Implementation notes: aarch64, pi4b, crypto_hash/blake3

Computer: pi4b
Architecture: aarch64
CPU ID: 410fd083
SUPERCOP version: 20221019
Operation: crypto_hash
Primitive: blake3
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
1040816028 0 025771 824 728T:neongcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022052920220506
1046016264 0 026971 824 736T:neongcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022052920220506
105079324 0 020043 824 736T:portablegcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022052920220506
1068415368 0 024027 808 720T:neongcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022052920220506
107458948 0 017627 808 720T:portablegcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022052920220506
1225516212 0 027594 840 728T:neonclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100720221005
123149932 0 021322 840 728T:portableclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100720221005
1382417440 0 027034 816 728T:neongcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022052920220506
1418310836 0 020426 816 728T:portablegcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022052920220506

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:1:1: error: unknown directive
blake3_avx2_x86-64_unix.S: .intel_syntax noprefix
blake3_avx2_x86-64_unix.S: ^
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:12:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_avx2_x86-64_unix.S: push r15
blake3_avx2_x86-64_unix.S: ^
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:13:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_avx2_x86-64_unix.S: push r14
blake3_avx2_x86-64_unix.S: ^
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:14:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_avx2_x86-64_unix.S: push r13
blake3_avx2_x86-64_unix.S: ^
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:15:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_avx2_x86-64_unix.S: push r12
blake3_avx2_x86-64_unix.S: ^
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:16:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_avx2_x86-64_unix.S: push rbx
blake3_avx2_x86-64_unix.S: ^
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:17:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_avx2_x86-64_unix.S: push rbp
blake3_avx2_x86-64_unix.S: ^
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:18:13: error: invalid operand for instruction
blake3_avx2_x86-64_unix.S: mov rbp, rsp
blake3_avx2_x86-64_unix.S: ^
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:19:13: error: invalid operand for instruction
blake3_avx2_x86-64_unix.S: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake3.c: In file included from /usr/include/string.h:519,
blake3.c: from blake3.c:3:
blake3.c: In function 'memcpy',
blake3.c: inlined from 'compress_subtree_to_parent_node' at blake3.c:237:5,
blake3.c: inlined from 'blake3_default_hash' at blake3.c:249:3:
blake3.c: /usr/include/aarch64-linux-gnu/bits/string_fortified.h:29:10: warning: '__builtin_memcpy' reading 64 bytes from a region of size 32 [-Wstringop-overread]
blake3.c: 29 | return __builtin___memcpy_chk (__dest, __src, __len,
blake3.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: 30 | __glibc_objsize0 (__dest));
blake3.c: | ~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: /usr/include/aarch64-linux-gnu/bits/string_fortified.h: In function 'blake3_default_hash':
blake3.c: blake3.c:233:11: note: source object 'out_array' of size 32
blake3.c: 233 | uint8_t out_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2];
blake3.c: | ^~~~~~~~~
blake3.c: In file included from /usr/include/string.h:519,
blake3.c: from blake3.c:3:
blake3.c: In function 'memcpy',
blake3.c: inlined from 'compress_parents_parallel' at blake3.c:125:5,
blake3.c: inlined from 'compress_subtree_to_parent_node' at blake3.c:236:9,
blake3.c: inlined from 'blake3_default_hash' at blake3.c:249:3:
blake3.c: /usr/include/aarch64-linux-gnu/bits/string_fortified.h:29:10: warning: '__builtin_memcpy' writing 32 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
blake3.c: 29 | return __builtin___memcpy_chk (__dest, __src, __len,
blake3.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: 30 | __glibc_objsize0 (__dest));
blake3.c: | ~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2 T:avx512 T:portable T:sse41

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake3.c: In file included from /usr/include/string.h:519,
blake3.c: from blake3.c:3:
blake3.c: In function 'memcpy',
blake3.c: inlined from 'compress_parents_parallel' at blake3.c:125:5,
blake3.c: inlined from 'compress_subtree_to_parent_node' at blake3.c:236:9,
blake3.c: inlined from 'blake3_default_hash' at blake3.c:249:3:
blake3.c: /usr/include/aarch64-linux-gnu/bits/string_fortified.h:29:10: warning: '__builtin_memcpy' writing 32 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
blake3.c: 29 | return __builtin___memcpy_chk (__dest, __src, __len,
blake3.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: 30 | __glibc_objsize0 (__dest));
blake3.c: | ~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: /usr/include/aarch64-linux-gnu/bits/string_fortified.h: In function 'blake3_default_hash':
blake3.c: blake3.c:233:11: note: at offset 32 into destination object 'out_array' of size 32
blake3.c: 233 | uint8_t out_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2];
blake3.c: | ^~~~~~~~~
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S: Assembler messages:
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:1: Error: unknown pseudo-op: `.intel_syntax'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:12: Error: unknown mnemonic `push' -- `push r15'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:13: Error: unknown mnemonic `push' -- `push r14'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:14: Error: unknown mnemonic `push' -- `push r13'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:15: Error: unknown mnemonic `push' -- `push r12'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:16: Error: unknown mnemonic `push' -- `push rbx'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:17: Error: unknown mnemonic `push' -- `push rbp'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:18: Error: operand 1 must be an integer register -- `mov rbp,rsp'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:19: Error: operand 1 must be an integer or stack pointer register -- `sub rsp,680'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:20: Error: operand 1 must be an integer or stack pointer register -- `and rsp,0xFFFFFFFFFFFFFFC0'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:21: Error: operand 1 must be a SIMD vector register -- `neg r9d'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:22: Error: unknown mnemonic `vmovd' -- `vmovd xmm0,r9d'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:23: Error: unknown mnemonic `vpbroadcastd' -- `vpbroadcastd ymm0,xmm0'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:24: Error: unknown mnemonic `vmovdqa' -- `vmovdqa ymmword ptr[rsp+0x280],ymm0'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:25: Error: unknown mnemonic `vpand' -- `vpand ymm1,ymm0,ymmword ptr[ADD0+rip]'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:26: Error: unknown mnemonic `vpand' -- `vpand ymm2,ymm0,ymmword ptr[ADD1+rip]'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:27: Error: unknown mnemonic `vmovdqa' -- `vmovdqa ymmword ptr[rsp+0x220],ymm2'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:28: Error: unknown mnemonic `vmovd' -- `vmovd xmm2,r8d'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:29: Error: unknown mnemonic `vpbroadcastd' -- `vpbroadcastd ymm2,xmm2'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:30: Error: unknown mnemonic `vpaddd' -- `vpaddd ymm2,ymm2,ymm1'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:31: Error: unknown mnemonic `vmovdqa' -- `vmovdqa ymmword ptr[rsp+0x240],ymm2'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:32: Error: unknown mnemonic `vpxor' -- `vpxor ymm1,ymm1,ymmword ptr[CMP_MSB_MASK+rip]'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:33: Error: unknown mnemonic `vpxor' -- `vpxor ymm2,ymm2,ymmword ptr[CMP_MSB_MASK+rip]'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:34: Error: unknown mnemonic `vpcmpgtd' -- `vpcmpgtd ymm2,ymm1,ymm2'
blake3_avx2_x86-64_unix.S: ...

Number of similar (compiler,implementation) pairs: 2, namely:
CompilerImplementations
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S: Assembler messages:
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:1: Error: unknown pseudo-op: `.intel_syntax'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:12: Error: unknown mnemonic `push' -- `push r15'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:13: Error: unknown mnemonic `push' -- `push r14'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:14: Error: unknown mnemonic `push' -- `push r13'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:15: Error: unknown mnemonic `push' -- `push r12'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:16: Error: unknown mnemonic `push' -- `push rbx'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:17: Error: unknown mnemonic `push' -- `push rbp'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:18: Error: operand 1 must be an integer register -- `mov rbp,rsp'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:19: Error: operand 1 must be an integer or stack pointer register -- `sub rsp,680'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:20: Error: operand 1 must be an integer or stack pointer register -- `and rsp,0xFFFFFFFFFFFFFFC0'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:21: Error: operand 1 must be a SIMD vector register -- `neg r9d'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:22: Error: unknown mnemonic `vmovd' -- `vmovd xmm0,r9d'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:23: Error: unknown mnemonic `vpbroadcastd' -- `vpbroadcastd ymm0,xmm0'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:24: Error: unknown mnemonic `vmovdqa' -- `vmovdqa ymmword ptr[rsp+0x280],ymm0'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:25: Error: unknown mnemonic `vpand' -- `vpand ymm1,ymm0,ymmword ptr[ADD0+rip]'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:26: Error: unknown mnemonic `vpand' -- `vpand ymm2,ymm0,ymmword ptr[ADD1+rip]'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:27: Error: unknown mnemonic `vmovdqa' -- `vmovdqa ymmword ptr[rsp+0x220],ymm2'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:28: Error: unknown mnemonic `vmovd' -- `vmovd xmm2,r8d'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:29: Error: unknown mnemonic `vpbroadcastd' -- `vpbroadcastd ymm2,xmm2'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:30: Error: unknown mnemonic `vpaddd' -- `vpaddd ymm2,ymm2,ymm1'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:31: Error: unknown mnemonic `vmovdqa' -- `vmovdqa ymmword ptr[rsp+0x240],ymm2'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:32: Error: unknown mnemonic `vpxor' -- `vpxor ymm1,ymm1,ymmword ptr[CMP_MSB_MASK+rip]'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:33: Error: unknown mnemonic `vpxor' -- `vpxor ymm2,ymm2,ymmword ptr[CMP_MSB_MASK+rip]'
blake3_avx2_x86-64_unix.S: blake3_avx2_x86-64_unix.S:34: Error: unknown mnemonic `vpcmpgtd' -- `vpcmpgtd ymm2,ymm1,ymm2'
blake3_avx2_x86-64_unix.S: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx512
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:1:1: error: unknown directive
blake3_avx512_x86-64_unix.S: .intel_syntax noprefix
blake3_avx512_x86-64_unix.S: ^
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:18:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_avx512_x86-64_unix.S: push r15
blake3_avx512_x86-64_unix.S: ^
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:19:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_avx512_x86-64_unix.S: push r14
blake3_avx512_x86-64_unix.S: ^
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:20:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_avx512_x86-64_unix.S: push r13
blake3_avx512_x86-64_unix.S: ^
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:21:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_avx512_x86-64_unix.S: push r12
blake3_avx512_x86-64_unix.S: ^
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:22:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_avx512_x86-64_unix.S: push rbx
blake3_avx512_x86-64_unix.S: ^
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:23:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_avx512_x86-64_unix.S: push rbp
blake3_avx512_x86-64_unix.S: ^
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:24:13: error: invalid operand for instruction
blake3_avx512_x86-64_unix.S: mov rbp, rsp
blake3_avx512_x86-64_unix.S: ^
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:25:13: error: invalid operand for instruction
blake3_avx512_x86-64_unix.S: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512

Compiler output

Implementation: T:avx512
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake3.c: In file included from /usr/include/string.h:519,
blake3.c: from blake3.c:3:
blake3.c: In function 'memcpy',
blake3.c: inlined from 'compress_parents_parallel' at blake3.c:125:5,
blake3.c: inlined from 'compress_subtree_to_parent_node' at blake3.c:236:9,
blake3.c: inlined from 'blake3_default_hash' at blake3.c:249:3:
blake3.c: /usr/include/aarch64-linux-gnu/bits/string_fortified.h:29:10: warning: '__builtin_memcpy' writing 32 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
blake3.c: 29 | return __builtin___memcpy_chk (__dest, __src, __len,
blake3.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: 30 | __glibc_objsize0 (__dest));
blake3.c: | ~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: /usr/include/aarch64-linux-gnu/bits/string_fortified.h: In function 'blake3_default_hash':
blake3.c: blake3.c:233:11: note: at offset 32 into destination object 'out_array' of size 32
blake3.c: 233 | uint8_t out_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2];
blake3.c: | ^~~~~~~~~
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S: Assembler messages:
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:1: Error: unknown pseudo-op: `.intel_syntax'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:18: Error: unknown mnemonic `push' -- `push r15'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:19: Error: unknown mnemonic `push' -- `push r14'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:20: Error: unknown mnemonic `push' -- `push r13'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:21: Error: unknown mnemonic `push' -- `push r12'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:22: Error: unknown mnemonic `push' -- `push rbx'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:23: Error: unknown mnemonic `push' -- `push rbp'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:24: Error: operand 1 must be an integer register -- `mov rbp,rsp'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:25: Error: operand 1 must be an integer or stack pointer register -- `sub rsp,144'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:26: Error: operand 1 must be an integer or stack pointer register -- `and rsp,0xFFFFFFFFFFFFFFC0'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:27: Error: operand 1 must be a SIMD vector register -- `neg r9'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:28: Error: unknown mnemonic `kmovw' -- `kmovw k1,r9d'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:29: Error: unknown mnemonic `vmovd' -- `vmovd xmm0,r8d'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:30: Error: unknown mnemonic `vpbroadcastd' -- `vpbroadcastd ymm0,xmm0'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:31: Error: unknown mnemonic `shr' -- `shr r8,32'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:32: Error: unknown mnemonic `vmovd' -- `vmovd xmm1,r8d'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:33: Error: unknown mnemonic `vpbroadcastd' -- `vpbroadcastd ymm1,xmm1'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:34: Error: unknown mnemonic `vmovdqa' -- `vmovdqa ymm4,ymm1'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:35: Error: unknown mnemonic `vmovdqa' -- `vmovdqa ymm5,ymm1'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:36: Error: unknown mnemonic `vpaddd' -- `vpaddd ymm2,ymm0,ymmword ptr[ADD0+rip]'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:37: Error: unknown mnemonic `vpaddd' -- `vpaddd ymm3,ymm0,ymmword ptr[ADD0+32+rip]'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:38: Error: unknown mnemonic `vpcmpltud' -- `vpcmpltud k2,ymm2,ymm0'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:39: Error: unknown mnemonic `vpcmpltud' -- `vpcmpltud k3,ymm3,ymm0'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:40: Error: unknown mnemonic `vpaddd' -- `vpaddd ymm4{k2},ymm4,dword ptr[ADD1+rip]{1to8}'
blake3_avx512_x86-64_unix.S: ...

Number of similar (compiler,implementation) pairs: 2, namely:
CompilerImplementations
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512

Compiler output

Implementation: T:avx512
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S: Assembler messages:
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:1: Error: unknown pseudo-op: `.intel_syntax'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:18: Error: unknown mnemonic `push' -- `push r15'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:19: Error: unknown mnemonic `push' -- `push r14'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:20: Error: unknown mnemonic `push' -- `push r13'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:21: Error: unknown mnemonic `push' -- `push r12'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:22: Error: unknown mnemonic `push' -- `push rbx'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:23: Error: unknown mnemonic `push' -- `push rbp'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:24: Error: operand 1 must be an integer register -- `mov rbp,rsp'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:25: Error: operand 1 must be an integer or stack pointer register -- `sub rsp,144'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:26: Error: operand 1 must be an integer or stack pointer register -- `and rsp,0xFFFFFFFFFFFFFFC0'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:27: Error: operand 1 must be a SIMD vector register -- `neg r9'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:28: Error: unknown mnemonic `kmovw' -- `kmovw k1,r9d'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:29: Error: unknown mnemonic `vmovd' -- `vmovd xmm0,r8d'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:30: Error: unknown mnemonic `vpbroadcastd' -- `vpbroadcastd ymm0,xmm0'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:31: Error: unknown mnemonic `shr' -- `shr r8,32'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:32: Error: unknown mnemonic `vmovd' -- `vmovd xmm1,r8d'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:33: Error: unknown mnemonic `vpbroadcastd' -- `vpbroadcastd ymm1,xmm1'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:34: Error: unknown mnemonic `vmovdqa' -- `vmovdqa ymm4,ymm1'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:35: Error: unknown mnemonic `vmovdqa' -- `vmovdqa ymm5,ymm1'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:36: Error: unknown mnemonic `vpaddd' -- `vpaddd ymm2,ymm0,ymmword ptr[ADD0+rip]'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:37: Error: unknown mnemonic `vpaddd' -- `vpaddd ymm3,ymm0,ymmword ptr[ADD0+32+rip]'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:38: Error: unknown mnemonic `vpcmpltud' -- `vpcmpltud k2,ymm2,ymm0'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:39: Error: unknown mnemonic `vpcmpltud' -- `vpcmpltud k3,ymm3,ymm0'
blake3_avx512_x86-64_unix.S: blake3_avx512_x86-64_unix.S:40: Error: unknown mnemonic `vpaddd' -- `vpaddd ymm4{k2},ymm4,dword ptr[ADD1+rip]{1to8}'
blake3_avx512_x86-64_unix.S: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512

Compiler output

Implementation: T:neon
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
blake3.c: In file included from blake3.c:12:
blake3.c: ./blake3_static_dispatch.h:17:9: warning: 'MAX_SIMD_DEGREE' macro redefined [-Wmacro-redefined]
blake3.c: #define MAX_SIMD_DEGREE 4
blake3.c: ^
blake3.c: ./blake3_impl.h:53:9: note: previous definition is here
blake3.c: #define MAX_SIMD_DEGREE 1
blake3.c: ^
blake3.c: In file included from blake3.c:12:
blake3.c: ./blake3_static_dispatch.h:18:9: warning: 'MAX_SIMD_DEGREE_OR_2' macro redefined [-Wmacro-redefined]
blake3.c: #define MAX_SIMD_DEGREE_OR_2 4
blake3.c: ^
blake3.c: ./blake3_impl.h:58:9: note: previous definition is here
blake3.c: #define MAX_SIMD_DEGREE_OR_2 (MAX_SIMD_DEGREE > 2 ? MAX_SIMD_DEGREE : 2)
blake3.c: ^
blake3.c: 2 warnings generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:neon

Compiler output

Implementation: T:neon
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake3.c: In file included from blake3.c:12:
blake3.c: blake3_static_dispatch.h:17: warning: "MAX_SIMD_DEGREE" redefined
blake3.c: 17 | #define MAX_SIMD_DEGREE 4
blake3.c: |
blake3.c: In file included from blake3.c:6:
blake3.c: blake3_impl.h:53: note: this is the location of the previous definition
blake3.c: 53 | #define MAX_SIMD_DEGREE 1
blake3.c: |
blake3.c: In file included from blake3.c:12:
blake3.c: blake3_static_dispatch.h:18: warning: "MAX_SIMD_DEGREE_OR_2" redefined
blake3.c: 18 | #define MAX_SIMD_DEGREE_OR_2 4
blake3.c: |
blake3.c: In file included from blake3.c:6:
blake3.c: blake3_impl.h:58: note: this is the location of the previous definition
blake3.c: 58 | #define MAX_SIMD_DEGREE_OR_2 (MAX_SIMD_DEGREE > 2 ? MAX_SIMD_DEGREE : 2)
blake3.c: |

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon

Compiler output

Implementation: T:portable
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake3.c: In file included from /usr/include/string.h:519,
blake3.c: from blake3.c:3:
blake3.c: In function 'memcpy',
blake3.c: inlined from 'compress_parents_parallel' at blake3.c:125:5,
blake3.c: inlined from 'compress_subtree_to_parent_node' at blake3.c:236:9,
blake3.c: inlined from 'blake3_default_hash' at blake3.c:249:3:
blake3.c: /usr/include/aarch64-linux-gnu/bits/string_fortified.h:29:10: warning: '__builtin_memcpy' writing 32 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
blake3.c: 29 | return __builtin___memcpy_chk (__dest, __src, __len,
blake3.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: 30 | __glibc_objsize0 (__dest));
blake3.c: | ~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: /usr/include/aarch64-linux-gnu/bits/string_fortified.h: In function 'blake3_default_hash':
blake3.c: blake3.c:233:11: note: at offset 32 into destination object 'out_array' of size 32
blake3.c: 233 | uint8_t out_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2];
blake3.c: | ^~~~~~~~~

Number of similar (compiler,implementation) pairs: 2, namely:
CompilerImplementations
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:portable
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:portable

Compiler output

Implementation: T:sse41
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:1:1: error: unknown directive
blake3_sse41_x86-64_unix.S: .intel_syntax noprefix
blake3_sse41_x86-64_unix.S: ^
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:16:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_sse41_x86-64_unix.S: push r15
blake3_sse41_x86-64_unix.S: ^
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:17:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_sse41_x86-64_unix.S: push r14
blake3_sse41_x86-64_unix.S: ^
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:18:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_sse41_x86-64_unix.S: push r13
blake3_sse41_x86-64_unix.S: ^
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:19:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_sse41_x86-64_unix.S: push r12
blake3_sse41_x86-64_unix.S: ^
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:20:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_sse41_x86-64_unix.S: push rbx
blake3_sse41_x86-64_unix.S: ^
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:21:9: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
blake3_sse41_x86-64_unix.S: push rbp
blake3_sse41_x86-64_unix.S: ^
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:22:13: error: invalid operand for instruction
blake3_sse41_x86-64_unix.S: mov rbp, rsp
blake3_sse41_x86-64_unix.S: ^
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:23:13: error: invalid operand for instruction
blake3_sse41_x86-64_unix.S: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse41

Compiler output

Implementation: T:sse41
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake3.c: In file included from /usr/include/string.h:519,
blake3.c: from blake3.c:3:
blake3.c: In function 'memcpy',
blake3.c: inlined from 'compress_parents_parallel' at blake3.c:125:5,
blake3.c: inlined from 'compress_subtree_to_parent_node' at blake3.c:236:9,
blake3.c: inlined from 'blake3_default_hash' at blake3.c:249:3:
blake3.c: /usr/include/aarch64-linux-gnu/bits/string_fortified.h:29:10: warning: '__builtin_memcpy' writing 32 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
blake3.c: 29 | return __builtin___memcpy_chk (__dest, __src, __len,
blake3.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: 30 | __glibc_objsize0 (__dest));
blake3.c: | ~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: /usr/include/aarch64-linux-gnu/bits/string_fortified.h: In function 'blake3_default_hash':
blake3.c: blake3.c:233:11: note: at offset 32 into destination object 'out_array' of size 32
blake3.c: 233 | uint8_t out_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2];
blake3.c: | ^~~~~~~~~
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S: Assembler messages:
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:1: Error: unknown pseudo-op: `.intel_syntax'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:16: Error: unknown mnemonic `push' -- `push r15'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:17: Error: unknown mnemonic `push' -- `push r14'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:18: Error: unknown mnemonic `push' -- `push r13'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:19: Error: unknown mnemonic `push' -- `push r12'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:20: Error: unknown mnemonic `push' -- `push rbx'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:21: Error: unknown mnemonic `push' -- `push rbp'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:22: Error: operand 1 must be an integer register -- `mov rbp,rsp'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:23: Error: operand 1 must be an integer or stack pointer register -- `sub rsp,360'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:24: Error: operand 1 must be an integer or stack pointer register -- `and rsp,0xFFFFFFFFFFFFFFC0'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:25: Error: operand 1 must be a SIMD vector register -- `neg r9d'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:26: Error: unknown mnemonic `movd' -- `movd xmm0,r9d'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:27: Error: unknown mnemonic `pshufd' -- `pshufd xmm0,xmm0,0x00'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:28: Error: unknown mnemonic `movdqa' -- `movdqa xmmword ptr[rsp+0x130],xmm0'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:29: Error: unknown mnemonic `movdqa' -- `movdqa xmm1,xmm0'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:30: Error: unknown mnemonic `pand' -- `pand xmm1,xmmword ptr[ADD0+rip]'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:31: Error: unknown mnemonic `pand' -- `pand xmm0,xmmword ptr[ADD1+rip]'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:32: Error: unknown mnemonic `movdqa' -- `movdqa xmmword ptr[rsp+0x150],xmm0'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:33: Error: unknown mnemonic `movd' -- `movd xmm0,r8d'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:34: Error: unknown mnemonic `pshufd' -- `pshufd xmm0,xmm0,0x00'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:35: Error: unknown mnemonic `paddd' -- `paddd xmm0,xmm1'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:36: Error: unknown mnemonic `movdqa' -- `movdqa xmmword ptr[rsp+0x110],xmm0'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:37: Error: unknown mnemonic `pxor' -- `pxor xmm0,xmmword ptr[CMP_MSB_MASK+rip]'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:38: Error: unknown mnemonic `pxor' -- `pxor xmm1,xmmword ptr[CMP_MSB_MASK+rip]'
blake3_sse41_x86-64_unix.S: ...

Number of similar (compiler,implementation) pairs: 2, namely:
CompilerImplementations
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse41
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse41

Compiler output

Implementation: T:sse41
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S: Assembler messages:
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:1: Error: unknown pseudo-op: `.intel_syntax'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:16: Error: unknown mnemonic `push' -- `push r15'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:17: Error: unknown mnemonic `push' -- `push r14'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:18: Error: unknown mnemonic `push' -- `push r13'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:19: Error: unknown mnemonic `push' -- `push r12'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:20: Error: unknown mnemonic `push' -- `push rbx'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:21: Error: unknown mnemonic `push' -- `push rbp'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:22: Error: operand 1 must be an integer register -- `mov rbp,rsp'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:23: Error: operand 1 must be an integer or stack pointer register -- `sub rsp,360'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:24: Error: operand 1 must be an integer or stack pointer register -- `and rsp,0xFFFFFFFFFFFFFFC0'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:25: Error: operand 1 must be a SIMD vector register -- `neg r9d'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:26: Error: unknown mnemonic `movd' -- `movd xmm0,r9d'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:27: Error: unknown mnemonic `pshufd' -- `pshufd xmm0,xmm0,0x00'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:28: Error: unknown mnemonic `movdqa' -- `movdqa xmmword ptr[rsp+0x130],xmm0'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:29: Error: unknown mnemonic `movdqa' -- `movdqa xmm1,xmm0'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:30: Error: unknown mnemonic `pand' -- `pand xmm1,xmmword ptr[ADD0+rip]'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:31: Error: unknown mnemonic `pand' -- `pand xmm0,xmmword ptr[ADD1+rip]'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:32: Error: unknown mnemonic `movdqa' -- `movdqa xmmword ptr[rsp+0x150],xmm0'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:33: Error: unknown mnemonic `movd' -- `movd xmm0,r8d'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:34: Error: unknown mnemonic `pshufd' -- `pshufd xmm0,xmm0,0x00'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:35: Error: unknown mnemonic `paddd' -- `paddd xmm0,xmm1'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:36: Error: unknown mnemonic `movdqa' -- `movdqa xmmword ptr[rsp+0x110],xmm0'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:37: Error: unknown mnemonic `pxor' -- `pxor xmm0,xmmword ptr[CMP_MSB_MASK+rip]'
blake3_sse41_x86-64_unix.S: blake3_sse41_x86-64_unix.S:38: Error: unknown mnemonic `pxor' -- `pxor xmm1,xmmword ptr[CMP_MSB_MASK+rip]'
blake3_sse41_x86-64_unix.S: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse41