Implementation notes: amd64, h3neo, crypto_hash/blake3

Computer: h3neo
Microarchitecture: amd64; K10 45nm (100f63)
Architecture: amd64
CPU ID: AuthenticAMD-00100f63-078bfbff
SUPERCOP version: 20240107
Operation: crypto_hash
Primitive: blake3
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
94729946 0 023604 820 960T:portablegcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121620231212
95069461 0 021756 820 960T:portablegcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121620231212
99258854 0 019215 796 928T:portablegcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121620231212
109088644 0 018661 844 896T:portableclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121620231212
110349605 0 021139 852 896T:portableclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121620231212
110359605 0 022059 852 896T:portableclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121620231212
110479134 0 019787 852 896T:portableclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121620231212
1105410024 0 022891 852 896T:portableclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121620231212
1291711196 0 022603 812 960T:portablegcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121620231212

Test failure

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
error 111

Number of similar (compiler,implementation) pairs: 27, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse41
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse41
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse41
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse41
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse41
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse41
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse41
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse41
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse41

Compiler output

Implementation: T:neon
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
blake3_neon.c: In file included from blake3_neon.c:3:
blake3_neon.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/arm_neon.h:28:2: error: "NEON intrinsics not available with the soft-float ABI. Please use -mfloat-abi=softfp or -mfloat-abi=hard"
blake3_neon.c: #error "NEON intrinsics not available with the soft-float ABI. Please use -mfloat-abi=softfp or -mfloat-abi=hard"
blake3_neon.c: ^
blake3_neon.c: blake3_neon.c:11:8: error: unknown type name 'uint32x4_t'; did you mean 'uint32_t'?
blake3_neon.c: INLINE uint32x4_t loadu_128(const uint8_t src[16]) {
blake3_neon.c: ^~~~~~~~~~
blake3_neon.c: uint32_t
blake3_neon.c: /usr/include/x86_64-linux-gnu/bits/stdint-uintn.h:26:20: note: 'uint32_t' declared here
blake3_neon.c: typedef __uint32_t uint32_t;
blake3_neon.c: ^
blake3_neon.c: blake3_neon.c:13:3: error: unknown type name 'uint32x4_t'; did you mean 'uint32_t'?
blake3_neon.c: uint32x4_t x;
blake3_neon.c: ^~~~~~~~~~
blake3_neon.c: uint32_t
blake3_neon.c: /usr/include/x86_64-linux-gnu/bits/stdint-uintn.h:26:20: note: 'uint32_t' declared here
blake3_neon.c: typedef __uint32_t uint32_t;
blake3_neon.c: ^
blake3_neon.c: blake3_neon.c:14:3: warning: 'memcpy' will always overflow; destination buffer has size 4, but size argument is 16 [-Wfortify-source]
blake3_neon.c: memcpy(&x, src, 16);
blake3_neon.c: ^
blake3_neon.c: blake3_neon.c:18:24: error: unknown type name 'uint32x4_t'; did you mean 'uint32_t'?
blake3_neon.c: INLINE void storeu_128(uint32x4_t src, uint8_t dest[16]) {
blake3_neon.c: ^~~~~~~~~~
blake3_neon.c: uint32_t
blake3_neon.c: ...

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:neon
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:neon
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:neon
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:neon
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:neon

Compiler output

Implementation: T:neon
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake3_neon.c: blake3_neon.c:3:10: fatal error: arm_neon.h: No such file or directory
blake3_neon.c: 3 | #include <arm_neon.h>
blake3_neon.c: | ^~~~~~~~~~~~
blake3_neon.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon