Implementation notes: amd64, zen3, crypto_hash/blake3

Computer: zen3
Architecture: amd64
CPU ID: AuthenticAMD-00a20f10-178bfbff
SUPERCOP version: 20211108
Operation: crypto_hash
Primitive: blake3
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
569813234 0 024063 828 952T:sse41gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
570023067 0 034400 836 952T:avx2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
570023178 0 033999 828 952T:avx2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
570113676 0 027664 860 888T:sse41clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
570723752 0 037792 860 888T:avx2clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
570723961 0 037248 836 952T:avx2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
571413119 0 024464 836 952T:sse41gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
571613821 0 027120 836 952T:sse41gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
572512443 0 023912 860 888T:sse41clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
572512585 0 022471 812 920T:sse41gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
572725804 0 044976 860 920T:avx2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
572822327 0 033848 860 888T:avx2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
572915808 0 034720 860 920T:sse41clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
573022728 0 033846 852 888T:avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
573125820 0 044784 860 920T:avx2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
574315792 0 034912 860 920T:sse41clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
575612844 0 023910 852 888T:sse41clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
586522517 0 032407 812 920T:avx2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
688613202 0 032048 860 920T:portableclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
690013202 0 032288 860 920T:portableclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
690210061 0 021344 836 952T:portablegcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
693210110 0 024048 860 888T:portableclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
69909981 0 021400 860 888T:portableclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
71519838 0 019671 812 920T:portablegcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
727210995 0 024232 836 952T:portablegcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
745712022 0 022839 828 952T:portablegcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
943610269 0 021334 852 888T:portableclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108

Test failure

Implementation: T:avx512
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
error 111

Number of similar (compiler,implementation) pairs: 9, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512

Compiler output

Implementation: T:neon
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
blake3.c: In file included from blake3.c:12:
blake3.c: ./blake3_static_dispatch.h:17:9: warning: 'MAX_SIMD_DEGREE' macro redefined [-Wmacro-redefined]
blake3.c: #define MAX_SIMD_DEGREE 4
blake3.c: ^
blake3.c: ./blake3_impl.h:49:9: note: previous definition is here
blake3.c: #define MAX_SIMD_DEGREE 16
blake3.c: ^
blake3.c: In file included from blake3.c:12:
blake3.c: ./blake3_static_dispatch.h:18:9: warning: 'MAX_SIMD_DEGREE_OR_2' macro redefined [-Wmacro-redefined]
blake3.c: #define MAX_SIMD_DEGREE_OR_2 4
blake3.c: ^
blake3.c: ./blake3_impl.h:58:9: note: previous definition is here
blake3.c: #define MAX_SIMD_DEGREE_OR_2 (MAX_SIMD_DEGREE > 2 ? MAX_SIMD_DEGREE : 2)
blake3.c: ^
blake3.c: 2 warnings generated.
blake3_neon.c: In file included from blake3_neon.c:3:
blake3_neon.c: /usr/lib/clang/13.0.0/include/arm_neon.h:28:2: error: "NEON intrinsics not available with the soft-float ABI. Please use -mfloat-abi=softfp or -mfloat-abi=hard"
blake3_neon.c: #error "NEON intrinsics not available with the soft-float ABI. Please use -mfloat-abi=softfp or -mfloat-abi=hard"
blake3_neon.c: ^
blake3_neon.c: blake3_neon.c:6:8: error: unknown type name 'uint32x4_t'; did you mean 'uint32_t'?
blake3_neon.c: INLINE uint32x4_t loadu_128(const uint8_t src[16]) {
blake3_neon.c: ^~~~~~~~~~
blake3_neon.c: uint32_t
blake3_neon.c: /usr/include/bits/stdint-uintn.h:26:20: note: 'uint32_t' declared here
blake3_neon.c: typedef __uint32_t uint32_t;
blake3_neon.c: ^
blake3_neon.c: blake3_neon.c:8:3: error: unknown type name 'uint32x4_t'; did you mean 'uint32_t'?
blake3_neon.c: uint32x4_t x;
blake3_neon.c: ^~~~~~~~~~
blake3_neon.c: uint32_t
blake3_neon.c: /usr/include/bits/stdint-uintn.h:26:20: note: 'uint32_t' declared here
blake3_neon.c: typedef __uint32_t uint32_t;
blake3_neon.c: ^
blake3_neon.c: blake3_neon.c:9:3: warning: 'memcpy' will always overflow; destination buffer has size 4, but size argument is 16 [-Wfortify-source]
blake3_neon.c: memcpy(&x, src, 16);
blake3_neon.c: ^
blake3_neon.c: blake3_neon.c:13:24: error: unknown type name 'uint32x4_t'; did you mean 'uint32_t'?
blake3_neon.c: INLINE void storeu_128(uint32x4_t src, uint8_t dest[16]) {
blake3_neon.c: ^~~~~~~~~~
blake3_neon.c: uint32_t
blake3_neon.c: ...

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:neon
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:neon
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:neon
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:neon
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:neon

Compiler output

Implementation: T:neon
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake3.c: In file included from blake3.c:12:
blake3.c: blake3_static_dispatch.h:17: warning: "MAX_SIMD_DEGREE" redefined
blake3.c: 17 | #define MAX_SIMD_DEGREE 4
blake3.c: |
blake3.c: In file included from blake3.c:6:
blake3.c: blake3_impl.h:49: note: this is the location of the previous definition
blake3.c: 49 | #define MAX_SIMD_DEGREE 16
blake3.c: |
blake3.c: In file included from blake3.c:12:
blake3.c: blake3_static_dispatch.h:18: warning: "MAX_SIMD_DEGREE_OR_2" redefined
blake3.c: 18 | #define MAX_SIMD_DEGREE_OR_2 4
blake3.c: |
blake3.c: In file included from blake3.c:6:
blake3.c: blake3_impl.h:58: note: this is the location of the previous definition
blake3.c: 58 | #define MAX_SIMD_DEGREE_OR_2 (MAX_SIMD_DEGREE > 2 ? MAX_SIMD_DEGREE : 2)
blake3.c: |
blake3_neon.c: blake3_neon.c:3:10: fatal error: arm_neon.h: No such file or directory
blake3_neon.c: 3 | #include <arm_neon.h>
blake3_neon.c: | ^~~~~~~~~~~~
blake3_neon.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon