Implementation notes: aarch64, pi4b, crypto_hash/blake2b

Computer: pi4b
Microarchitecture: aarch64; Cortex-A72 (410fd083)
Architecture: aarch64
CPU ID: 410fd083
SUPERCOP version: 20240107
Operation: crypto_hash
Primitive: blake2b
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
851212400 0 026335 824 728T:regsgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
851312400 0 023211 824 736T:regsgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
852512080 0 020747 808 720T:regsgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
912212580 0 024394 840 728T:regsclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023122420231222
1012512404 0 021986 816 728T:regsgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
112198788 0 020674 840 728T:refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023122420231222

Compiler output

Implementation: T:avx2-1
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
blake2b.c: In file included from blake2b.c:11:
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
blake2b.c: #error "This header is only meant to be used on x86 and x64 architecture"
blake2b.c: ^
blake2b.c: In file included from blake2b.c:11:
blake2b.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:17:
blake2b.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/x86gprintrin.h:15:
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/hresetintrin.h:42:27: error: invalid input constraint 'a' in asm
blake2b.c: __asm__ ("hreset $0" :: "a"(__eax));
blake2b.c: ^
blake2b.c: In file included from blake2b.c:11:
blake2b.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:21:
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
blake2b.c: #error "This header is only meant to be used on x86 and x64 architecture"
blake2b.c: ^
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:54:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
blake2b.c: return (__m64)__builtin_ia32_vec_init_v2si(__i, 0);
blake2b.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:133:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
blake2b.c: return (__m64)__builtin_ia32_packsswb((__v4hi)__m1, (__v4hi)__m2);
blake2b.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:163:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
blake2b.c: return (__m64)__builtin_ia32_packssdw((__v2si)__m1, (__v2si)__m2);
blake2b.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:193:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
blake2b.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2-1

Compiler output

Implementation: T:avx2-1
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake2b.c: blake2b.c:11:10: fatal error: immintrin.h: No such file or directory
blake2b.c: 11 | #include <immintrin.h>
blake2b.c: | ^~~~~~~~~~~~~
blake2b.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2-1
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2-1
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2-1
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2-1

Compiler output

Implementation: T:avx2-2
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
blake2b.c: In file included from blake2b.c:12:
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
blake2b.c: #error "This header is only meant to be used on x86 and x64 architecture"
blake2b.c: ^
blake2b.c: In file included from blake2b.c:12:
blake2b.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:17:
blake2b.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/x86gprintrin.h:15:
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/hresetintrin.h:42:27: error: invalid input constraint 'a' in asm
blake2b.c: __asm__ ("hreset $0" :: "a"(__eax));
blake2b.c: ^
blake2b.c: In file included from blake2b.c:12:
blake2b.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:21:
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
blake2b.c: #error "This header is only meant to be used on x86 and x64 architecture"
blake2b.c: ^
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:54:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
blake2b.c: return (__m64)__builtin_ia32_vec_init_v2si(__i, 0);
blake2b.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:133:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
blake2b.c: return (__m64)__builtin_ia32_packsswb((__v4hi)__m1, (__v4hi)__m2);
blake2b.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:163:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
blake2b.c: return (__m64)__builtin_ia32_packssdw((__v2si)__m1, (__v2si)__m2);
blake2b.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:193:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
blake2b.c: ...

Number of similar (compiler,implementation) pairs: 2, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2-2 T:avx2-3

Compiler output

Implementation: T:avx2-2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake2b.c: blake2b.c:12:10: fatal error: immintrin.h: No such file or directory
blake2b.c: 12 | #include <immintrin.h>
blake2b.c: | ^~~~~~~~~~~~~
blake2b.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2-2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2-2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2-2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2-2
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2-3
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2-3
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2-3
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2-3

Compiler output

Implementation: T:avxicc
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
blake2b.S: blake2b.S:3:2: error: unknown directive
blake2b.S: .intel_syntax noprefix
blake2b.S: ^
blake2b.S: blake2b.S:14:13: error: invalid operand for instruction
blake2b.S: sub rsp, 552
blake2b.S: ^
blake2b.S: blake2b.S:16:9: error: unrecognized instruction mnemonic, did you mean: eor, orn, orr, ror?
blake2b.S: xor r11d, r11d
blake2b.S: ^
blake2b.S: blake2b.S:17:9: error: unrecognized instruction mnemonic, did you mean: eor, orn, orr, ror?
blake2b.S: xor ecx, ecx
blake2b.S: ^
blake2b.S: blake2b.S:18:13: error: invalid operand for instruction
blake2b.S: mov r9, rsi
blake2b.S: ^
blake2b.S: blake2b.S:19:31: error: unexpected token in argument list
blake2b.S: vmovdqu xmm2, XMMWORD PTR .L_2il0floatpacket.13[rip]
blake2b.S: ^
blake2b.S: blake2b.S:20:9: error: unrecognized instruction mnemonic, did you mean: eor, orn, orr, ror?
blake2b.S: xor eax, eax
blake2b.S: ^
blake2b.S: blake2b.S:21:31: error: unexpected token in argument list
blake2b.S: vmovdqu xmm3, XMMWORD PTR .L_2il0floatpacket.14[rip]
blake2b.S: ^
blake2b.S: blake2b.S:22:13: error: invalid operand for instruction
blake2b.S: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avxicc

Compiler output

Implementation: T:avxicc
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake2b.S: blake2b.S: Assembler messages:
blake2b.S: blake2b.S:3: Error: unknown pseudo-op: `.intel_syntax'
blake2b.S: blake2b.c:14: Error: operand 1 must be an integer or stack pointer register -- `sub rsp,552'
blake2b.S: blake2b.c:16: Error: unknown mnemonic `xor' -- `xor r11d,r11d'
blake2b.S: blake2b.c:17: Error: unknown mnemonic `xor' -- `xor ecx,ecx'
blake2b.S: blake2b.c:18: Error: operand 1 must be an integer register -- `mov r9,rsi'
blake2b.S: blake2b.c:19: Error: unknown mnemonic `vmovdqu' -- `vmovdqu xmm2,XMMWORD PTR .L_2il0floatpacket.13[rip]'
blake2b.S: blake2b.c:20: Error: unknown mnemonic `xor' -- `xor eax,eax'
blake2b.S: blake2b.c:21: Error: unknown mnemonic `vmovdqu' -- `vmovdqu xmm3,XMMWORD PTR .L_2il0floatpacket.14[rip]'
blake2b.S: blake2b.c:22: Error: operand 1 must be an integer register -- `mov r8,rdx'
blake2b.S: blake2b.c:23: Error: unknown mnemonic `vmovdqu' -- `vmovdqu xmm4,XMMWORD PTR .L_2il0floatpacket.15[rip]'
blake2b.S: blake2b.c:24: Error: unknown mnemonic `xor' -- `xor r10d,r10d'
blake2b.S: blake2b.c:25: Error: unknown mnemonic `vmovdqu' -- `vmovdqu xmm11,XMMWORD PTR .L_2il0floatpacket.16[rip]'
blake2b.S: blake2b.c:26: Error: unknown mnemonic `vmovdqu' -- `vmovdqu xmm1,XMMWORD PTR .L_2il0floatpacket.11[rip]'
blake2b.S: blake2b.c:27: Error: unknown mnemonic `vmovdqu' -- `vmovdqu xmm0,XMMWORD PTR .L_2il0floatpacket.12[rip]'
blake2b.S: blake2b.c:28: Error: unknown mnemonic `vmovdqu' -- `vmovdqu XMMWORD PTR[448+rsp],xmm2'
blake2b.S: blake2b.c:29: Error: unknown mnemonic `vmovdqu' -- `vmovdqu XMMWORD PTR[464+rsp],xmm3'
blake2b.S: blake2b.c:30: Error: unknown mnemonic `vmovdqu' -- `vmovdqu XMMWORD PTR[496+rsp],xmm4'
blake2b.S: blake2b.c:31: Error: unknown mnemonic `vmovdqu' -- `vmovdqu XMMWORD PTR[480+rsp],xmm11'
blake2b.S: blake2b.c:32: Error: unknown mnemonic `vmovdqu' -- `vmovdqu xmm5,XMMWORD PTR .L_2il0floatpacket.17[rip]'
blake2b.S: blake2b.c:33: Error: operand 1 must be an integer or stack pointer register -- `cmp rdx,128'
blake2b.S: blake2b.c:34: Error: unknown mnemonic `jbe' -- `jbe ..B1.5'
blake2b.S: blake2b.c:37: Error: operand 1 must be an integer register -- `mov QWORD PTR[rsp],rbp'
blake2b.S: blake2b.c:38: Error: unknown mnemonic `lea' -- `lea rsi,QWORD PTR[-1+rdx]'
blake2b.S: blake2b.c:39: Error: unknown mnemonic `sar' -- `sar rsi,6'
blake2b.S: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avxicc
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avxicc
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avxicc
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avxicc

Compiler output

Implementation: T:ref
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake2b-ref.c: In file included from blake2b-ref.c:19:
blake2b-ref.c: blake2.h:101:5: error: size of array element is not a multiple of its alignment
blake2b-ref.c: 101 | blake2s_state S[8][1];
blake2b-ref.c: | ^~~~~~~~~~~~~
blake2b-ref.c: blake2.h:102:5: error: size of array element is not a multiple of its alignment
blake2b-ref.c: 102 | blake2s_state R[1];
blake2b-ref.c: | ^~~~~~~~~~~~~
blake2b-ref.c: blake2.h:109:5: error: size of array element is not a multiple of its alignment
blake2b-ref.c: 109 | blake2b_state S[4][1];
blake2b-ref.c: | ^~~~~~~~~~~~~
blake2b-ref.c: blake2.h:110:5: error: size of array element is not a multiple of its alignment
blake2b-ref.c: 110 | blake2b_state R[1];
blake2b-ref.c: | ^~~~~~~~~~~~~
blake2b-ref.c: blake2b-ref.c: In function 'blake2b':
blake2b-ref.c: blake2b-ref.c:342:3: error: size of array element is not a multiple of its alignment
blake2b-ref.c: 342 | blake2b_state S[1];
blake2b-ref.c: | ^~~~~~~~~~~~~

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref

Compiler output

Implementation: T:xmm
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
blake2b.c: In file included from blake2b.c:6:
blake2b.c: ./blake2-config.h:68:2: error: "This code requires at least SSE2."
blake2b.c: #error "This code requires at least SSE2."
blake2b.c: ^
blake2b.c: In file included from blake2b.c:11:
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
blake2b.c: #error "This header is only meant to be used on x86 and x64 architecture"
blake2b.c: ^
blake2b.c: In file included from blake2b.c:11:
blake2b.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:17:
blake2b.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/x86gprintrin.h:15:
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/hresetintrin.h:42:27: error: invalid input constraint 'a' in asm
blake2b.c: __asm__ ("hreset $0" :: "a"(__eax));
blake2b.c: ^
blake2b.c: In file included from blake2b.c:11:
blake2b.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:21:
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
blake2b.c: #error "This header is only meant to be used on x86 and x64 architecture"
blake2b.c: ^
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:54:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
blake2b.c: return (__m64)__builtin_ia32_vec_init_v2si(__i, 0);
blake2b.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake2b.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:133:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
blake2b.c: return (__m64)__builtin_ia32_packsswb((__v4hi)__m1, (__v4hi)__m2);
blake2b.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake2b.c: ...

Number of similar (compiler,implementation) pairs: 2, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:xmm T:ymm

Compiler output

Implementation: T:xmm
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blake2b.c: In file included from blake2b.c:5:
blake2b.c: blake2.h:89:5: error: size of array element is not a multiple of its alignment
blake2b.c: 89 | blake2s_state S[8][1];
blake2b.c: | ^~~~~~~~~~~~~
blake2b.c: blake2.h:90:5: error: size of array element is not a multiple of its alignment
blake2b.c: 90 | blake2s_state R[1];
blake2b.c: | ^~~~~~~~~~~~~
blake2b.c: blake2.h:97:5: error: size of array element is not a multiple of its alignment
blake2b.c: 97 | blake2b_state S[4][1];
blake2b.c: | ^~~~~~~~~~~~~
blake2b.c: blake2.h:98:5: error: size of array element is not a multiple of its alignment
blake2b.c: 98 | blake2b_state R[1];
blake2b.c: | ^~~~~~~~~~~~~
blake2b.c: In file included from blake2b.c:6:
blake2b.c: blake2-config.h:68:2: error: #error "This code requires at least SSE2."
blake2b.c: 68 | #error "This code requires at least SSE2."
blake2b.c: | ^~~~~
blake2b.c: blake2b.c:11:10: fatal error: immintrin.h: No such file or directory
blake2b.c: 11 | #include <immintrin.h>
blake2b.c: | ^~~~~~~~~~~~~
blake2b.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:xmm
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:xmm
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:xmm
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:xmm
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ymm
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ymm
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ymm
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ymm