Implementation notes: aarch64, pi4b, crypto_stream/chacha12

Computer: pi4b
Architecture: aarch64
CPU ID: 410fd083
SUPERCOP version: 20221019
Operation: crypto_stream
Primitive: chacha12
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
51304124 0 415612 816 792dolbeau/arm-neongcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
51515004 0 417468 816 800dolbeau/arm-neongcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
54344228 0 117442 840 800dolbeau/arm-neonclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100720221005
66203444 0 413908 800 784dolbeau/arm-neongcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
663216969 2368 01665911 146937 15128T:cryptoppg++_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
664716625 2368 01668813 146945 15128T:cryptoppg++_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
669316618 2368 01664804 146945 15128T:cryptoppg++_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
67528022 2880 01650204 147585 15112T:cryptoppg++_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
79592588 0 414068 816 792e/mergedgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
79592272 0 412716 800 784e/mergedgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
86444324 0 415795 808 792dolbeau/arm-neongcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
91853732 0 416204 816 800e/mergedgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
92853476 0 415964 816 800dolbeau/mipsel-msagcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
93283380 0 415852 816 800e/refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
94442828 0 116042 840 800dolbeau/mipsel-msaclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100720221005
94442828 0 116026 840 800e/refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100720221005
95112828 0 116026 840 800e/regsclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100720221005
95895104 0 417580 816 800e/regsgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
102033340 0 116546 840 800e/mergedclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100720221005
108873864 0 415307 808 792e/mergedgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
125112396 0 413892 816 792e/regsgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
140352012 0 412452 800 784e/refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
140362012 0 412468 800 784dolbeau/mipsel-msagcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
143103332 0 414779 808 792e/regsgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
150002348 0 413844 816 792e/refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
156322428 0 413924 816 792dolbeau/mipsel-msagcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
157382180 0 412612 800 784e/regsgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
168392836 0 414291 808 792dolbeau/mipsel-msagcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506
168392836 0 414275 808 792e/refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022053020220506

Compiler output

Implementation: amd64-ssse3
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
chacha.S: chacha.S:19:5: error: unknown token in expression
chacha.S: mov %rsp,%r11
chacha.S: ^
chacha.S: chacha.S:19:5: error: invalid operand
chacha.S: mov %rsp,%r11
chacha.S: ^
chacha.S: chacha.S:20:9: error: unknown token in expression
chacha.S: and $31,%r11
chacha.S: ^
chacha.S: chacha.S:20:9: error: invalid operand
chacha.S: and $31,%r11
chacha.S: ^
chacha.S: chacha.S:21:10: error: unknown token in expression
chacha.S: add $384,%r11
chacha.S: ^
chacha.S: chacha.S:21:10: error: invalid operand
chacha.S: add $384,%r11
chacha.S: ^
chacha.S: chacha.S:22:5: error: unknown token in expression
chacha.S: sub %r11,%rsp
chacha.S: ^
chacha.S: chacha.S:22:5: error: invalid operand
chacha.S: sub %r11,%rsp
chacha.S: ^
chacha.S: chacha.S:23:5: error: unknown token in expression
chacha.S: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE amd64-ssse3

Compiler output

Implementation: amd64-ssse3
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
chacha.S: chacha.S: Assembler messages:
chacha.S: chacha.S:19: Error: operand 1 must be an integer register -- `mov %rsp,%r11'
chacha.S: chacha.S:20: Error: operand 1 must be an integer or stack pointer register -- `and $31,%r11'
chacha.S: chacha.S:21: Error: operand 1 must be an integer or stack pointer register -- `add $384,%r11'
chacha.S: chacha.S:22: Error: operand 1 must be an integer or stack pointer register -- `sub %r11,%rsp'
chacha.S: chacha.S:23: Error: operand 1 must be an integer register -- `mov %rdi,%r8'
chacha.S: chacha.S:24: Error: operand 1 must be an integer register -- `mov %rsi,%rsi'
chacha.S: chacha.S:25: Error: operand 1 must be an integer register -- `mov %rsi,%rdi'
chacha.S: chacha.S:26: Error: operand 1 must be an integer register -- `mov %rdx,%rdx'
chacha.S: chacha.S:27: Error: operand 1 must be an integer or stack pointer register -- `cmp $0,%rdx'
chacha.S: chacha.S:29: Error: unknown mnemonic `jbe' -- `jbe ._done'
chacha.S: chacha.S:31: Error: operand 1 must be an integer register -- `mov $0,%rax'
chacha.S: chacha.S:33: Error: operand 1 must be an integer register -- `mov %rdx,%rcx'
chacha.S: chacha.S:35: Error: unknown mnemonic `rep' -- `rep stosb'
chacha.S: chacha.S:37: Error: operand 1 must be an integer or stack pointer register -- `sub %rdx,%rdi'
chacha.S: chacha.S:39: Error: unknown mnemonic `jmp' -- `jmp ._start'
chacha.S: chacha.S:47: Error: operand 1 must be an integer register -- `mov %rsp,%r11'
chacha.S: chacha.S:48: Error: operand 1 must be an integer or stack pointer register -- `and $31,%r11'
chacha.S: chacha.S:49: Error: operand 1 must be an integer or stack pointer register -- `add $384,%r11'
chacha.S: chacha.S:50: Error: operand 1 must be an integer or stack pointer register -- `sub %r11,%rsp'
chacha.S: chacha.S:52: Error: operand 1 must be an integer register -- `mov %rdi,%r8'
chacha.S: chacha.S:54: Error: operand 1 must be an integer register -- `mov %rsi,%rsi'
chacha.S: chacha.S:56: Error: operand 1 must be an integer register -- `mov %rdx,%rdi'
chacha.S: chacha.S:58: Error: operand 1 must be an integer register -- `mov %rcx,%rdx'
chacha.S: chacha.S:60: Error: operand 1 must be an integer or stack pointer register -- `cmp $0,%rdx'
chacha.S: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE amd64-ssse3
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE amd64-ssse3
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE amd64-ssse3
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE amd64-ssse3

Compiler output

Implementation: goll_gueron
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: In file included from stream.c:11:
stream.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
stream.c: #error "This header is only meant to be used on x86 and x64 architecture"
stream.c: ^
stream.c: In file included from stream.c:11:
stream.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:17:
stream.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/x86gprintrin.h:15:
stream.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/hresetintrin.h:42:27: error: invalid input constraint 'a' in asm
stream.c: __asm__ ("hreset $0" :: "a"(__eax));
stream.c: ^
stream.c: In file included from stream.c:11:
stream.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:21:
stream.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
stream.c: #error "This header is only meant to be used on x86 and x64 architecture"
stream.c: ^
stream.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:54:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: return (__m64)__builtin_ia32_vec_init_v2si(__i, 0);
stream.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:133:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: return (__m64)__builtin_ia32_packsswb((__v4hi)__m1, (__v4hi)__m2);
stream.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:163:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: return (__m64)__builtin_ia32_packssdw((__v2si)__m1, (__v2si)__m2);
stream.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:193:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE goll_gueron

Compiler output

Implementation: goll_gueron
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
stream.c: stream.c:11:10: fatal error: immintrin.h: No such file or directory
stream.c: 11 | #include <immintrin.h>
stream.c: | ^~~~~~~~~~~~~
stream.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE goll_gueron
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE goll_gueron
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE goll_gueron
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE goll_gueron

Compiler output

Implementation: krovetz/avx2
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: In file included from stream.c:8:
stream.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
stream.c: #error "This header is only meant to be used on x86 and x64 architecture"
stream.c: ^
stream.c: In file included from stream.c:8:
stream.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:17:
stream.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/x86gprintrin.h:15:
stream.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/hresetintrin.h:42:27: error: invalid input constraint 'a' in asm
stream.c: __asm__ ("hreset $0" :: "a"(__eax));
stream.c: ^
stream.c: In file included from stream.c:8:
stream.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:21:
stream.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
stream.c: #error "This header is only meant to be used on x86 and x64 architecture"
stream.c: ^
stream.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:54:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: return (__m64)__builtin_ia32_vec_init_v2si(__i, 0);
stream.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:133:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: return (__m64)__builtin_ia32_packsswb((__v4hi)__m1, (__v4hi)__m2);
stream.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:163:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: return (__m64)__builtin_ia32_packssdw((__v2si)__m1, (__v2si)__m2);
stream.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:193:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE krovetz/avx2

Compiler output

Implementation: krovetz/avx2
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
stream.c: stream.c:8:10: fatal error: immintrin.h: No such file or directory
stream.c: 8 | #include <immintrin.h>
stream.c: | ^~~~~~~~~~~~~
stream.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/avx2

Compiler output

Implementation: krovetz/vec128
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: stream.c:80:2: error: -- Implementation supports only machines with neon, altivec or SSE2
stream.c: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: ^
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: vec s3 = NONCE(np);
stream.c: ^
stream.c: stream.c:151:9: error: initializing 'vec' (vector of 4 'unsigned int' values) with an expression of incompatible type 'int'
stream.c: vec s3 = NONCE(np);
stream.c: ^ ~~~~~~~~~
stream.c: stream.c:152:36: error: use of undeclared identifier 'VBPI'
stream.c: for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ^
stream.c: stream.c:91:19: note: expanded from macro 'BPI'
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:152:36: error: use of undeclared identifier 'GPR_TOO'
stream.c: stream.c:91:26: note: expanded from macro 'BPI'
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:155:19: error: use of undeclared identifier 'ONE'
stream.c: v7 = v3 + ONE;
stream.c: ^
stream.c: stream.c:176:13: warning: implicit declaration of function 'ROTW16' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: DQROUND_VECTORS(v0,v1,v2,v3)
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE krovetz/vec128

Compiler output

Implementation: krovetz/vec128
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
stream.c: stream.c:80:2: error: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: 80 | #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: | ^~~~~
stream.c: stream.c: In function 'crypto_stream_chacha12_krovetz_vec128_constbranchindex_xor':
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' [-Wimplicit-function-declaration]
stream.c: 151 | vec s3 = NONCE(np);
stream.c: | ^~~~~
stream.c: stream.c:151:14: error: incompatible types when initializing type 'vec' {aka '__vector(4) unsigned int'} using type 'int'
stream.c: stream.c:91:19: error: 'VBPI' undeclared (first use in this function); did you mean 'BPI'?
stream.c: 91 | #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: | ^~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: 152 | for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: | ^~~
stream.c: stream.c:91:19: note: each undeclared identifier is reported only once for each function it appears in
stream.c: 91 | #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: | ^~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: 152 | for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: | ^~~
stream.c: stream.c:91:26: error: 'GPR_TOO' undeclared (first use in this function)
stream.c: 91 | #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: | ^~~~~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: 152 | for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128