Implementation notes: aarch64, gcc185, crypto_stream/chacha8

Computer: gcc185
Microarchitecture: aarch64; Skylark (503f0002)
Architecture: aarch64
CPU ID: 503f0002
SUPERCOP version: 20221122
Operation: crypto_stream
Primitive: chacha8
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
57002412 0 415629 824 808e/mergedgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
57003356 0 417798 832 824e/mergedgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
62252044 0 414285 808 800e/mergedgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
63003004 0 417462 832 824dolbeau/mipsel-msagcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
63754088 0 119282 816 816dolbeau/arm-neonclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022101020221005
63753028 0 417470 832 824e/refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
63753028 0 417470 832 824e/regsgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
65254812 0 419246 832 824dolbeau/arm-neongcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
66003948 0 417173 824 808dolbeau/arm-neongcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
68252888 0 118074 816 816e/mergedclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022101020221005
69003328 0 415565 808 800dolbeau/arm-neongcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
69002632 0 117818 816 816e/regsclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022101020221005
73502640 0 117826 816 816dolbeau/mipsel-msaclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022101020221005
74252640 0 117818 816 816e/refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022101020221005
79504308 0 417477 824 808dolbeau/arm-neongcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
87752172 0 415397 824 808e/regsgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
907517919 2368 01200757 146929 15144T:cryptoppg++_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
915018242 2368 01202307 146929 15144T:cryptoppg++_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
915016879 2368 01199424 146937 15144T:cryptoppg++_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
91507411 2880 01190736 147577 15128T:cryptoppg++_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
107253720 0 416885 824 808e/mergedgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
120001816 0 414061 808 800dolbeau/mipsel-msagcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
120001816 0 414053 808 800e/refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
121502204 0 415445 824 808dolbeau/mipsel-msagcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
132002188 0 415405 824 808e/refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
138752596 0 415781 824 808dolbeau/mipsel-msagcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
138752596 0 415765 824 808e/refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
166501976 0 414197 808 800e/regsgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122
215253092 0 416261 824 808e/regsgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022112420221122

Compiler output

Implementation: amd64-ssse3
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
chacha.S: chacha.S:19:5: error: unknown token in expression
chacha.S: mov %rsp,%r11
chacha.S: ^
chacha.S: chacha.S:19:5: error: invalid operand
chacha.S: mov %rsp,%r11
chacha.S: ^
chacha.S: chacha.S:20:9: error: unknown token in expression
chacha.S: and $31,%r11
chacha.S: ^
chacha.S: chacha.S:20:9: error: invalid operand
chacha.S: and $31,%r11
chacha.S: ^
chacha.S: chacha.S:21:10: error: unknown token in expression
chacha.S: add $384,%r11
chacha.S: ^
chacha.S: chacha.S:21:10: error: invalid operand
chacha.S: add $384,%r11
chacha.S: ^
chacha.S: chacha.S:22:5: error: unknown token in expression
chacha.S: sub %r11,%rsp
chacha.S: ^
chacha.S: chacha.S:22:5: error: invalid operand
chacha.S: sub %r11,%rsp
chacha.S: ^
chacha.S: chacha.S:23:5: error: unknown token in expression
chacha.S: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE amd64-ssse3

Compiler output

Implementation: amd64-ssse3
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
chacha.S: chacha.S: Assembler messages:
chacha.S: chacha.S:19: Error: operand 1 must be an integer register -- `mov %rsp,%r11'
chacha.S: chacha.S:20: Error: operand 1 must be an integer or stack pointer register -- `and $31,%r11'
chacha.S: chacha.S:21: Error: operand 1 must be an integer or stack pointer register -- `add $384,%r11'
chacha.S: chacha.S:22: Error: operand 1 must be an integer or stack pointer register -- `sub %r11,%rsp'
chacha.S: chacha.S:23: Error: operand 1 must be an integer register -- `mov %rdi,%r8'
chacha.S: chacha.S:24: Error: operand 1 must be an integer register -- `mov %rsi,%rsi'
chacha.S: chacha.S:25: Error: operand 1 must be an integer register -- `mov %rsi,%rdi'
chacha.S: chacha.S:26: Error: operand 1 must be an integer register -- `mov %rdx,%rdx'
chacha.S: chacha.S:27: Error: operand 1 must be an integer or stack pointer register -- `cmp $0,%rdx'
chacha.S: chacha.S:29: Error: unknown mnemonic `jbe' -- `jbe ._done'
chacha.S: chacha.S:31: Error: operand 1 must be an integer register -- `mov $0,%rax'
chacha.S: chacha.S:33: Error: operand 1 must be an integer register -- `mov %rdx,%rcx'
chacha.S: chacha.S:35: Error: unknown mnemonic `rep' -- `rep stosb'
chacha.S: chacha.S:37: Error: operand 1 must be an integer or stack pointer register -- `sub %rdx,%rdi'
chacha.S: chacha.S:39: Error: unknown mnemonic `jmp' -- `jmp ._start'
chacha.S: chacha.S:47: Error: operand 1 must be an integer register -- `mov %rsp,%r11'
chacha.S: chacha.S:48: Error: operand 1 must be an integer or stack pointer register -- `and $31,%r11'
chacha.S: chacha.S:49: Error: operand 1 must be an integer or stack pointer register -- `add $384,%r11'
chacha.S: chacha.S:50: Error: operand 1 must be an integer or stack pointer register -- `sub %r11,%rsp'
chacha.S: chacha.S:52: Error: operand 1 must be an integer register -- `mov %rdi,%r8'
chacha.S: chacha.S:54: Error: operand 1 must be an integer register -- `mov %rsi,%rsi'
chacha.S: chacha.S:56: Error: operand 1 must be an integer register -- `mov %rdx,%rdi'
chacha.S: chacha.S:58: Error: operand 1 must be an integer register -- `mov %rcx,%rdx'
chacha.S: chacha.S:60: Error: operand 1 must be an integer or stack pointer register -- `cmp $0,%rdx'
chacha.S: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE amd64-ssse3
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE amd64-ssse3
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE amd64-ssse3
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE amd64-ssse3

Compiler output

Implementation: goll_gueron
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: In file included from stream.c:11:
stream.c: /usr/lib64/clang/14.0.6/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
stream.c: #error "This header is only meant to be used on x86 and x64 architecture"
stream.c: ^
stream.c: In file included from stream.c:11:
stream.c: In file included from /usr/lib64/clang/14.0.6/include/immintrin.h:17:
stream.c: In file included from /usr/lib64/clang/14.0.6/include/x86gprintrin.h:15:
stream.c: /usr/lib64/clang/14.0.6/include/hresetintrin.h:42:27: error: invalid input constraint 'a' in asm
stream.c: __asm__ ("hreset $0" :: "a"(__eax));
stream.c: ^
stream.c: In file included from stream.c:11:
stream.c: In file included from /usr/lib64/clang/14.0.6/include/immintrin.h:21:
stream.c: /usr/lib64/clang/14.0.6/include/mmintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
stream.c: #error "This header is only meant to be used on x86 and x64 architecture"
stream.c: ^
stream.c: /usr/lib64/clang/14.0.6/include/mmintrin.h:54:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: return (__m64)__builtin_ia32_vec_init_v2si(__i, 0);
stream.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: /usr/lib64/clang/14.0.6/include/mmintrin.h:133:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: return (__m64)__builtin_ia32_packsswb((__v4hi)__m1, (__v4hi)__m2);
stream.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: /usr/lib64/clang/14.0.6/include/mmintrin.h:163:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: return (__m64)__builtin_ia32_packssdw((__v2si)__m1, (__v2si)__m2);
stream.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: /usr/lib64/clang/14.0.6/include/mmintrin.h:193:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE goll_gueron

Compiler output

Implementation: goll_gueron
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
stream.c: stream.c:11:10: fatal error: immintrin.h: No such file or directory
stream.c: #include <immintrin.h>
stream.c: ^~~~~~~~~~~~~
stream.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE goll_gueron
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE goll_gueron
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE goll_gueron
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE goll_gueron

Compiler output

Implementation: krovetz/avx2
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: In file included from stream.c:8:
stream.c: /usr/lib64/clang/14.0.6/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
stream.c: #error "This header is only meant to be used on x86 and x64 architecture"
stream.c: ^
stream.c: In file included from stream.c:8:
stream.c: In file included from /usr/lib64/clang/14.0.6/include/immintrin.h:17:
stream.c: In file included from /usr/lib64/clang/14.0.6/include/x86gprintrin.h:15:
stream.c: /usr/lib64/clang/14.0.6/include/hresetintrin.h:42:27: error: invalid input constraint 'a' in asm
stream.c: __asm__ ("hreset $0" :: "a"(__eax));
stream.c: ^
stream.c: In file included from stream.c:8:
stream.c: In file included from /usr/lib64/clang/14.0.6/include/immintrin.h:21:
stream.c: /usr/lib64/clang/14.0.6/include/mmintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
stream.c: #error "This header is only meant to be used on x86 and x64 architecture"
stream.c: ^
stream.c: /usr/lib64/clang/14.0.6/include/mmintrin.h:54:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: return (__m64)__builtin_ia32_vec_init_v2si(__i, 0);
stream.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: /usr/lib64/clang/14.0.6/include/mmintrin.h:133:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: return (__m64)__builtin_ia32_packsswb((__v4hi)__m1, (__v4hi)__m2);
stream.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: /usr/lib64/clang/14.0.6/include/mmintrin.h:163:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: return (__m64)__builtin_ia32_packssdw((__v2si)__m1, (__v2si)__m2);
stream.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: /usr/lib64/clang/14.0.6/include/mmintrin.h:193:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE krovetz/avx2

Compiler output

Implementation: krovetz/avx2
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
stream.c: stream.c:8:10: fatal error: immintrin.h: No such file or directory
stream.c: #include <immintrin.h>
stream.c: ^~~~~~~~~~~~~
stream.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/avx2

Compiler output

Implementation: krovetz/vec128
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: stream.c:80:2: error: -- Implementation supports only machines with neon, altivec or SSE2
stream.c: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: ^
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: vec s3 = NONCE(np);
stream.c: ^
stream.c: stream.c:151:9: error: initializing 'vec' (vector of 4 'unsigned int' values) with an expression of incompatible type 'int'
stream.c: vec s3 = NONCE(np);
stream.c: ^ ~~~~~~~~~
stream.c: stream.c:152:36: error: use of undeclared identifier 'VBPI'
stream.c: for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ^
stream.c: stream.c:91:19: note: expanded from macro 'BPI'
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:152:36: error: use of undeclared identifier 'GPR_TOO'
stream.c: stream.c:91:26: note: expanded from macro 'BPI'
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:155:19: error: use of undeclared identifier 'ONE'
stream.c: v7 = v3 + ONE;
stream.c: ^
stream.c: stream.c:176:13: warning: implicit declaration of function 'ROTW16' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: DQROUND_VECTORS(v0,v1,v2,v3)
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE krovetz/vec128

Compiler output

Implementation: krovetz/vec128
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
stream.c: stream.c:80:2: error: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: ^~~~~
stream.c: stream.c: In function 'crypto_stream_chacha8_krovetz_vec128_constbranchindex_xor':
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' [-Wimplicit-function-declaration]
stream.c: vec s3 = NONCE(np);
stream.c: ^~~~~
stream.c: stream.c:151:14: error: incompatible types when initializing type 'vec' {aka '__vector(4) unsigned int'} using type 'int'
stream.c: stream.c:91:19: error: 'VBPI' undeclared (first use in this function); did you mean 'BPI'?
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ^~~
stream.c: stream.c:91:19: note: each undeclared identifier is reported only once for each function it appears in
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ^~~
stream.c: stream.c:91:26: error: 'GPR_TOO' undeclared (first use in this function)
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^~~~~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128