Implementation notes: aarch64, pi3bplus, crypto_stream/chacha20

Computer: pi3bplus
Microarchitecture: aarch64; Cortex-A53 (410fd034)
Architecture: aarch64
CPU ID: 410fd034
SUPERCOP version: 20230530
Operation: crypto_stream
Primitive: chacha20
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
85004604 0 419399 912 824dolbeau/arm-neongcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
86253836 0 119176 824 824dolbeau/arm-neonclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023060320230530
86253928 0 417398 904 808dolbeau/arm-neongcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
98753324 0 415942 888 800dolbeau/arm-neongcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
108753972 0 119320 824 824dolbeau/generic-gccsimd128clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023060320230530
108754040 0 417526 904 808dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
108754732 0 419543 912 824dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
130004300 0 417870 904 808dolbeau/arm-neongcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
131255620 0 120968 824 824dolbeau/generic-gccsimd256clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023060320230530
135002392 0 415846 904 808e/mergedgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
135003228 0 418015 912 824e/mergedgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
136253428 0 416070 888 800dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
140002948 0 417743 912 824e/refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
140002948 0 417743 912 824e/regsgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
141252044 0 414670 888 800e/mergedgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
150002668 0 117992 824 824e/mergedclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023060320230530
153754368 0 417950 904 808dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
155002444 0 117768 824 824e/refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023060320230530
156252516 0 117848 824 824e/regsclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023060320230530
166256816 0 419446 888 800dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
176257832 0 421318 904 808dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
176258548 0 423367 912 824dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
182507812 0 421390 904 808dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
215002164 0 415646 904 808e/regsgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
290001816 0 414430 888 800e/refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
303751976 0 414574 888 800e/regsgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
308752164 0 415638 904 808e/refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
345003732 0 417286 904 808e/mergedgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
472503076 0 416630 904 808e/regsgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530
512502616 0 416166 904 808e/refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023060320230530

Compiler output

Implementation: T:cryptopp
Security model: timingleaks
Compiler: g++ -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
stream.cpp: stream.cpp:1:10: fatal error: cryptopp/chacha.h: No such file or directory
stream.cpp: #include <cryptopp/chacha.h>
stream.cpp: ^~~~~~~~~~~~~~~~~~~
stream.cpp: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
g++ -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:cryptopp
g++ -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:cryptopp
g++ -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:cryptopp
g++ -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:cryptopp

Compiler output

Implementation: dolbeau/arm-sve
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
chacha.c: chacha.c:11:10: fatal error: 'arm_sve.h' file not found
chacha.c: #include <arm_sve.h>
chacha.c: ^~~~~~~~~~~
chacha.c: 1 error generated.

Number of similar (compiler,implementation) pairs: 2, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE dolbeau/arm-sve dolbeau/arm-sve2

Compiler output

Implementation: dolbeau/arm-sve
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
chacha.c: chacha.c:11:10: fatal error: arm_sve.h: No such file or directory
chacha.c: #include <arm_sve.h>
chacha.c: ^~~~~~~~~~~
chacha.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2

Compiler output

Implementation: krovetz/vec128
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: stream.c:80:2: error: -- Implementation supports only machines with neon, altivec or SSE2
stream.c: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: ^
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: vec s3 = NONCE(np);
stream.c: ^
stream.c: stream.c:151:9: error: initializing 'vec' (vector of 4 'unsigned int' values) with an expression of incompatible type 'int'
stream.c: vec s3 = NONCE(np);
stream.c: ^ ~~~~~~~~~
stream.c: stream.c:152:36: error: use of undeclared identifier 'VBPI'
stream.c: for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ^
stream.c: stream.c:91:19: note: expanded from macro 'BPI'
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:152:36: error: use of undeclared identifier 'GPR_TOO'
stream.c: stream.c:91:26: note: expanded from macro 'BPI'
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:155:19: error: use of undeclared identifier 'ONE'
stream.c: v7 = v3 + ONE;
stream.c: ^
stream.c: stream.c:176:13: warning: implicit declaration of function 'ROTW16' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: DQROUND_VECTORS(v0,v1,v2,v3)
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE krovetz/vec128

Compiler output

Implementation: krovetz/vec128
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
stream.c: stream.c:80:2: error: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: ^~~~~
stream.c: stream.c: In function 'crypto_stream_chacha20_krovetz_vec128_constbranchindex_xor':
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' [-Wimplicit-function-declaration]
stream.c: vec s3 = NONCE(np);
stream.c: ^~~~~
stream.c: stream.c:151:14: error: incompatible types when initializing type 'vec' {aka '__vector(4) unsigned int'} using type 'int'
stream.c: stream.c:91:19: error: 'VBPI' undeclared (first use in this function); did you mean 'BPI'?
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ^~~
stream.c: stream.c:91:19: note: each undeclared identifier is reported only once for each function it appears in
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ^~~
stream.c: stream.c:91:26: error: 'GPR_TOO' undeclared (first use in this function)
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^~~~~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128