Implementation notes: aarch64, pi3bplus, crypto_stream/chacha20

Computer: pi3bplus
Microarchitecture: aarch64; Cortex-A53 (410fd034)
Architecture: aarch64
CPU ID: 410fd034
SUPERCOP version: 202311020231107
Operation: crypto_stream
Primitive: chacha20
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
83753752 0 417063 824 816dolbeau/arm-neongcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
83754704 0 418871 824 816dolbeau/arm-neongcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
93753292 0 415447 808 800dolbeau/arm-neongcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
107503928 0 417255 824 816dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
107504880 0 419063 824 816dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
127504172 0 417358 816 808dolbeau/arm-neongcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
135003488 0 415671 808 800dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
141252148 0 414311 808 800e/mergedgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
150002896 0 416207 824 816e/mergedgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
150003528 0 417687 824 816e/mergedgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
152504384 0 417598 816 808dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
152503232 0 417407 824 816e/refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
158754620 0 418799 824 816e/regsgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
160006228 0 418407 808 800dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
180007868 0 421070 816 808dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
187507128 0 420455 824 816dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
192508312 0 422495 824 816dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
238752168 0 415487 824 816e/refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
243752704 0 416015 824 816e/regsgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
280001876 0 414023 808 800e/refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
307502028 0 414175 808 800e/regsgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
342503680 0 416854 816 808e/mergedgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
428753068 0 416246 816 808e/regsgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107
511252604 0 415782 816 808e/refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111120231107

Compiler output

Implementation: T:cryptopp
Security model: timingleaks
Compiler: g++ -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
stream.cpp: stream.cpp:1:10: fatal error: cryptopp/chacha.h: No such file or directory
stream.cpp: 1 | #include <cryptopp/chacha.h>
stream.cpp: | ^~~~~~~~~~~~~~~~~~~
stream.cpp: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
g++ -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:cryptopp
g++ -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:cryptopp
g++ -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:cryptopp
g++ -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:cryptopp

Compiler output

Implementation: dolbeau/arm-sve
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
chacha.c: In file included from chacha.c:94:
chacha.c: uX.h: In function 'crypto_stream_chacha20_dolbeau_arm_sve_constbranchindex_ECRYPT_encrypt_bytes':
chacha.c: uX.h:21:15: error: ACLE function 'svcntb' requires ISA extension 'sve'
chacha.c: 21 | uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: | ^~~~~~
chacha.c: uX.h:21:15: note: you can enable 'sve' using the command-line option '-march', or by using the 'target' attribute or pragma

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve

Compiler output

Implementation: dolbeau/arm-sve2
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
chacha.c: In file included from chacha.c:94:
chacha.c: uX.h: In function 'crypto_stream_chacha20_dolbeau_arm_sve2_constbranchindex_ECRYPT_encrypt_bytes':
chacha.c: uX.h:18:15: error: ACLE function 'svcntb' requires ISA extension 'sve'
chacha.c: 18 | uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: | ^~~~~~
chacha.c: uX.h:18:15: note: you can enable 'sve' using the command-line option '-march', or by using the 'target' attribute or pragma

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2

Compiler output

Implementation: krovetz/vec128
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
stream.c: stream.c:80:2: error: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: 80 | #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: | ^~~~~
stream.c: stream.c: In function 'crypto_stream_chacha20_krovetz_vec128_constbranchindex_xor':
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' [-Wimplicit-function-declaration]
stream.c: 151 | vec s3 = NONCE(np);
stream.c: | ^~~~~
stream.c: stream.c:151:14: error: incompatible types when initializing type 'vec' {aka '__vector(4) unsigned int'} using type 'int'
stream.c: stream.c:91:19: error: 'VBPI' undeclared (first use in this function); did you mean 'BPI'?
stream.c: 91 | #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: | ^~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: 152 | for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: | ^~~
stream.c: stream.c:91:19: note: each undeclared identifier is reported only once for each function it appears in
stream.c: 91 | #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: | ^~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: 152 | for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: | ^~~
stream.c: stream.c:91:26: error: 'GPR_TOO' undeclared (first use in this function)
stream.c: 91 | #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: | ^~~~~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: 152 | for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128