Implementation notes: aarch64, pi4b, crypto_stream/chacha20

Computer: pi4b
Microarchitecture: aarch64; Cortex-A72 (410fd083)
Architecture: aarch64
CPU ID: 410fd083
SUPERCOP version: 20240425
Operation: crypto_stream
Primitive: chacha20
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
82095012 0 417476 816 800dolbeau/arm-neongcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
82104108 0 415596 816 792dolbeau/arm-neongcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
85534228 0 117746 840 800dolbeau/arm-neonclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
90074220 0 415724 816 792dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
90125124 0 417604 816 800dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
95544468 0 118002 840 800dolbeau/generic-gccsimd128clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
969314057 2240 01691799 147001 15016T:cryptoppg++_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
971413765 2240 01666041 147009 15016T:cryptoppg++_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
973613638 2240 01674103 147009 15016T:cryptoppg++_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
97426713 2880 01655704 147649 15000T:cryptoppg++_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
10410652 0 012743 888 792opensslgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
10459480 0 014558 896 792opensslclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
10475612 0 011663 872 784opensslgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
10490636 0 012767 888 792opensslgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
10503828 0 013903 888 800opensslgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
105833444 0 413908 800 784dolbeau/arm-neongcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
107468996 0 421476 816 800dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
108638020 0 419532 816 792dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
111866020 0 119554 840 800dolbeau/generic-gccsimd256clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
126152588 0 414068 816 792e/mergedgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
126152272 0 412716 800 784e/mergedgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
132696988 0 417452 800 784dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
135657956 0 419419 808 792dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
138413788 0 416260 816 800e/mergedgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
138873600 0 414084 800 784dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
139353380 0 415852 816 800e/refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
141334320 0 415795 808 792dolbeau/arm-neongcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
142455104 0 417580 816 800e/regsgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
143402828 0 116330 840 800e/refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
144092828 0 116330 840 800e/regsclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
147904500 0 415987 808 792dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
157293340 0 116850 840 800e/mergedclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
158793864 0 415307 808 792e/mergedgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
171522364 0 413860 816 792e/regsgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
186912012 0 412452 800 784e/refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
193293332 0 414779 808 792e/regsgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
199992324 0 413820 816 792e/refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
203942180 0 412612 800 784e/regsgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
218542836 0 414275 808 792e/refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425

Compiler output

Implementation: dolbeau/arm-sve
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
chacha.c: In file included from chacha.c:11:
chacha.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/arm_sve.h:15:2: error: "SVE support not enabled"
chacha.c: #error "SVE support not enabled"
chacha.c: ^
chacha.c: In file included from chacha.c:94:
chacha.c: ./uX.h:21:9: error: expected ';' after expression
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ;
chacha.c: ./uX.h:21:1: error: use of undeclared identifier 'uint64_t'
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ./uX.h:21:10: error: use of undeclared identifier 'vc'
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ./uX.h:21:15: warning: implicit declaration of function 'svcntb' is invalid in C99 [-Wimplicit-function-declaration]
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ./uX.h:22:15: error: use of undeclared identifier 'vc'
chacha.c: if (bytes>=16*vc) {
chacha.c: ^
chacha.c: ./uX.h:24:3: error: unknown type name 'svuint32_t'; did you mean '__uint32_t'?
chacha.c: svuint32_t x_0 = svdup_n_u32(x[0]);
chacha.c: ^~~~~~~~~~
chacha.c: __uint32_t
chacha.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE dolbeau/arm-sve

Compiler output

Implementation: dolbeau/arm-sve
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
chacha.c: In file included from chacha.c:94:
chacha.c: uX.h: In function 'crypto_stream_chacha20_dolbeau_arm_sve_constbranchindex_ECRYPT_encrypt_bytes':
chacha.c: uX.h:21:15: error: ACLE function 'svcntb' requires ISA extension 'sve'
chacha.c: 21 | uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: | ^~~~~~
chacha.c: uX.h:21:15: note: you can enable 'sve' using the command-line option '-march', or by using the 'target' attribute or pragma

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve

Compiler output

Implementation: dolbeau/arm-sve2
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
chacha.c: In file included from chacha.c:11:
chacha.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/arm_sve.h:15:2: error: "SVE support not enabled"
chacha.c: #error "SVE support not enabled"
chacha.c: ^
chacha.c: In file included from chacha.c:94:
chacha.c: ./uX.h:18:9: error: expected ';' after expression
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ;
chacha.c: ./uX.h:18:1: error: use of undeclared identifier 'uint64_t'
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ./uX.h:18:10: error: use of undeclared identifier 'vc'
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ./uX.h:18:15: warning: implicit declaration of function 'svcntb' is invalid in C99 [-Wimplicit-function-declaration]
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ./uX.h:19:15: error: use of undeclared identifier 'vc'
chacha.c: if (bytes>=16*vc) {
chacha.c: ^
chacha.c: ./uX.h:21:3: error: unknown type name 'svuint32_t'; did you mean '__uint32_t'?
chacha.c: svuint32_t x_0 = svdup_n_u32(x[0]);
chacha.c: ^~~~~~~~~~
chacha.c: __uint32_t
chacha.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE dolbeau/arm-sve2

Compiler output

Implementation: dolbeau/arm-sve2
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
chacha.c: In file included from chacha.c:94:
chacha.c: uX.h: In function 'crypto_stream_chacha20_dolbeau_arm_sve2_constbranchindex_ECRYPT_encrypt_bytes':
chacha.c: uX.h:18:15: error: ACLE function 'svcntb' requires ISA extension 'sve'
chacha.c: 18 | uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: | ^~~~~~
chacha.c: uX.h:18:15: note: you can enable 'sve' using the command-line option '-march', or by using the 'target' attribute or pragma

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2

Compiler output

Implementation: krovetz/vec128
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: stream.c:80:2: error: -- Implementation supports only machines with neon, altivec or SSE2
stream.c: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: ^
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: vec s3 = NONCE(np);
stream.c: ^
stream.c: stream.c:151:9: error: initializing 'vec' (vector of 4 'unsigned int' values) with an expression of incompatible type 'int'
stream.c: vec s3 = NONCE(np);
stream.c: ^ ~~~~~~~~~
stream.c: stream.c:152:36: error: use of undeclared identifier 'VBPI'
stream.c: for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ^
stream.c: stream.c:91:19: note: expanded from macro 'BPI'
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:152:36: error: use of undeclared identifier 'GPR_TOO'
stream.c: stream.c:91:26: note: expanded from macro 'BPI'
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:155:19: error: use of undeclared identifier 'ONE'
stream.c: v7 = v3 + ONE;
stream.c: ^
stream.c: stream.c:176:13: warning: implicit declaration of function 'ROTW16' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: DQROUND_VECTORS(v0,v1,v2,v3)
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE krovetz/vec128

Compiler output

Implementation: krovetz/vec128
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
stream.c: stream.c:80:2: error: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: 80 | #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: | ^~~~~
stream.c: stream.c: In function 'crypto_stream_chacha20_krovetz_vec128_constbranchindex_xor':
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' [-Wimplicit-function-declaration]
stream.c: 151 | vec s3 = NONCE(np);
stream.c: | ^~~~~
stream.c: stream.c:151:14: error: incompatible types when initializing type 'vec' {aka '__vector(4) unsigned int'} using type 'int'
stream.c: stream.c:91:19: error: 'VBPI' undeclared (first use in this function); did you mean 'BPI'?
stream.c: 91 | #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: | ^~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: 152 | for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: | ^~~
stream.c: stream.c:91:19: note: each undeclared identifier is reported only once for each function it appears in
stream.c: 91 | #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: | ^~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: 152 | for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: | ^~~
stream.c: stream.c:91:26: error: 'GPR_TOO' undeclared (first use in this function)
stream.c: 91 | #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: | ^~~~~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: 152 | for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128