Implementation notes: aarch64, pi4b, crypto_stream/chacha20

Computer: pi4b
Microarchitecture: aarch64; Cortex-A72 (410fd083)
Architecture: aarch64
CPU ID: 410fd083
SUPERCOP version: 20240107
Operation: crypto_stream
Primitive: chacha20
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
82084108 0 415596 816 792dolbeau/arm-neongcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
82095012 0 417476 816 800dolbeau/arm-neongcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
85534228 0 117746 840 800dolbeau/arm-neonclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023122420231222
90074220 0 415724 816 792dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
90125124 0 417604 816 800dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
95544468 0 118002 840 800dolbeau/generic-gccsimd128clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023122420231222
969813638 2240 01659687 146945 15128T:cryptoppg++_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
974013765 2240 01659817 146945 15128T:cryptoppg++_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
97436713 2880 01649504 147585 15112T:cryptoppg++_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
981414057 2240 01660999 146937 15128T:cryptoppg++_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
105813444 0 413908 800 784dolbeau/arm-neongcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
107618996 0 421476 816 800dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
108498020 0 419532 816 792dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
112326020 0 119554 840 800dolbeau/generic-gccsimd256clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023122420231222
126152588 0 414068 816 792e/mergedgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
126152272 0 412716 800 784e/mergedgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
132696988 0 417452 800 784dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
135727956 0 419419 808 792dolbeau/generic-gccsimd256gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
138413788 0 416260 816 800e/mergedgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
138883600 0 414084 800 784dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
139353380 0 415852 816 800e/refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
141284320 0 415795 808 792dolbeau/arm-neongcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
142455104 0 417580 816 800e/regsgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
143402828 0 116330 840 800e/refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023122420231222
144082828 0 116330 840 800e/regsclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023122420231222
147934500 0 415987 808 792dolbeau/generic-gccsimd128gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
157973340 0 116850 840 800e/mergedclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023122420231222
158793864 0 415307 808 792e/mergedgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
171682364 0 413860 816 792e/regsgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
186982012 0 412452 800 784e/refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
193063332 0 414779 808 792e/regsgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
200252324 0 413820 816 792e/refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
203942180 0 412612 800 784e/regsgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222
218542836 0 414275 808 792e/refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023122420231222

Compiler output

Implementation: dolbeau/arm-sve
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
chacha.c: In file included from chacha.c:11:
chacha.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/arm_sve.h:15:2: error: "SVE support not enabled"
chacha.c: #error "SVE support not enabled"
chacha.c: ^
chacha.c: In file included from chacha.c:94:
chacha.c: ./uX.h:21:9: error: expected ';' after expression
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ;
chacha.c: ./uX.h:21:1: error: use of undeclared identifier 'uint64_t'
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ./uX.h:21:10: error: use of undeclared identifier 'vc'
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ./uX.h:21:15: warning: implicit declaration of function 'svcntb' is invalid in C99 [-Wimplicit-function-declaration]
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ./uX.h:22:15: error: use of undeclared identifier 'vc'
chacha.c: if (bytes>=16*vc) {
chacha.c: ^
chacha.c: ./uX.h:24:3: error: unknown type name 'svuint32_t'; did you mean '__uint32_t'?
chacha.c: svuint32_t x_0 = svdup_n_u32(x[0]);
chacha.c: ^~~~~~~~~~
chacha.c: __uint32_t
chacha.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE dolbeau/arm-sve

Compiler output

Implementation: dolbeau/arm-sve
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
chacha.c: In file included from chacha.c:94:
chacha.c: uX.h: In function 'crypto_stream_chacha20_dolbeau_arm_sve_constbranchindex_ECRYPT_encrypt_bytes':
chacha.c: uX.h:21:15: error: ACLE function 'svcntb' requires ISA extension 'sve'
chacha.c: 21 | uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: | ^~~~~~
chacha.c: uX.h:21:15: note: you can enable 'sve' using the command-line option '-march', or by using the 'target' attribute or pragma

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve

Compiler output

Implementation: dolbeau/arm-sve2
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
chacha.c: In file included from chacha.c:11:
chacha.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/arm_sve.h:15:2: error: "SVE support not enabled"
chacha.c: #error "SVE support not enabled"
chacha.c: ^
chacha.c: In file included from chacha.c:94:
chacha.c: ./uX.h:18:9: error: expected ';' after expression
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ;
chacha.c: ./uX.h:18:1: error: use of undeclared identifier 'uint64_t'
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ./uX.h:18:10: error: use of undeclared identifier 'vc'
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ./uX.h:18:15: warning: implicit declaration of function 'svcntb' is invalid in C99 [-Wimplicit-function-declaration]
chacha.c: uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: ^
chacha.c: ./uX.h:19:15: error: use of undeclared identifier 'vc'
chacha.c: if (bytes>=16*vc) {
chacha.c: ^
chacha.c: ./uX.h:21:3: error: unknown type name 'svuint32_t'; did you mean '__uint32_t'?
chacha.c: svuint32_t x_0 = svdup_n_u32(x[0]);
chacha.c: ^~~~~~~~~~
chacha.c: __uint32_t
chacha.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE dolbeau/arm-sve2

Compiler output

Implementation: dolbeau/arm-sve2
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
chacha.c: In file included from chacha.c:94:
chacha.c: uX.h: In function 'crypto_stream_chacha20_dolbeau_arm_sve2_constbranchindex_ECRYPT_encrypt_bytes':
chacha.c: uX.h:18:15: error: ACLE function 'svcntb' requires ISA extension 'sve'
chacha.c: 18 | uint64_t vc = svcntb(); /* how many bytes in a vector */
chacha.c: | ^~~~~~
chacha.c: uX.h:18:15: note: you can enable 'sve' using the command-line option '-march', or by using the 'target' attribute or pragma

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE dolbeau/arm-sve2

Compiler output

Implementation: krovetz/vec128
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: stream.c:80:2: error: -- Implementation supports only machines with neon, altivec or SSE2
stream.c: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: ^
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: vec s3 = NONCE(np);
stream.c: ^
stream.c: stream.c:151:9: error: initializing 'vec' (vector of 4 'unsigned int' values) with an expression of incompatible type 'int'
stream.c: vec s3 = NONCE(np);
stream.c: ^ ~~~~~~~~~
stream.c: stream.c:152:36: error: use of undeclared identifier 'VBPI'
stream.c: for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ^
stream.c: stream.c:91:19: note: expanded from macro 'BPI'
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:152:36: error: use of undeclared identifier 'GPR_TOO'
stream.c: stream.c:91:26: note: expanded from macro 'BPI'
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:155:19: error: use of undeclared identifier 'ONE'
stream.c: v7 = v3 + ONE;
stream.c: ^
stream.c: stream.c:176:13: warning: implicit declaration of function 'ROTW16' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: DQROUND_VECTORS(v0,v1,v2,v3)
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE krovetz/vec128

Compiler output

Implementation: krovetz/vec128
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
stream.c: stream.c:80:2: error: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: 80 | #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: | ^~~~~
stream.c: stream.c: In function 'crypto_stream_chacha20_krovetz_vec128_constbranchindex_xor':
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' [-Wimplicit-function-declaration]
stream.c: 151 | vec s3 = NONCE(np);
stream.c: | ^~~~~
stream.c: stream.c:151:14: error: incompatible types when initializing type 'vec' {aka '__vector(4) unsigned int'} using type 'int'
stream.c: stream.c:91:19: error: 'VBPI' undeclared (first use in this function); did you mean 'BPI'?
stream.c: 91 | #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: | ^~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: 152 | for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: | ^~~
stream.c: stream.c:91:19: note: each undeclared identifier is reported only once for each function it appears in
stream.c: 91 | #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: | ^~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: 152 | for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: | ^~~
stream.c: stream.c:91:26: error: 'GPR_TOO' undeclared (first use in this function)
stream.c: 91 | #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: | ^~~~~~~
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: 152 | for (iters = 0; iters < inlen/(BPI*64); iters++) {
stream.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE krovetz/vec128