Implementation notes: aarch64, a53, crypto_stream/chacha8

Computer: a53
Architecture: aarch64
CPU ID: unknown CPU ID
SUPERCOP version: 20160731
Operation: crypto_stream
Primitive: chacha8
TimeImplementationCompilerBenchmark dateSUPERCOP version
3645dolbeau/arm-neonclang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016080120160731
4860dolbeau/arm-neongcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv2016080120160731
4860dolbeau/arm-neongcc -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv2016080120160731
6075dolbeau/mipsel-msagcc -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv2016080120160731
6075e/refgcc -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv2016080120160731
6075dolbeau/arm-neongcc -mcpu=cortex-a53 -O -fomit-frame-pointer -fwrapv2016080120160731
6075dolbeau/arm-neongcc -mcpu=cortex-a53 -Os -fomit-frame-pointer -fwrapv2016080120160731
7290e/mergedclang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016080120160731
7290e/regsclang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016080120160731
8505e/regsgcc -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv2016080120160731
10935e/mergedgcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv2016080120160731
12150dolbeau/mipsel-msaclang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016080120160731
12150e/refclang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016080120160731
12150e/mergedgcc -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv2016080120160731
13365e/mergedgcc -mcpu=cortex-a53 -O -fomit-frame-pointer -fwrapv2016080120160731
14580e/regsgcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv2016080120160731
15200e/mergedgcc -mcpu=cortex-a53 -Os -fomit-frame-pointer -fwrapv2016080120160731
24300dolbeau/mipsel-msagcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv2016080120160731
24300e/refgcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv2016080120160731
24300dolbeau/mipsel-msagcc -mcpu=cortex-a53 -O -fomit-frame-pointer -fwrapv2016080120160731
24300e/refgcc -mcpu=cortex-a53 -Os -fomit-frame-pointer -fwrapv2016080120160731
24800e/regsgcc -mcpu=cortex-a53 -O -fomit-frame-pointer -fwrapv2016080120160731
31590e/refgcc -mcpu=cortex-a53 -O -fomit-frame-pointer -fwrapv2016080120160731
32000dolbeau/mipsel-msagcc -mcpu=cortex-a53 -Os -fomit-frame-pointer -fwrapv2016080120160731
37600e/regsgcc -mcpu=cortex-a53 -Os -fomit-frame-pointer -fwrapv2016080120160731

Compiler output

Implementation: crypto_stream/chacha8/dolbeau/ppc-altivec
Compiler: clang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
chacha.c: In file included from chacha.c:11:
chacha.c: /usr/include/clang/3.5.2/include/altivec.h:27:2: error: "AltiVec support not enabled"
chacha.c: #error "AltiVec support not enabled"
chacha.c: ^
chacha.c: /usr/include/clang/3.5.2/include/altivec.h:39:8: error: unknown type name 'vector'
chacha.c: static vector signed char __ATTRS_o_ai
chacha.c: ^
chacha.c: /usr/include/clang/3.5.2/include/altivec.h:39:15: error: expected identifier or '('
chacha.c: static vector signed char __ATTRS_o_ai
chacha.c: ^
chacha.c: /usr/include/clang/3.5.2/include/altivec.h:42:8: error: unknown type name 'vector'
chacha.c: static vector unsigned char __ATTRS_o_ai
chacha.c: ^
chacha.c: /usr/include/clang/3.5.2/include/altivec.h:42:15: error: expected identifier or '('
chacha.c: static vector unsigned char __ATTRS_o_ai
chacha.c: ^
chacha.c: /usr/include/clang/3.5.2/include/altivec.h:47:8: error: unknown type name 'vector'
chacha.c: static vector bool char __ATTRS_o_ai
chacha.c: ^
chacha.c: /usr/include/clang/3.5.2/include/altivec.h:47:19: error: expected ';' after top level declarator
chacha.c: static vector bool char __ATTRS_o_ai
chacha.c: ^
chacha.c: /usr/include/clang/3.5.2/include/altivec.h:48:10: error: unknown type name 'vector'
chacha.c: vec_perm(vector bool char __a, vector bool char __b, vector unsigned char __c);
chacha.c: ^
chacha.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments dolbeau/ppc-altivec

Compiler output

Implementation: crypto_stream/chacha8/amd64-ssse3
Compiler: clang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
chacha.s: chacha.s:22:5: error: unknown token in expression
chacha.s: mov %rsp,%r11
chacha.s: ^
chacha.s: chacha.s:22:5: error: invalid operand
chacha.s: mov %rsp,%r11
chacha.s: ^
chacha.s: chacha.s:23:5: error: invalid token in expression
chacha.s: and $31,%r11
chacha.s: ^
chacha.s: chacha.s:23:5: error: invalid operand
chacha.s: and $31,%r11
chacha.s: ^
chacha.s: chacha.s:24:5: error: invalid token in expression
chacha.s: add $384,%r11
chacha.s: ^
chacha.s: chacha.s:24:5: error: invalid operand
chacha.s: add $384,%r11
chacha.s: ^
chacha.s: chacha.s:25:5: error: unknown token in expression
chacha.s: sub %r11,%rsp
chacha.s: ^
chacha.s: chacha.s:25:5: error: invalid operand
chacha.s: sub %r11,%rsp
chacha.s: ^
chacha.s: chacha.s:26:6: error: unknown token in expression
chacha.s: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments amd64-ssse3

Compiler output

Implementation: crypto_stream/chacha8/goll_gueron
Compiler: clang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
stream.c: stream.c:126:2: error: -- Implementation supports only microarchitectures with support for Advanced Vector Extensions (AVX2 or AVX512).
stream.c: #error -- Implementation supports only microarchitectures with support for Advanced Vector Extensions (AVX2 or AVX512).
stream.c: ^
stream.c: 1 error generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments goll_gueron

Compiler output

Implementation: crypto_stream/chacha8/krovetz/avx2
Compiler: clang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
stream.c: stream.c:54:5: error: use of undeclared identifier '__m256i'
stream.c: __m256i v0,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10,v11;
stream.c: ^
stream.c: stream.c:56:5: error: use of undeclared identifier '__m256i'
stream.c: __m256i s0 = _mm_broadcastsi128_si256((__m128i *)sigma);
stream.c: ^
stream.c: stream.c:60:5: error: use of undeclared identifier '__m256i'
stream.c: __m256i s1 = _mm256_loadu_si256((__m256i *)k);
stream.c: ^
stream.c: stream.c:61:5: error: use of undeclared identifier '__m256i'
stream.c: __m256i s2 = _mm256_permute2x128_si256(s1,s1,0x11);
stream.c: ^
stream.c: stream.c:62:5: error: use of undeclared identifier 's1'
stream.c: s1 = _mm256_permute2x128_si256(s1,s1,0x00);
stream.c: ^
stream.c: stream.c:62:10: warning: implicit declaration of function '_mm256_permute2x128_si256' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: s1 = _mm256_permute2x128_si256(s1,s1,0x00);
stream.c: ^
stream.c: stream.c:62:36: error: use of undeclared identifier 's1'
stream.c: s1 = _mm256_permute2x128_si256(s1,s1,0x00);
stream.c: ^
stream.c: stream.c:62:39: error: use of undeclared identifier 's1'
stream.c: s1 = _mm256_permute2x128_si256(s1,s1,0x00);
stream.c: ^
stream.c: stream.c:63:5: error: use of undeclared identifier '__m256i'
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments krovetz/avx2

Compiler output

Implementation: crypto_stream/chacha8/krovetz/vec128
Compiler: clang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
stream.c: stream.c:80:2: error: -- Implementation supports only machines with neon, altivec or SSE2
stream.c: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: ^
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: vec s3 = NONCE(np);
stream.c: ^
stream.c: stream.c:151:9: error: initializing 'vec' (vector of 4 'unsigned int' values) with an expression of incompatible type 'int'
stream.c: vec s3 = NONCE(np);
stream.c: ^ ~~~~~~~~~
stream.c: stream.c:152:36: error: use of undeclared identifier 'VBPI'
stream.c: for (iters = 0; iters stream.c: ^
stream.c: stream.c:91:19: note: expanded from macro 'BPI'
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:152:36: error: use of undeclared identifier 'GPR_TOO'
stream.c: stream.c:91:26: note: expanded from macro 'BPI'
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:155:19: error: use of undeclared identifier 'ONE'
stream.c: v7 = v3 + ONE;
stream.c: ^
stream.c: stream.c:176:13: warning: implicit declaration of function 'ROTW16' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: DQROUND_VECTORS(v0,v1,v2,v3)
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments krovetz/vec128

Compiler output

Implementation: crypto_stream/chacha8/dolbeau/ppc-altivec
Compiler: gcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv
chacha.c: chacha.c:11:21: fatal error: altivec.h: No such file or directory
chacha.c: #include gt;
chacha.c: ^
chacha.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv dolbeau/ppc-altivec
gcc -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv dolbeau/ppc-altivec
gcc -mcpu=cortex-a53 -O -fomit-frame-pointer -fwrapv dolbeau/ppc-altivec
gcc -mcpu=cortex-a53 -Os -fomit-frame-pointer -fwrapv dolbeau/ppc-altivec

Compiler output

Implementation: crypto_stream/chacha8/amd64-ssse3
Compiler: gcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv
chacha.s: chacha.s: Assembler messages:
chacha.s: chacha.s:22: Error: operand 1 should be an integer register -- `mov %rsp,%r11'
chacha.s: chacha.s:23: Error: operand 1 should be an integer or stack pointer register -- `and $31,%r11'
chacha.s: chacha.s:24: Error: operand 1 should be an integer or stack pointer register -- `add $384,%r11'
chacha.s: chacha.s:25: Error: operand 1 should be an integer or stack pointer register -- `sub %r11,%rsp'
chacha.s: chacha.s:26: Error: operand 1 should be an integer register -- `mov %rdi,%r8'
chacha.s: chacha.s:27: Error: operand 1 should be an integer register -- `mov %rsi,%rsi'
chacha.s: chacha.s:28: Error: operand 1 should be an integer register -- `mov %rsi,%rdi'
chacha.s: chacha.s:29: Error: operand 1 should be an integer register -- `mov %rdx,%rdx'
chacha.s: chacha.s:30: Error: operand 1 should be an integer or stack pointer register -- `cmp $0,%rdx'
chacha.s: chacha.s:32: Error: unknown mnemonic `jbe' -- `jbe ._done'
chacha.s: chacha.s:34: Error: operand 1 should be an integer register -- `mov $0,%rax'
chacha.s: chacha.s:36: Error: operand 1 should be an integer register -- `mov %rdx,%rcx'
chacha.s: chacha.s:38: Error: unknown mnemonic `rep' -- `rep stosb'
chacha.s: chacha.s:40: Error: operand 1 should be an integer or stack pointer register -- `sub %rdx,%rdi'
chacha.s: chacha.s:42: Error: unknown mnemonic `jmp' -- `jmp ._start'
chacha.s: chacha.s:50: Error: operand 1 should be an integer register -- `mov %rsp,%r11'
chacha.s: chacha.s:51: Error: operand 1 should be an integer or stack pointer register -- `and $31,%r11'
chacha.s: chacha.s:52: Error: operand 1 should be an integer or stack pointer register -- `add $384,%r11'
chacha.s: chacha.s:53: Error: operand 1 should be an integer or stack pointer register -- `sub %r11,%rsp'
chacha.s: chacha.s:55: Error: operand 1 should be an integer register -- `mov %rdi,%r8'
chacha.s: chacha.s:57: Error: operand 1 should be an integer register -- `mov %rsi,%rsi'
chacha.s: chacha.s:59: Error: operand 1 should be an integer register -- `mov %rdx,%rdi'
chacha.s: chacha.s:61: Error: operand 1 should be an integer register -- `mov %rcx,%rdx'
chacha.s: chacha.s:63: Error: operand 1 should be an integer or stack pointer register -- `cmp $0,%rdx'
chacha.s: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv amd64-ssse3
gcc -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv amd64-ssse3
gcc -mcpu=cortex-a53 -O -fomit-frame-pointer -fwrapv amd64-ssse3
gcc -mcpu=cortex-a53 -Os -fomit-frame-pointer -fwrapv amd64-ssse3

Compiler output

Implementation: crypto_stream/chacha8/goll_gueron
Compiler: gcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv
stream.c: stream.c:11:23: fatal error: immintrin.h: No such file or directory
stream.c: #include gt;
stream.c: ^
stream.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv goll_gueron
gcc -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv goll_gueron
gcc -mcpu=cortex-a53 -O -fomit-frame-pointer -fwrapv goll_gueron
gcc -mcpu=cortex-a53 -Os -fomit-frame-pointer -fwrapv goll_gueron

Compiler output

Implementation: crypto_stream/chacha8/krovetz/vec128
Compiler: gcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv
stream.c: stream.c:80:2: error: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: ^
stream.c: stream.c: In function 'crypto_stream_chacha8_krovetz_vec128_xor':
stream.c: stream.c:151:14: error: incompatible types when initializing type 'vec' using type 'int'
stream.c: vec s3 = NONCE(np);
stream.c: ^
stream.c: stream.c:91:19: error: 'VBPI' undeclared (first use in this function)
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: for (iters = 0; iters stream.c: ^
stream.c: stream.c:91:19: note: each undeclared identifier is reported only once for each function it appears in
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: for (iters = 0; iters stream.c: ^
stream.c: stream.c:91:26: error: 'GPR_TOO' undeclared (first use in this function)
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: for (iters = 0; iters stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv krovetz/vec128
gcc -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv krovetz/vec128
gcc -mcpu=cortex-a53 -O -fomit-frame-pointer -fwrapv krovetz/vec128
gcc -mcpu=cortex-a53 -Os -fomit-frame-pointer -fwrapv krovetz/vec128

Compiler output

Implementation: crypto_stream/chacha8/krovetz/avx2
Compiler: gcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv
stream.c: stream.c:8:23: fatal error: immintrin.h: No such file or directory
stream.c: #include gt;
stream.c: ^
stream.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -mcpu=cortex-a53 -O2 -fomit-frame-pointer -fwrapv krovetz/avx2
gcc -mcpu=cortex-a53 -O3 -fomit-frame-pointer -fwrapv krovetz/avx2
gcc -mcpu=cortex-a53 -O -fomit-frame-pointer -fwrapv krovetz/avx2
gcc -mcpu=cortex-a53 -Os -fomit-frame-pointer -fwrapv krovetz/avx2