Implementation notes: armeabi, cubox, crypto_stream/chacha12

Computer: cubox
Architecture: armeabi
CPU ID: unknown CPU ID
SUPERCOP version: 20161026
Operation: crypto_stream
Primitive: chacha12
TimeImplementationCompilerBenchmark dateSUPERCOP version
14195moon/armv6/32gcc -funroll-loops -mcpu=marvell-pj4 -O32016121020161026
14196moon/armv6/32gcc -mcpu=marvell-pj4 -O22016121020161026
14232moon/armv6/32gcc -funroll-loops -mcpu=marvell-pj4 -Os2016121020161026
14232moon/armv6/32gcc -mcpu=marvell-pj4 -Os2016121020161026
14332moon/armv6/32gcc -mcpu=marvell-pj4 -O32016121020161026
14367moon/armv6/32gcc -funroll-loops -mcpu=marvell-pj4 -O22016121020161026
18120e/mergedgcc -funroll-loops -mcpu=marvell-pj4 -Os2016121020161026
19268e/mergedgcc -mcpu=marvell-pj4 -Os2016121020161026
19627e/mergedgcc -funroll-loops -mcpu=marvell-pj4 -O22016121020161026
19676e/mergedgcc -funroll-loops -mcpu=marvell-pj4 -O32016121020161026
19828e/mergedgcc -mcpu=marvell-pj4 -O22016121020161026
19992e/mergedgcc -mcpu=marvell-pj4 -O32016121020161026
22789e/regsgcc -funroll-loops -mcpu=marvell-pj4 -O32016121020161026
22813e/refgcc -funroll-loops -mcpu=marvell-pj4 -O32016121020161026
23369dolbeau/mipsel-msagcc -funroll-loops -mcpu=marvell-pj4 -O32016121020161026
24080e/refgcc -mcpu=marvell-pj4 -O32016121020161026
24312e/regsgcc -mcpu=marvell-pj4 -O32016121020161026
24380dolbeau/mipsel-msagcc -mcpu=marvell-pj4 -O32016121020161026
27611dolbeau/mipsel-msagcc -funroll-loops -mcpu=marvell-pj4 -O22016121020161026
28143e/regsgcc -funroll-loops -mcpu=marvell-pj4 -O22016121020161026
28859e/refgcc -funroll-loops -mcpu=marvell-pj4 -O22016121020161026
30660e/regsgcc -mcpu=marvell-pj4 -O22016121020161026
33556e/regsgcc -funroll-loops -mcpu=marvell-pj4 -Os2016121020161026
33692e/regsgcc -mcpu=marvell-pj4 -Os2016121020161026
35052e/refgcc -mcpu=marvell-pj4 -O22016121020161026
35264dolbeau/mipsel-msagcc -mcpu=marvell-pj4 -O22016121020161026
37392e/refgcc -funroll-loops -mcpu=marvell-pj4 -Os2016121020161026
37792dolbeau/mipsel-msagcc -funroll-loops -mcpu=marvell-pj4 -Os2016121020161026
38528e/refgcc -mcpu=marvell-pj4 -Os2016121020161026
39516dolbeau/mipsel-msagcc -mcpu=marvell-pj4 -Os2016121020161026

Test failure

Implementation: crypto_stream/chacha12/moon/neon/32
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
error 111

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 moon/neon/32
gcc -funroll-loops -mcpu=marvell-pj4 -O3 moon/neon/32
gcc -funroll-loops -mcpu=marvell-pj4 -Os moon/neon/32
gcc -mcpu=marvell-pj4 -O2 moon/neon/32
gcc -mcpu=marvell-pj4 -O3 moon/neon/32
gcc -mcpu=marvell-pj4 -Os moon/neon/32

Compiler output

Implementation: crypto_stream/chacha12/dolbeau/arm-neon
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
chacha.c: In file included from chacha.c:11:0:
chacha.c: u4.h: In function 'ECRYPT_encrypt_bytes':
chacha.c: /usr/lib/gcc/armv7l-unknown-linux-gnueabihf/6.2.1/include/arm_neon.h:5816:1: error: inlining failed in call to always_inline 'vdupq_n_u32': target specific option mismatch
chacha.c: vdupq_n_u32 (uint32_t __a)
chacha.c: ^~~~~~~~~~~
chacha.c: In file included from chacha.c:94:0:
chacha.c: u4.h:45:14: note: called from here
chacha.c: uint32x4_t x_15 = vdupq_n_u32(x[15]);
chacha.c: ^~~~
chacha.c: In file included from chacha.c:11:0:
chacha.c: ...
chacha.c: In file included from chacha.c:94:0:
chacha.c: u4.h:139:13: note: called from here
chacha.c: x_##a = vaddq_u32(x_##a, orig##a); ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
chacha.c: u4.h:159:26: note: in expansion of macro 'ONEQUAD_TRANSPOSE'
chacha.c: #define ONEQUAD(a,b,c,d) ONEQUAD_TRANSPOSE(a,b,c,d)
chacha.c: ^~~~~~~~~~~~~~~~~
chacha.c: u4.h:161:5: note: in expansion of macro 'ONEQUAD'
chacha.c: ONEQUAD(0,1,2,3);
chacha.c: ^~~~~~~

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 dolbeau/arm-neon
gcc -funroll-loops -mcpu=marvell-pj4 -O3 dolbeau/arm-neon
gcc -funroll-loops -mcpu=marvell-pj4 -Os dolbeau/arm-neon
gcc -mcpu=marvell-pj4 -O2 dolbeau/arm-neon
gcc -mcpu=marvell-pj4 -O3 dolbeau/arm-neon
gcc -mcpu=marvell-pj4 -Os dolbeau/arm-neon

Compiler output

Implementation: crypto_stream/chacha12/dolbeau/ppc-altivec
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
chacha.c: chacha.c:11:21: fatal error: altivec.h: No such file or directory
chacha.c: #include gt;
chacha.c: ^
chacha.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 dolbeau/ppc-altivec
gcc -funroll-loops -mcpu=marvell-pj4 -O3 dolbeau/ppc-altivec
gcc -funroll-loops -mcpu=marvell-pj4 -Os dolbeau/ppc-altivec
gcc -mcpu=marvell-pj4 -O2 dolbeau/ppc-altivec
gcc -mcpu=marvell-pj4 -O3 dolbeau/ppc-altivec
gcc -mcpu=marvell-pj4 -Os dolbeau/ppc-altivec

Compiler output

Implementation: crypto_stream/chacha12/amd64-ssse3
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
chacha.s: chacha.s: Assembler messages:
chacha.s: chacha.s:22: Error: ARM register expected -- `mov %rsp,%r11'
chacha.s: chacha.s:23: Error: ARM register expected -- `and $31,%r11'
chacha.s: chacha.s:24: Error: ARM register expected -- `add $384,%r11'
chacha.s: chacha.s:25: Error: immediate expression requires a # prefix -- `sub %r11,%rsp'
chacha.s: chacha.s:26: Error: ARM register expected -- `mov %rdi,%r8'
chacha.s: chacha.s:27: Error: ARM register expected -- `mov %rsi,%rsi'
chacha.s: chacha.s:28: Error: ARM register expected -- `mov %rsi,%rdi'
chacha.s: chacha.s:29: Error: ARM register expected -- `mov %rdx,%rdx'
chacha.s: chacha.s:30: Error: ARM register expected -- `cmp $0,%rdx'
chacha.s: ...
chacha.s: chacha.s:1516: Error: bad instruction `movl 0(%rsi),%eax'
chacha.s: chacha.s:1518: Error: bad instruction `movl 4(%rsi),%esi'
chacha.s: chacha.s:1520: Error: bad instruction `movl %r8d,48(%rdi)'
chacha.s: chacha.s:1522: Error: bad instruction `movl %r9d,52(%rdi)'
chacha.s: chacha.s:1524: Error: bad instruction `movl %eax,56(%rdi)'
chacha.s: chacha.s:1526: Error: bad instruction `movl %esi,60(%rdi)'
chacha.s: chacha.s:1528: Error: immediate expression requires a # prefix -- `add %r11,%rsp'
chacha.s: chacha.s:1529: Error: ARM register expected -- `mov %rdi,%rax'
chacha.s: chacha.s:1530: Error: ARM register expected -- `mov %rsi,%rdx'
chacha.s: chacha.s:1531: Error: bad instruction `ret'

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 amd64-ssse3
gcc -funroll-loops -mcpu=marvell-pj4 -O3 amd64-ssse3
gcc -funroll-loops -mcpu=marvell-pj4 -Os amd64-ssse3
gcc -mcpu=marvell-pj4 -O2 amd64-ssse3
gcc -mcpu=marvell-pj4 -O3 amd64-ssse3
gcc -mcpu=marvell-pj4 -Os amd64-ssse3

Compiler output

Implementation: crypto_stream/chacha12/goll_gueron
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
stream.c: stream.c:11:23: fatal error: immintrin.h: No such file or directory
stream.c: #include gt;
stream.c: ^
stream.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 goll_gueron
gcc -funroll-loops -mcpu=marvell-pj4 -O3 goll_gueron
gcc -funroll-loops -mcpu=marvell-pj4 -Os goll_gueron
gcc -mcpu=marvell-pj4 -O2 goll_gueron
gcc -mcpu=marvell-pj4 -O3 goll_gueron
gcc -mcpu=marvell-pj4 -Os goll_gueron

Compiler output

Implementation: crypto_stream/chacha12/krovetz/vec128
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
stream.c: stream.c:80:2: error: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: ^~~~~
stream.c: stream.c: In function 'crypto_stream_chacha12_krovetz_vec128_xor':
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' [-Wimplicit-function-declaration]
stream.c: vec s3 = NONCE(np);
stream.c: ^~~~~
stream.c: stream.c:151:14: error: incompatible types when initializing type 'vec {aka __vector(4) unsigned int}' using type 'int'
stream.c: stream.c:91:19: error: 'VBPI' undeclared (first use in this function)
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ...
stream.c: ^
stream.c: stream.c:248:13: note: in expansion of macro 'DQROUND_VECTORS'
stream.c: DQROUND_VECTORS(v0,v1,v2,v3)
stream.c: ^~~~~~~~~~~~~~~
stream.c: stream.c:103:35: error: incompatible types when assigning to type 'vec {aka __vector(4) unsigned int}' from type 'int'
stream.c: b = ROTV3(b); c = ROTV2(c); d = ROTV1(d);
stream.c: ^
stream.c: stream.c:248:13: note: in expansion of macro 'DQROUND_VECTORS'
stream.c: DQROUND_VECTORS(v0,v1,v2,v3)
stream.c: ^~~~~~~~~~~~~~~

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 krovetz/vec128
gcc -funroll-loops -mcpu=marvell-pj4 -O3 krovetz/vec128
gcc -funroll-loops -mcpu=marvell-pj4 -Os krovetz/vec128
gcc -mcpu=marvell-pj4 -O2 krovetz/vec128
gcc -mcpu=marvell-pj4 -O3 krovetz/vec128
gcc -mcpu=marvell-pj4 -Os krovetz/vec128

Compiler output

Implementation: crypto_stream/chacha12/krovetz/avx2
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
stream.c: stream.c:8:23: fatal error: immintrin.h: No such file or directory
stream.c: #include gt;
stream.c: ^
stream.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 krovetz/avx2
gcc -funroll-loops -mcpu=marvell-pj4 -O3 krovetz/avx2
gcc -funroll-loops -mcpu=marvell-pj4 -Os krovetz/avx2
gcc -mcpu=marvell-pj4 -O2 krovetz/avx2
gcc -mcpu=marvell-pj4 -O3 krovetz/avx2
gcc -mcpu=marvell-pj4 -Os krovetz/avx2