Implementation notes: armeabi, cubox, crypto_hash/groestl256

Computer: cubox
Architecture: armeabi
CPU ID: unknown CPU ID
SUPERCOP version: 20161026
Operation: crypto_hash
Primitive: groestl256
TimeImplementationCompilerBenchmark dateSUPERCOP version
121000arm32gcc -mcpu=marvell-pj4 -O22016120920161026
123353arm32gcc -funroll-loops -mcpu=marvell-pj4 -Os2016120920161026
125329arm32gcc -mcpu=marvell-pj4 -Os2016120920161026
149447arm11gcc -funroll-loops -mcpu=marvell-pj4 -O32016120920161026
149581arm11gcc -funroll-loops -mcpu=marvell-pj4 -O22016120920161026
149598arm11gcc -mcpu=marvell-pj4 -O32016120920161026
153132arm11gcc -mcpu=marvell-pj4 -O22016120920161026
154350arm11gcc -mcpu=marvell-pj4 -Os2016120920161026
154508arm11gcc -funroll-loops -mcpu=marvell-pj4 -Os2016120920161026
214877opt32gcc -funroll-loops -mcpu=marvell-pj4 -O22016120920161026
214942opt32gcc -funroll-loops -mcpu=marvell-pj4 -O32016120920161026
218895sphlib-smallgcc -funroll-loops -mcpu=marvell-pj4 -Os2016120920161026
221508opt32gcc -mcpu=marvell-pj4 -O32016120920161026
224584opt32gcc -mcpu=marvell-pj4 -O22016120920161026
260558sphlib-smallgcc -funroll-loops -mcpu=marvell-pj4 -O32016120920161026
266337sphlib-smallgcc -funroll-loops -mcpu=marvell-pj4 -O22016120920161026
268413sphlib-smallgcc -mcpu=marvell-pj4 -O32016120920161026
271385sphlib-smallgcc -mcpu=marvell-pj4 -Os2016120920161026
27150432bit-2ktablegcc -funroll-loops -mcpu=marvell-pj4 -O22016120920161026
27253332bit-2ktablegcc -funroll-loops -mcpu=marvell-pj4 -O32016120920161026
274440sphlib-smallgcc -mcpu=marvell-pj4 -O22016120920161026
27680132bit-2ktablegcc -mcpu=marvell-pj4 -O32016120920161026
27936432bit-2ktablegcc -mcpu=marvell-pj4 -O22016120920161026
311744opt64gcc -mcpu=marvell-pj4 -Os2016120920161026
31434832bit-2ktablegcc -funroll-loops -mcpu=marvell-pj4 -Os2016120920161026
341278opt64gcc -funroll-loops -mcpu=marvell-pj4 -Os2016120920161026
34448832bit-2ktablegcc -mcpu=marvell-pj4 -Os2016120920161026
348931opt32gcc -funroll-loops -mcpu=marvell-pj4 -Os2016120920161026
351408opt32gcc -mcpu=marvell-pj4 -Os2016120920161026
41388732bit-bytesliced-c-fastgcc -funroll-loops -mcpu=marvell-pj4 -O32016120920161026
442626sphlibgcc -funroll-loops -mcpu=marvell-pj4 -O32016120920161026
445418sphlibgcc -funroll-loops -mcpu=marvell-pj4 -O22016120920161026
448421sphlib-adaptedgcc -funroll-loops -mcpu=marvell-pj4 -O22016120920161026
450511sphlib-adaptedgcc -funroll-loops -mcpu=marvell-pj4 -O32016120920161026
472949sphlibgcc -mcpu=marvell-pj4 -Os2016120920161026
480731sphlibgcc -mcpu=marvell-pj4 -O22016120920161026
481926sphlibgcc -mcpu=marvell-pj4 -O32016120920161026
490018sphlib-adaptedgcc -mcpu=marvell-pj4 -O32016120920161026
491986opt64gcc -mcpu=marvell-pj4 -O22016120920161026
495578sphlib-adaptedgcc -mcpu=marvell-pj4 -O22016120920161026
504561opt64gcc -funroll-loops -mcpu=marvell-pj4 -O22016120920161026
50777032bit-bytesliced-c-fastgcc -funroll-loops -mcpu=marvell-pj4 -O22016120920161026
527428sphlibgcc -funroll-loops -mcpu=marvell-pj4 -Os2016120920161026
528355sphlib-adaptedgcc -funroll-loops -mcpu=marvell-pj4 -Os2016120920161026
53087432bit-bytesliced-c-fastgcc -mcpu=marvell-pj4 -O32016120920161026
5416708bit_cgcc -funroll-loops -mcpu=marvell-pj4 -O32016120920161026
5509288bit_cgcc -mcpu=marvell-pj4 -O32016120920161026
5530778bit_cgcc -funroll-loops -mcpu=marvell-pj4 -O22016120920161026
56182832bit-bytesliced-c-smallgcc -funroll-loops -mcpu=marvell-pj4 -O32016120920161026
5697448bit_cgcc -mcpu=marvell-pj4 -O22016120920161026
575324sphlib-adaptedgcc -mcpu=marvell-pj4 -Os2016120920161026
57541532bit-bytesliced-c-smallgcc -mcpu=marvell-pj4 -O32016120920161026
58199232bit-bytesliced-c-smallgcc -funroll-loops -mcpu=marvell-pj4 -O22016120920161026
64430832bit-bytesliced-c-fastgcc -mcpu=marvell-pj4 -O22016120920161026
651427opt64gcc -funroll-loops -mcpu=marvell-pj4 -O32016120920161026
666330opt64gcc -mcpu=marvell-pj4 -O32016120920161026
7137488bit_cgcc -funroll-loops -mcpu=marvell-pj4 -Os2016120920161026
75636032bit-bytesliced-c-fastgcc -funroll-loops -mcpu=marvell-pj4 -Os2016120920161026
7653318bit_cgcc -mcpu=marvell-pj4 -Os2016120920161026
81542432bit-bytesliced-c-fastgcc -mcpu=marvell-pj4 -Os2016120920161026
105054332bit-bytesliced-c-smallgcc -mcpu=marvell-pj4 -Os2016120920161026
105691032bit-bytesliced-c-smallgcc -funroll-loops -mcpu=marvell-pj4 -Os2016120920161026
119447032bit-bytesliced-c-smallgcc -mcpu=marvell-pj4 -O22016120920161026

Checksum failure

Implementation: crypto_hash/groestl256/arm32
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
f079b87636261cf3c9ea6c0c0fa5429569bc7bd103f8d0f0bb23bd4ba5d49053
Number of similar (compiler,implementation) pairs: 3, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 arm32
gcc -funroll-loops -mcpu=marvell-pj4 -O3 arm32
gcc -mcpu=marvell-pj4 -O3 arm32

Test failure

Implementation: crypto_hash/groestl256/neon-bitslice
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
error 111

Number of similar (compiler,implementation) pairs: 22, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 neon-bitslice neon-table thumb-asm-fast
gcc -funroll-loops -mcpu=marvell-pj4 -O3 neon-bitslice neon-table thumb-asm-fast thumb-asm-small
gcc -funroll-loops -mcpu=marvell-pj4 -Os neon-bitslice neon-table thumb-asm-fast thumb-asm-small
gcc -mcpu=marvell-pj4 -O2 neon-bitslice neon-table thumb-asm-fast thumb-asm-small
gcc -mcpu=marvell-pj4 -O3 neon-bitslice neon-table thumb-asm-fast thumb-asm-small
gcc -mcpu=marvell-pj4 -Os neon-bitslice neon-table thumb-asm-small

Test failure

Implementation: crypto_hash/groestl256/thumb-asm-small
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
error 142
sh: line 1: 10451 Alarm clock killafter 3600 ./try

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 thumb-asm-small

Test failure

Implementation: crypto_hash/groestl256/thumb-asm-fast
Compiler: gcc -mcpu=marvell-pj4 -Os
error 142
sh: line 1: 10053 Alarm clock killafter 3600 ./try

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
gcc -mcpu=marvell-pj4 -Os thumb-asm-fast

Compiler output

Implementation: crypto_hash/groestl256/vperm-intr
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
hash.c: In file included from hash.c:34:0:
hash.c: groestl-intr-vperm.h:13:23: fatal error: tmmintrin.h: No such file or directory
hash.c: #include gt;
hash.c: ^
hash.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 vperm-intr
gcc -funroll-loops -mcpu=marvell-pj4 -O3 vperm-intr
gcc -funroll-loops -mcpu=marvell-pj4 -Os vperm-intr
gcc -mcpu=marvell-pj4 -O2 vperm-intr
gcc -mcpu=marvell-pj4 -O3 vperm-intr
gcc -mcpu=marvell-pj4 -Os vperm-intr

Compiler output

Implementation: crypto_hash/groestl256/neon-bitslice
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
hash.c: hash.c: In function 'crypto_hash_groestl256_neon_bitslice':
hash.c: hash.c:40:12: warning: iteration 64 invokes undefined behavior [-Waggressive-loop-optimizations]
hash.c: ctx[i] = 0;
hash.c: ~~~~~~~^~~
hash.c: hash.c:39:3: note: within this loop
hash.c: for(i=0;i hash.c: ^~~

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 neon-bitslice
gcc -funroll-loops -mcpu=marvell-pj4 -O3 neon-bitslice
gcc -funroll-loops -mcpu=marvell-pj4 -Os neon-bitslice
gcc -mcpu=marvell-pj4 -O2 neon-bitslice
gcc -mcpu=marvell-pj4 -O3 neon-bitslice
gcc -mcpu=marvell-pj4 -Os neon-bitslice

Compiler output

Implementation: crypto_hash/groestl256/neon-vperm
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
hash.c: hash.c: In function 'crypto_hash_groestl256_neon_vperm':
hash.c: hash.c:38:12: warning: iteration 64 invokes undefined behavior [-Waggressive-loop-optimizations]
hash.c: ctx[i] = 0;
hash.c: ~~~~~~~^~~
hash.c: hash.c:37:3: note: within this loop
hash.c: for(i=0;i hash.c: ^~~
vperm-neon.S: vperm-neon.S: Assembler messages:
vperm-neon.S: vperm-neon.S:911: Error: expected symbol name
vperm-neon.S: vperm-neon.S:913: Error: selected processor does not support `veor q0,q0,q8' in ARM mode
vperm-neon.S: vperm-neon.S:913: Error: selected processor does not support `veor q1,q1,q9' in ARM mode
vperm-neon.S: vperm-neon.S:913: Error: selected processor does not support `veor q2,q2,q9' in ARM mode
vperm-neon.S: vperm-neon.S:913: Error: selected processor does not support `veor q3,q3,q9' in ARM mode
vperm-neon.S: vperm-neon.S:913: Error: selected processor does not support `veor q4,q4,q9' in ARM mode
vperm-neon.S: vperm-neon.S:913: Error: selected processor does not support `veor q5,q5,q9' in ARM mode
vperm-neon.S: vperm-neon.S:913: Error: selected processor does not support `veor q6,q6,q9' in ARM mode
vperm-neon.S: vperm-neon.S:913: Error: selected processor does not support `veor q7,q7,q10' in ARM mode
vperm-neon.S: ...
vperm-neon.S: vperm-neon.S:1072: Error: selected processor does not support `vtrn.8 d0,d1' in ARM mode
vperm-neon.S: vperm-neon.S:1072: Error: selected processor does not support `vtrn.8 d2,d3' in ARM mode
vperm-neon.S: vperm-neon.S:1072: Error: selected processor does not support `vtrn.8 d4,d5' in ARM mode
vperm-neon.S: vperm-neon.S:1072: Error: selected processor does not support `vtrn.8 d6,d7' in ARM mode
vperm-neon.S: vperm-neon.S:1072: Error: selected processor does not support `vtrn.16 d0,d2' in ARM mode
vperm-neon.S: vperm-neon.S:1072: Error: selected processor does not support `vtrn.16 d1,d3' in ARM mode
vperm-neon.S: vperm-neon.S:1072: Error: selected processor does not support `vtrn.16 d4,d6' in ARM mode
vperm-neon.S: vperm-neon.S:1072: Error: selected processor does not support `vtrn.16 d5,d7' in ARM mode
vperm-neon.S: vperm-neon.S:1072: Error: selected processor does not support `vtrn.32 q0,q2' in ARM mode
vperm-neon.S: vperm-neon.S:1072: Error: selected processor does not support `vtrn.32 q1,q3' in ARM mode

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 neon-vperm
gcc -funroll-loops -mcpu=marvell-pj4 -O3 neon-vperm
gcc -funroll-loops -mcpu=marvell-pj4 -Os neon-vperm
gcc -mcpu=marvell-pj4 -O2 neon-vperm
gcc -mcpu=marvell-pj4 -O3 neon-vperm
gcc -mcpu=marvell-pj4 -Os neon-vperm

Compiler output

Implementation: crypto_hash/groestl256/opt64
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
hash.c: hash.c:194:14: warning: 'inP' is static but declared in inline function 'F1024' which is not static
hash.c: static u64 inP[COLS1024] __attribute__((aligned(16)));
hash.c: ^~~
hash.c: hash.c:193:14: warning: 'outQ' is static but declared in inline function 'F1024' which is not static
hash.c: static u64 outQ[COLS1024] __attribute__((aligned(16)));
hash.c: ^~~~
hash.c: hash.c:192:14: warning: 'z' is static but declared in inline function 'F1024' which is not static
hash.c: static u64 z[COLS1024] __attribute__((aligned(16)));
hash.c: ^
hash.c: hash.c:191:14: warning: 'y' is static but declared in inline function 'F1024' which is not static
hash.c: static u64 y[COLS1024] __attribute__((aligned(16)));
hash.c: ^

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 opt64
gcc -funroll-loops -mcpu=marvell-pj4 -O3 opt64
gcc -funroll-loops -mcpu=marvell-pj4 -Os opt64
gcc -mcpu=marvell-pj4 -O2 opt64
gcc -mcpu=marvell-pj4 -O3 opt64
gcc -mcpu=marvell-pj4 -Os opt64