Implementation notes: armeabi, cubox, crypto_stream/salsa20

Computer: cubox
Architecture: armeabi
CPU ID: unknown CPU ID
SUPERCOP version: 20161026
Operation: crypto_stream
Primitive: salsa20
TimeImplementationCompilerBenchmark dateSUPERCOP version
24844e/mergedgcc -mcpu=marvell-pj4 -O22016121020161026
24879e/mergedgcc -funroll-loops -mcpu=marvell-pj4 -O22016121020161026
24904e/mergedgcc -mcpu=marvell-pj4 -O32016121020161026
25112e/mergedgcc -funroll-loops -mcpu=marvell-pj4 -O32016121020161026
28791e/refgcc -funroll-loops -mcpu=marvell-pj4 -O32016121020161026
28856refgcc -mcpu=marvell-pj4 -O32016121020161026
28924e/regsgcc -mcpu=marvell-pj4 -O32016121020161026
28932refgcc -funroll-loops -mcpu=marvell-pj4 -O32016121020161026
29160e/refgcc -mcpu=marvell-pj4 -O32016121020161026
29227e/regsgcc -funroll-loops -mcpu=marvell-pj4 -O32016121020161026
30956e/refgcc -funroll-loops -mcpu=marvell-pj4 -O22016121020161026
33031e/regsgcc -funroll-loops -mcpu=marvell-pj4 -O22016121020161026
34160e/regsgcc -mcpu=marvell-pj4 -O22016121020161026
34635refgcc -funroll-loops -mcpu=marvell-pj4 -O22016121020161026
35660e/mergedgcc -funroll-loops -mcpu=marvell-pj4 -Os2016121020161026
37484e/mergedgcc -mcpu=marvell-pj4 -Os2016121020161026
37972refgcc -mcpu=marvell-pj4 -O22016121020161026
39470e/refgcc -mcpu=marvell-pj4 -O22016121020161026
39956refgcc -funroll-loops -mcpu=marvell-pj4 -Os2016121020161026
40028refgcc -mcpu=marvell-pj4 -Os2016121020161026
45596e/regsgcc -mcpu=marvell-pj4 -Os2016121020161026
45740e/regsgcc -funroll-loops -mcpu=marvell-pj4 -Os2016121020161026
47892e/refgcc -funroll-loops -mcpu=marvell-pj4 -Os2016121020161026
48156e/refgcc -mcpu=marvell-pj4 -Os2016121020161026

Test failure

Implementation: crypto_stream/salsa20/armneon3
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
error 111

Number of similar (compiler,implementation) pairs: 12, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 armneon3 armneon6
gcc -funroll-loops -mcpu=marvell-pj4 -O3 armneon3 armneon6
gcc -funroll-loops -mcpu=marvell-pj4 -Os armneon3 armneon6
gcc -mcpu=marvell-pj4 -O2 armneon3 armneon6
gcc -mcpu=marvell-pj4 -O3 armneon3 armneon6
gcc -mcpu=marvell-pj4 -Os armneon3 armneon6

Compiler output

Implementation: crypto_stream/salsa20/armneon2
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
xor.c: In file included from xor.c:8:0:
xor.c: xor.c: In function 'crypto_stream_salsa20_armneon2_xor':
xor.c: /usr/lib/gcc/armv7l-unknown-linux-gnueabihf/6.2.1/include/arm_neon.h:6187:1: error: inlining failed in call to always_inline 'vcombine_u32': target specific option mismatch
xor.c: vcombine_u32 (uint32x2_t __a, uint32x2_t __b)
xor.c: ^~~~~~~~~~~~
xor.c: xor.c:39:14: note: called from here
xor.c: uint32x4_t start1 = vcombine_u32(k5k0,n0k4);
xor.c: ^~~~~~
xor.c: In file included from xor.c:8:0:
xor.c: /usr/lib/gcc/armv7l-unknown-linux-gnueabihf/6.2.1/include/arm_neon.h:7553:1: error: inlining failed in call to always_inline 'vext_u32': target specific option mismatch
xor.c: ...
xor.c: xor.c:353:3: note: called from here
xor.c: vst1q_u8((uint8_t *) c,(uint8x16_t) x0x1x2x3);
xor.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
xor.c: In file included from xor.c:8:0:
xor.c: /usr/lib/gcc/armv7l-unknown-linux-gnueabihf/6.2.1/include/arm_neon.h:565:1: error: inlining failed in call to always_inline 'vadd_u64': target specific option mismatch
xor.c: vadd_u64 (uint64x1_t __a, uint64x1_t __b)
xor.c: ^~~~~~~~
xor.c: xor.c:363:23: note: called from here
xor.c: n2n3 = (uint32x2_t) vadd_u64(nextblock,(uint64x1_t) n2n3);
xor.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 armneon2
gcc -funroll-loops -mcpu=marvell-pj4 -O3 armneon2
gcc -funroll-loops -mcpu=marvell-pj4 -Os armneon2
gcc -mcpu=marvell-pj4 -O2 armneon2
gcc -mcpu=marvell-pj4 -O3 armneon2
gcc -mcpu=marvell-pj4 -Os armneon2

Compiler output

Implementation: crypto_stream/salsa20/armneon
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
xor.c: In file included from xor.c:8:0:
xor.c: xor.c: In function 'crypto_stream_salsa20_armneon_xor':
xor.c: /usr/lib/gcc/armv7l-unknown-linux-gnueabihf/6.2.1/include/arm_neon.h:6187:1: error: inlining failed in call to always_inline 'vcombine_u32': target specific option mismatch
xor.c: vcombine_u32 (uint32x2_t __a, uint32x2_t __b)
xor.c: ^~~~~~~~~~~~
xor.c: xor.c:39:14: note: called from here
xor.c: uint32x4_t start1 = vcombine_u32(k5k0,n0k4);
xor.c: ^~~~~~
xor.c: In file included from xor.c:8:0:
xor.c: /usr/lib/gcc/armv7l-unknown-linux-gnueabihf/6.2.1/include/arm_neon.h:7553:1: error: inlining failed in call to always_inline 'vext_u32': target specific option mismatch
xor.c: ...
xor.c: xor.c:165:3: note: called from here
xor.c: vst1q_u8((uint8_t *) c,(uint8x16_t) x0x1x2x3);
xor.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
xor.c: In file included from xor.c:8:0:
xor.c: /usr/lib/gcc/armv7l-unknown-linux-gnueabihf/6.2.1/include/arm_neon.h:565:1: error: inlining failed in call to always_inline 'vadd_u64': target specific option mismatch
xor.c: vadd_u64 (uint64x1_t __a, uint64x1_t __b)
xor.c: ^~~~~~~~
xor.c: xor.c:175:23: note: called from here
xor.c: n2n3 = (uint32x2_t) vadd_u64(nextblock,(uint64x1_t) n2n3);
xor.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 armneon
gcc -funroll-loops -mcpu=marvell-pj4 -O3 armneon
gcc -funroll-loops -mcpu=marvell-pj4 -Os armneon
gcc -mcpu=marvell-pj4 -O2 armneon
gcc -mcpu=marvell-pj4 -O3 armneon
gcc -mcpu=marvell-pj4 -Os armneon