Implementation notes: armeabi, cubox, crypto_aead/scream10v3

Computer: cubox
Architecture: armeabi
CPU ID: unknown CPU ID
SUPERCOP version: 20161026
Operation: crypto_aead
Primitive: scream10v3
TimeImplementationCompilerBenchmark dateSUPERCOP version
916121refgcc -funroll-loops -mcpu=marvell-pj4 -O22016121820161026
926460refgcc -funroll-loops -mcpu=marvell-pj4 -O32016121820161026
947532refgcc -mcpu=marvell-pj4 -O32016121820161026
1894328refgcc -mcpu=marvell-pj4 -O22016121820161026
2039411refgcc -funroll-loops -mcpu=marvell-pj4 -Os2016121820161026
2063015refgcc -mcpu=marvell-pj4 -Os2016121820161026

Compiler output

Implementation: crypto_aead/scream10v3/neon
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
scream.c: In file included from tae.h:7:0,
scream.c: from scream.c:10:
scream.c: helper.h: In function 'write128':
scream.c: /usr/lib/gcc/armv7l-unknown-linux-gnueabihf/6.2.1/include/arm_neon.h:8519:1: error: inlining failed in call to always_inline 'vzipq_u8': target specific option mismatch
scream.c: vzipq_u8 (uint8x16_t __a, uint8x16_t __b)
scream.c: ^~~~~~~~
scream.c: In file included from scream.c:13:0:
scream.c: helper.h:10:18: note: called from here
scream.c: uint8x16x2_t c__ = vzipq_u8 (X(i), X(j)); ^
scream.c: ...
scream.c: /usr/lib/gcc/armv7l-unknown-linux-gnueabihf/6.2.1/include/arm_neon.h:9523:1: error: inlining failed in call to always_inline 'vst1q_u8': target specific option mismatch
scream.c: vst1q_u8 (uint8_t * __a, uint8x16_t __b)
scream.c: ^~~~~~~~
scream.c: In file included from scream.c:10:0:
scream.c: tae.h:11:25: note: called from here
scream.c: #define neon_store(p,x) vst1q_u8((uint8_t*)(p), x)
scream.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~
scream.c: helper.h:85:5: note: in expansion of macro 'neon_store'
scream.c: neon_store(out8+16*0 , X(0 ));
scream.c: ^~~~~~~~~~

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 neon
gcc -funroll-loops -mcpu=marvell-pj4 -O3 neon
gcc -funroll-loops -mcpu=marvell-pj4 -Os neon
gcc -mcpu=marvell-pj4 -O2 neon
gcc -mcpu=marvell-pj4 -O3 neon
gcc -mcpu=marvell-pj4 -Os neon

Compiler output

Implementation: crypto_aead/scream10v3/sse
Compiler: gcc -funroll-loops -mcpu=marvell-pj4 -O2
scream.c: scream.c: In function 'LBox16P':
scream.c: scream.c:16:32: warning: implicit declaration of function '__builtin_ia32_psrldi128' [-Wimplicit-function-declaration]
scream.c: #define shift_right(x) ((v16qi)__builtin_ia32_psrldi128((v4si)x, 4))
scream.c: ^
scream.c: scream.c:199:10: note: in expansion of macro 'shift_right'
scream.c: t0 = shift_right(in[0]) & V(0xf);
scream.c: ^~~~~~~~~~~
scream.c: scream.c:199:5: error: can't convert a value of type 'int' to vector type '__vector(16) char' which has different size
scream.c: t0 = shift_right(in[0]) & V(0xf);
scream.c: ^~
scream.c: ...
scream.c: ^~
scream.c: scream.c:337:7: error: conversion of scalar 'int' to vector 'v16qi {aka __vector(16) char}' involves truncation
scream.c: C ^= __builtin_ia32_pshufb128(table, in[3]);
scream.c: ^~
scream.c: scream.c:341:7: error: conversion of scalar 'int' to vector 'v16qi {aka __vector(16) char}' involves truncation
scream.c: B ^= __builtin_ia32_pshufb128(table, in[1]);
scream.c: ^~
scream.c: scream.c:342:7: error: conversion of scalar 'int' to vector 'v16qi {aka __vector(16) char}' involves truncation
scream.c: D ^= __builtin_ia32_pshufb128(table, in[3]);
scream.c: ^~

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -mcpu=marvell-pj4 -O2 sse
gcc -funroll-loops -mcpu=marvell-pj4 -O3 sse
gcc -funroll-loops -mcpu=marvell-pj4 -Os sse
gcc -mcpu=marvell-pj4 -O2 sse
gcc -mcpu=marvell-pj4 -O3 sse
gcc -mcpu=marvell-pj4 -Os sse