Implementation notes: amd64, devoptimis, crypto_kem/kyber512

Computer: devoptimis
Architecture: amd64
CPU ID: GenuineIntel-000206c2-bfebfbff
SUPERCOP version: 20190910
Operation: crypto_kem
Primitive: kyber512
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
54065556814 512 073757 1312 1600refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019100320190910
64302811588 512 026293 1312 1600refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019100320190910
127063712129 512 027013 1312 1600refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019100320190910
131007710829 512 024693 1304 1568refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019100320190910

Compiler output

Implementation: avx2
Security model: unknown
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c: In function 'KeccakP1600times4_AddLanesAll':
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:135:40: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
KeccakP-1600-times4-SIMD256.c: #define Xor_In4( argIndex ) lanes0 = LOAD256u( curData0[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 0 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:143:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+3], lanes3 )
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:149:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 12 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2

Namespace violations

Implementation: ref
Security model: unknown
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
cbd.o cbd T
fips202.o KeccakF1600_StatePermute T
fips202.o sha3_256 T
fips202.o sha3_512 T
fips202.o shake128_absorb T
fips202.o shake128_squeezeblocks T
fips202.o shake256 T
indcpa.o gen_matrix T
indcpa.o indcpa_dec T
indcpa.o indcpa_enc T
indcpa.o indcpa_keypair T
ntt.o basemul T
ntt.o invntt T
ntt.o ntt T
ntt.o zetas D
ntt.o zetas_inv D
poly.o poly_add T
poly.o poly_basemul T
poly.o poly_compress T
poly.o poly_csubq T
poly.o poly_decompress T
poly.o poly_frombytes T
poly.o poly_frommont T
poly.o poly_frommsg T
poly.o poly_getnoise T
poly.o poly_invntt T
poly.o poly_ntt T
poly.o poly_reduce T
poly.o poly_sub T
poly.o poly_tobytes T
poly.o poly_tomsg T
polyvec.o polyvec_add T
polyvec.o polyvec_compress T
polyvec.o polyvec_csubq T
polyvec.o polyvec_decompress T
polyvec.o polyvec_frombytes T
polyvec.o polyvec_invntt T
polyvec.o polyvec_ntt T
polyvec.o polyvec_pointwise_acc T
polyvec.o polyvec_reduce T
polyvec.o polyvec_tobytes T
reduce.o barrett_reduce T
reduce.o csubq T
reduce.o montgomery_reduce T
symmetric-fips202.o kyber_shake128_absorb T
symmetric-fips202.o kyber_shake128_squeezeblocks T
symmetric-fips202.o shake256_prf T
verify.o cmov T
verify.o verify T

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE ref