Implementation notes: amd64, intelnuci8, crypto_kem/kyber1024

Computer: intelnuci8
Architecture: amd64
CPU ID: GenuineIntel-000906e9-bfebfbff
SUPERCOP version: 20191221
Operation: crypto_kem
Primitive: kyber1024
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
202276189757 0 0212981 792 1608avx2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020011720191221
220719144277 0 0165713 784 1576avx2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020011720191221
223940154331 0 0175433 784 1576avx2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020011720191221
224458144277 0 0165713 784 1576avx2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020011720191221
241166126869 0 0144679 776 1576avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020011720191221
253886121146 0 0140981 792 1608avx2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020011720191221
254501122078 0 0142029 792 1608avx2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020011720191221
283546119713 0 0138541 784 1576avx2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020011720191221
86604335497 512 056777 1312 1576refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020011720191221
87571112793 512 032606 1304 1608refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020011720191221
88710752014 512 075134 1304 1608refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020011720191221
89720338348 512 058913 1312 1576refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020011720191221
90308953864 512 074193 1312 1576refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020011720191221
91736138348 512 058913 1312 1576refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020011720191221
93643512202 512 031870 1304 1608refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020011720191221
97719311615 512 029791 1304 1576refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020011720191221
103549311441 512 030110 1296 1576refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020011720191221

Compiler output

Implementation: avx2
Security model: unknown
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'sse4.2'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 0 );
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:135:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: #define Xor_In4( argIndex ) lanes0 = LOAD256u( curData0[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:44:37: note: expanded from macro 'LOAD256u'
KeccakP-1600-times4-SIMD256.c: #define LOAD256u(a) _mm256_loadu_si256((const V256 *)&(a))
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'sse4.2'
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:136:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: lanes1 = LOAD256u( curData1[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:44:37: note: expanded from macro 'LOAD256u'
KeccakP-1600-times4-SIMD256.c: #define LOAD256u(a) _mm256_loadu_si256((const V256 *)&(a))
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'sse4.2'
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:137:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: lanes2 = LOAD256u( curData2[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:44:37: note: expanded from macro 'LOAD256u'
KeccakP-1600-times4-SIMD256.c: #define LOAD256u(a) _mm256_loadu_si256((const V256 *)&(a))
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'sse4.2'
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:138:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2

Namespace violations

Implementation: avx2
Security model: unknown
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
KeccakP-1600-times4-SIMD256.o KeccakF1600times4_FastLoop_Absorb T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_12rounds_FastLoop_Absorb T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_AddBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_AddLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractAndAddBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractAndAddLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_InitializeAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_OverwriteBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_OverwriteLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_OverwriteWithZeroes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_PermuteAll_12rounds T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_PermuteAll_24rounds T
basemul.o basemul_acc_avx T
basemul.o basemul_avx T
cbd.o cbd T
consts.o _16xfhi R
consts.o _16xflo R
consts.o _16xmask R
consts.o _16xmontsqhi R
consts.o _16xmontsqlo R
consts.o _16xq R
consts.o _16xqinv R
consts.o _16xv R
consts.o zetas_exp R
consts.o zetas_inv_exp R
fips202.o KeccakF1600_StatePermute T
fips202.o sha3_256 T
fips202.o sha3_512 T
fips202.o shake128_absorb T
fips202.o shake128_squeezeblocks T
fips202.o shake256 T
fips202x4.o kyber_shake128x4_absorb T
fips202x4.o shake128x4_squeezeblocks T
fips202x4.o shake256x4_prf T
fq.o csubq_avx T
fq.o frommont_avx T
fq.o reduce_avx T
indcpa.o gen_matrix T
indcpa.o indcpa_dec T
indcpa.o indcpa_enc T
indcpa.o indcpa_keypair T
invntt.o invntt_level6_avx T
invntt.o invntt_levels0t5_avx T
ntt.o ntt_level0_avx T
ntt.o ntt_levels1t6_avx T
poly.o poly_add T
poly.o poly_basemul T
poly.o poly_compress T
poly.o poly_csubq T
poly.o poly_decompress T
poly.o poly_frombytes T
poly.o poly_frommont T
poly.o poly_frommsg T
poly.o poly_getnoise T
poly.o poly_getnoise4x T
poly.o poly_invntt T
poly.o poly_ntt T
poly.o poly_nttunpack T
poly.o poly_reduce T
poly.o poly_sub T
poly.o poly_tobytes T
poly.o poly_tomsg T
polyvec.o polyvec_add T
polyvec.o polyvec_compress T
polyvec.o polyvec_csubq T
polyvec.o polyvec_decompress T
polyvec.o polyvec_frombytes T
polyvec.o polyvec_invntt T
polyvec.o polyvec_ntt T
polyvec.o polyvec_pointwise_acc T
polyvec.o polyvec_reduce T
polyvec.o polyvec_tobytes T
rejsample.o rej_uniform T
shuffle.o nttfrombytes_avx T
shuffle.o ntttobytes_avx T
shuffle.o nttunpack_avx T
symmetric-fips202.o kyber_shake128_absorb T
symmetric-fips202.o kyber_shake128_squeezeblocks T
symmetric-fips202.o shake256_prf T
verify.o cmov T
verify.o verify T

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2

Namespace violations

Implementation: ref
Security model: unknown
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
cbd.o cbd T
fips202.o KeccakF1600_StatePermute T
fips202.o sha3_256 T
fips202.o sha3_512 T
fips202.o shake128_absorb T
fips202.o shake128_squeezeblocks T
fips202.o shake256 T
indcpa.o gen_matrix T
indcpa.o indcpa_dec T
indcpa.o indcpa_enc T
indcpa.o indcpa_keypair T
ntt.o basemul T
ntt.o invntt T
ntt.o ntt T
ntt.o zetas D
ntt.o zetas_inv D
poly.o poly_add T
poly.o poly_basemul T
poly.o poly_compress T
poly.o poly_csubq T
poly.o poly_decompress T
poly.o poly_frombytes T
poly.o poly_frommont T
poly.o poly_frommsg T
poly.o poly_getnoise T
poly.o poly_invntt T
poly.o poly_ntt T
poly.o poly_reduce T
poly.o poly_sub T
poly.o poly_tobytes T
poly.o poly_tomsg T
polyvec.o polyvec_add T
polyvec.o polyvec_compress T
polyvec.o polyvec_csubq T
polyvec.o polyvec_decompress T
polyvec.o polyvec_frombytes T
polyvec.o polyvec_invntt T
polyvec.o polyvec_ntt T
polyvec.o polyvec_pointwise_acc T
polyvec.o polyvec_reduce T
polyvec.o polyvec_tobytes T
reduce.o barrett_reduce T
reduce.o csubq T
reduce.o montgomery_reduce T
symmetric-fips202.o kyber_shake128_absorb T
symmetric-fips202.o kyber_shake128_squeezeblocks T
symmetric-fips202.o shake256_prf T
verify.o cmov T
verify.o verify T

Number of similar (compiler,implementation) pairs: 9, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE ref