Implementation notes: amd64, gcc68, crypto_kem/kyber768

Computer: gcc68
Architecture: amd64
CPU ID: AuthenticAMD-00800f82-178bfbff
SUPERCOP version: 20191221
Operation: crypto_kem
Primitive: kyber768
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
167328161364 0 0180731 792 1576avx2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020010120191221
170016196885 0 0216524 808 1608avx2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020010120191221
170784143354 0 0162763 792 1576avx2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020010120191221
171840143354 0 0162763 792 1576avx2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020010120191221
184000127013 0 0143537 784 1576avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020010120191221
196672125413 0 0143676 808 1608avx2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020010120191221
203072124274 0 0142252 808 1608avx2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020010120191221
221152123513 0 0140380 800 1576avx2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020010120191221
42316860153 512 079707 1320 1576refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020010120191221
45116842685 512 062244 1328 1608refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020010120191221
45232037375 512 057043 1320 1576refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020010120191221
45286437375 512 057043 1320 1576refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020010120191221
45376039271 512 059451 1320 1576refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020010120191221
48425611038 512 027945 1312 1576refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020010120191221
51644810912 512 028796 1328 1608refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020010120191221
53923211718 512 029892 1328 1608refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020010120191221
60745610144 512 026948 1320 1576refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020010120191221

Compiler output

Implementation: avx2
Security model: unknown
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'avx', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'avx'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 0 );
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:135:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: #define Xor_In4( argIndex ) lanes0 = LOAD256u( curData0[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:44:37: note: expanded from macro 'LOAD256u'
KeccakP-1600-times4-SIMD256.c: #define LOAD256u(a) _mm256_loadu_si256((const V256 *)&(a))
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'avx', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'avx'
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:136:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: lanes1 = LOAD256u( curData1[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:44:37: note: expanded from macro 'LOAD256u'
KeccakP-1600-times4-SIMD256.c: #define LOAD256u(a) _mm256_loadu_si256((const V256 *)&(a))
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'avx', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'avx'
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:137:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: lanes2 = LOAD256u( curData2[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:44:37: note: expanded from macro 'LOAD256u'
KeccakP-1600-times4-SIMD256.c: #define LOAD256u(a) _mm256_loadu_si256((const V256 *)&(a))
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'avx', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'avx'
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:138:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2

Namespace violations

Implementation: avx2
Security model: unknown
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
KeccakP-1600-times4-SIMD256.o KeccakF1600times4_FastLoop_Absorb T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_12rounds_FastLoop_Absorb T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_AddBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_AddLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractAndAddBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractAndAddLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_InitializeAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_OverwriteBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_OverwriteLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_OverwriteWithZeroes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_PermuteAll_12rounds T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_PermuteAll_24rounds T
basemul.o basemul_acc_avx T
basemul.o basemul_avx T
cbd.o cbd T
consts.o _16xfhi R
consts.o _16xflo R
consts.o _16xmask R
consts.o _16xmontsqhi R
consts.o _16xmontsqlo R
consts.o _16xq R
consts.o _16xqinv R
consts.o _16xv R
consts.o zetas_exp R
consts.o zetas_inv_exp R
fips202.o KeccakF1600_StatePermute T
fips202.o sha3_256 T
fips202.o sha3_512 T
fips202.o shake128_absorb T
fips202.o shake128_squeezeblocks T
fips202.o shake256 T
fips202x4.o kyber_shake128x4_absorb T
fips202x4.o shake128x4_squeezeblocks T
fips202x4.o shake256x4_prf T
fq.o csubq_avx T
fq.o frommont_avx T
fq.o reduce_avx T
indcpa.o gen_matrix T
indcpa.o indcpa_dec T
indcpa.o indcpa_enc T
indcpa.o indcpa_keypair T
invntt.o invntt_level6_avx T
invntt.o invntt_levels0t5_avx T
ntt.o ntt_level0_avx T
ntt.o ntt_levels1t6_avx T
poly.o poly_add T
poly.o poly_basemul T
poly.o poly_compress T
poly.o poly_csubq T
poly.o poly_decompress T
poly.o poly_frombytes T
poly.o poly_frommont T
poly.o poly_frommsg T
poly.o poly_getnoise T
poly.o poly_getnoise4x T
poly.o poly_invntt T
poly.o poly_ntt T
poly.o poly_nttunpack T
poly.o poly_reduce T
poly.o poly_sub T
poly.o poly_tobytes T
poly.o poly_tomsg T
polyvec.o polyvec_add T
polyvec.o polyvec_compress T
polyvec.o polyvec_csubq T
polyvec.o polyvec_decompress T
polyvec.o polyvec_frombytes T
polyvec.o polyvec_invntt T
polyvec.o polyvec_ntt T
polyvec.o polyvec_pointwise_acc T
polyvec.o polyvec_reduce T
polyvec.o polyvec_tobytes T
rejsample.o rej_uniform T
shuffle.o nttfrombytes_avx T
shuffle.o ntttobytes_avx T
shuffle.o nttunpack_avx T
symmetric-fips202.o kyber_shake128_absorb T
symmetric-fips202.o kyber_shake128_squeezeblocks T
symmetric-fips202.o shake256_prf T
verify.o cmov T
verify.o verify T

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2

Namespace violations

Implementation: ref
Security model: unknown
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
cbd.o cbd T
fips202.o KeccakF1600_StatePermute T
fips202.o sha3_256 T
fips202.o sha3_512 T
fips202.o shake128_absorb T
fips202.o shake128_squeezeblocks T
fips202.o shake256 T
indcpa.o gen_matrix T
indcpa.o indcpa_dec T
indcpa.o indcpa_enc T
indcpa.o indcpa_keypair T
ntt.o basemul T
ntt.o invntt T
ntt.o ntt T
ntt.o zetas D
ntt.o zetas_inv D
poly.o poly_add T
poly.o poly_basemul T
poly.o poly_compress T
poly.o poly_csubq T
poly.o poly_decompress T
poly.o poly_frombytes T
poly.o poly_frommont T
poly.o poly_frommsg T
poly.o poly_getnoise T
poly.o poly_invntt T
poly.o poly_ntt T
poly.o poly_reduce T
poly.o poly_sub T
poly.o poly_tobytes T
poly.o poly_tomsg T
polyvec.o polyvec_add T
polyvec.o polyvec_compress T
polyvec.o polyvec_csubq T
polyvec.o polyvec_decompress T
polyvec.o polyvec_frombytes T
polyvec.o polyvec_invntt T
polyvec.o polyvec_ntt T
polyvec.o polyvec_pointwise_acc T
polyvec.o polyvec_reduce T
polyvec.o polyvec_tobytes T
reduce.o barrett_reduce T
reduce.o csubq T
reduce.o montgomery_reduce T
symmetric-fips202.o kyber_shake128_absorb T
symmetric-fips202.o kyber_shake128_squeezeblocks T
symmetric-fips202.o shake256_prf T
verify.o cmov T
verify.o verify T

Number of similar (compiler,implementation) pairs: 9, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE ref