Implementation notes: amd64, intelnuci7, crypto_sign/dilithium2

Computer: intelnuci7
Architecture: amd64
CPU ID: GenuineIntel-000806e9-bfebfbff
SUPERCOP version: 20191017
Operation: crypto_sign
Primitive: dilithium2
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
683082233621 0 0257733 792 1632avx2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017
719158155033 0 0176721 784 1600avx2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
719562170542 0 0190089 784 1600avx2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
728352155033 0 0176721 784 1600avx2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
738796135319 0 0153175 784 1600avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
872986127015 0 0147893 792 1632avx2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017
877716123560 0 0143677 784 1600avx2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017
901666125614 0 0146445 792 1632avx2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017
213430048104 0 068449 784 1600refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
220343047972 0 072278 784 1632refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017
220810834191 0 056497 784 1600refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
231059031759 0 054329 784 1600refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
232432631759 0 054329 784 1600refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
239876818099 0 037319 776 1600refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
244466419783 0 040822 784 1632refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017
252881819005 0 039966 784 1632refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017
297532817852 0 037734 776 1600refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017

Compiler output

Implementation: avx2
Security model: unknown
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'sse4.2'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 0 );
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:135:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: #define Xor_In4( argIndex ) lanes0 = LOAD256u( curData0[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:44:37: note: expanded from macro 'LOAD256u'
KeccakP-1600-times4-SIMD256.c: #define LOAD256u(a) _mm256_loadu_si256((const V256 *)&(a))
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'sse4.2'
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:136:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: lanes1 = LOAD256u( curData1[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:44:37: note: expanded from macro 'LOAD256u'
KeccakP-1600-times4-SIMD256.c: #define LOAD256u(a) _mm256_loadu_si256((const V256 *)&(a))
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'sse4.2'
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:137:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: lanes2 = LOAD256u( curData2[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:44:37: note: expanded from macro 'LOAD256u'
KeccakP-1600-times4-SIMD256.c: #define LOAD256u(a) _mm256_loadu_si256((const V256 *)&(a))
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'sse4.2'
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:138:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2

Namespace violations

Implementation: avx2
Security model: unknown
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
KeccakP-1600-times4-SIMD256.o KeccakF1600times4_FastLoop_Absorb T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_12rounds_FastLoop_Absorb T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_AddBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_AddLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractAndAddBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractAndAddLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_InitializeAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_OverwriteBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_OverwriteLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_OverwriteWithZeroes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_PermuteAll_12rounds T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_PermuteAll_24rounds T
fips202.o shake128 T
fips202.o shake128_absorb T
fips202.o shake128_squeezeblocks T
fips202.o shake128_stream_init T
fips202.o shake256 T
fips202.o shake256_absorb T
fips202.o shake256_squeezeblocks T
fips202.o shake256_stream_init T
fips202x4.o shake128_4x T
fips202x4.o shake128_absorb4x T
fips202x4.o shake128_squeezeblocks4x T
fips202x4.o shake256_4x T
fips202x4.o shake256_absorb4x T
fips202x4.o shake256_squeezeblocks4x T
invntt.o invntt_levels0t4_avx T
invntt.o invntt_levels5t7_avx T
ntt.o ntt_levels0t2_avx T
ntt.o ntt_levels3t8_avx T
nttconsts.o _8x23ones R
nttconsts.o _8x256q R
nttconsts.o _8x2q R
nttconsts.o _8xdiv R
nttconsts.o _8xq R
nttconsts.o _8xqinv R
nttconsts.o _mask R
nttconsts.o zetas R
nttconsts.o zetas_inv R
packing.o pack_pk T
packing.o pack_sig T
packing.o pack_sk T
packing.o unpack_pk T
packing.o unpack_sig T
packing.o unpack_sk T
pointwise.o pointwise_acc_avx T
pointwise.o pointwise_avx T
poly.o poly_add T
poly.o poly_chknorm T
poly.o poly_csubq T
poly.o poly_decompose T
poly.o poly_freeze T
poly.o poly_invntt_montgomery T
poly.o poly_make_hint T
poly.o poly_ntt T
poly.o poly_pointwise_invmontgomery T
poly.o poly_power2round T
poly.o poly_reduce T
poly.o poly_shiftl T
poly.o poly_sub T
poly.o poly_uniform T
poly.o poly_uniform_4x T
poly.o poly_uniform_eta T
poly.o poly_uniform_eta_4x T
poly.o poly_uniform_gamma1m1 T
poly.o poly_uniform_gamma1m1_4x T
poly.o poly_use_hint T
poly.o polyeta_pack T
poly.o polyeta_unpack T
poly.o polyt0_pack T
poly.o polyt0_unpack T
poly.o polyt1_pack T
poly.o polyt1_unpack T
poly.o polyw1_pack T
poly.o polyz_pack T
poly.o polyz_unpack T
polyvec.o polyveck_add T
polyvec.o polyveck_chknorm T
polyvec.o polyveck_csubq T
polyvec.o polyveck_decompose T
polyvec.o polyveck_freeze T
polyvec.o polyveck_invntt_montgomery T
polyvec.o polyveck_make_hint T
polyvec.o polyveck_ntt T
polyvec.o polyveck_power2round T
polyvec.o polyveck_reduce T
polyvec.o polyveck_shiftl T
polyvec.o polyveck_sub T
polyvec.o polyveck_use_hint T
polyvec.o polyvecl_add T
polyvec.o polyvecl_chknorm T
polyvec.o polyvecl_freeze T
polyvec.o polyvecl_ntt T
polyvec.o polyvecl_pointwise_acc_invmontgomery T
reduce.o csubq_avx T
reduce.o reduce_avx T
rejsample.o rej_eta T
rejsample.o rej_gamma1m1 T
rejsample.o rej_uniform T
rounding.o decompose T
rounding.o make_hint T
rounding.o power2round T
rounding.o use_hint T
sign.o challenge T
sign.o expand_mat T

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2

Namespace violations

Implementation: ref
Security model: unknown
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
fips202.o shake128 T
fips202.o shake128_absorb T
fips202.o shake128_squeezeblocks T
fips202.o shake128_stream_init T
fips202.o shake256 T
fips202.o shake256_absorb T
fips202.o shake256_squeezeblocks T
fips202.o shake256_stream_init T
ntt.o invntt_frominvmont T
ntt.o ntt T
packing.o pack_pk T
packing.o pack_sig T
packing.o pack_sk T
packing.o unpack_pk T
packing.o unpack_sig T
packing.o unpack_sk T
poly.o poly_add T
poly.o poly_chknorm T
poly.o poly_csubq T
poly.o poly_decompose T
poly.o poly_freeze T
poly.o poly_invntt_montgomery T
poly.o poly_make_hint T
poly.o poly_ntt T
poly.o poly_pointwise_invmontgomery T
poly.o poly_power2round T
poly.o poly_reduce T
poly.o poly_shiftl T
poly.o poly_sub T
poly.o poly_uniform T
poly.o poly_uniform_eta T
poly.o poly_uniform_gamma1m1 T
poly.o poly_use_hint T
poly.o polyeta_pack T
poly.o polyeta_unpack T
poly.o polyt0_pack T
poly.o polyt0_unpack T
poly.o polyt1_pack T
poly.o polyt1_unpack T
poly.o polyw1_pack T
poly.o polyz_pack T
poly.o polyz_unpack T
polyvec.o polyveck_add T
polyvec.o polyveck_chknorm T
polyvec.o polyveck_csubq T
polyvec.o polyveck_decompose T
polyvec.o polyveck_freeze T
polyvec.o polyveck_invntt_montgomery T
polyvec.o polyveck_make_hint T
polyvec.o polyveck_ntt T
polyvec.o polyveck_power2round T
polyvec.o polyveck_reduce T
polyvec.o polyveck_shiftl T
polyvec.o polyveck_sub T
polyvec.o polyveck_use_hint T
polyvec.o polyvecl_add T
polyvec.o polyvecl_chknorm T
polyvec.o polyvecl_freeze T
polyvec.o polyvecl_ntt T
polyvec.o polyvecl_pointwise_acc_invmontgomery T
reduce.o csubq T
reduce.o freeze T
reduce.o montgomery_reduce T
reduce.o reduce32 T
rounding.o decompose T
rounding.o make_hint T
rounding.o power2round T
rounding.o use_hint T
sign.o challenge T
sign.o expand_mat T

Number of similar (compiler,implementation) pairs: 9, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE ref