Implementation notes: amd64, intelnuci7, crypto_sign/dilithium3

Computer: intelnuci7
Architecture: amd64
CPU ID: GenuineIntel-000806e9-bfebfbff
SUPERCOP version: 20191017
Operation: crypto_sign
Primitive: dilithium3
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
642666233468 0 0257573 792 1632avx2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017
688832156583 0 0178257 784 1600avx2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
689982156583 0 0178257 784 1600avx2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
692160172079 0 0191609 784 1600avx2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
728710135610 0 0153463 784 1600avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
852772125879 0 0146709 792 1632avx2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017
861516127372 0 0148261 792 1632avx2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017
900314123763 0 0143885 784 1600avx2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017
206841235347 0 057665 784 1600refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
213291449212 0 069569 784 1600refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
224931434153 0 056729 784 1600refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
225733219801 0 040830 784 1632refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017
231734234153 0 056729 784 1600refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
233483247883 0 072198 784 1632refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017
250431618966 0 039918 784 1632refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017
250606418118 0 037335 776 1600refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2019120720191017
274022217781 0 037662 776 1600refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2019120720191017

Compiler output

Implementation: avx2
Security model: unknown
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'sse4.2'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 0 );
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:135:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: #define Xor_In4( argIndex ) lanes0 = LOAD256u( curData0[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:44:37: note: expanded from macro 'LOAD256u'
KeccakP-1600-times4-SIMD256.c: #define LOAD256u(a) _mm256_loadu_si256((const V256 *)&(a))
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'sse4.2'
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:136:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: lanes1 = LOAD256u( curData1[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:44:37: note: expanded from macro 'LOAD256u'
KeccakP-1600-times4-SIMD256.c: #define LOAD256u(a) _mm256_loadu_si256((const V256 *)&(a))
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'sse4.2'
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:137:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: lanes2 = LOAD256u( curData2[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:44:37: note: expanded from macro 'LOAD256u'
KeccakP-1600-times4-SIMD256.c: #define LOAD256u(a) _mm256_loadu_si256((const V256 *)&(a))
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'KeccakP1600times4_AddLanesAll' that is compiled without support for 'sse4.2'
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:138:42: note: expanded from macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2

Namespace violations

Implementation: avx2
Security model: unknown
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
KeccakP-1600-times4-SIMD256.o KeccakF1600times4_FastLoop_Absorb T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_12rounds_FastLoop_Absorb T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_AddBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_AddLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractAndAddBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractAndAddLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_ExtractLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_InitializeAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_OverwriteBytes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_OverwriteLanesAll T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_OverwriteWithZeroes T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_PermuteAll_12rounds T
KeccakP-1600-times4-SIMD256.o KeccakP1600times4_PermuteAll_24rounds T
fips202.o shake128 T
fips202.o shake128_absorb T
fips202.o shake128_squeezeblocks T
fips202.o shake128_stream_init T
fips202.o shake256 T
fips202.o shake256_absorb T
fips202.o shake256_squeezeblocks T
fips202.o shake256_stream_init T
fips202x4.o shake128_4x T
fips202x4.o shake128_absorb4x T
fips202x4.o shake128_squeezeblocks4x T
fips202x4.o shake256_4x T
fips202x4.o shake256_absorb4x T
fips202x4.o shake256_squeezeblocks4x T
invntt.o invntt_levels0t4_avx T
invntt.o invntt_levels5t7_avx T
ntt.o ntt_levels0t2_avx T
ntt.o ntt_levels3t8_avx T
nttconsts.o _8x23ones R
nttconsts.o _8x256q R
nttconsts.o _8x2q R
nttconsts.o _8xdiv R
nttconsts.o _8xq R
nttconsts.o _8xqinv R
nttconsts.o _mask R
nttconsts.o zetas R
nttconsts.o zetas_inv R
packing.o pack_pk T
packing.o pack_sig T
packing.o pack_sk T
packing.o unpack_pk T
packing.o unpack_sig T
packing.o unpack_sk T
pointwise.o pointwise_acc_avx T
pointwise.o pointwise_avx T
poly.o poly_add T
poly.o poly_chknorm T
poly.o poly_csubq T
poly.o poly_decompose T
poly.o poly_freeze T
poly.o poly_invntt_montgomery T
poly.o poly_make_hint T
poly.o poly_ntt T
poly.o poly_pointwise_invmontgomery T
poly.o poly_power2round T
poly.o poly_reduce T
poly.o poly_shiftl T
poly.o poly_sub T
poly.o poly_uniform T
poly.o poly_uniform_4x T
poly.o poly_uniform_eta T
poly.o poly_uniform_eta_4x T
poly.o poly_uniform_gamma1m1 T
poly.o poly_uniform_gamma1m1_4x T
poly.o poly_use_hint T
poly.o polyeta_pack T
poly.o polyeta_unpack T
poly.o polyt0_pack T
poly.o polyt0_unpack T
poly.o polyt1_pack T
poly.o polyt1_unpack T
poly.o polyw1_pack T
poly.o polyz_pack T
poly.o polyz_unpack T
polyvec.o polyveck_add T
polyvec.o polyveck_chknorm T
polyvec.o polyveck_csubq T
polyvec.o polyveck_decompose T
polyvec.o polyveck_freeze T
polyvec.o polyveck_invntt_montgomery T
polyvec.o polyveck_make_hint T
polyvec.o polyveck_ntt T
polyvec.o polyveck_power2round T
polyvec.o polyveck_reduce T
polyvec.o polyveck_shiftl T
polyvec.o polyveck_sub T
polyvec.o polyveck_use_hint T
polyvec.o polyvecl_add T
polyvec.o polyvecl_chknorm T
polyvec.o polyvecl_freeze T
polyvec.o polyvecl_ntt T
polyvec.o polyvecl_pointwise_acc_invmontgomery T
reduce.o csubq_avx T
reduce.o reduce_avx T
rejsample.o rej_eta T
rejsample.o rej_gamma1m1 T
rejsample.o rej_uniform T
rounding.o decompose T
rounding.o make_hint T
rounding.o power2round T
rounding.o use_hint T
sign.o challenge T
sign.o expand_mat T

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2

Namespace violations

Implementation: ref
Security model: unknown
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
fips202.o shake128 T
fips202.o shake128_absorb T
fips202.o shake128_squeezeblocks T
fips202.o shake128_stream_init T
fips202.o shake256 T
fips202.o shake256_absorb T
fips202.o shake256_squeezeblocks T
fips202.o shake256_stream_init T
ntt.o invntt_frominvmont T
ntt.o ntt T
packing.o pack_pk T
packing.o pack_sig T
packing.o pack_sk T
packing.o unpack_pk T
packing.o unpack_sig T
packing.o unpack_sk T
poly.o poly_add T
poly.o poly_chknorm T
poly.o poly_csubq T
poly.o poly_decompose T
poly.o poly_freeze T
poly.o poly_invntt_montgomery T
poly.o poly_make_hint T
poly.o poly_ntt T
poly.o poly_pointwise_invmontgomery T
poly.o poly_power2round T
poly.o poly_reduce T
poly.o poly_shiftl T
poly.o poly_sub T
poly.o poly_uniform T
poly.o poly_uniform_eta T
poly.o poly_uniform_gamma1m1 T
poly.o poly_use_hint T
poly.o polyeta_pack T
poly.o polyeta_unpack T
poly.o polyt0_pack T
poly.o polyt0_unpack T
poly.o polyt1_pack T
poly.o polyt1_unpack T
poly.o polyw1_pack T
poly.o polyz_pack T
poly.o polyz_unpack T
polyvec.o polyveck_add T
polyvec.o polyveck_chknorm T
polyvec.o polyveck_csubq T
polyvec.o polyveck_decompose T
polyvec.o polyveck_freeze T
polyvec.o polyveck_invntt_montgomery T
polyvec.o polyveck_make_hint T
polyvec.o polyveck_ntt T
polyvec.o polyveck_power2round T
polyvec.o polyveck_reduce T
polyvec.o polyveck_shiftl T
polyvec.o polyveck_sub T
polyvec.o polyveck_use_hint T
polyvec.o polyvecl_add T
polyvec.o polyvecl_chknorm T
polyvec.o polyvecl_freeze T
polyvec.o polyvecl_ntt T
polyvec.o polyvecl_pointwise_acc_invmontgomery T
reduce.o csubq T
reduce.o freeze T
reduce.o montgomery_reduce T
reduce.o reduce32 T
rounding.o decompose T
rounding.o make_hint T
rounding.o power2round T
rounding.o use_hint T
sign.o challenge T
sign.o expand_mat T

Number of similar (compiler,implementation) pairs: 9, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE ref
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE ref