Implementation notes: amd64, icelake2, crypto_sign/dilithium3

Computer: icelake2
Architecture: amd64
CPU ID: GenuineIntel-000706e5-bfebfbff
SUPERCOP version: 20221005
Operation: crypto_sign
Primitive: dilithium3
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
33442488770 64 0111082 860 1792avx2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100420220506
33779199450 64 0121850 860 1824avx2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100420220506
33970867922 64 087844 852 1824avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100420220506
35037465911 64 085650 860 1760avx2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100420220506
102953353224 0 075514 780 1824refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100420220506
116344620312 0 040442 772 1824refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022100420220506
116433537165 0 059066 780 1760refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100420220506
116754537051 0 058914 772 1824refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022100420220506
117565542068 0 064210 780 1792refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100420220506
118113918096 0 038532 772 1824refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100420220506
122038121609 0 041418 780 1760refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022100420220506
133938217551 0 036118 764 1792refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022100420220506
146223419209 0 038906 772 1824refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022100420220506

Compiler output

Implementation: avx2
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
fips202x4.c: fips202x4.c:22:12: error: always_inline function '_mm256_setzero_si256' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: s[i] = _mm256_setzero_si256();
fips202x4.c: ^
fips202x4.c: fips202x4.c:22:12: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
fips202x4.c: fips202x4.c:24:9: error: always_inline function '_mm256_set_epi64x' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: idx = _mm256_set_epi64x((long long)in3, (long long)in2, (long long)in1, (long long)in0);
fips202x4.c: ^
fips202x4.c: fips202x4.c:24:9: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
fips202x4.c: fips202x4.c:27:11: error: '__builtin_ia32_gatherq_q256' needs target feature avx2
fips202x4.c: t = _mm256_i64gather_epi64((long long *)pos, idx, 1);
fips202x4.c: ^
fips202x4.c: /usr/lib64/clang/14.0.5/include/avx2intrin.h:1140:13: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: ((__m256i)__builtin_ia32_gatherq_q256((__v4di)_mm256_undefined_si256(), \
fips202x4.c: ^
fips202x4.c: fips202x4.c:27:11: error: always_inline function '_mm256_undefined_si256' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: /usr/lib64/clang/14.0.5/include/avx2intrin.h:1140:49: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: ((__m256i)__builtin_ia32_gatherq_q256((__v4di)_mm256_undefined_si256(), \
fips202x4.c: ^
fips202x4.c: fips202x4.c:27:11: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
fips202x4.c: /usr/lib64/clang/14.0.5/include/avx2intrin.h:1140:49: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: ((__m256i)__builtin_ia32_gatherq_q256((__v4di)_mm256_undefined_si256(), \
fips202x4.c: ^
fips202x4.c: fips202x4.c:27:11: error: always_inline function '_mm256_set1_epi64x' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: /usr/lib64/clang/14.0.5/include/avx2intrin.h:1143:49: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: (__v4di)_mm256_set1_epi64x(-1), (s)))
fips202x4.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2

Compiler output

Implementation: avx2
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
poly.c: poly.c: In function 'crypto_sign_dilithium3_avx2_constbranchindex_poly_uniform_eta_4x':
poly.c: <command-line>: warning: 'crypto_sign_dilithium3_avx2_constbranchindex_rej_eta_avx' reading 840 bytes from a region of size 704 [-Wstringop-overread]
poly.c: <command-line>: note: in definition of macro 'CRYPTO_NAMESPACE'
poly.c: rejsample.h:24:21: note: in expansion of macro 'DILITHIUM_NAMESPACE'
poly.c: 24 | #define rej_eta_avx DILITHIUM_NAMESPACE(rej_eta_avx)
poly.c: | ^~~~~~~~~~~~~~~~~~~
poly.c: poly.c:596:10: note: in expansion of macro 'rej_eta_avx'
poly.c: 596 | ctr2 = rej_eta_avx(a2->coeffs, buf[2].coeffs);
poly.c: | ^~~~~~~~~~~
poly.c: <command-line>: note: referencing argument 2 of type 'const uint8_t[840]' {aka 'const unsigned char[840]'}
poly.c: <command-line>: note: in definition of macro 'CRYPTO_NAMESPACE'
poly.c: rejsample.h:24:21: note: in expansion of macro 'DILITHIUM_NAMESPACE'
poly.c: 24 | #define rej_eta_avx DILITHIUM_NAMESPACE(rej_eta_avx)
poly.c: | ^~~~~~~~~~~~~~~~~~~
poly.c: poly.c:596:10: note: in expansion of macro 'rej_eta_avx'
poly.c: 596 | ctr2 = rej_eta_avx(a2->coeffs, buf[2].coeffs);
poly.c: | ^~~~~~~~~~~
poly.c: <command-line>: note: in a call to function 'crypto_sign_dilithium3_avx2_constbranchindex_rej_eta_avx'
poly.c: <command-line>: note: in definition of macro 'CRYPTO_NAMESPACE'
poly.c: rejsample.h:24:21: note: in expansion of macro 'DILITHIUM_NAMESPACE'
poly.c: 24 | #define rej_eta_avx DILITHIUM_NAMESPACE(rej_eta_avx)
poly.c: | ^~~~~~~~~~~~~~~~~~~
poly.c: rejsample.h:25:14: note: in expansion of macro 'rej_eta_avx'
poly.c: 25 | unsigned int rej_eta_avx(int32_t *r, const uint8_t buf[REJ_UNIFORM_BUFLEN]);
poly.c: | ^~~~~~~~~~~
poly.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2