Implementation notes: amd64, samba, crypto_sign/dilithium5

Computer: samba
Microarchitecture: amd64; Skylake (506e3)
Architecture: amd64
CPU ID: GenuineIntel-000506e3-bfebfbff
SUPERCOP version: 20240425
Operation: crypto_sign
Primitive: dilithium5
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
1329042111924 64 0135267 924 1824avx2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
135352789084 64 0112075 924 1792avx2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
135409171646 64 091701 916 1824avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
138006567668 64 087619 924 1760avx2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
717392347249 0 070157 812 1824refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
765949361085 0 083995 852 1824refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
766154738007 0 060515 852 1760refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
797162121751 0 042581 812 1824refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
839117643863 0 066467 852 1792refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
850097219954 0 040349 812 1824refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
857120418647 0 039205 844 1824refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
916031618942 0 038229 804 1792refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
948883121993 0 042011 852 1760refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425

Compiler output

Implementation: avx2
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
fips202x4.c: fips202x4.c:22:12: error: always_inline function '_mm256_setzero_si256' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: s[i] = _mm256_setzero_si256();
fips202x4.c: ^
fips202x4.c: fips202x4.c:22:12: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
fips202x4.c: fips202x4.c:24:9: error: always_inline function '_mm256_set_epi64x' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: idx = _mm256_set_epi64x((long long)in3, (long long)in2, (long long)in1, (long long)in0);
fips202x4.c: ^
fips202x4.c: fips202x4.c:24:9: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
fips202x4.c: fips202x4.c:27:11: error: '__builtin_ia32_gatherq_q256' needs target feature avx2
fips202x4.c: t = _mm256_i64gather_epi64((long long *)pos, idx, 1);
fips202x4.c: ^
fips202x4.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:1140:13: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: ((__m256i)__builtin_ia32_gatherq_q256((__v4di)_mm256_undefined_si256(), \
fips202x4.c: ^
fips202x4.c: fips202x4.c:27:11: error: always_inline function '_mm256_undefined_si256' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:1140:49: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: ((__m256i)__builtin_ia32_gatherq_q256((__v4di)_mm256_undefined_si256(), \
fips202x4.c: ^
fips202x4.c: fips202x4.c:27:11: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
fips202x4.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:1140:49: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: ((__m256i)__builtin_ia32_gatherq_q256((__v4di)_mm256_undefined_si256(), \
fips202x4.c: ^
fips202x4.c: fips202x4.c:27:11: error: always_inline function '_mm256_set1_epi64x' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:1143:49: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: (__v4di)_mm256_set1_epi64x(-1), (s)))
fips202x4.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2

Compiler output

Implementation: avx2
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
poly.c: poly.c: In function 'crypto_sign_dilithium5_avx2_constbranchindex_poly_uniform_eta_4x':
poly.c: <command-line>: warning: 'crypto_sign_dilithium5_avx2_constbranchindex_rej_eta_avx' reading 840 bytes from a region of size 768 [-Wstringop-overread]
poly.c: <command-line>: note: in definition of macro 'CRYPTO_NAMESPACE'
poly.c: rejsample.h:24:21: note: in expansion of macro 'DILITHIUM_NAMESPACE'
poly.c: 24 | #define rej_eta_avx DILITHIUM_NAMESPACE(rej_eta_avx)
poly.c: | ^~~~~~~~~~~~~~~~~~~
poly.c: poly.c:594:10: note: in expansion of macro 'rej_eta_avx'
poly.c: 594 | ctr0 = rej_eta_avx(a0->coeffs, buf[0].coeffs);
poly.c: | ^~~~~~~~~~~
poly.c: <command-line>: note: referencing argument 2 of type 'const uint8_t *' {aka 'const unsigned char *'}
poly.c: <command-line>: note: in definition of macro 'CRYPTO_NAMESPACE'
poly.c: rejsample.h:24:21: note: in expansion of macro 'DILITHIUM_NAMESPACE'
poly.c: 24 | #define rej_eta_avx DILITHIUM_NAMESPACE(rej_eta_avx)
poly.c: | ^~~~~~~~~~~~~~~~~~~
poly.c: poly.c:594:10: note: in expansion of macro 'rej_eta_avx'
poly.c: 594 | ctr0 = rej_eta_avx(a0->coeffs, buf[0].coeffs);
poly.c: | ^~~~~~~~~~~
poly.c: <command-line>: note: in a call to function 'crypto_sign_dilithium5_avx2_constbranchindex_rej_eta_avx'
poly.c: <command-line>: note: in definition of macro 'CRYPTO_NAMESPACE'
poly.c: rejsample.h:24:21: note: in expansion of macro 'DILITHIUM_NAMESPACE'
poly.c: 24 | #define rej_eta_avx DILITHIUM_NAMESPACE(rej_eta_avx)
poly.c: | ^~~~~~~~~~~~~~~~~~~
poly.c: rejsample.h:25:14: note: in expansion of macro 'rej_eta_avx'
poly.c: 25 | unsigned int rej_eta_avx(int32_t *r, const uint8_t buf[REJ_UNIFORM_BUFLEN]);
poly.c: | ^~~~~~~~~~~
poly.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2