Implementation notes: amd64, bolero, crypto_sign/dilithium5

Computer: bolero
Microarchitecture: amd64; Broadwell+AES (406f1)
Architecture: amd64
CPU ID: GenuineIntel-000406f1-1fc9cbf5
SUPERCOP version: 20240425
Operation: crypto_sign
Primitive: dilithium5
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
153485287044 64 0111060 896 1608avx2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
1549056106876 64 0131260 896 1608avx2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
155091669002 64 089958 888 1640avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
156345267668 64 088372 896 1576avx2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
764657655205 0 079492 824 1608refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
807487646873 0 070358 784 1640refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
828398438007 0 061260 824 1576refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
834556421993 0 042756 824 1576refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
838776441183 0 065148 824 1608refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
850971218647 0 039886 816 1640refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042620240425
871656821511 0 043038 784 1640refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
888103219954 0 041094 784 1640refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425
973156018950 0 038990 776 1608refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024042620240425

Compiler output

Implementation: avx2
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
fips202x4.c: fips202x4.c:22:12: error: always_inline function '_mm256_setzero_si256' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: s[i] = _mm256_setzero_si256();
fips202x4.c: ^
fips202x4.c: fips202x4.c:22:12: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
fips202x4.c: fips202x4.c:24:9: error: always_inline function '_mm256_set_epi64x' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: idx = _mm256_set_epi64x((long long)in3, (long long)in2, (long long)in1, (long long)in0);
fips202x4.c: ^
fips202x4.c: fips202x4.c:24:9: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
fips202x4.c: fips202x4.c:27:11: error: '__builtin_ia32_gatherq_q256' needs target feature avx2
fips202x4.c: t = _mm256_i64gather_epi64((long long *)pos, idx, 1);
fips202x4.c: ^
fips202x4.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:1140:13: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: ((__m256i)__builtin_ia32_gatherq_q256((__v4di)_mm256_undefined_si256(), \
fips202x4.c: ^
fips202x4.c: fips202x4.c:27:11: error: always_inline function '_mm256_undefined_si256' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:1140:49: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: ((__m256i)__builtin_ia32_gatherq_q256((__v4di)_mm256_undefined_si256(), \
fips202x4.c: ^
fips202x4.c: fips202x4.c:27:11: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
fips202x4.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:1140:49: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: ((__m256i)__builtin_ia32_gatherq_q256((__v4di)_mm256_undefined_si256(), \
fips202x4.c: ^
fips202x4.c: fips202x4.c:27:11: error: always_inline function '_mm256_set1_epi64x' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:1143:49: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: (__v4di)_mm256_set1_epi64x(-1), (s)))
fips202x4.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2

Compiler output

Implementation: avx2
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
poly.c: poly.c: In function 'crypto_sign_dilithium5_avx2_constbranchindex_poly_uniform_eta_4x':
poly.c: <command-line>: warning: 'crypto_sign_dilithium5_avx2_constbranchindex_rej_eta_avx' reading 840 bytes from a region of size 768 [-Wstringop-overread]
poly.c: <command-line>: note: in definition of macro 'CRYPTO_NAMESPACE'
poly.c: rejsample.h:24:21: note: in expansion of macro 'DILITHIUM_NAMESPACE'
poly.c: 24 | #define rej_eta_avx DILITHIUM_NAMESPACE(rej_eta_avx)
poly.c: | ^~~~~~~~~~~~~~~~~~~
poly.c: poly.c:594:10: note: in expansion of macro 'rej_eta_avx'
poly.c: 594 | ctr0 = rej_eta_avx(a0->coeffs, buf[0].coeffs);
poly.c: | ^~~~~~~~~~~
poly.c: <command-line>: note: referencing argument 2 of type 'const uint8_t *' {aka 'const unsigned char *'}
poly.c: <command-line>: note: in definition of macro 'CRYPTO_NAMESPACE'
poly.c: rejsample.h:24:21: note: in expansion of macro 'DILITHIUM_NAMESPACE'
poly.c: 24 | #define rej_eta_avx DILITHIUM_NAMESPACE(rej_eta_avx)
poly.c: | ^~~~~~~~~~~~~~~~~~~
poly.c: poly.c:594:10: note: in expansion of macro 'rej_eta_avx'
poly.c: 594 | ctr0 = rej_eta_avx(a0->coeffs, buf[0].coeffs);
poly.c: | ^~~~~~~~~~~
poly.c: <command-line>: note: in a call to function 'crypto_sign_dilithium5_avx2_constbranchindex_rej_eta_avx'
poly.c: <command-line>: note: in definition of macro 'CRYPTO_NAMESPACE'
poly.c: rejsample.h:24:21: note: in expansion of macro 'DILITHIUM_NAMESPACE'
poly.c: 24 | #define rej_eta_avx DILITHIUM_NAMESPACE(rej_eta_avx)
poly.c: | ^~~~~~~~~~~~~~~~~~~
poly.c: rejsample.h:25:14: note: in expansion of macro 'rej_eta_avx'
poly.c: 25 | unsigned int rej_eta_avx(int32_t *r, const uint8_t buf[REJ_UNIFORM_BUFLEN]);
poly.c: | ^~~~~~~~~~~
poly.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2