Implementation notes: amd64, kizomba, crypto_sign/dilithium3

Computer: kizomba
Microarchitecture: amd64; Kaby Lake (906e9)
Architecture: amd64
CPU ID: GenuineIntel-000906e9-1fc9cbf5
SUPERCOP version: 20240107
Operation: crypto_sign
Primitive: dilithium3
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
361828102998 64 0127076 896 1640avx2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
36235284230 64 0107956 896 1608avx2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
36741968198 64 088998 888 1640avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
37923665932 64 086596 896 1576avx2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
126763356149 0 079940 824 1640refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
133512340866 0 064494 784 1640refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
135547239697 0 063180 824 1608refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
138529821295 0 042862 784 1640refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
140183237165 0 060388 824 1576refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
141621318235 0 039542 816 1640refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
145434721609 0 042364 824 1576refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
152165719554 0 040678 784 1640refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
167086118569 0 038606 776 1608refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212

Compiler output

Implementation: avx2
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
fips202x4.c: fips202x4.c:22:12: error: always_inline function '_mm256_setzero_si256' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: s[i] = _mm256_setzero_si256();
fips202x4.c: ^
fips202x4.c: fips202x4.c:22:12: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
fips202x4.c: fips202x4.c:24:9: error: always_inline function '_mm256_set_epi64x' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: idx = _mm256_set_epi64x((long long)in3, (long long)in2, (long long)in1, (long long)in0);
fips202x4.c: ^
fips202x4.c: fips202x4.c:24:9: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
fips202x4.c: fips202x4.c:27:11: error: '__builtin_ia32_gatherq_q256' needs target feature avx2
fips202x4.c: t = _mm256_i64gather_epi64((long long *)pos, idx, 1);
fips202x4.c: ^
fips202x4.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:1140:13: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: ((__m256i)__builtin_ia32_gatherq_q256((__v4di)_mm256_undefined_si256(), \
fips202x4.c: ^
fips202x4.c: fips202x4.c:27:11: error: always_inline function '_mm256_undefined_si256' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:1140:49: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: ((__m256i)__builtin_ia32_gatherq_q256((__v4di)_mm256_undefined_si256(), \
fips202x4.c: ^
fips202x4.c: fips202x4.c:27:11: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
fips202x4.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:1140:49: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: ((__m256i)__builtin_ia32_gatherq_q256((__v4di)_mm256_undefined_si256(), \
fips202x4.c: ^
fips202x4.c: fips202x4.c:27:11: error: always_inline function '_mm256_set1_epi64x' requires target feature 'avx', but would be inlined into function 'keccakx4_absorb_once' that is compiled without support for 'avx'
fips202x4.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:1143:49: note: expanded from macro '_mm256_i64gather_epi64'
fips202x4.c: (__v4di)_mm256_set1_epi64x(-1), (s)))
fips202x4.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx2

Compiler output

Implementation: avx2
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
poly.c: poly.c: In function 'crypto_sign_dilithium3_avx2_constbranchindex_poly_uniform_eta_4x':
poly.c: <command-line>: warning: 'crypto_sign_dilithium3_avx2_constbranchindex_rej_eta_avx' reading 840 bytes from a region of size 704 [-Wstringop-overread]
poly.c: <command-line>: note: in definition of macro 'CRYPTO_NAMESPACE'
poly.c: rejsample.h:24:21: note: in expansion of macro 'DILITHIUM_NAMESPACE'
poly.c: 24 | #define rej_eta_avx DILITHIUM_NAMESPACE(rej_eta_avx)
poly.c: | ^~~~~~~~~~~~~~~~~~~
poly.c: poly.c:596:10: note: in expansion of macro 'rej_eta_avx'
poly.c: 596 | ctr2 = rej_eta_avx(a2->coeffs, buf[2].coeffs);
poly.c: | ^~~~~~~~~~~
poly.c: <command-line>: note: referencing argument 2 of type 'const uint8_t *' {aka 'const unsigned char *'}
poly.c: <command-line>: note: in definition of macro 'CRYPTO_NAMESPACE'
poly.c: rejsample.h:24:21: note: in expansion of macro 'DILITHIUM_NAMESPACE'
poly.c: 24 | #define rej_eta_avx DILITHIUM_NAMESPACE(rej_eta_avx)
poly.c: | ^~~~~~~~~~~~~~~~~~~
poly.c: poly.c:596:10: note: in expansion of macro 'rej_eta_avx'
poly.c: 596 | ctr2 = rej_eta_avx(a2->coeffs, buf[2].coeffs);
poly.c: | ^~~~~~~~~~~
poly.c: <command-line>: note: in a call to function 'crypto_sign_dilithium3_avx2_constbranchindex_rej_eta_avx'
poly.c: <command-line>: note: in definition of macro 'CRYPTO_NAMESPACE'
poly.c: rejsample.h:24:21: note: in expansion of macro 'DILITHIUM_NAMESPACE'
poly.c: 24 | #define rej_eta_avx DILITHIUM_NAMESPACE(rej_eta_avx)
poly.c: | ^~~~~~~~~~~~~~~~~~~
poly.c: rejsample.h:25:14: note: in expansion of macro 'rej_eta_avx'
poly.c: 25 | unsigned int rej_eta_avx(int32_t *r, const uint8_t buf[REJ_UNIFORM_BUFLEN]);
poly.c: | ^~~~~~~~~~~
poly.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE avx2