Implementation notes: amd64, comet, crypto_kem/ntskem1264

Computer: comet
Microarchitecture: amd64; Comet Lake (806ec)
Architecture: amd64
CPU ID: GenuineIntel-000806ec-bfebfbff
SUPERCOP version: 20240425
Operation: crypto_kem
Primitive: ntskem1264
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
37166398444 6228 16119259 7132 1728T:avx2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
408785163781 6228 16184131 7132 1824T:sse2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
40895892992 6228 16113923 7132 1728T:sse2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
430003113496 6228 16133758 7076 1792T:avx2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024050820240425
44466866934 6228 1685398 7076 1792T:avx2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024050820240425
44914762391 6228 1681009 7124 1824T:avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
46067565158 6228 1683054 7076 1792T:avx2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024050820240425
47367684871 6228 16102787 7132 1728T:avx2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
47923058006 6228 1676625 7124 1824T:sse2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
55170480066 6228 1698027 7132 1728T:sse2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
556220141768 6228 16163691 7132 1824T:optclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
55685560305 6228 1677141 7068 1760T:avx2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024050820240425
577934150971 6228 16171179 7132 1728T:sse2clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
641345101245 6228 16123126 7076 1792T:optgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024050820240425
67308091202 6228 16113707 7132 1728T:optclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
69994556088 6228 1676158 7076 1792T:optgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024050820240425
70939453374 6228 1673425 7124 1824T:optclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
728033145497 6228 16167059 7132 1728T:optclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
74409153881 6228 1673358 7076 1792T:optgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024050820240425
76968849696 6228 1667981 7068 1760T:optgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024050820240425
81570875587 6228 1695059 7132 1728T:optclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
601370963869 76 1685814 900 1792T:refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024050820240425
603052138066 76 1660035 972 1728T:refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
607990152754 76 1675547 972 1728T:refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
618507558380 76 1681243 972 1824T:refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
669877324737 76 1644766 900 1792T:refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024050820240425
678402229161 76 1648611 972 1728T:refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
707763422650 76 1642857 964 1824T:refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024050820240425
728079622358 76 1641838 900 1792T:refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024050820240425
752753619046 76 1637317 892 1760T:refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024050820240425

Test failure

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
error 111

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
bitslice_fft_256.c: bitslice_fft_256.c:87:25: error: always_inline function '_mm256_set_epi64x' requires target feature 'avx', but would be inlined into function 'bitslice_butterflies12_256' that is compiled without support for 'avx'
bitslice_fft_256.c: out[i][b] = _mm256_set_epi64x(-((in[0][b] >> reversal[4*i+3]) & 1),
bitslice_fft_256.c: ^
bitslice_fft_256.c: bitslice_fft_256.c:87:25: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
bitslice_fft_256.c: bitslice_fft_256.c:99:22: error: '__builtin_ia32_pshufd256' needs target feature avx2
bitslice_fft_256.c: vb = _mm256_shuffle_epi32(tmp[b], _MM_SHUFFLE(3, 2, 3, 2));
bitslice_fft_256.c: ^
bitslice_fft_256.c: /usr/lib/llvm-14/lib/clang/14.0.6/include/avx2intrin.h:470:13: note: expanded from macro '_mm256_shuffle_epi32'
bitslice_fft_256.c: ((__m256i)__builtin_ia32_pshufd256((__v8si)(__m256i)(a), (int)(imm)))
bitslice_fft_256.c: ^
bitslice_fft_256.c: bitslice_fft_256.c:100:22: error: '__builtin_ia32_pslldqi256_byteshift' needs target feature avx2
bitslice_fft_256.c: va = _mm256_slli_si256(out[k][b], 8);
bitslice_fft_256.c: ^
bitslice_fft_256.c: /usr/lib/llvm-14/lib/clang/14.0.6/include/avx2intrin.h:497:13: note: expanded from macro '_mm256_slli_si256'
bitslice_fft_256.c: ((__m256i)__builtin_ia32_pslldqi256_byteshift((__v4di)(__m256i)(a), (int)(imm)))
bitslice_fft_256.c: ^
bitslice_fft_256.c: bitslice_fft_256.c:101:22: error: always_inline function '_mm256_xor_si256' requires target feature 'avx2', but would be inlined into function 'bitslice_butterflies12_256' that is compiled without support for 'avx2'
bitslice_fft_256.c: vb = _mm256_xor_si256(va, vb);
bitslice_fft_256.c: ^
bitslice_fft_256.c: bitslice_fft_256.c:101:22: error: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
bitslice_fft_256.c: bitslice_fft_256.c:102:29: error: always_inline function '_mm256_xor_si256' requires target feature 'avx2', but would be inlined into function 'bitslice_butterflies12_256' that is compiled without support for 'avx2'
bitslice_fft_256.c: out[k][b] = _mm256_xor_si256(out[k][b], vb);
bitslice_fft_256.c: ^
bitslice_fft_256.c: bitslice_fft_256.c:102:29: error: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
bitslice_fft_256.c: bitslice_fft_256.c:112:22: error: always_inline function '_mm256_set_epi64x' requires target feature 'avx', but would be inlined into function 'bitslice_butterflies12_256' that is compiled without support for 'avx'
bitslice_fft_256.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:sse2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
m4r.c: m4r.c: In function 'zero_vector':
m4r.c: m4r.c:85:20: error: incompatible types when assigning to type 'vector' {aka '__m128i'} from type '__m256i'
m4r.c: 85 | *vec_ptr = _mm256_setzero_si256(); vec_ptr++;
m4r.c: | ^~~~~~~~~~~~~~~~~~~~
m4r.c: m4r.c:86:20: error: incompatible types when assigning to type 'vector' {aka '__m128i'} from type '__m256i'
m4r.c: 86 | *vec_ptr = _mm256_setzero_si256(); vec_ptr++;
m4r.c: | ^~~~~~~~~~~~~~~~~~~~
m4r.c: m4r.c: In function '_m4ri_make_table_rev':
m4r.c: m4r.c:147:12: error: incompatible types when assigning to type 'vector' {aka '__m128i'} from type '__m256i'
m4r.c: 147 | mask = _mm256_set_epi64x(v[3], v[2], v[1], v[0]);
m4r.c: | ^~~~~~~~~~~~~~~~~
m4r.c: m4r.c:196:46: error: incompatible type for argument 1 of '_mm256_and_si256'
m4r.c: 196 | S_ptr[nblocks-1] = _mm256_and_si256(S_ptr[nblocks-1], mask);
m4r.c: | ~~~~~^~~~~~~~~~~
m4r.c: | |
m4r.c: | vector {aka __m128i}
m4r.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/12/include/immintrin.h:47,
m4r.c: from bits.h:28,
m4r.c: from m4r.c:26:
m4r.c: /usr/lib/gcc/x86_64-linux-gnu/12/include/avx2intrin.h:179:27: note: expected '__m256i' but argument is of type 'vector' {aka '__m128i'}
m4r.c: 179 | _mm256_and_si256 (__m256i __A, __m256i __B)
m4r.c: | ~~~~~~~~^~~
m4r.c: m4r.c:196:59: error: incompatible type for argument 2 of '_mm256_and_si256'
m4r.c: 196 | S_ptr[nblocks-1] = _mm256_and_si256(S_ptr[nblocks-1], mask);
m4r.c: | ^~~~
m4r.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse2