Implementation notes: amd64, speed2supercop, crypto_kem/ntskem1264

Computer: speed2supercop
Microarchitecture: amd64; Haswell+AES (306c3)
Architecture: amd64
CPU ID: GenuineIntel-000306c3-1fc9cbf5
SUPERCOP version: 20240625
Operation: crypto_kem
Primitive: ntskem1264
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
473884129383 6228 16149003 7048 1632T:avx2gcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
48721266162 6228 1683811 7048 1632T:avx2gcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
50790864349 6228 1681703 7048 1632T:avx2gcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
525028146956 6228 16166359 7072 1568T:avx2clang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
60040060252 6228 1676750 7040 1600T:avx2gcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
643192143357 6228 16162871 7072 1568T:sse2clang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
64510092329 6228 16112575 7072 1568T:sse2clang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
665956126604 6228 16147591 7072 1568T:optclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
69890055131 6228 1674507 7048 1632T:optgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
71694460786 6228 1678681 7064 1664T:sse2clang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
721440116424 6228 16137699 7048 1632T:optgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
729048126347 6228 16146120 7072 1568T:sse2clang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
74974488918 6228 16110735 7072 1568T:optclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
75392069892 6228 1687168 7072 1568T:sse2clang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
76688056281 6228 1675697 7064 1664T:optclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
78812452878 6228 1671815 7048 1632T:optgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
790932113285 6228 16134528 7072 1568T:optclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
83752849627 6228 1667526 7040 1600T:optgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
85844865805 6228 1684584 7072 1568T:optclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
648557279195 76 16100499 872 1632T:refgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
700283654840 76 1676783 912 1568T:refclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
707316048914 76 1670999 912 1568T:refclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
707835624550 76 1643883 872 1632T:refgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
736909636580 76 1658120 912 1568T:refclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
757715622153 76 1641071 872 1632T:refgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
811586018969 76 1636854 864 1600T:refgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
814012829455 76 1648216 912 1568T:refclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625
859272422744 76 1642273 904 1664T:refclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071220240625

Test failure


error 111

Number of similar (implementation,compiler) pairs: 3, namely:
ImplementationCompiler
T:avx2clang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:avx2clang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:avx2clang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))

Compiler output


nts_kem.c: nts_kem.c:112:27: warning: passing arguments to 'ff_create' without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
nts_kem.c:     priv->ff2m = ff_create(priv->m);
nts_kem.c:                           ^
nts_kem.c: nts_kem.c:255:27: warning: passing arguments to 'ff_create' without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
nts_kem.c:     priv->ff2m = ff_create(priv->m);
nts_kem.c:                           ^
nts_kem.c: 2 warnings generated.

Number of similar (implementation,compiler) pairs: 9, namely:
ImplementationCompiler
T:avx2clang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:avx2clang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:avx2clang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:avx2clang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:sse2clang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:sse2clang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:sse2clang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:sse2clang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:sse2clang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))

Compiler output


bitslice_fft_256.c: bitslice_fft_256.c:87:25: error: always_inline function '_mm256_set_epi64x' requires target feature 'avx', but would be inlined into function 'bitslice_butterflies12_256' that is compiled without support for 'avx'
bitslice_fft_256.c:             out[i][b] = _mm256_set_epi64x(-((in[0][b] >> reversal[4*i+3]) & 1),
bitslice_fft_256.c:                         ^
bitslice_fft_256.c: bitslice_fft_256.c:87:25: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
bitslice_fft_256.c: bitslice_fft_256.c:99:22: error: '__builtin_ia32_pshufd256' needs target feature avx2
bitslice_fft_256.c:                 vb = _mm256_shuffle_epi32(tmp[b], _MM_SHUFFLE(3, 2, 3, 2));
bitslice_fft_256.c:                      ^
bitslice_fft_256.c: /usr/lib/llvm-16/lib/clang/16/include/avx2intrin.h:470:13: note: expanded from macro '_mm256_shuffle_epi32'
bitslice_fft_256.c:   ((__m256i)__builtin_ia32_pshufd256((__v8si)(__m256i)(a), (int)(imm)))
bitslice_fft_256.c:             ^
bitslice_fft_256.c: bitslice_fft_256.c:100:22: error: '__builtin_ia32_pslldqi256_byteshift' needs target feature avx2
bitslice_fft_256.c:                 va = _mm256_slli_si256(out[k][b], 8);
bitslice_fft_256.c:                      ^
bitslice_fft_256.c: /usr/lib/llvm-16/lib/clang/16/include/avx2intrin.h:497:13: note: expanded from macro '_mm256_slli_si256'
bitslice_fft_256.c:   ((__m256i)__builtin_ia32_pslldqi256_byteshift((__v4di)(__m256i)(a), (int)(imm)))
bitslice_fft_256.c:             ^
bitslice_fft_256.c: bitslice_fft_256.c:101:22: error: always_inline function '_mm256_xor_si256' requires target feature 'avx2', but would be inlined into function 'bitslice_butterflies12_256' that is compiled without support for 'avx2'
bitslice_fft_256.c:                 vb = _mm256_xor_si256(va, vb);
bitslice_fft_256.c:                      ^
bitslice_fft_256.c: bitslice_fft_256.c:101:22: error: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
bitslice_fft_256.c: bitslice_fft_256.c:102:29: error: always_inline function '_mm256_xor_si256' requires target feature 'avx2', but would be inlined into function 'bitslice_butterflies12_256' that is compiled without support for 'avx2'
bitslice_fft_256.c:                 out[k][b] = _mm256_xor_si256(out[k][b], vb);
bitslice_fft_256.c:                             ^
bitslice_fft_256.c: bitslice_fft_256.c:102:29: error: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
bitslice_fft_256.c: bitslice_fft_256.c:112:22: error: always_inline function '_mm256_set_epi64x' requires target feature 'avx', but would be inlined into function 'bitslice_butterflies12_256' that is compiled without support for 'avx'
bitslice_fft_256.c: ...

Number of similar (implementation,compiler) pairs: 1, namely:
ImplementationCompiler
T:avx2clang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))

Compiler output


nts_kem.c: nts_kem.c:106:27: warning: passing arguments to 'ff_create' without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
nts_kem.c:     priv->ff2m = ff_create(priv->m);
nts_kem.c:                           ^
nts_kem.c: nts_kem.c:228:27: warning: passing arguments to 'ff_create' without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
nts_kem.c:     priv->ff2m = ff_create(priv->m);
nts_kem.c:                           ^
nts_kem.c: 2 warnings generated.

Number of similar (implementation,compiler) pairs: 5, namely:
ImplementationCompiler
T:optclang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:optclang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:optclang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:optclang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:optclang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))

Compiler output


nts_kem.c: nts_kem.c:106:27: warning: passing arguments to 'ff_create' without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
nts_kem.c:     priv->ff2m = ff_create(priv->m);
nts_kem.c:                           ^
nts_kem.c: nts_kem.c:242:27: warning: passing arguments to 'ff_create' without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
nts_kem.c:     priv->ff2m = ff_create(priv->m);
nts_kem.c:                           ^
nts_kem.c: 2 warnings generated.

Number of similar (implementation,compiler) pairs: 5, namely:
ImplementationCompiler
T:refclang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:refclang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:refclang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:refclang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))
T:refclang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))

Compiler output


m4r.c: m4r.c: In function 'zero_vector':
m4r.c: m4r.c:85:20: error: incompatible types when assigning to type 'vector' {aka '__m128i'} from type '__m256i'
m4r.c:    85 |         *vec_ptr = _mm256_setzero_si256(); vec_ptr++;
m4r.c:       |                    ^~~~~~~~~~~~~~~~~~~~
m4r.c: m4r.c:86:20: error: incompatible types when assigning to type 'vector' {aka '__m128i'} from type '__m256i'
m4r.c:    86 |         *vec_ptr = _mm256_setzero_si256(); vec_ptr++;
m4r.c:       |                    ^~~~~~~~~~~~~~~~~~~~
m4r.c: m4r.c: In function '_m4ri_make_table_rev':
m4r.c: m4r.c:147:12: error: incompatible types when assigning to type 'vector' {aka '__m128i'} from type '__m256i'
m4r.c:   147 |     mask = _mm256_set_epi64x(v[3], v[2], v[1], v[0]);
m4r.c:       |            ^~~~~~~~~~~~~~~~~
m4r.c: m4r.c:196:46: error: incompatible type for argument 1 of '_mm256_and_si256'
m4r.c:   196 |     S_ptr[nblocks-1] = _mm256_and_si256(S_ptr[nblocks-1], mask);
m4r.c:       |                                         ~~~~~^~~~~~~~~~~
m4r.c:       |                                              |
m4r.c:       |                                              vector {aka __m128i}
m4r.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/13/include/immintrin.h:51,
m4r.c:                  from bits.h:28,
m4r.c:                  from m4r.c:26:
m4r.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/avx2intrin.h:179:27: note: expected '__m256i' but argument is of type 'vector' {aka '__m128i'}
m4r.c:   179 | _mm256_and_si256 (__m256i __A, __m256i __B)
m4r.c:       |                   ~~~~~~~~^~~
m4r.c: m4r.c:196:59: error: incompatible type for argument 2 of '_mm256_and_si256'
m4r.c:   196 |     S_ptr[nblocks-1] = _mm256_and_si256(S_ptr[nblocks-1], mask);
m4r.c:       |                                                           ^~~~
m4r.c: ...

Number of similar (implementation,compiler) pairs: 4, namely:
ImplementationCompiler
T:sse2gcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (13.3.0)
T:sse2gcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (13.3.0)
T:sse2gcc -march=native -mtune=native -O -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (13.3.0)
T:sse2gcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (13.3.0)