Implementation notes: amd64, rumba5, crypto_kem/ntskem13136

Computer: rumba5
Microarchitecture: amd64; Zen (800f11)
Architecture: amd64
CPU ID: AuthenticAMD-00800f11-178bfbff
SUPERCOP version: 20240625
Operation: crypto_kem
Primitive: ntskem13136
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
1872823148917 84 16169565 980 1728T:avx2clang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
1969300100671 84 16121477 980 1728T:avx2clang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
201081094297 84 16115213 980 1728T:sse2clang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2017501107501 84 16125629 980 1728T:avx2clang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2091590140100 84 16160925 980 1728T:sse2clang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
211928375738 84 1694423 972 1824T:sse2clang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2156533100688 84 16118821 980 1728T:sse2clang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
227330687887 84 16105879 956 1792T:avx2gcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
227572181901 84 16100455 972 1824T:avx2clang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2277394112435 84 16132295 956 1792T:avx2gcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
232553890447 84 16109191 956 1792T:avx2gcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2482750123609 84 16145941 980 1728T:optclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
287708483238 84 16100326 948 1760T:avx2gcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
307459975403 84 1695863 956 1792T:optgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
3129555165809 84 16187413 980 1728T:optclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
313388999967 84 16121535 956 1792T:optgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
318956292546 84 16114965 980 1728T:optclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
325237672052 84 1692007 972 1824T:optclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
326673072923 84 1692647 956 1792T:optgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
327933596659 84 16116317 980 1728T:optclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
380030969946 84 1688534 948 1760T:optgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2366135338672 76 1660581 964 1728T:refclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2471964940692 76 1663829 964 1728T:refclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2488963938064 76 1660789 964 1728T:refclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2667554522846 76 1643031 956 1824T:refclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2727875734265 76 1655839 924 1792T:refgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2732324629589 76 1649205 964 1728T:refclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2809008626942 76 1647351 924 1792T:refgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2818852321157 76 1639718 916 1760T:refgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2987648123869 76 1643559 924 1792T:refgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625

Compiler output


bitslice_bma_128.c: bitslice_bma_128.c:315:21: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'bitslice_bma' that is compiled without support for 'ssse3'
bitslice_bma_128.c:         out[1][j] = _mm_shuffle_epi8(out[0][j], mask);
bitslice_bma_128.c:                     ^
bitslice_bma_128.c: bitslice_bma_128.c:316:21: error: '__builtin_ia32_palignr128' needs target feature ssse3
bitslice_bma_128.c:         out[0][j] = _mm_alignr_epi8(out[0][j], psi[0][j], 15);
bitslice_bma_128.c:                     ^
bitslice_bma_128.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/tmmintrin.h:152:13: note: expanded from macro '_mm_alignr_epi8'
bitslice_bma_128.c:   ((__m128i)__builtin_ia32_palignr128((__v16qi)(__m128i)(a), \
bitslice_bma_128.c:             ^
bitslice_bma_128.c: 2 errors generated.

Number of similar (implementation,compiler) pairs: 2, namely:
ImplementationCompiler
T:avx2clang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:sse2clang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)

Compiler output


m4r.c: m4r.c: In function 'zero_vector':
m4r.c: m4r.c:85:20: error: incompatible types when assigning to type 'vector' {aka '__m128i'} from type '__m256i'
m4r.c:    85 |         *vec_ptr = _mm256_setzero_si256(); vec_ptr++;
m4r.c:       |                    ^~~~~~~~~~~~~~~~~~~~
m4r.c: m4r.c:86:20: error: incompatible types when assigning to type 'vector' {aka '__m128i'} from type '__m256i'
m4r.c:    86 |         *vec_ptr = _mm256_setzero_si256(); vec_ptr++;
m4r.c:       |                    ^~~~~~~~~~~~~~~~~~~~
m4r.c: m4r.c: In function '_m4ri_make_table_rev':
m4r.c: m4r.c:147:12: error: incompatible types when assigning to type 'vector' {aka '__m128i'} from type '__m256i'
m4r.c:   147 |     mask = _mm256_set_epi64x(v[3], v[2], v[1], v[0]);
m4r.c:       |            ^~~~~~~~~~~~~~~~~
m4r.c: m4r.c:196:46: error: incompatible type for argument 1 of '_mm256_and_si256'
m4r.c:   196 |     S_ptr[nblocks-1] = _mm256_and_si256(S_ptr[nblocks-1], mask);
m4r.c:       |                                         ~~~~~^~~~~~~~~~~
m4r.c:       |                                              |
m4r.c:       |                                              vector {aka __m128i}
m4r.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:47,
m4r.c:                  from bits.h:28,
m4r.c:                  from m4r.c:26:
m4r.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/avx2intrin.h:179:27: note: expected '__m256i' but argument is of type 'vector' {aka '__m128i'}
m4r.c:   179 | _mm256_and_si256 (__m256i __A, __m256i __B)
m4r.c:       |                   ~~~~~~~~^~~
m4r.c: m4r.c:196:59: error: incompatible type for argument 2 of '_mm256_and_si256'
m4r.c:   196 |     S_ptr[nblocks-1] = _mm256_and_si256(S_ptr[nblocks-1], mask);
m4r.c:       |                                                           ^~~~
m4r.c: ...

Number of similar (implementation,compiler) pairs: 4, namely:
ImplementationCompiler
T:sse2gcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:sse2gcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:sse2gcc -march=native -mtune=native -O -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:sse2gcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)