Implementation notes: amd64, rumba7, crypto_kem/ntskem13136

Computer: rumba7
Microarchitecture: amd64; Zen (800f11)
Architecture: amd64
CPU ID: AuthenticAMD-00800f11-178bfbff
SUPERCOP version: 20240625
Operation: crypto_kem
Primitive: ntskem13136
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
1736007140100 84 16160869 980 1728T:sse2clang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
1870820148917 84 16169573 980 1728T:avx2clang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
1935694100671 84 16121485 980 1728T:avx2clang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
1983826107501 84 16125573 980 1728T:avx2clang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
198780081901 84 16100463 972 1824T:avx2clang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
200391794297 84 16115221 980 1728T:sse2clang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
210080675738 84 1694367 972 1824T:sse2clang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2131824100688 84 16118829 980 1728T:sse2clang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2272144112435 84 16132303 956 1792T:avx2gcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
233567190447 84 16109199 956 1792T:avx2gcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
238348587887 84 16105887 956 1792T:avx2gcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2488172123609 84 16145949 980 1728T:optclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
284613383238 84 16100334 948 1760T:avx2gcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
314595799967 84 16121543 956 1792T:optgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
319027175403 84 1695871 956 1792T:optgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
323996272923 84 1692655 956 1792T:optgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
325170372052 84 1692015 972 1824T:optclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
326607592546 84 16114973 980 1728T:optclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
326993696659 84 16116325 980 1728T:optclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
3336595165809 84 16187421 980 1728T:optclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
379153069946 84 1688542 948 1760T:optgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2362425338672 76 1660589 964 1728T:refclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2474399240692 76 1663837 964 1728T:refclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2495475538064 76 1660797 964 1728T:refclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2683056422846 76 1643039 956 1824T:refclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2727536734265 76 1655847 924 1792T:refgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2768593029589 76 1649213 964 1728T:refclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2811789226942 76 1647295 924 1792T:refgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
2821633421157 76 1639662 916 1760T:refgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
3010257123869 76 1643503 924 1792T:refgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
bitslice_bma_128.c: bitslice_bma_128.c:315:21: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'bitslice_bma' that is compiled without support for 'ssse3'
bitslice_bma_128.c:         out[1][j] = _mm_shuffle_epi8(out[0][j], mask);
bitslice_bma_128.c:                     ^
bitslice_bma_128.c: bitslice_bma_128.c:316:21: error: '__builtin_ia32_palignr128' needs target feature ssse3
bitslice_bma_128.c:         out[0][j] = _mm_alignr_epi8(out[0][j], psi[0][j], 15);
bitslice_bma_128.c:                     ^
bitslice_bma_128.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/tmmintrin.h:152:13: note: expanded from macro '_mm_alignr_epi8'
bitslice_bma_128.c:   ((__m128i)__builtin_ia32_palignr128((__v16qi)(__m128i)(a), \
bitslice_bma_128.c:             ^
bitslice_bma_128.c: 2 errors generated.

Number of similar (implementation,compiler) pairs: 2, namely:
ImplementationCompiler
T:avx2clang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:sse2clang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)

Compiler output

Implementation: T:sse2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
m4r.c: m4r.c: In function 'zero_vector':
m4r.c: m4r.c:85:20: error: incompatible types when assigning to type 'vector' {aka '__m128i'} from type '__m256i'
m4r.c:    85 |         *vec_ptr = _mm256_setzero_si256(); vec_ptr++;
m4r.c:       |                    ^~~~~~~~~~~~~~~~~~~~
m4r.c: m4r.c:86:20: error: incompatible types when assigning to type 'vector' {aka '__m128i'} from type '__m256i'
m4r.c:    86 |         *vec_ptr = _mm256_setzero_si256(); vec_ptr++;
m4r.c:       |                    ^~~~~~~~~~~~~~~~~~~~
m4r.c: m4r.c: In function '_m4ri_make_table_rev':
m4r.c: m4r.c:147:12: error: incompatible types when assigning to type 'vector' {aka '__m128i'} from type '__m256i'
m4r.c:   147 |     mask = _mm256_set_epi64x(v[3], v[2], v[1], v[0]);
m4r.c:       |            ^~~~~~~~~~~~~~~~~
m4r.c: m4r.c:196:46: error: incompatible type for argument 1 of '_mm256_and_si256'
m4r.c:   196 |     S_ptr[nblocks-1] = _mm256_and_si256(S_ptr[nblocks-1], mask);
m4r.c:       |                                         ~~~~~^~~~~~~~~~~
m4r.c:       |                                              |
m4r.c:       |                                              vector {aka __m128i}
m4r.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:47,
m4r.c:                  from bits.h:28,
m4r.c:                  from m4r.c:26:
m4r.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/avx2intrin.h:179:27: note: expected '__m256i' but argument is of type 'vector' {aka '__m128i'}
m4r.c:   179 | _mm256_and_si256 (__m256i __A, __m256i __B)
m4r.c:       |                   ~~~~~~~~^~~
m4r.c: m4r.c:196:59: error: incompatible type for argument 2 of '_mm256_and_si256'
m4r.c:   196 |     S_ptr[nblocks-1] = _mm256_and_si256(S_ptr[nblocks-1], mask);
m4r.c:       |                                                           ^~~~
m4r.c: ...

Number of similar (implementation,compiler) pairs: 4, namely:
ImplementationCompiler
T:sse2gcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:sse2gcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:sse2gcc -march=native -mtune=native -O -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:sse2gcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)