Implementation notes: amd64, bolero, crypto_kem/bikel3

Computer: bolero
Microarchitecture: amd64; Broadwell+AES (406f1)
Architecture: amd64
CPU ID: GenuineIntel-000406f1-1fc9cbf5
SUPERCOP version: 20240107
Operation: crypto_kem
Primitive: bikel3
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
6007684108664 72 4130900 904 1580T:ches2021clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
6186012148824 72 4171572 904 1580T:ches2021clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
6539764104749 72 4125974 864 1612T:ches2021gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
671415254498 72 473422 896 1644T:ches2021clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
685946061251 72 479852 904 1580T:ches2021clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
735311675394 72 494750 864 1612T:ches2021gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
769136072209 72 491134 864 1612T:ches2021gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
887177238226 64 461068 896 1580T:avx2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
891430855807 72 473750 856 1580T:ches2021gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
915474828890 64 451236 896 1580T:avx2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
934240054447 64 475646 856 1612T:avx2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
1045110040861 56 463636 888 1580T:aes-ni-and-pclmulclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
1303908416601 64 435758 888 1644T:avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
1531276417212 64 435932 896 1580T:avx2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
1616485234789 64 453630 856 1612T:avx2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
1620230435364 64 454702 856 1612T:avx2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
1760668831988 64 449894 848 1580T:avx2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
1840862829825 56 452292 888 1580T:aes-ni-and-pclmulclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
2100270072270 56 493582 848 1612T:aes-ni-and-pclmulgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
2494661215746 56 434758 880 1644T:aes-ni-and-pclmulclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
2790624815945 56 434660 888 1580T:aes-ni-and-pclmulclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
2903612043771 56 462998 848 1612T:aes-ni-and-pclmulgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
2939873243015 56 461814 848 1612T:aes-ni-and-pclmulgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
3001608030885 56 448758 840 1580T:aes-ni-and-pclmulgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
3637650440842 56 463612 888 1580T:aes-ni-onlyclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
3672331237772 48 461126 936 1580T:portableclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
4302216832817 48 454542 936 1580T:portableclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
4457700088094 56 4109406 848 1612T:aes-ni-onlygcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
4475115629806 56 452268 888 1580T:aes-ni-onlyclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
4491304027299 48 450342 936 1580T:portableclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
4554719284269 48 4106247 912 1612T:portablegcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
5241588813044 48 432720 928 1644T:portableclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
5281990815349 56 434390 880 1644T:aes-ni-onlyclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
5647076815411 56 434132 888 1580T:aes-ni-onlyclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
5786787613080 48 432366 936 1580T:portableclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
5864200040436 48 460359 912 1612T:portablegcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
5873220043283 56 462510 848 1612T:aes-ni-onlygcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
5925261230565 56 448358 840 1580T:aes-ni-onlygcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
5962972428222 48 446711 904 1580T:portablegcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
6059356439936 48 459431 912 1612T:portablegcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
6106581642271 56 461070 848 1612T:aes-ni-onlygcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212

Compiler output

Implementation: T:aes-ni-and-pclmul
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
aes.c: aes.c:9:4: error: "This code requries support for AES_NI and SSSE3"
aes.c: # error "This code requries support for AES_NI and SSSE3"
aes.c: ^
aes.c: 1 error generated.

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:aes-ni-and-pclmul T:aes-ni-only T:avx2 T:avx512 T:avx512-vpclmul T:ches2021

Compiler output

Implementation: T:avx512
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:22:10: error: always_inline function '_mm512_loadu_si512' requires target feature 'avx512f', but would be inlined into function 'gf2x_mod_add' that is compiled without support for 'avx512f'
decode.c: va = LOAD(&a_qwords[i]);
decode.c: ^
decode.c: ./x86_64_intrinsic.h:40:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm512_loadu_si512((mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:22:10: error: AVX vector return of type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled changes the ABI
decode.c: ./x86_64_intrinsic.h:40:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm512_loadu_si512((mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:23:10: error: always_inline function '_mm512_loadu_si512' requires target feature 'avx512f', but would be inlined into function 'gf2x_mod_add' that is compiled without support for 'avx512f'
decode.c: vb = LOAD(&b_qwords[i]);
decode.c: ^
decode.c: ./x86_64_intrinsic.h:40:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm512_loadu_si512((mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:23:10: error: AVX vector return of type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled changes the ABI
decode.c: ./x86_64_intrinsic.h:40:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm512_loadu_si512((mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ...

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512-vpclmul
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512-vpclmul
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512-vpclmul
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512-vpclmul

Compiler output

Implementation: T:avx512
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
decode.c: In file included from decode.c:39:
decode.c: gf2x.h: In function 'gf2x_mod_add':
decode.c: gf2x.h:22:8: warning: AVX512F vector return without AVX512F enabled changes the ABI [-Wpsabi]
decode.c: 22 | va = LOAD(&a_qwords[i]);
decode.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:49,
decode.c: from x86_64_intrinsic.h:20,
decode.c: from defs.h:103,
decode.c: from bike_defs.h:10,
decode.c: from types.h:13,
decode.c: from decode.h:10,
decode.c: from decode.c:37:
decode.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/avx512fintrin.h:6481:1: error: inlining failed in call to 'always_inline' '_mm512_storeu_si512': target specific option mismatch
decode.c: 6481 | _mm512_storeu_si512 (void *__P, __m512i __A)
decode.c: | ^~~~~~~~~~~~~~~~~~~
decode.c: In file included from defs.h:103,
decode.c: from bike_defs.h:10,
decode.c: from types.h:13,
decode.c: from decode.h:10,
decode.c: from decode.c:37:
decode.c: x86_64_intrinsic.h:41:27: note: called from here
decode.c: 41 | # define STORE(mem, reg) _mm512_storeu_si512((mem), (reg))
decode.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
decode.c: gf2x.h:25:5: note: in expansion of macro 'STORE'
decode.c: 25 | STORE(&c_qwords[i], va ^ vb);
decode.c: | ^~~~~
decode.c: ...

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512-vpclmul
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512-vpclmul
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512-vpclmul
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512-vpclmul