Implementation notes: amd64, nucnuc, crypto_kem/bikel3

Computer: nucnuc
Microarchitecture: amd64; Airmont (406c3)
Architecture: amd64
CPU ID: GenuineIntel-000406c3-bfebfbff
SUPERCOP version: 20240107
Operation: crypto_kem
Primitive: bikel3
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
4168991933855 56 453748 884 1724T:aes-ni-and-pclmulclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024012420231217
5565596222315 56 440508 884 1724T:aes-ni-and-pclmulclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024012420231217
5757383978540 56 498248 852 1756T:aes-ni-and-pclmulgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024012420231217
5820168215378 56 431938 876 1724T:aes-ni-and-pclmulclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024012420231217
6645754916627 56 434354 876 1724T:aes-ni-and-pclmulclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024012420231217
6995075445273 56 463808 852 1756T:aes-ni-and-pclmulgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024012420231217
7563745629382 56 445928 844 1724T:aes-ni-and-pclmulgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024012420231217
7813672443618 56 461520 852 1756T:aes-ni-and-pclmulgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024012420231217
9692267930237 48 450754 932 1724T:portableclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024012420231217
9699346933433 56 453276 884 1724T:aes-ni-onlyclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024012420231217
9803237632906 48 452994 932 1724T:portableclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024012420231217
107096247100167 48 4120427 916 1756T:portablegcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024012420231217
107129024104176 56 4123856 852 1756T:aes-ni-onlygcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024012420231217
11079624719446 48 438306 932 1724T:portableclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024012420231217
11090276221893 56 440100 884 1724T:aes-ni-onlyclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024012420231217
14605865812129 48 429352 924 1724T:portableclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024012420231217
14608913214354 56 430938 876 1724T:aes-ni-onlyclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024012420231217
15693669114105 48 432480 924 1724T:portableclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024012420231217
15717974816221 56 433954 876 1724T:aes-ni-onlyclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024012420231217
16000563241637 48 460763 916 1756T:portablegcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024012420231217
16012187544389 56 462904 852 1756T:aes-ni-onlygcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024012420231217
16712805829064 56 445504 844 1724T:aes-ni-onlygcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024012420231217
16729553926856 48 443931 908 1724T:portablegcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024012420231217
16911957540296 48 458875 916 1756T:portablegcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024012420231217
16918787242519 56 460424 852 1756T:aes-ni-onlygcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024012420231217

Compiler output

Implementation: T:aes-ni-and-pclmul
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
aes.c: aes.c:9:4: error: "This code requries support for AES_NI and SSSE3"
aes.c: # error "This code requries support for AES_NI and SSSE3"
aes.c: ^
aes.c: 1 error generated.

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:aes-ni-and-pclmul T:aes-ni-only T:avx2 T:avx512 T:avx512-vpclmul T:ches2021

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:22:10: error: always_inline function '_mm256_loadu_si256' requires target feature 'avx', but would be inlined into function 'gf2x_mod_add' that is compiled without support for 'avx'
decode.c: va = LOAD(&a_qwords[i]);
decode.c: ^
decode.c: ./x86_64_intrinsic.h:30:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm256_loadu_si256((const void *)(mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:22:10: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
decode.c: ./x86_64_intrinsic.h:30:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm256_loadu_si256((const void *)(mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:23:10: error: always_inline function '_mm256_loadu_si256' requires target feature 'avx', but would be inlined into function 'gf2x_mod_add' that is compiled without support for 'avx'
decode.c: vb = LOAD(&b_qwords[i]);
decode.c: ^
decode.c: ./x86_64_intrinsic.h:30:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm256_loadu_si256((const void *)(mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:23:10: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
decode.c: ./x86_64_intrinsic.h:30:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm256_loadu_si256((const void *)(mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
decode.c: In file included from decode.c:39:
decode.c: gf2x.h: In function 'gf2x_mod_add':
decode.c: gf2x.h:22:8: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
decode.c: 22 | va = LOAD(&a_qwords[i]);
decode.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/10/include/immintrin.h:51,
decode.c: from x86_64_intrinsic.h:20,
decode.c: from defs.h:103,
decode.c: from bike_defs.h:10,
decode.c: from types.h:13,
decode.c: from decode.h:10,
decode.c: from decode.c:37:
decode.c: /usr/lib/gcc/x86_64-linux-gnu/10/include/avxintrin.h:926:1: error: inlining failed in call to 'always_inline' '_mm256_storeu_si256': target specific option mismatch
decode.c: 926 | _mm256_storeu_si256 (__m256i_u *__P, __m256i __A)
decode.c: | ^~~~~~~~~~~~~~~~~~~
decode.c: In file included from defs.h:103,
decode.c: from bike_defs.h:10,
decode.c: from types.h:13,
decode.c: from decode.h:10,
decode.c: from decode.c:37:
decode.c: x86_64_intrinsic.h:31:27: note: called from here
decode.c: 31 | # define STORE(mem, reg) _mm256_storeu_si256((void *)(mem), (reg))
decode.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
decode.c: gf2x.h:25:5: note: in expansion of macro 'STORE'
decode.c: 25 | STORE(&c_qwords[i], va ^ vb);
decode.c: | ^~~~~
decode.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx512
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:22:10: error: always_inline function '_mm512_loadu_si512' requires target feature 'avx512f', but would be inlined into function 'gf2x_mod_add' that is compiled without support for 'avx512f'
decode.c: va = LOAD(&a_qwords[i]);
decode.c: ^
decode.c: ./x86_64_intrinsic.h:40:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm512_loadu_si512((mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:22:10: error: AVX vector return of type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled changes the ABI
decode.c: ./x86_64_intrinsic.h:40:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm512_loadu_si512((mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:23:10: error: always_inline function '_mm512_loadu_si512' requires target feature 'avx512f', but would be inlined into function 'gf2x_mod_add' that is compiled without support for 'avx512f'
decode.c: vb = LOAD(&b_qwords[i]);
decode.c: ^
decode.c: ./x86_64_intrinsic.h:40:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm512_loadu_si512((mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:23:10: error: AVX vector return of type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled changes the ABI
decode.c: ./x86_64_intrinsic.h:40:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm512_loadu_si512((mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ...

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512-vpclmul
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512-vpclmul
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512-vpclmul
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512-vpclmul

Compiler output

Implementation: T:avx512
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
decode.c: In file included from decode.c:39:
decode.c: gf2x.h: In function 'gf2x_mod_add':
decode.c: gf2x.h:22:8: warning: AVX512F vector return without AVX512F enabled changes the ABI [-Wpsabi]
decode.c: 22 | va = LOAD(&a_qwords[i]);
decode.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/10/include/immintrin.h:55,
decode.c: from x86_64_intrinsic.h:20,
decode.c: from defs.h:103,
decode.c: from bike_defs.h:10,
decode.c: from types.h:13,
decode.c: from decode.h:10,
decode.c: from decode.c:37:
decode.c: /usr/lib/gcc/x86_64-linux-gnu/10/include/avx512fintrin.h:6429:1: error: inlining failed in call to 'always_inline' '_mm512_storeu_si512': target specific option mismatch
decode.c: 6429 | _mm512_storeu_si512 (void *__P, __m512i __A)
decode.c: | ^~~~~~~~~~~~~~~~~~~
decode.c: In file included from defs.h:103,
decode.c: from bike_defs.h:10,
decode.c: from types.h:13,
decode.c: from decode.h:10,
decode.c: from decode.c:37:
decode.c: x86_64_intrinsic.h:41:27: note: called from here
decode.c: 41 | # define STORE(mem, reg) _mm512_storeu_si512((mem), (reg))
decode.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
decode.c: gf2x.h:25:5: note: in expansion of macro 'STORE'
decode.c: 25 | STORE(&c_qwords[i], va ^ vb);
decode.c: | ^~~~~
decode.c: ...

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512-vpclmul
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512-vpclmul
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512-vpclmul
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512-vpclmul

Compiler output

Implementation: T:ches2021
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
decode.c: decode.c:303:20: error: '__builtin_ia32_vec_ext_v4di' needs target feature avx
decode.c: printf("%.16llX", _mm256_extract_epi64(v, 0));
decode.c: ^
decode.c: /usr/lib/llvm-11/lib/clang/11.0.1/include/avxintrin.h:2024:14: note: expanded from macro '_mm256_extract_epi64'
decode.c: (long long)__builtin_ia32_vec_ext_v4di((__v4di)(__m256i)(X), (int)(N))
decode.c: ^
decode.c: decode.c:304:21: error: '__builtin_ia32_vec_ext_v4di' needs target feature avx
decode.c: printf(" %.16llX", _mm256_extract_epi64(v, 1));
decode.c: ^
decode.c: /usr/lib/llvm-11/lib/clang/11.0.1/include/avxintrin.h:2024:14: note: expanded from macro '_mm256_extract_epi64'
decode.c: (long long)__builtin_ia32_vec_ext_v4di((__v4di)(__m256i)(X), (int)(N))
decode.c: ^
decode.c: decode.c:305:21: error: '__builtin_ia32_vec_ext_v4di' needs target feature avx
decode.c: printf(" %.16llX", _mm256_extract_epi64(v, 2));
decode.c: ^
decode.c: /usr/lib/llvm-11/lib/clang/11.0.1/include/avxintrin.h:2024:14: note: expanded from macro '_mm256_extract_epi64'
decode.c: (long long)__builtin_ia32_vec_ext_v4di((__v4di)(__m256i)(X), (int)(N))
decode.c: ^
decode.c: decode.c:306:23: error: '__builtin_ia32_vec_ext_v4di' needs target feature avx
decode.c: printf(" %.16llX\n", _mm256_extract_epi64(v, 3));
decode.c: ^
decode.c: /usr/lib/llvm-11/lib/clang/11.0.1/include/avxintrin.h:2024:14: note: expanded from macro '_mm256_extract_epi64'
decode.c: (long long)__builtin_ia32_vec_ext_v4di((__v4di)(__m256i)(X), (int)(N))
decode.c: ^
decode.c: 4 errors generated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ches2021
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ches2021
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ches2021
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ches2021

Compiler output

Implementation: T:ches2021
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
decode.c: decode.c: In function 'dump1':
decode.c: decode.c:301:6: note: the ABI for passing parameters with 32-byte alignment has changed in GCC 4.6
decode.c: 301 | void dump1(__m256i v)
decode.c: | ^~~~~
decode.c: decode.c: In function 'find_err1':
decode.c: decode.c:351:11: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
decode.c: 351 | buf[j] = LOAD(&syndrome->qw[4 * j]);
decode.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/10/include/immintrin.h:51,
decode.c: from x86_64_intrinsic.h:20,
decode.c: from defs.h:106,
decode.c: from bike_defs.h:10,
decode.c: from types.h:15,
decode.c: from decode.h:17,
decode.c: from decode.c:39:
decode.c: gf2x.h: In function 'gf2x_mod_add':
decode.c: /usr/lib/gcc/x86_64-linux-gnu/10/include/avxintrin.h:926:1: error: inlining failed in call to 'always_inline' '_mm256_storeu_si256': target specific option mismatch
decode.c: 926 | _mm256_storeu_si256 (__m256i_u *__P, __m256i __A)
decode.c: | ^~~~~~~~~~~~~~~~~~~
decode.c: In file included from defs.h:106,
decode.c: from bike_defs.h:10,
decode.c: from types.h:15,
decode.c: from decode.h:17,
decode.c: from decode.c:39:
decode.c: x86_64_intrinsic.h:31:27: note: called from here
decode.c: 31 | # define STORE(mem, reg) _mm256_storeu_si256((void *)(mem), (reg))
decode.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ches2021
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ches2021
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ches2021
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ches2021