Implementation notes: amd64, hydra8, crypto_kem/bikel3

Computer: hydra8
Microarchitecture: amd64; Ivy Bridge+AES (306a9)
Architecture: amd64
CPU ID: GenuineIntel-000306a9-bfebfbff
SUPERCOP version: 20240107
Operation: crypto_kem
Primitive: bikel3
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
1880033145947 56 466312 932 1732T:aes-ni-and-pclmulclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
2709161728133 56 446552 932 1732T:aes-ni-and-pclmulclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
3210990886676 56 4105093 876 1764T:aes-ni-and-pclmulgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
3729041916373 56 432402 924 1732T:aes-ni-and-pclmulclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
3800206816056 56 432824 932 1732T:aes-ni-and-pclmulclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
3929181243078 56 459813 876 1764T:aes-ni-and-pclmulgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
3966026343932 56 461109 876 1764T:aes-ni-and-pclmulgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
4438081030897 56 446637 868 1732T:aes-ni-and-pclmulgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
4544018242946 48 463938 980 1732T:portableclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
4576606946000 56 466368 932 1732T:aes-ni-onlyclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
4736725332817 48 452490 980 1732T:portableclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
5378494325679 48 444722 980 1732T:portableclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
5391036728186 56 446608 932 1732T:aes-ni-onlyclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
57957229101884 56 4120333 876 1764T:aes-ni-onlygcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
5798671198019 48 4117094 940 1764T:portablegcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
8201337115546 56 432272 932 1732T:aes-ni-onlyclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
8219645613215 48 430546 980 1732T:portableclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
8353292640653 48 458494 940 1764T:portablegcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
8358121813939 48 430588 972 1732T:portableclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
8358233443500 56 460661 876 1764T:aes-ni-onlygcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
8367970016116 56 432106 924 1732T:aes-ni-onlyclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121520231212
8532487628274 48 444654 932 1732T:portablegcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
8555782640044 48 457470 940 1764T:portablegcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
8604202242391 56 459109 876 1764T:aes-ni-onlygcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212
8863286430627 56 446285 868 1732T:aes-ni-onlygcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121520231212

Compiler output

Implementation: T:aes-ni-and-pclmul
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
aes.c: aes.c:9:4: error: "This code requries support for AES_NI and SSSE3"
aes.c: # error "This code requries support for AES_NI and SSSE3"
aes.c: ^
aes.c: 1 error generated.

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:aes-ni-and-pclmul T:aes-ni-only T:avx2 T:avx512 T:avx512-vpclmul T:ches2021

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
gf2x_ksqr_avx2.c: gf2x_ksqr_avx2.c:58:9: error: always_inline function '_mm256_sub_epi16' requires target feature 'avx2', but would be inlined into function 'generate_map' that is compiled without support for 'avx2'
gf2x_ksqr_avx2.c: inc = SUB_I16(inc, vr);
gf2x_ksqr_avx2.c: ^
gf2x_ksqr_avx2.c: ./x86_64_intrinsic.h:64:28: note: expanded from macro 'SUB_I16'
gf2x_ksqr_avx2.c: # define SUB_I16(a, b) _mm256_sub_epi16(a, b)
gf2x_ksqr_avx2.c: ^
gf2x_ksqr_avx2.c: gf2x_ksqr_avx2.c:67:17: error: always_inline function '_mm256_add_epi16' requires target feature 'avx2', but would be inlined into function 'generate_map' that is compiled without support for 'avx2'
gf2x_ksqr_avx2.c: vmap[j] = ADD_I16(vmap[j], inc);
gf2x_ksqr_avx2.c: ^
gf2x_ksqr_avx2.c: ./x86_64_intrinsic.h:63:28: note: expanded from macro 'ADD_I16'
gf2x_ksqr_avx2.c: # define ADD_I16(a, b) _mm256_add_epi16(a, b)
gf2x_ksqr_avx2.c: ^
gf2x_ksqr_avx2.c: gf2x_ksqr_avx2.c:68:17: error: always_inline function '_mm256_cmpgt_epi16' requires target feature 'avx2', but would be inlined into function 'generate_map' that is compiled without support for 'avx2'
gf2x_ksqr_avx2.c: vtmp[j] = CMPGT_I16(zero, vmap[j]);
gf2x_ksqr_avx2.c: ^
gf2x_ksqr_avx2.c: ./x86_64_intrinsic.h:70:27: note: expanded from macro 'CMPGT_I16'
gf2x_ksqr_avx2.c: # define CMPGT_I16(a, b) _mm256_cmpgt_epi16(a, b)
gf2x_ksqr_avx2.c: ^
gf2x_ksqr_avx2.c: gf2x_ksqr_avx2.c:69:17: error: always_inline function '_mm256_add_epi16' requires target feature 'avx2', but would be inlined into function 'generate_map' that is compiled without support for 'avx2'
gf2x_ksqr_avx2.c: vmap[j] = ADD_I16(vmap[j], vtmp[j] & vr);
gf2x_ksqr_avx2.c: ^
gf2x_ksqr_avx2.c: ./x86_64_intrinsic.h:63:28: note: expanded from macro 'ADD_I16'
gf2x_ksqr_avx2.c: # define ADD_I16(a, b) _mm256_add_epi16(a, b)
gf2x_ksqr_avx2.c: ^
gf2x_ksqr_avx2.c: 4 errors generated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
gf2x_ksqr_avx2.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:47,
gf2x_ksqr_avx2.c: from x86_64_intrinsic.h:20,
gf2x_ksqr_avx2.c: from defs.h:103,
gf2x_ksqr_avx2.c: from bike_defs.h:10,
gf2x_ksqr_avx2.c: from types.h:13,
gf2x_ksqr_avx2.c: from utilities.h:13,
gf2x_ksqr_avx2.c: from cleanup.h:10,
gf2x_ksqr_avx2.c: from gf2x_ksqr_avx2.c:13:
gf2x_ksqr_avx2.c: gf2x_ksqr_avx2.c: In function 'bytes_to_bin':
gf2x_ksqr_avx2.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/avx2intrin.h:433:1: error: inlining failed in call to 'always_inline' '_mm256_movemask_epi8': target specific option mismatch
gf2x_ksqr_avx2.c: 433 | _mm256_movemask_epi8 (__m256i __A)
gf2x_ksqr_avx2.c: | ^~~~~~~~~~~~~~~~~~~~
gf2x_ksqr_avx2.c: In file included from defs.h:103,
gf2x_ksqr_avx2.c: from bike_defs.h:10,
gf2x_ksqr_avx2.c: from types.h:13,
gf2x_ksqr_avx2.c: from utilities.h:13,
gf2x_ksqr_avx2.c: from cleanup.h:10,
gf2x_ksqr_avx2.c: from gf2x_ksqr_avx2.c:13:
gf2x_ksqr_avx2.c: x86_64_intrinsic.h:79:23: note: called from here
gf2x_ksqr_avx2.c: 79 | # define MOVEMASK(a) _mm256_movemask_epi8(a)
gf2x_ksqr_avx2.c: | ^~~~~~~~~~~~~~~~~~~~~~~
gf2x_ksqr_avx2.c: gf2x_ksqr_avx2.c:85:17: note: in expansion of macro 'MOVEMASK'
gf2x_ksqr_avx2.c: 85 | bin32[i] = MOVEMASK(t);
gf2x_ksqr_avx2.c: | ^~~~~~~~
gf2x_ksqr_avx2.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:47,
gf2x_ksqr_avx2.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx512
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:22:10: error: always_inline function '_mm512_loadu_si512' requires target feature 'avx512f', but would be inlined into function 'gf2x_mod_add' that is compiled without support for 'avx512f'
decode.c: va = LOAD(&a_qwords[i]);
decode.c: ^
decode.c: ./x86_64_intrinsic.h:40:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm512_loadu_si512((mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:22:10: error: AVX vector return of type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled changes the ABI
decode.c: ./x86_64_intrinsic.h:40:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm512_loadu_si512((mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:23:10: error: always_inline function '_mm512_loadu_si512' requires target feature 'avx512f', but would be inlined into function 'gf2x_mod_add' that is compiled without support for 'avx512f'
decode.c: vb = LOAD(&b_qwords[i]);
decode.c: ^
decode.c: ./x86_64_intrinsic.h:40:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm512_loadu_si512((mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ./gf2x.h:23:10: error: AVX vector return of type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled changes the ABI
decode.c: ./x86_64_intrinsic.h:40:27: note: expanded from macro 'LOAD'
decode.c: # define LOAD(mem) _mm512_loadu_si512((mem))
decode.c: ^
decode.c: In file included from decode.c:39:
decode.c: ...

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512-vpclmul
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512-vpclmul
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512-vpclmul
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512-vpclmul

Compiler output

Implementation: T:avx512
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
decode.c: In file included from decode.c:39:
decode.c: gf2x.h: In function 'gf2x_mod_add':
decode.c: gf2x.h:22:8: warning: AVX512F vector return without AVX512F enabled changes the ABI [-Wpsabi]
decode.c: 22 | va = LOAD(&a_qwords[i]);
decode.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:49,
decode.c: from x86_64_intrinsic.h:20,
decode.c: from defs.h:103,
decode.c: from bike_defs.h:10,
decode.c: from types.h:13,
decode.c: from decode.h:10,
decode.c: from decode.c:37:
decode.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/avx512fintrin.h:6481:1: error: inlining failed in call to 'always_inline' '_mm512_storeu_si512': target specific option mismatch
decode.c: 6481 | _mm512_storeu_si512 (void *__P, __m512i __A)
decode.c: | ^~~~~~~~~~~~~~~~~~~
decode.c: In file included from defs.h:103,
decode.c: from bike_defs.h:10,
decode.c: from types.h:13,
decode.c: from decode.h:10,
decode.c: from decode.c:37:
decode.c: x86_64_intrinsic.h:41:27: note: called from here
decode.c: 41 | # define STORE(mem, reg) _mm512_storeu_si512((mem), (reg))
decode.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
decode.c: gf2x.h:25:5: note: in expansion of macro 'STORE'
decode.c: 25 | STORE(&c_qwords[i], va ^ vb);
decode.c: | ^~~~~
decode.c: ...

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512-vpclmul
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512-vpclmul
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512-vpclmul
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512-vpclmul

Compiler output

Implementation: T:ches2021
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
decode.c: decode.c:358:8: error: '__builtin_ia32_permti256' needs target feature avx2
decode.c: x = _mm256_permute2x128_si256(buf[j], buf[j+128], 0x20);
decode.c: ^
decode.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:821:13: note: expanded from macro '_mm256_permute2x128_si256'
decode.c: ((__m256i)__builtin_ia32_permti256((__m256i)(V1), (__m256i)(V2), (int)(M)))
decode.c: ^
decode.c: decode.c:359:8: error: '__builtin_ia32_permti256' needs target feature avx2
decode.c: y = _mm256_permute2x128_si256(buf[j], buf[j+128], 0x31);
decode.c: ^
decode.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:821:13: note: expanded from macro '_mm256_permute2x128_si256'
decode.c: ((__m256i)__builtin_ia32_permti256((__m256i)(V1), (__m256i)(V2), (int)(M)))
decode.c: ^
decode.c: decode.c:366:8: error: always_inline function '_mm256_unpacklo_epi64' requires target feature 'avx2', but would be inlined into function 'find_err1' that is compiled without support for 'avx2'
decode.c: x = _mm256_unpacklo_epi64(buf[j], buf[j+64]);
decode.c: ^
decode.c: decode.c:367:8: error: always_inline function '_mm256_unpackhi_epi64' requires target feature 'avx2', but would be inlined into function 'find_err1' that is compiled without support for 'avx2'
decode.c: y = _mm256_unpackhi_epi64(buf[j], buf[j+64]);
decode.c: ^
decode.c: decode.c:374:8: error: always_inline function '_mm256_unpacklo_epi64' requires target feature 'avx2', but would be inlined into function 'find_err1' that is compiled without support for 'avx2'
decode.c: x = _mm256_unpacklo_epi64(buf[j], buf[j+64]);
decode.c: ^
decode.c: decode.c:375:8: error: always_inline function '_mm256_unpackhi_epi64' requires target feature 'avx2', but would be inlined into function 'find_err1' that is compiled without support for 'avx2'
decode.c: y = _mm256_unpackhi_epi64(buf[j], buf[j+64]);
decode.c: ^
decode.c: decode.c:388:14: error: '__builtin_ia32_permdi256' needs target feature avx2
decode.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ches2021
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ches2021
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ches2021
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ches2021

Compiler output

Implementation: T:ches2021
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
decode.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:47,
decode.c: from x86_64_intrinsic.h:20,
decode.c: from defs.h:106,
decode.c: from bike_defs.h:10,
decode.c: from types.h:15,
decode.c: from decode.h:17,
decode.c: from decode.c:39:
decode.c: decode.c: In function 'find_err1':
decode.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/avx2intrin.h:1084:1: error: inlining failed in call to 'always_inline' '_mm256_permute2x128_si256': target specific option mismatch
decode.c: 1084 | _mm256_permute2x128_si256 (__m256i __X, __m256i __Y, const int __M)
decode.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~
decode.c: decode.c:359:29: note: called from here
decode.c: 359 | y = _mm256_permute2x128_si256(buf[j], buf[j+128], 0x31);
decode.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
decode.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:47,
decode.c: from x86_64_intrinsic.h:20,
decode.c: from defs.h:106,
decode.c: from bike_defs.h:10,
decode.c: from types.h:15,
decode.c: from decode.h:17,
decode.c: from decode.c:39:
decode.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/avx2intrin.h:1084:1: error: inlining failed in call to 'always_inline' '_mm256_permute2x128_si256': target specific option mismatch
decode.c: 1084 | _mm256_permute2x128_si256 (__m256i __X, __m256i __Y, const int __M)
decode.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~
decode.c: decode.c:358:29: note: called from here
decode.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ches2021
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ches2021
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ches2021
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ches2021