Implementation notes: amd64, wooden, crypto_sign/rainbow5ccyclicc963664

Computer: wooden
Microarchitecture: amd64; Goldmont (506c9)
Architecture: amd64
CPU ID: GenuineIntel-000506c9-1fc9cbf5
SUPERCOP version: 20240107
Operation: crypto_sign
Primitive: rainbow5ccyclicc963664
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
71145240194941 8 52244895 880 1664T:ssse3gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010920231215
7276879699166 8 52166982 896 1632T:ssse3clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010920231215
72903796121814 8 52188598 896 1632T:ssse3clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010920231215
7309164477781 8 52145524 888 1632T:ssse3clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010920231215
7316542880099 8 52148871 880 1664T:ssse3gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010920231215
73334316102180 8 52169967 880 1664T:ssse3gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010920231215
7396650661159 8 52134412 888 1632T:ssse3clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010920231215
7397733849514 8 52124455 872 1664T:ssse3gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010920231215
74666304164335 0 52245423 872 1664T:refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010920231215
74953136203100 0 52283927 872 1664T:amd64gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010920231215
7722845097262 0 52177030 888 1632T:amd64clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010920231215
77389146102719 0 52184702 888 1632T:amd64clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010920231215
77757660101563 0 52180414 888 1632T:refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010920231215
78016356109709 0 52190950 888 1632T:refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010920231215
78020452112896 0 52192854 888 1632T:refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010920231215
78066520105782 0 52186478 888 1632T:amd64clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010920231215
8070748276636 0 52156863 872 1664T:amd64gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010920231215
8153388863612 0 52143551 872 1664T:amd64gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010920231215
8206975860625 0 52139620 880 1632T:amd64clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010920231215
8799527648656 0 52128879 872 1664T:refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010920231215
8836669430543 0 52109719 864 1664T:amd64gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010920231215
8880195652401 0 52131380 880 1632T:refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010920231215
8903181647131 0 52127023 872 1664T:refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010920231215
9464048238677 0 52118852 880 1632T:amd64clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010920231215
10190940027414 0 52106407 864 1664T:refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010920231215
11499582034127 0 52113988 880 1632T:refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010920231215

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
blas_comm.c: In file included from blas_comm.c:6:
blas_comm.c: In file included from ./blas.h:25:
blas_comm.c: ./blas_avx2.h:88:17: error: always_inline function '_mm256_loadu_si256' requires target feature 'avx', but would be inlined into function 'gf256v_add_avx2' that is compiled without support for 'avx'
blas_comm.c: __m256i inp = _mm256_loadu_si256( (__m256i*) (a+i*32) );
blas_comm.c: ^
blas_comm.c: ./blas_avx2.h:88:17: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
blas_comm.c: ./blas_avx2.h:89:17: error: always_inline function '_mm256_loadu_si256' requires target feature 'avx', but would be inlined into function 'gf256v_add_avx2' that is compiled without support for 'avx'
blas_comm.c: __m256i out = _mm256_loadu_si256( (__m256i*) (accu_b+i*32) );
blas_comm.c: ^
blas_comm.c: ./blas_avx2.h:89:17: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
blas_comm.c: ./blas_avx2.h:91:3: error: always_inline function '_mm256_storeu_si256' requires target feature 'avx', but would be inlined into function 'gf256v_add_avx2' that is compiled without support for 'avx'
blas_comm.c: _mm256_storeu_si256( (__m256i*) (accu_b+i*32) , out );
blas_comm.c: ^
blas_comm.c: ./blas_avx2.h:91:3: error: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
blas_comm.c: 6 errors generated.

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blas_comm.c: In file included from blas_avx2.h:15,
blas_comm.c: from blas.h:25,
blas_comm.c: from blas_comm.c:6:
blas_comm.c: gf16_avx2.h: In function 'linear_transform_8x8_256b':
blas_comm.c: gf16_avx2.h:28:1: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
blas_comm.c: 28 | {
blas_comm.c: | ^
blas_comm.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/10/include/immintrin.h:51,
blas_comm.c: from blas_avx2.h:10,
blas_comm.c: from blas.h:25,
blas_comm.c: from blas_comm.c:6:
blas_comm.c: blas_avx2.h: In function 'gf256v_add_avx2':
blas_comm.c: /usr/lib/gcc/x86_64-linux-gnu/10/include/avxintrin.h:926:1: error: inlining failed in call to 'always_inline' '_mm256_storeu_si256': target specific option mismatch
blas_comm.c: 926 | _mm256_storeu_si256 (__m256i_u *__P, __m256i __A)
blas_comm.c: | ^~~~~~~~~~~~~~~~~~~
blas_comm.c: In file included from blas.h:25,
blas_comm.c: from blas_comm.c:6:
blas_comm.c: blas_avx2.h:91:3: note: called from here
blas_comm.c: 91 | _mm256_storeu_si256( (__m256i*) (accu_b+i*32) , out );
blas_comm.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blas_comm.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/10/include/immintrin.h:51,
blas_comm.c: from blas_avx2.h:10,
blas_comm.c: from blas.h:25,
blas_comm.c: from blas_comm.c:6:
blas_comm.c: /usr/lib/gcc/x86_64-linux-gnu/10/include/avxintrin.h:920:1: error: inlining failed in call to 'always_inline' '_mm256_loadu_si256': target specific option mismatch
blas_comm.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2

Compiler output

Implementation: T:ssse3
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
blas_matrix_ref.c: In file included from blas_matrix_ref.c:6:
blas_matrix_ref.c: In file included from ./blas.h:25:
blas_matrix_ref.c: In file included from ./blas_sse.h:16:
blas_matrix_ref.c: ./gf16_sse.h:34:9: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'linear_transform_8x8_128b' that is compiled without support for 'ssse3'
blas_matrix_ref.c: return _mm_shuffle_epi8(tab_l,v&mask_f)^_mm_shuffle_epi8(tab_h,_mm_srli_epi16(v,4)&mask_f);
blas_matrix_ref.c: ^
blas_matrix_ref.c: ./gf16_sse.h:34:42: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'linear_transform_8x8_128b' that is compiled without support for 'ssse3'
blas_matrix_ref.c: return _mm_shuffle_epi8(tab_l,v&mask_f)^_mm_shuffle_epi8(tab_h,_mm_srli_epi16(v,4)&mask_f);
blas_matrix_ref.c: ^
blas_matrix_ref.c: 2 errors generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ssse3