Implementation notes: amd64, margaux, crypto_sign/rainbow3ccyclicc683248

Computer: margaux
Microarchitecture: amd64; Core 2 65nm (6fb)
Architecture: amd64
CPU ID: GenuineIntel-000006fb-bfebfbff
SUPERCOP version: 20231107
Operation: crypto_sign
Primitive: rainbow3ccyclicc683248
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
169140833149198 8 52208940 956 1792T:ssse3clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023111920231107
169546744108613 8 52173852 956 1792T:ssse3clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023111920231107
169746281193068 8 52237372 940 1824T:ssse3gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111920231107
17068857274908 8 52142486 948 1792T:ssse3clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023111920231107
170689212159401 0 52237900 932 1824T:refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111920231107
17154562285788 8 52153396 956 1792T:ssse3clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023111920231107
17170328555354 8 52126732 932 1824T:ssse3gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111920231107
172173109101747 8 52166572 940 1824T:ssse3gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111920231107
17234851167905 0 52144492 948 1792T:amd64clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023111920231107
17242589291773 0 52168980 948 1792T:amd64clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023111920231107
17274836575054 0 52152876 948 1792T:refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023111920231107
17346207957474 0 52134660 948 1792T:refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023111920231107
17375666284141 8 52149684 940 1824T:ssse3gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111920231107
17388522471079 0 52148676 932 1824T:amd64gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111920231107
17426313046780 0 52123660 948 1792T:amd64clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023111920231107
17508574494289 0 52171556 948 1792T:amd64clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023111920231107
175781011201488 0 52280028 932 1824T:amd64gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111920231107
17628754740563 0 52116782 940 1792T:amd64clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023111920231107
17781264841652 0 52118468 948 1792T:refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023111920231107
17844940462240 0 52139452 932 1824T:amd64gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111920231107
17874587433372 0 52109764 924 1824T:amd64gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111920231107
17913281649863 0 52126972 932 1824T:refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111920231107
17917611272435 0 52150180 948 1792T:refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023111920231107
18021534435547 0 52111710 940 1792T:refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023111920231107
18107287148708 0 52126284 932 1824T:refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111920231107
18565623230262 0 52106508 924 1824T:refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023111920231107

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
blas_comm.c: In file included from blas_comm.c:6:
blas_comm.c: In file included from ./blas.h:25:
blas_comm.c: ./blas_avx2.h:88:17: error: always_inline function '_mm256_loadu_si256' requires target feature 'avx', but would be inlined into function 'gf256v_add_avx2' that is compiled without support for 'avx'
blas_comm.c: __m256i inp = _mm256_loadu_si256( (__m256i*) (a+i*32) );
blas_comm.c: ^
blas_comm.c: ./blas_avx2.h:88:17: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
blas_comm.c: ./blas_avx2.h:89:17: error: always_inline function '_mm256_loadu_si256' requires target feature 'avx', but would be inlined into function 'gf256v_add_avx2' that is compiled without support for 'avx'
blas_comm.c: __m256i out = _mm256_loadu_si256( (__m256i*) (accu_b+i*32) );
blas_comm.c: ^
blas_comm.c: ./blas_avx2.h:89:17: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
blas_comm.c: ./blas_avx2.h:91:3: error: always_inline function '_mm256_storeu_si256' requires target feature 'avx', but would be inlined into function 'gf256v_add_avx2' that is compiled without support for 'avx'
blas_comm.c: _mm256_storeu_si256( (__m256i*) (accu_b+i*32) , out );
blas_comm.c: ^
blas_comm.c: ./blas_avx2.h:91:3: error: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
blas_comm.c: 6 errors generated.

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
blas_comm.c: In file included from blas_avx2.h:15,
blas_comm.c: from blas.h:25,
blas_comm.c: from blas_comm.c:6:
blas_comm.c: gf16_avx2.h: In function 'linear_transform_8x8_256b':
blas_comm.c: gf16_avx2.h:28:1: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
blas_comm.c: 28 | {
blas_comm.c: | ^
blas_comm.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:43,
blas_comm.c: from blas_avx2.h:10,
blas_comm.c: from blas.h:25,
blas_comm.c: from blas_comm.c:6:
blas_comm.c: blas_avx2.h: In function 'gf256v_add_avx2':
blas_comm.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/avxintrin.h:933:1: error: inlining failed in call to 'always_inline' '_mm256_storeu_si256': target specific option mismatch
blas_comm.c: 933 | _mm256_storeu_si256 (__m256i_u *__P, __m256i __A)
blas_comm.c: | ^~~~~~~~~~~~~~~~~~~
blas_comm.c: In file included from blas.h:25,
blas_comm.c: from blas_comm.c:6:
blas_comm.c: blas_avx2.h:91:17: note: called from here
blas_comm.c: 91 | _mm256_storeu_si256( (__m256i*) (accu_b+i*32) , out );
blas_comm.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blas_comm.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:43,
blas_comm.c: from blas_avx2.h:10,
blas_comm.c: from blas.h:25,
blas_comm.c: from blas_comm.c:6:
blas_comm.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/avxintrin.h:927:1: error: inlining failed in call to 'always_inline' '_mm256_loadu_si256': target specific option mismatch
blas_comm.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2

Compiler output

Implementation: T:ssse3
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
blas_matrix_ref.c: In file included from blas_matrix_ref.c:6:
blas_matrix_ref.c: In file included from ./blas.h:25:
blas_matrix_ref.c: In file included from ./blas_sse.h:16:
blas_matrix_ref.c: ./gf16_sse.h:34:9: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'linear_transform_8x8_128b' that is compiled without support for 'ssse3'
blas_matrix_ref.c: return _mm_shuffle_epi8(tab_l,v&mask_f)^_mm_shuffle_epi8(tab_h,_mm_srli_epi16(v,4)&mask_f);
blas_matrix_ref.c: ^
blas_matrix_ref.c: ./gf16_sse.h:34:42: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'linear_transform_8x8_128b' that is compiled without support for 'ssse3'
blas_matrix_ref.c: return _mm_shuffle_epi8(tab_l,v&mask_f)^_mm_shuffle_epi8(tab_h,_mm_srli_epi16(v,4)&mask_f);
blas_matrix_ref.c: ^
blas_matrix_ref.c: 2 errors generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ssse3