Implementation notes: amd64, saber214, crypto_sign/luov863256

Computer: saber214
Microarchitecture: amd64; Bulldozer (600f20)
Architecture: amd64
CPU ID: AuthenticAMD-00600f20-1789c3f5
SUPERCOP version: 20240625
Operation: crypto_sign
Primitive: luov863256
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
18606777551845 0 069975 816 1632T:portablegcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024070220240625
20000108550581 0 068591 816 1632T:portablegcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024070220240625
21425478559002 0 076711 816 1632T:portablegcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024070220240625
129006633418352 36 0197527 816 1632T:refgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024070220240625
130750581412683 36 0193407 816 1632T:refgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024070220240625
138605541412003 36 0192543 816 1632T:refgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024070220240625

Test failure


error 111

Number of similar (implementation,compiler) pairs: 12, namely:
ImplementationCompiler
T:portableclang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:portableclang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:portableclang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:portableclang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:portableclang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:portablegcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:refclang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refgcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)

Compiler output


LUOV.c: LUOV.c:38:19: error: '__builtin_ia32_permdi256' needs target feature avx2
LUOV.c:                         __m256i rrrr = _mm256_permute4x64_epi64(_mm256_loadu_si256((__m256i *)&Q1[col++]),0);
LUOV.c:                                        ^
LUOV.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:818:13: note: expanded from macro '_mm256_permute4x64_epi64'
LUOV.c:   ((__m256i)__builtin_ia32_permdi256((__v4di)(__m256i)(V), (int)(M)))
LUOV.c:             ^
LUOV.c: LUOV.c:44:10: error: always_inline function '_mm256_slli_epi64' requires target feature 'avx2', but would be inlined into function 'calculateQ2' that is compiled without support for 'avx2'
LUOV.c:                                 TJ = _mm256_slli_epi64(TJ,4);
LUOV.c:                                      ^
LUOV.c: LUOV.c:47:10: error: always_inline function '_mm256_slli_epi64' requires target feature 'avx2', but would be inlined into function 'calculateQ2' that is compiled without support for 'avx2'
LUOV.c:                                 TJ = _mm256_slli_epi64(TJ,4);
LUOV.c:                                      ^
LUOV.c: LUOV.c:60:19: error: '__builtin_ia32_permdi256' needs target feature avx2
LUOV.c:                         __m256i rrrr = _mm256_permute4x64_epi64(_mm256_loadu_si256((__m256i *)&TempMat[j][i]),0);
LUOV.c:                                        ^
LUOV.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:818:13: note: expanded from macro '_mm256_permute4x64_epi64'
LUOV.c:   ((__m256i)__builtin_ia32_permdi256((__v4di)(__m256i)(V), (int)(M)))
LUOV.c:             ^
LUOV.c: LUOV.c:66:10: error: always_inline function '_mm256_slli_epi64' requires target feature 'avx2', but would be inlined into function 'calculateQ2' that is compiled without support for 'avx2'
LUOV.c:                                 TJ = _mm256_slli_epi64(TJ,4);
LUOV.c:                                      ^
LUOV.c: LUOV.c:69:10: error: always_inline function '_mm256_slli_epi64' requires target feature 'avx2', but would be inlined into function 'calculateQ2' that is compiled without support for 'avx2'
LUOV.c:                                 TJ = _mm256_slli_epi64(TJ,4);
LUOV.c:                                      ^
LUOV.c: 6 errors generated.

Number of similar (implementation,compiler) pairs: 4, namely:
ImplementationCompiler
T:avx2clang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:avx2clang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:avx2clang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:avx2clang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)

Compiler output


LUOV.c: LUOV.c:38:19: error: '__builtin_ia32_permdi256' needs target feature avx2
LUOV.c:                         __m256i rrrr = _mm256_permute4x64_epi64(_mm256_loadu_si256((__m256i *)&Q1[col++]),0);
LUOV.c:                                        ^
LUOV.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:818:13: note: expanded from macro '_mm256_permute4x64_epi64'
LUOV.c:   ((__m256i)__builtin_ia32_permdi256((__v4di)(__m256i)(V), (int)(M)))
LUOV.c:             ^
LUOV.c: LUOV.c:38:44: error: always_inline function '_mm256_loadu_si256' requires target feature 'avx', but would be inlined into function 'calculateQ2' that is compiled without support for 'avx'
LUOV.c:                         __m256i rrrr = _mm256_permute4x64_epi64(_mm256_loadu_si256((__m256i *)&Q1[col++]),0);
LUOV.c:                                                                 ^
LUOV.c: LUOV.c:38:44: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
LUOV.c: LUOV.c:43:69: error: always_inline function '_mm256_setzero_pd' requires target feature 'avx', but would be inlined into function 'calculateQ2' that is compiled without support for 'avx'
LUOV.c:                                 *((__m256i *)&TempMat[i][k*8+4]) ^=  (__m256i) _mm256_blendv_pd(_mm256_setzero_pd(),(__m256d) rrrr,(__m256d)TJ);
LUOV.c:                                                                                                 ^
LUOV.c: LUOV.c:43:69: error: AVX vector return of type '__m256d' (vector of 4 'double' values) without 'avx' enabled changes the ABI
LUOV.c: LUOV.c:43:52: error: always_inline function '_mm256_blendv_pd' requires target feature 'avx', but would be inlined into function 'calculateQ2' that is compiled without support for 'avx'
LUOV.c:                                 *((__m256i *)&TempMat[i][k*8+4]) ^=  (__m256i) _mm256_blendv_pd(_mm256_setzero_pd(),(__m256d) rrrr,(__m256d)TJ);
LUOV.c:                                                                                ^
LUOV.c: LUOV.c:43:52: error: AVX vector argument of type '__m256d' (vector of 4 'double' values) without 'avx' enabled changes the ABI
LUOV.c: LUOV.c:44:10: error: always_inline function '_mm256_slli_epi64' requires target feature 'avx2', but would be inlined into function 'calculateQ2' that is compiled without support for 'avx2'
LUOV.c:                                 TJ = _mm256_slli_epi64(TJ,4);
LUOV.c:                                      ^
LUOV.c: LUOV.c:44:10: error: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
LUOV.c: LUOV.c:46:66: error: always_inline function '_mm256_setzero_pd' requires target feature 'avx', but would be inlined into function 'calculateQ2' that is compiled without support for 'avx'
LUOV.c:                                 *((__m256i *)&TempMat[i][k*8]) ^= (__m256i) _mm256_blendv_pd(_mm256_setzero_pd(),(__m256d) rrrr,(__m256d)TJ);
LUOV.c:                                                                                              ^
LUOV.c: ...

Number of similar (implementation,compiler) pairs: 1, namely:
ImplementationCompiler
T:avx2clang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)

Compiler output


LUOV.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:47,
LUOV.c:                  from LUOV.h:7,
LUOV.c:                  from LUOV.c:1:
LUOV.c: AVX_Operations.h: In function 'addScalarProductAVX':
LUOV.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/avx2intrin.h:186:1: error: inlining failed in call to 'always_inline' '_mm256_andnot_si256': target specific option mismatch
LUOV.c:   186 | _mm256_andnot_si256 (__m256i __A, __m256i __B)
LUOV.c:       | ^~~~~~~~~~~~~~~~~~~
LUOV.c: In file included from LinearAlgebra.h:9,
LUOV.c:                  from LUOV.h:13,
LUOV.c:                  from LUOV.c:1:
LUOV.c: AVX_Operations.h:73:16: note: called from here
LUOV.c:    73 |         avx2 = _mm256_andnot_si256(avx2,aa);
LUOV.c:       |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
LUOV.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:47,
LUOV.c:                  from LUOV.h:7,
LUOV.c:                  from LUOV.c:1:
LUOV.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/avx2intrin.h:231:1: error: inlining failed in call to 'always_inline' '_mm256_cmpeq_epi8': target specific option mismatch
LUOV.c:   231 | _mm256_cmpeq_epi8 (__m256i __A, __m256i __B)
LUOV.c:       | ^~~~~~~~~~~~~~~~~
LUOV.c: In file included from LinearAlgebra.h:9,
LUOV.c:                  from LUOV.h:13,
LUOV.c:                  from LUOV.c:1:
LUOV.c: AVX_Operations.h:72:16: note: called from here
LUOV.c:    72 |         avx2 = _mm256_cmpeq_epi8(avx2,_mm256_setzero_si256());
LUOV.c:       |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LUOV.c: ...

Number of similar (implementation,compiler) pairs: 4, namely:
ImplementationCompiler
T:avx2gcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:avx2gcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:avx2gcc -march=native -mtune=native -O -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:avx2gcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)

Compiler output


F64Field.c: F64Field.c: In function 'f64addInPlace':
F64Field.c: F64Field.c:43:11: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
F64Field.c:    43 |         *((uint64_t *) a->coef) ^= *((uint64_t *) b->coef);
F64Field.c:       |          ~^~~~~~~~~~~~~~~~~~~~~
F64Field.c: F64Field.c:43:38: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
F64Field.c:    43 |         *((uint64_t *) a->coef) ^= *((uint64_t *) b->coef);
F64Field.c:       |                                     ~^~~~~~~~~~~~~~~~~~~~~
F80Field.c: F80Field.c: In function 'f80addInPlace':
F80Field.c: F80Field.c:55:11: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
F80Field.c:    55 |         *((uint64_t *) a->coef) ^= *((uint64_t *) b->coef);
F80Field.c:       |          ~^~~~~~~~~~~~~~~~~~~~~
F80Field.c: F80Field.c:55:38: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
F80Field.c:    55 |         *((uint64_t *) a->coef) ^= *((uint64_t *) b->coef);
F80Field.c:       |                                     ~^~~~~~~~~~~~~~~~~~~~~

Number of similar (implementation,compiler) pairs: 3, namely:
ImplementationCompiler
T:portablegcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:portablegcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:portablegcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)