Implementation notes: amd64, h8bobcat, crypto_decode/1277x2627

Computer: h8bobcat
Microarchitecture: amd64; Bobcat (500f10)
Architecture: amd64
CPU ID: AuthenticAMD-00500f20-178bfbff
SUPERCOP version: 20240107
Operation: crypto_decode
Primitive: 1277x2627
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
335112719 0 013284 816 728int16clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
343242352 0 011342 808 728int16clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
377202545 0 012172 816 728int16clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
383042822 0 013085 768 800int16gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
386563699 0 015420 816 728int16clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
393163716 0 015652 816 728int16clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
407844481 0 016606 776 800int16gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
409262711 0 013334 776 800int16gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
415632678 0 011993 752 768int16gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
509727485 0 019436 816 728portableclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
510396034 0 016628 816 728portableclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
520517249 0 018980 816 728portableclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
659923395 0 013012 816 728portableclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
692313562 0 012566 808 728portableclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
717253836 0 015958 776 800portablegcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
754443280 0 013902 776 800portablegcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
2036801926 0 013982 776 800refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
2037841556 0 012132 816 728refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
2041212802 0 014740 816 728refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
2042982756 0 014476 816 728refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
2102072054 0 011385 752 768portablegcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
2102581618 0 012174 776 800refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
2147811184 0 010796 816 728refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
2163302098 0 012373 768 800portablegcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
2209791615 0 010582 808 728refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
2358521668 0 011853 768 800refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
2428581536 0 010729 752 768refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212

Compiler output

Implementation: avx
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
decode.c: decode.c:243:15: error: always_inline function '_mm256_loadu_si256' requires target feature 'avx', but would be inlined into function 'crypto_decode_1277x2627_avx_constbranchindex' that is compiled without support for 'avx'
decode.c: A2 = A0 = _mm256_loadu_si256((__m256i *) &R6[i]);
decode.c: ^
decode.c: decode.c:243:15: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
decode.c: decode.c:244:10: error: always_inline function '_mm256_loadu_si256' requires target feature 'avx', but would be inlined into function 'crypto_decode_1277x2627_avx_constbranchindex' that is compiled without support for 'avx'
decode.c: S0 = _mm256_loadu_si256((__m256i *) (s+2*i));
decode.c: ^
decode.c: decode.c:244:10: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
decode.c: decode.c:245:10: error: always_inline function '_mm256_srli_epi16' requires target feature 'avx2', but would be inlined into function 'crypto_decode_1277x2627_avx_constbranchindex' that is compiled without support for 'avx2'
decode.c: S1 = _mm256_srli_epi16(S0,8);
decode.c: ^
decode.c: decode.c:245:10: error: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
decode.c: decode.c:246:11: error: always_inline function '_mm256_set1_epi16' requires target feature 'avx', but would be inlined into function 'crypto_decode_1277x2627_avx_constbranchindex' that is compiled without support for 'avx'
decode.c: S0 &= _mm256_set1_epi16(255);
decode.c: ^
decode.c: decode.c:246:11: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
decode.c: decode.c:247:14: warning: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI [-Wpsabi]
decode.c: A0 = sub(mulhiconst(A0,-259),mulhiconst(mulloconst(A0,-5225),3211)); /* -1671...1605 */
decode.c: ^
decode.c: decode.c:247:45: warning: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI [-Wpsabi]
decode.c: A0 = sub(mulhiconst(A0,-259),mulhiconst(mulloconst(A0,-5225),3211)); /* -1671...1605 */
decode.c: ^
decode.c: decode.c:247:34: warning: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI [-Wpsabi]
decode.c: A0 = sub(mulhiconst(A0,-259),mulhiconst(mulloconst(A0,-5225),3211)); /* -1671...1605 */
decode.c: ^
decode.c: ...

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx

Compiler output

Implementation: avx
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
decode.c: decode.c: In function 'add':
decode.c: decode.c:21:1: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
decode.c: 21 | {
decode.c: | ^
decode.c: decode.c: In function 'signedshiftrightconst':
decode.c: decode.c:35:23: note: the ABI for passing parameters with 32-byte alignment has changed in GCC 4.6
decode.c: 35 | static inline __m256i signedshiftrightconst(__m256i x,int16 y)
decode.c: | ^~~~~~~~~~~~~~~~~~~~~
decode.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:47,
decode.c: from decode.c:3:
decode.c: decode.c: In function 'add':
decode.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/avx2intrin.h:112:1: error: inlining failed in call to 'always_inline' '_mm256_add_epi16': target specific option mismatch
decode.c: 112 | _mm256_add_epi16 (__m256i __A, __m256i __B)
decode.c: | ^~~~~~~~~~~~~~~~
decode.c: decode.c:22:10: note: called from here
decode.c: 22 | return _mm256_add_epi16(x,y);
decode.c: | ^~~~~~~~~~~~~~~~~~~~~

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE avx
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE avx
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE avx