Implementation notes: amd64, h8bobcat, crypto_hash/blake256

Computer: h8bobcat
Microarchitecture: amd64; Bobcat (500f10)
Architecture: amd64
CPU ID: AuthenticAMD-00500f20-178bfbff
SUPERCOP version: 20231107
Operation: crypto_hash
Primitive: blake256
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
2916524955 0 037758 776 800sphlibgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
2918924645 0 035958 776 800sphlibgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
2993424434 0 035516 816 728sphlibclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
3019524466 0 036692 816 728sphlibclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
3021524466 0 034708 816 728sphlibclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
3023410144 0 020949 768 800bswapgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
3044725074 0 037220 816 728sphlibclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
305529883 0 021988 816 728bswapclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
305819507 0 019628 816 728regsclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
3058110104 0 019881 752 768regsgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
3060010104 0 019881 752 768bswapgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
306099947 0 022052 816 728regsclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
3061410056 0 022084 816 728bswapclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
3063710072 0 022100 816 728regsclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
307139451 0 020412 816 728bswapclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
307339539 0 020500 816 728regsclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
3075623506 0 034454 776 800sphlibgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
307999451 0 019572 816 728bswapclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
3139811081 0 023822 776 800bswapgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
3140710631 0 021878 776 800bswapgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
3140710631 0 021878 776 800regsgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
3149711690 0 024430 776 800regsgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
3191110787 0 021589 768 800regsgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
3215322923 0 032793 752 768sphlibgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
3237124723 0 034342 808 728sphlibclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
325289816 0 019318 808 728bswapclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
326479882 0 019382 808 728regsclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
346377670 0 019700 816 728sse2clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
350318914 0 021646 776 800ssse3gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
363958063 0 019310 776 800ssse3gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
368509714 0 022446 776 800sse2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
368658157 0 018957 768 800ssse3gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
369698132 0 020164 816 728sse2-2clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
378488847 0 020094 776 800sse2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
383289104 0 019901 768 800sse2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
394208165 0 018284 816 728sse2-2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
398867744 0 018708 816 728sse2-2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
4000410482 0 023214 776 800sse2-2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
400478152 0 020260 816 728sse2-2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
400527657 0 017158 808 728sse2-2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
407509567 0 020814 776 800sse2-2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
410597577 0 017449 752 768sphlib-smallgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
412077831 0 019940 816 728sse2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
412117423 0 018388 816 728sse2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
412167776 0 017900 816 728sse2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
414117336 0 016838 808 728sse2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
414159776 0 020573 768 800sse2-2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
417918092 0 019070 776 800sphlib-smallgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
422617027 0 016809 752 768ssse3gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
440758199 0 017977 752 768sse2-2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
445037545 0 017321 752 768sse2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
476142633 0 012409 752 768refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
497283048 0 013845 768 800refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
539123323 0 015348 816 728refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
551383128 0 015228 816 728refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
551488658 0 018916 816 728sphlib-smallclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
560592608 0 013564 816 728refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
560938626 0 019724 816 728sphlib-smallclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
5634411013 0 022358 776 800sphlib-smallgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
566208706 0 020948 816 728sphlib-smallclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
5665411419 0 024222 776 800sphlib-smallgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
570528433 0 018070 808 728sphlib-smallclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
574802716 0 012844 816 728refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
582978722 0 020884 816 728sphlib-smallclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
6096210328 0 022356 816 728sandyclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
610712624 0 012118 808 728refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
614185162 0 017886 776 800refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
6157410891 0 020665 752 768sandygcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
6262410290 0 020420 816 728sandyclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
6265710290 0 021260 816 728sandyclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
6270010722 0 022836 816 728sandyclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
6311311351 0 022598 776 800sandygcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
6348911849 0 024590 776 800sandygcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
643013703 0 014950 776 800refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107
6432910472 0 019974 808 728sandyclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023052220221122
6643810906 0 021701 768 800sandygcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023110920231107

Test failure

Implementation: avxs
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
error 111

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avxs
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avxs
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avxs
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avxs
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE avxs
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE avxs
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE avxs
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE avxs

Compiler output

Implementation: avxs
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: hash.c:155:61: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_final' that is compiled without support for 'ssse3'
hash.c: __m128i w0 = _mm_load_si128((__m128i*)(&S->h[0])); w0 = _mm_shuffle_epi8(w0, u32to8);
hash.c: ^
hash.c: hash.c:156:61: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_final' that is compiled without support for 'ssse3'
hash.c: __m128i w1 = _mm_load_si128((__m128i*)(&S->h[4])); w1 = _mm_shuffle_epi8(w1, u32to8);
hash.c: ^
hash.c: 2 errors generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avxs

Compiler output

Implementation: sse41
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: In file included from hash.c:121:
hash.c: ./rounds.sse41.h:17:55: warning: implicit conversion from 'long' to 'int' changes value from 2242054355 to -2052912941 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3964562569, 698298832, 57701188, 2242054355);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.sse41.h:17:22: warning: implicit conversion from 'long' to 'int' changes value from 3964562569 to -330404727 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3964562569, 698298832, 57701188, 2242054355);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.sse41.h:20:33: warning: implicit conversion from 'long' to 'int' changes value from 2752067618 to -1542899678 [-Wconstant-conversion]
hash.c: buf1 = _mm_set_epi32(137296536, 2752067618, 320440878, 608135816);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.sse41.h:47:34: warning: implicit conversion from 'long' to 'int' changes value from 3380367581 to -914599715 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3041331479, 3380367581, 887688300, 953160567);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.sse41.h:47:22: warning: implicit conversion from 'long' to 'int' changes value from 3041331479 to -1253635817 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3041331479, 3380367581, 887688300, 953160567);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.sse41.h:50:46: warning: implicit conversion from 'long' to 'int' changes value from 3193202383 to -1101764913 [-Wconstant-conversion]
hash.c: buf1 = _mm_set_epi32(1065670069, 3232508343, 3193202383, 1160258022);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.sse41.h:50:34: warning: implicit conversion from 'long' to 'int' changes value from 3232508343 to -1062458953 [-Wconstant-conversion]
hash.c: buf1 = _mm_set_epi32(1065670069, 3232508343, 3193202383, 1160258022);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.sse41.h:81:57: warning: implicit conversion from 'long' to 'int' changes value from 3193202383 to -1101764913 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(137296536, 3041331479, 1160258022, 3193202383);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ...

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41

Compiler output

Implementation: sse41
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
hash.c: In file included from hash.c:5:
hash.c: rounds.sse41.h: In function 'blake256_compress':
hash.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/smmintrin.h:166:1: error: inlining failed in call to 'always_inline' '_mm_blend_epi16': target specific option mismatch
hash.c: 166 | _mm_blend_epi16 (__m128i __X, __m128i __Y, const int __M)
hash.c: | ^~~~~~~~~~~~~~~
hash.c: In file included from hash.c:121:
hash.c: rounds.sse41.h:881:8: note: called from here
hash.c: 881 | tmp1 = _mm_blend_epi16(tmp0, m3, 0xC0);
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: In file included from hash.c:5:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/smmintrin.h:166:1: error: inlining failed in call to 'always_inline' '_mm_blend_epi16': target specific option mismatch
hash.c: 166 | _mm_blend_epi16 (__m128i __X, __m128i __Y, const int __M)
hash.c: | ^~~~~~~~~~~~~~~
hash.c: In file included from hash.c:121:
hash.c: rounds.sse41.h:880:8: note: called from here
hash.c: 880 | tmp0 = _mm_blend_epi16(m0,m1,0x0F);
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: In file included from hash.c:5:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/smmintrin.h:166:1: error: inlining failed in call to 'always_inline' '_mm_blend_epi16': target specific option mismatch
hash.c: 166 | _mm_blend_epi16 (__m128i __X, __m128i __Y, const int __M)
hash.c: | ^~~~~~~~~~~~~~~
hash.c: In file included from hash.c:121:
hash.c: rounds.sse41.h:852:8: note: called from here
hash.c: 852 | tmp6 = _mm_blend_epi16(tmp5, tmp4, 0xC0);
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41

Compiler output

Implementation: sse41-2
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: In file included from hash.c:2:
hash.c: ./blake256.h:105:15: warning: '_mm_roti_epi32' macro redefined [-Wmacro-redefined]
hash.c: #define _mm_roti_epi32(r, c) ((8==-c) ? _mm_shuffle_epi8(r,r8) : ( (16==-c) ? _mm_shuffle_epi8(r,r16) : _mm_xor_si128(_mm_srli_epi32( (r), -(c) ),_mm_slli_epi32( (r), 32-(-c) )) ) )
hash.c: ^
hash.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/xopintrin.h:233:9: note: previous definition is here
hash.c: #define _mm_roti_epi32(A, N) \
hash.c: ^
hash.c: hash.c:116:3: error: '__builtin_ia32_pblendw128' needs target feature sse4.1
hash.c: ROUND( 1);
hash.c: ^
hash.c: ./rounds.h:51:3: note: expanded from macro 'ROUND'
hash.c: LOAD_MSG_ ##r ##_1(buf1); \
hash.c: ^
hash.c: <scratch space>:186:1: note: expanded from here
hash.c: LOAD_MSG_1_1
hash.c: ^
hash.c: ./load.sse41.h:31:6: note: expanded from macro 'LOAD_MSG_1_1'
hash.c: t0 = _mm_blend_epi16(m1, m2, 0x0C); \
hash.c: ^
hash.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/smmintrin.h:520:14: note: expanded from macro '_mm_blend_epi16'
hash.c: ((__m128i) __builtin_ia32_pblendw128 ((__v8hi)(__m128i)(V1), \
hash.c: ^
hash.c: hash.c:116:3: error: '__builtin_ia32_pblendw128' needs target feature sse4.1
hash.c: ./rounds.h:51:3: note: expanded from macro 'ROUND'
hash.c: LOAD_MSG_ ##r ##_1(buf1); \
hash.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41-2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41-2
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41-2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41-2

Compiler output

Implementation: sse41-2
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: In file included from hash.c:2:
hash.c: ./blake256.h:105:15: warning: '_mm_roti_epi32' macro redefined [-Wmacro-redefined]
hash.c: #define _mm_roti_epi32(r, c) ((8==-c) ? _mm_shuffle_epi8(r,r8) : ( (16==-c) ? _mm_shuffle_epi8(r,r16) : _mm_xor_si128(_mm_srli_epi32( (r), -(c) ),_mm_slli_epi32( (r), 32-(-c) )) ) )
hash.c: ^
hash.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/xopintrin.h:233:9: note: previous definition is here
hash.c: #define _mm_roti_epi32(A, N) \
hash.c: ^
hash.c: hash.c:93:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m0 = _mm_shuffle_epi8(LOADU(datablock + 00), u8to32);
hash.c: ^
hash.c: hash.c:94:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m1 = _mm_shuffle_epi8(LOADU(datablock + 16), u8to32);
hash.c: ^
hash.c: hash.c:95:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m2 = _mm_shuffle_epi8(LOADU(datablock + 32), u8to32);
hash.c: ^
hash.c: hash.c:96:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m3 = _mm_shuffle_epi8(LOADU(datablock + 48), u8to32);
hash.c: ^
hash.c: hash.c:115:3: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: ROUND( 0);
hash.c: ^
hash.c: ./rounds.h:52:3: note: expanded from macro 'ROUND'
hash.c: G1(row1,row2,row3,row4,buf1); \
hash.c: ^
hash.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41-2

Compiler output

Implementation: sse41-2
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:39,
hash.c: from /usr/lib/gcc/x86_64-linux-gnu/11/include/x86intrin.h:32,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: hash.c: In function 'blake256_compress':
hash.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/smmintrin.h:166:1: error: inlining failed in call to 'always_inline' '_mm_blend_epi16': target specific option mismatch
hash.c: 166 | _mm_blend_epi16 (__m128i __X, __m128i __Y, const int __M)
hash.c: | ^~~~~~~~~~~~~~~
hash.c: In file included from rounds.h:45,
hash.c: from blake256.h:127,
hash.c: from hash.c:2:
hash.c: load.sse41.h:313:6: note: called from here
hash.c: 313 | t2 = _mm_blend_epi16(t0,t1,0x0F); \
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: rounds.h:58:3: note: in expansion of macro 'LOAD_MSG_9_4'
hash.c: 58 | LOAD_MSG_ ##r ##_4(buf4); \
hash.c: | ^~~~~~~~~
hash.c: hash.c:124:3: note: in expansion of macro 'ROUND'
hash.c: 124 | ROUND( 9);
hash.c: | ^~~~~
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:39,
hash.c: from /usr/lib/gcc/x86_64-linux-gnu/11/include/x86intrin.h:32,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/smmintrin.h:166:1: error: inlining failed in call to 'always_inline' '_mm_blend_epi16': target specific option mismatch
hash.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41-2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41-2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41-2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41-2

Compiler output

Implementation: ssse3
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: In file included from hash.c:122:
hash.c: ./rounds.ssse3.h:3:55: warning: implicit conversion from 'long' to 'int' changes value from 2242054355 to -2052912941 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3964562569, 698298832, 57701188, 2242054355);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.ssse3.h:3:22: warning: implicit conversion from 'long' to 'int' changes value from 3964562569 to -330404727 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3964562569, 698298832, 57701188, 2242054355);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.ssse3.h:6:33: warning: implicit conversion from 'long' to 'int' changes value from 2752067618 to -1542899678 [-Wconstant-conversion]
hash.c: buf1 = _mm_set_epi32(137296536, 2752067618, 320440878, 608135816);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.ssse3.h:27:34: warning: implicit conversion from 'long' to 'int' changes value from 3380367581 to -914599715 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3041331479, 3380367581, 887688300, 953160567);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.ssse3.h:27:22: warning: implicit conversion from 'long' to 'int' changes value from 3041331479 to -1253635817 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3041331479, 3380367581, 887688300, 953160567);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.ssse3.h:30:46: warning: implicit conversion from 'long' to 'int' changes value from 3193202383 to -1101764913 [-Wconstant-conversion]
hash.c: buf1 = _mm_set_epi32(1065670069, 3232508343, 3193202383, 1160258022);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.ssse3.h:30:34: warning: implicit conversion from 'long' to 'int' changes value from 3232508343 to -1062458953 [-Wconstant-conversion]
hash.c: buf1 = _mm_set_epi32(1065670069, 3232508343, 3193202383, 1160258022);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.ssse3.h:51:57: warning: implicit conversion from 'long' to 'int' changes value from 3193202383 to -1101764913 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(137296536, 3041331479, 1160258022, 3193202383);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ...

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ssse3
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ssse3
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ssse3
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ssse3
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ssse3

Compiler output

Implementation: xop
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: hash.c:115:3: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake256_compress' that is compiled without support for 'xop'
hash.c: ROUND( 0);
hash.c: ^
hash.c: ./rounds.h:51:3: note: expanded from macro 'ROUND'
hash.c: LOAD_MSG_ ##r ##_1(buf1); \
hash.c: ^
hash.c: <scratch space>:178:1: note: expanded from here
hash.c: LOAD_MSG_0_1
hash.c: ^
hash.c: ./load.xop.h:19:6: note: expanded from macro 'LOAD_MSG_0_1'
hash.c: s0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(6),TOB(4),TOB(2),TOB(0)) ); \
hash.c: ^
hash.c: hash.c:115:3: error: '__builtin_ia32_vprotdi' needs target feature xop
hash.c: ./rounds.h:52:3: note: expanded from macro 'ROUND'
hash.c: G1(row1,row2,row3,row4,buf1); \
hash.c: ^
hash.c: ./rounds.h:8:10: note: expanded from macro 'G1'
hash.c: row4 = _mm_roti_epi32(row4, -16); \
hash.c: ^
hash.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/xopintrin.h:234:13: note: expanded from macro '_mm_roti_epi32'
hash.c: ((__m128i)__builtin_ia32_vprotdi((__v4si)(__m128i)(A), (N)))
hash.c: ^
hash.c: hash.c:115:3: error: '__builtin_ia32_vprotdi' needs target feature xop
hash.c: ./rounds.h:52:3: note: expanded from macro 'ROUND'
hash.c: G1(row1,row2,row3,row4,buf1); \
hash.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop

Compiler output

Implementation: xop
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: hash.c:93:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m0 = _mm_shuffle_epi8(LOADU(datablock + 00), u8to32);
hash.c: ^
hash.c: hash.c:94:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m1 = _mm_shuffle_epi8(LOADU(datablock + 16), u8to32);
hash.c: ^
hash.c: hash.c:95:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m2 = _mm_shuffle_epi8(LOADU(datablock + 32), u8to32);
hash.c: ^
hash.c: hash.c:96:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m3 = _mm_shuffle_epi8(LOADU(datablock + 48), u8to32);
hash.c: ^
hash.c: hash.c:115:3: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake256_compress' that is compiled without support for 'xop'
hash.c: ROUND( 0);
hash.c: ^
hash.c: ./rounds.h:51:3: note: expanded from macro 'ROUND'
hash.c: LOAD_MSG_ ##r ##_1(buf1); \
hash.c: ^
hash.c: <scratch space>:178:1: note: expanded from here
hash.c: LOAD_MSG_0_1
hash.c: ^
hash.c: ./load.xop.h:19:6: note: expanded from macro 'LOAD_MSG_0_1'
hash.c: s0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(6),TOB(4),TOB(2),TOB(0)) ); \
hash.c: ^
hash.c: hash.c:115:3: error: '__builtin_ia32_vprotdi' needs target feature xop
hash.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop

Compiler output

Implementation: xop
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/x86intrin.h:38,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: hash.c: In function 'blake256_compress':
hash.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/xopintrin.h:266:1: error: inlining failed in call to 'always_inline' '_mm_roti_epi32': target specific option mismatch
hash.c: 266 | _mm_roti_epi32(__m128i __A, const int __B)
hash.c: | ^~~~~~~~~~~~~~
hash.c: In file included from blake256.h:127,
hash.c: from hash.c:2:
hash.c: rounds.h:19:10: note: called from here
hash.c: 19 | row2 = _mm_roti_epi32(row2, -7); \
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~
hash.c: rounds.h:59:3: note: in expansion of macro 'G2'
hash.c: 59 | G2(row1,row2,row3,row4,buf4); \
hash.c: | ^~
hash.c: hash.c:128:3: note: in expansion of macro 'ROUND'
hash.c: 128 | ROUND(13);
hash.c: | ^~~~~
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/x86intrin.h:38,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/xopintrin.h:266:1: error: inlining failed in call to 'always_inline' '_mm_roti_epi32': target specific option mismatch
hash.c: 266 | _mm_roti_epi32(__m128i __A, const int __B)
hash.c: | ^~~~~~~~~~~~~~
hash.c: In file included from blake256.h:127,
hash.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE xop
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE xop
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE xop
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE xop