Implementation notes: amd64, margaux, crypto_hash/blake256

Computer: margaux
Microarchitecture: amd64; Core 2 65nm (6fb)
Architecture: amd64
CPU ID: GenuineIntel-000006fb-bfebfbff
SUPERCOP version: 20230530
Operation: crypto_hash
Primitive: blake256
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
193117670 0 020163 844 896sse2clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
197888132 0 020627 844 896sse2-2clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
198248077 0 020787 844 896sse2-2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
198407582 0 017501 836 896sse2-2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
198477669 0 019267 844 896sse2-2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
202678122 0 018835 844 896sse2-2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2080910056 0 022547 844 896bswapclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
208159976 0 022691 844 896bswapclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2087510040 0 022755 844 896regsclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2093210104 0 020392 780 928regsgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
209339632 0 020339 844 896regsclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
209419632 0 021235 844 896regsclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2096910072 0 022563 844 896regsclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
209719576 0 020283 844 896bswapclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2107210104 0 020392 780 928bswapgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
210869544 0 021147 844 896bswapclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2146810150 0 021492 796 960bswapgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
2155610647 0 022517 804 960bswapgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
2159311117 0 024157 804 960bswapgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
2161910647 0 022517 804 960regsgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
2165511722 0 024765 804 960regsgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
219907186 0 017117 836 896sse2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
219919602 0 022645 804 960sse2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
221007689 0 018403 844 896sse2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2210725812 0 038643 844 896sphlibclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
221767681 0 020403 844 896sse2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2218625780 0 037499 844 896sphlibclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2220325074 0 037683 844 896sphlibclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2221510034 0 023077 804 960sse2-2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
2222225812 0 036635 844 896sphlibclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2225024995 0 038109 804 960sphlibgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
2227924805 0 036741 804 960sphlibgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
223129960 0 019885 836 896bswapclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2236110026 0 019949 836 896regsclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
223807273 0 018883 844 896sse2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
224988567 0 020437 804 960sse2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
2257910793 0 022148 796 960regsgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
226387801 0 018088 780 928sse2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
226819037 0 020388 796 960sse2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
227339095 0 020965 804 960sse2-2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
227838362 0 018648 780 928sse2-2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
229567266 0 017560 780 928ssse3gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
230568145 0 019492 796 960ssse3gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
230778914 0 021957 804 960ssse3gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
2329025456 0 035509 836 896sphlibclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
233158103 0 019973 804 960ssse3gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
234379701 0 021044 796 960sse2-2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
2362523519 0 035013 804 960sphlibgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
2484722923 0 033304 780 928sphlibgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
2604110328 0 022819 844 896sandyclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2607610248 0 022963 844 896sandyclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
261729816 0 021419 844 896sandyclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
262909848 0 020555 844 896sandyclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2715410891 0 021176 780 928sandygcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
2721210236 0 020157 836 896sandyclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
2780311367 0 023237 804 960sandygcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
2820811837 0 024877 804 960sandygcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
3003710964 0 022308 796 960sandygcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
320002633 0 012920 780 928refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
320208115 0 019645 804 960sphlib-smallgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
325267577 0 017960 780 928sphlib-smallgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
339163006 0 014356 796 960refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
391713291 0 016003 844 896refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
392463323 0 015811 844 896refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
397972754 0 014355 844 896refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
404029396 0 021131 844 896sphlib-smallclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
406029428 0 020267 844 896sphlib-smallclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
408449460 0 022307 844 896sphlib-smallclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
410602844 0 013555 844 896refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
410988497 0 018565 836 896sphlib-smallclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
411568722 0 021347 844 896sphlib-smallclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
4292411157 0 023125 804 960sphlib-smallgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
4296511363 0 024477 804 960sphlib-smallgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
432612713 0 012637 836 896refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023053120230530
440145162 0 018189 804 960refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530
473373735 0 015605 804 960refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023053120230530

Test failure

Implementation: avxs
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
error 111

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avxs
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avxs
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avxs
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avxs
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE avxs
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE avxs
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE avxs
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE avxs

Compiler output

Implementation: avxs
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: hash.c:155:61: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_final' that is compiled without support for 'ssse3'
hash.c: __m128i w0 = _mm_load_si128((__m128i*)(&S->h[0])); w0 = _mm_shuffle_epi8(w0, u32to8);
hash.c: ^
hash.c: hash.c:156:61: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_final' that is compiled without support for 'ssse3'
hash.c: __m128i w1 = _mm_load_si128((__m128i*)(&S->h[4])); w1 = _mm_shuffle_epi8(w1, u32to8);
hash.c: ^
hash.c: 2 errors generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avxs

Compiler output

Implementation: sse41
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: In file included from hash.c:121:
hash.c: ./rounds.sse41.h:17:55: warning: implicit conversion from 'long' to 'int' changes value from 2242054355 to -2052912941 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3964562569, 698298832, 57701188, 2242054355);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.sse41.h:17:22: warning: implicit conversion from 'long' to 'int' changes value from 3964562569 to -330404727 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3964562569, 698298832, 57701188, 2242054355);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.sse41.h:20:33: warning: implicit conversion from 'long' to 'int' changes value from 2752067618 to -1542899678 [-Wconstant-conversion]
hash.c: buf1 = _mm_set_epi32(137296536, 2752067618, 320440878, 608135816);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.sse41.h:47:34: warning: implicit conversion from 'long' to 'int' changes value from 3380367581 to -914599715 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3041331479, 3380367581, 887688300, 953160567);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.sse41.h:47:22: warning: implicit conversion from 'long' to 'int' changes value from 3041331479 to -1253635817 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3041331479, 3380367581, 887688300, 953160567);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.sse41.h:50:46: warning: implicit conversion from 'long' to 'int' changes value from 3193202383 to -1101764913 [-Wconstant-conversion]
hash.c: buf1 = _mm_set_epi32(1065670069, 3232508343, 3193202383, 1160258022);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.sse41.h:50:34: warning: implicit conversion from 'long' to 'int' changes value from 3232508343 to -1062458953 [-Wconstant-conversion]
hash.c: buf1 = _mm_set_epi32(1065670069, 3232508343, 3193202383, 1160258022);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.sse41.h:81:57: warning: implicit conversion from 'long' to 'int' changes value from 3193202383 to -1101764913 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(137296536, 3041331479, 1160258022, 3193202383);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ...

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41

Compiler output

Implementation: sse41
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
hash.c: In file included from hash.c:5:
hash.c: rounds.sse41.h: In function 'blake256_compress':
hash.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/smmintrin.h:166:1: error: inlining failed in call to 'always_inline' '_mm_blend_epi16': target specific option mismatch
hash.c: 166 | _mm_blend_epi16 (__m128i __X, __m128i __Y, const int __M)
hash.c: | ^~~~~~~~~~~~~~~
hash.c: In file included from hash.c:121:
hash.c: rounds.sse41.h:881:8: note: called from here
hash.c: 881 | tmp1 = _mm_blend_epi16(tmp0, m3, 0xC0);
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: In file included from hash.c:5:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/smmintrin.h:166:1: error: inlining failed in call to 'always_inline' '_mm_blend_epi16': target specific option mismatch
hash.c: 166 | _mm_blend_epi16 (__m128i __X, __m128i __Y, const int __M)
hash.c: | ^~~~~~~~~~~~~~~
hash.c: In file included from hash.c:121:
hash.c: rounds.sse41.h:880:8: note: called from here
hash.c: 880 | tmp0 = _mm_blend_epi16(m0,m1,0x0F);
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: In file included from hash.c:5:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/smmintrin.h:166:1: error: inlining failed in call to 'always_inline' '_mm_blend_epi16': target specific option mismatch
hash.c: 166 | _mm_blend_epi16 (__m128i __X, __m128i __Y, const int __M)
hash.c: | ^~~~~~~~~~~~~~~
hash.c: In file included from hash.c:121:
hash.c: rounds.sse41.h:852:8: note: called from here
hash.c: 852 | tmp6 = _mm_blend_epi16(tmp5, tmp4, 0xC0);
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41

Compiler output

Implementation: sse41-2
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: In file included from hash.c:2:
hash.c: ./blake256.h:105:15: warning: '_mm_roti_epi32' macro redefined [-Wmacro-redefined]
hash.c: #define _mm_roti_epi32(r, c) ((8==-c) ? _mm_shuffle_epi8(r,r8) : ( (16==-c) ? _mm_shuffle_epi8(r,r16) : _mm_xor_si128(_mm_srli_epi32( (r), -(c) ),_mm_slli_epi32( (r), 32-(-c) )) ) )
hash.c: ^
hash.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/xopintrin.h:233:9: note: previous definition is here
hash.c: #define _mm_roti_epi32(A, N) \
hash.c: ^
hash.c: hash.c:116:3: error: '__builtin_ia32_pblendw128' needs target feature sse4.1
hash.c: ROUND( 1);
hash.c: ^
hash.c: ./rounds.h:51:3: note: expanded from macro 'ROUND'
hash.c: LOAD_MSG_ ##r ##_1(buf1); \
hash.c: ^
hash.c: <scratch space>:186:1: note: expanded from here
hash.c: LOAD_MSG_1_1
hash.c: ^
hash.c: ./load.sse41.h:31:6: note: expanded from macro 'LOAD_MSG_1_1'
hash.c: t0 = _mm_blend_epi16(m1, m2, 0x0C); \
hash.c: ^
hash.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/smmintrin.h:520:14: note: expanded from macro '_mm_blend_epi16'
hash.c: ((__m128i) __builtin_ia32_pblendw128 ((__v8hi)(__m128i)(V1), \
hash.c: ^
hash.c: hash.c:116:3: error: '__builtin_ia32_pblendw128' needs target feature sse4.1
hash.c: ./rounds.h:51:3: note: expanded from macro 'ROUND'
hash.c: LOAD_MSG_ ##r ##_1(buf1); \
hash.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41-2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41-2
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41-2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41-2

Compiler output

Implementation: sse41-2
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: In file included from hash.c:2:
hash.c: ./blake256.h:105:15: warning: '_mm_roti_epi32' macro redefined [-Wmacro-redefined]
hash.c: #define _mm_roti_epi32(r, c) ((8==-c) ? _mm_shuffle_epi8(r,r8) : ( (16==-c) ? _mm_shuffle_epi8(r,r16) : _mm_xor_si128(_mm_srli_epi32( (r), -(c) ),_mm_slli_epi32( (r), 32-(-c) )) ) )
hash.c: ^
hash.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/xopintrin.h:233:9: note: previous definition is here
hash.c: #define _mm_roti_epi32(A, N) \
hash.c: ^
hash.c: hash.c:93:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m0 = _mm_shuffle_epi8(LOADU(datablock + 00), u8to32);
hash.c: ^
hash.c: hash.c:94:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m1 = _mm_shuffle_epi8(LOADU(datablock + 16), u8to32);
hash.c: ^
hash.c: hash.c:95:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m2 = _mm_shuffle_epi8(LOADU(datablock + 32), u8to32);
hash.c: ^
hash.c: hash.c:96:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m3 = _mm_shuffle_epi8(LOADU(datablock + 48), u8to32);
hash.c: ^
hash.c: hash.c:115:3: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: ROUND( 0);
hash.c: ^
hash.c: ./rounds.h:52:3: note: expanded from macro 'ROUND'
hash.c: G1(row1,row2,row3,row4,buf1); \
hash.c: ^
hash.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41-2

Compiler output

Implementation: sse41-2
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:39,
hash.c: from /usr/lib/gcc/x86_64-linux-gnu/11/include/x86intrin.h:32,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: hash.c: In function 'blake256_compress':
hash.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/smmintrin.h:166:1: error: inlining failed in call to 'always_inline' '_mm_blend_epi16': target specific option mismatch
hash.c: 166 | _mm_blend_epi16 (__m128i __X, __m128i __Y, const int __M)
hash.c: | ^~~~~~~~~~~~~~~
hash.c: In file included from rounds.h:45,
hash.c: from blake256.h:127,
hash.c: from hash.c:2:
hash.c: load.sse41.h:313:6: note: called from here
hash.c: 313 | t2 = _mm_blend_epi16(t0,t1,0x0F); \
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: rounds.h:58:3: note: in expansion of macro 'LOAD_MSG_9_4'
hash.c: 58 | LOAD_MSG_ ##r ##_4(buf4); \
hash.c: | ^~~~~~~~~
hash.c: hash.c:124:3: note: in expansion of macro 'ROUND'
hash.c: 124 | ROUND( 9);
hash.c: | ^~~~~
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:39,
hash.c: from /usr/lib/gcc/x86_64-linux-gnu/11/include/x86intrin.h:32,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/smmintrin.h:166:1: error: inlining failed in call to 'always_inline' '_mm_blend_epi16': target specific option mismatch
hash.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41-2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41-2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41-2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE sse41-2

Compiler output

Implementation: ssse3
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: In file included from hash.c:122:
hash.c: ./rounds.ssse3.h:3:55: warning: implicit conversion from 'long' to 'int' changes value from 2242054355 to -2052912941 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3964562569, 698298832, 57701188, 2242054355);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.ssse3.h:3:22: warning: implicit conversion from 'long' to 'int' changes value from 3964562569 to -330404727 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3964562569, 698298832, 57701188, 2242054355);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.ssse3.h:6:33: warning: implicit conversion from 'long' to 'int' changes value from 2752067618 to -1542899678 [-Wconstant-conversion]
hash.c: buf1 = _mm_set_epi32(137296536, 2752067618, 320440878, 608135816);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.ssse3.h:27:34: warning: implicit conversion from 'long' to 'int' changes value from 3380367581 to -914599715 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3041331479, 3380367581, 887688300, 953160567);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.ssse3.h:27:22: warning: implicit conversion from 'long' to 'int' changes value from 3041331479 to -1253635817 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(3041331479, 3380367581, 887688300, 953160567);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.ssse3.h:30:46: warning: implicit conversion from 'long' to 'int' changes value from 3193202383 to -1101764913 [-Wconstant-conversion]
hash.c: buf1 = _mm_set_epi32(1065670069, 3232508343, 3193202383, 1160258022);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.ssse3.h:30:34: warning: implicit conversion from 'long' to 'int' changes value from 3232508343 to -1062458953 [-Wconstant-conversion]
hash.c: buf1 = _mm_set_epi32(1065670069, 3232508343, 3193202383, 1160258022);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ./rounds.ssse3.h:51:57: warning: implicit conversion from 'long' to 'int' changes value from 3193202383 to -1101764913 [-Wconstant-conversion]
hash.c: buf2 = _mm_set_epi32(137296536, 3041331479, 1160258022, 3193202383);
hash.c: ~~~~~~~~~~~~~ ^~~~~~~~~~
hash.c: ...

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ssse3
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ssse3
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ssse3
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ssse3
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ssse3

Compiler output

Implementation: xop
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: hash.c:115:3: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake256_compress' that is compiled without support for 'xop'
hash.c: ROUND( 0);
hash.c: ^
hash.c: ./rounds.h:51:3: note: expanded from macro 'ROUND'
hash.c: LOAD_MSG_ ##r ##_1(buf1); \
hash.c: ^
hash.c: <scratch space>:178:1: note: expanded from here
hash.c: LOAD_MSG_0_1
hash.c: ^
hash.c: ./load.xop.h:19:6: note: expanded from macro 'LOAD_MSG_0_1'
hash.c: s0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(6),TOB(4),TOB(2),TOB(0)) ); \
hash.c: ^
hash.c: hash.c:115:3: error: '__builtin_ia32_vprotdi' needs target feature xop
hash.c: ./rounds.h:52:3: note: expanded from macro 'ROUND'
hash.c: G1(row1,row2,row3,row4,buf1); \
hash.c: ^
hash.c: ./rounds.h:8:10: note: expanded from macro 'G1'
hash.c: row4 = _mm_roti_epi32(row4, -16); \
hash.c: ^
hash.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/xopintrin.h:234:13: note: expanded from macro '_mm_roti_epi32'
hash.c: ((__m128i)__builtin_ia32_vprotdi((__v4si)(__m128i)(A), (N)))
hash.c: ^
hash.c: hash.c:115:3: error: '__builtin_ia32_vprotdi' needs target feature xop
hash.c: ./rounds.h:52:3: note: expanded from macro 'ROUND'
hash.c: G1(row1,row2,row3,row4,buf1); \
hash.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop

Compiler output

Implementation: xop
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: hash.c:93:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m0 = _mm_shuffle_epi8(LOADU(datablock + 00), u8to32);
hash.c: ^
hash.c: hash.c:94:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m1 = _mm_shuffle_epi8(LOADU(datablock + 16), u8to32);
hash.c: ^
hash.c: hash.c:95:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m2 = _mm_shuffle_epi8(LOADU(datablock + 32), u8to32);
hash.c: ^
hash.c: hash.c:96:22: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'blake256_compress' that is compiled without support for 'ssse3'
hash.c: const __m128i m3 = _mm_shuffle_epi8(LOADU(datablock + 48), u8to32);
hash.c: ^
hash.c: hash.c:115:3: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake256_compress' that is compiled without support for 'xop'
hash.c: ROUND( 0);
hash.c: ^
hash.c: ./rounds.h:51:3: note: expanded from macro 'ROUND'
hash.c: LOAD_MSG_ ##r ##_1(buf1); \
hash.c: ^
hash.c: <scratch space>:178:1: note: expanded from here
hash.c: LOAD_MSG_0_1
hash.c: ^
hash.c: ./load.xop.h:19:6: note: expanded from macro 'LOAD_MSG_0_1'
hash.c: s0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(6),TOB(4),TOB(2),TOB(0)) ); \
hash.c: ^
hash.c: hash.c:115:3: error: '__builtin_ia32_vprotdi' needs target feature xop
hash.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop

Compiler output

Implementation: xop
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/x86intrin.h:38,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: hash.c: In function 'blake256_compress':
hash.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/xopintrin.h:266:1: error: inlining failed in call to 'always_inline' '_mm_roti_epi32': target specific option mismatch
hash.c: 266 | _mm_roti_epi32(__m128i __A, const int __B)
hash.c: | ^~~~~~~~~~~~~~
hash.c: In file included from blake256.h:127,
hash.c: from hash.c:2:
hash.c: rounds.h:19:10: note: called from here
hash.c: 19 | row2 = _mm_roti_epi32(row2, -7); \
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~
hash.c: rounds.h:59:3: note: in expansion of macro 'G2'
hash.c: 59 | G2(row1,row2,row3,row4,buf4); \
hash.c: | ^~
hash.c: hash.c:128:3: note: in expansion of macro 'ROUND'
hash.c: 128 | ROUND(13);
hash.c: | ^~~~~
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/x86intrin.h:38,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/xopintrin.h:266:1: error: inlining failed in call to 'always_inline' '_mm_roti_epi32': target specific option mismatch
hash.c: 266 | _mm_roti_epi32(__m128i __A, const int __B)
hash.c: | ^~~~~~~~~~~~~~
hash.c: In file included from blake256.h:127,
hash.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE xop
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE xop
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE xop
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE xop