Implementation notes: amd64, h4atom, crypto_hash/blake256

Computer: h4atom
Architecture: amd64
CPU ID: GenuineIntel-000106ca-bfe9fbff
SUPERCOP version: 20160806
Operation: crypto_hash
Primitive: blake256
TimeImplementationCompilerBenchmark dateSUPERCOP version
21776sse41clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016081120160806
22016sse2clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016081120160806
22504sse41-2clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016081120160806
24800sse2-2clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016081120160806
25200ssse3clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016081120160806
28608vect128-mmxhackgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016081120160806
28664sse2gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016081120160806
29400sse2gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016081120160806
29440sse2gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016081120160806
29520vect128-mmxhackgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016081120160806
29648vect128-mmxhackgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016081120160806
30592sse2-2gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016081120160806
31184ssse3gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016081120160806
31392ssse3gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016081120160806
31416sse2-2gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016081120160806
31440sse2-2gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016081120160806
31456ssse3gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016081120160806
32304vect128-mmxhackgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016081120160806
32568sse2gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016081120160806
32800ssse3gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016081120160806
33528sse2-2gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016081120160806
34120vect128gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016081120160806
34968sphlibclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016081120160806
35336vect128gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016081120160806
35624vect128gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016081120160806
36304bswapgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016081120160806
36376regsgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016081120160806
36456sphlibgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016081120160806
36496sphlibgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016081120160806
36752bswapgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016081120160806
36768bswapgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016081120160806
36784regsgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016081120160806
36800regsgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016081120160806
37016vect128gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016081120160806
39056bswapclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016081120160806
43144sphlibgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016081120160806
44712regsclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016081120160806
52632sphlibgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016081120160806
53552bswapgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016081120160806
56840regsgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016081120160806
58032sphlib-smallgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016081120160806
62872sphlib-smallclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016081120160806
63440refclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016081120160806
69080sphlib-smallgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016081120160806
75008sphlib-smallgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016081120160806
75072sphlib-smallgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016081120160806
76064refgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016081120160806
81704refgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016081120160806
86312refgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016081120160806
86944sandygcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016081120160806
87272refgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016081120160806
92040sandygcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016081120160806
92096sandygcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016081120160806
93648sandyclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016081120160806
96584sandygcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016081120160806

Test failure

Implementation: crypto_hash/blake256/avxs
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
error 111

Number of similar (compiler,implementation) pairs: 9, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments avxs
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv avxicc avxs
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv avxicc avxs
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv avxicc avxs
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv avxicc avxs

Compiler output

Implementation: crypto_hash/blake256/sse41-2
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
hash.c: In file included from hash.c:2:
hash.c: ./blake256.h:105:15: warning: '_mm_roti_epi32' macro redefined [-Wmacro-redefined]
hash.c: #define _mm_roti_epi32(r, c) ((8==-c) ? _mm_shuffle_epi8(r,r8) : ( (16==-c) ? _mm_shuffle_epi8(r,r16) : _mm_xor_si128(_mm_srli_epi32( (r), -(c) ),_mm_slli_epi32( (r), 32-(-c) )) ) )
hash.c: ^
hash.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/xopintrin.h:246:9: note: previous definition is here
hash.c: #define _mm_roti_epi32(A, N) __extension__ ({ \
hash.c: ^
hash.c: 1 warning generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments sse41-2

Compiler output

Implementation: crypto_hash/blake256/xop
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
hash.c: hash.c:115:3: error: always_inline function '_mm_perm_epi8' requires target feature 'sse4a', but would be inlined into function 'blake256_compress' that is compiled without support for 'sse4a'
hash.c: ROUND( 0);
hash.c: ^
hash.c: ./rounds.h:51:3: note: expanded from macro 'ROUND'
hash.c: LOAD_MSG_ ##r ##_1(buf1); \
hash.c: ^
hash.c: gt;:44:1: note: expanded from here
hash.c: LOAD_MSG_0_1
hash.c: ^
hash.c: ./load.xop.h:19:6: note: expanded from macro 'LOAD_MSG_0_1'
hash.c: s0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(6),TOB(4),TOB(2),TOB(0)) ); \
hash.c: ^
hash.c: hash.c:115:3: error: '__builtin_ia32_vprotdi' needs target feature xop
hash.c: ./rounds.h:52:3: note: expanded from macro 'ROUND'
hash.c: G1(row1,row2,row3,row4,buf1); \
hash.c: ^
hash.c: ./rounds.h:8:10: note: expanded from macro 'G1'
hash.c: row4 = _mm_roti_epi32(row4, -16); \
hash.c: ^
hash.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/xopintrin.h:247:12: note: expanded from macro '_mm_roti_epi32'
hash.c: (__m128i)__builtin_ia32_vprotdi((__v4si)(__m128i)(A), (N)); })
hash.c: ^
hash.c: hash.c:115:3: error: '__builtin_ia32_vprotdi' needs target feature xop
hash.c: ./rounds.h:52:3: note: expanded from macro 'ROUND'
hash.c: G1(row1,row2,row3,row4,buf1); \
hash.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments xop

Compiler output

Implementation: crypto_hash/blake256/avxicc
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
hash.s: hash.s:217952:59: error: unexpected token in argument list
hash.s: vmovdqu xmm3, XMMWORD PTR .L_2il0floatpacket.113[rip] #287.3
hash.s: ^
hash.s: hash.s:217953:60: error: unexpected token in argument list
hash.s: vmovdqu xmm11, XMMWORD PTR .L_2il0floatpacket.114[rip] #287.3
hash.s: ^
hash.s: hash.s:217993:59: error: unexpected token in argument list
hash.s: vmovdqu xmm0, XMMWORD PTR .L_2il0floatpacket.115[rip] #288.3
hash.s: ^
hash.s: hash.s:217997:59: error: unexpected token in argument list
hash.s: vmovdqu xmm9, XMMWORD PTR .L_2il0floatpacket.116[rip] #288.3
hash.s: ^
hash.s: hash.s:217998:59: error: unexpected token in argument list
hash.s: vmovdqu xmm8, XMMWORD PTR .L_2il0floatpacket.117[rip] #288.3
hash.s: ^
hash.s: hash.s:218005:59: error: unexpected token in argument list
hash.s: vmovdqu xmm3, XMMWORD PTR .L_2il0floatpacket.118[rip] #288.3
hash.s: ^
hash.s: hash.s:218017:66: error: unexpected token in argument list
hash.s: vpxor xmm7, xmm10, XMMWORD PTR .L_2il0floatpacket.119[rip] #288.3
hash.s: ^
hash.s: hash.s:218023:65: error: unexpected token in argument list
hash.s: vpaddd xmm4, xmm0, XMMWORD PTR .L_2il0floatpacket.120[rip] #288.3
hash.s: ^
hash.s: hash.s:218028:66: error: unexpected token in argument list
hash.s: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments avxicc

Compiler output

Implementation: crypto_hash/blake256/vect128-mmxhack
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
vector.c: vector.c:203:8: error: use of unknown builtin '__builtin_ia32_pshufd' [-Wimplicit-function-declaration]
vector.c: M0 = v32_shufrot(M0,1);
vector.c: ^
vector.c: ./vector.h:151:26: note: expanded from macro 'v32_shufrot'
vector.c: #define v32_shufrot(x,s) v32_shuf(x,XCAT(SHUFROT_,s))
vector.c: ^
vector.c: ./vector.h:140:18: note: expanded from macro 'v32_shuf'
vector.c: #define v32_shuf __builtin_ia32_pshufd
vector.c: ^
vector.c: vector.c:203:8: note: did you mean '__builtin_ia32_psubd'?
vector.c: ./vector.h:151:26: note: expanded from macro 'v32_shufrot'
vector.c: #define v32_shufrot(x,s) v32_shuf(x,XCAT(SHUFROT_,s))
vector.c: ^
vector.c: ./vector.h:140:18: note: expanded from macro 'v32_shuf'
vector.c: #define v32_shuf __builtin_ia32_pshufd
vector.c: ^
vector.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/mmintrin.h:177:19: note: '__builtin_ia32_psubd' declared here
vector.c: return (__m64)__builtin_ia32_psubd((__v2si)__m1, (__v2si)__m2);
vector.c: ^
vector.c: vector.c:203:6: error: assigning to 'v32' (aka 'v4si') from incompatible type 'int'
vector.c: M0 = v32_shufrot(M0,1);
vector.c: ^ ~~~~~~~~~~~~~~~~~
vector.c: vector.c:205:6: error: assigning to 'v32' (aka 'v4si') from incompatible type 'int'
vector.c: M0 = v32_shufrot(M0,1);
vector.c: ^ ~~~~~~~~~~~~~~~~~
vector.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments vect128-mmxhack

Compiler output

Implementation: crypto_hash/blake256/vect128
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
vector.c: vector.c:389:3: error: use of unknown builtin '__builtin_ia32_punpckldq128' [-Wimplicit-function-declaration]
vector.c: v32_interleave_inplace(M0,M2);
vector.c: ^
vector.c: ./vector.h:1038:17: note: expanded from macro 'v32_interleave_inplace'
vector.c: v32 c__ = v32_interleavel (a__, b__); \
vector.c: ^
vector.c: ./vector.h:100:27: note: expanded from macro 'v32_interleavel'
vector.c: #define v32_interleavel __builtin_ia32_punpckldq128
vector.c: ^
vector.c: vector.c:389:3: error: initializing 'v32' (aka 'v4si') with an expression of incompatible type 'int'
vector.c: v32_interleave_inplace(M0,M2);
vector.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector.c: ./vector.h:1038:11: note: expanded from macro 'v32_interleave_inplace'
vector.c: v32 c__ = v32_interleavel (a__, b__); \
vector.c: ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~
vector.c: vector.c:389:3: error: use of unknown builtin '__builtin_ia32_punpckhdq128' [-Wimplicit-function-declaration]
vector.c: ./vector.h:1039:17: note: expanded from macro 'v32_interleave_inplace'
vector.c: v32 d__ = v32_interleaveh (a__, b__); \
vector.c: ^
vector.c: ./vector.h:101:27: note: expanded from macro 'v32_interleaveh'
vector.c: #define v32_interleaveh __builtin_ia32_punpckhdq128
vector.c: ^
vector.c: vector.c:389:3: note: did you mean '__builtin_ia32_punpckldq128'?
vector.c: ./vector.h:1039:17: note: expanded from macro 'v32_interleave_inplace'
vector.c: v32 d__ = v32_interleaveh (a__, b__); \
vector.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments vect128

Compiler output

Implementation: crypto_hash/blake256/sse41-2
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/x86intrin.h:41:0,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: hash.c: In function 'blake256_compress':
hash.c: /usr/lib/gcc/x86_64-linux-gnu/5/include/smmintrin.h:166:1: error: inlining failed in call to always_inline '_mm_blend_epi16': target specific option mismatch
hash.c: _mm_blend_epi16 (__m128i __X, __m128i __Y, const int __M)
hash.c: ^
hash.c: In file included from rounds.h:45:0,
hash.c: from blake256.h:127,
hash.c: from hash.c:2:
hash.c: load.sse41.h:313:4: error: called from here
hash.c: t2 = _mm_blend_epi16(t0,t1,0x0F); \
hash.c: ^
hash.c: rounds.h:58:3: note: in expansion of macro 'LOAD_MSG_9_4'
hash.c: LOAD_MSG_ ##r ##_4(buf4); \
hash.c: ^
hash.c: hash.c:124:3: note: in expansion of macro 'ROUND'
hash.c: ROUND( 9);
hash.c: ^
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/x86intrin.h:41:0,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/5/include/smmintrin.h:166:1: error: inlining failed in call to always_inline '_mm_blend_epi16': target specific option mismatch
hash.c: _mm_blend_epi16 (__m128i __X, __m128i __Y, const int __M)
hash.c: ^
hash.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv sse41-2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv sse41-2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv sse41-2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv sse41-2

Compiler output

Implementation: crypto_hash/blake256/xop
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/x86intrin.h:52:0,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: hash.c: In function 'blake256_compress':
hash.c: /usr/lib/gcc/x86_64-linux-gnu/5/include/xopintrin.h:260:1: error: inlining failed in call to always_inline '_mm_roti_epi32': target specific option mismatch
hash.c: _mm_roti_epi32(__m128i __A, const int __B)
hash.c: ^
hash.c: In file included from blake256.h:127:0,
hash.c: from hash.c:2:
hash.c: rounds.h:19:8: error: called from here
hash.c: row2 = _mm_roti_epi32(row2, -7); \
hash.c: ^
hash.c: rounds.h:59:3: note: in expansion of macro 'G2'
hash.c: G2(row1,row2,row3,row4,buf4); \
hash.c: ^
hash.c: hash.c:128:3: note: in expansion of macro 'ROUND'
hash.c: ROUND(13);
hash.c: ^
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/x86intrin.h:52:0,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/5/include/xopintrin.h:260:1: error: inlining failed in call to always_inline '_mm_roti_epi32': target specific option mismatch
hash.c: _mm_roti_epi32(__m128i __A, const int __B)
hash.c: ^
hash.c: In file included from blake256.h:127:0,
hash.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv xop
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv xop
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv xop
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv xop

Compiler output

Implementation: crypto_hash/blake256/sse41
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv
hash.c: In file included from hash.c:5:0:
hash.c: rounds.sse41.h: In function 'blake256_compress':
hash.c: /usr/lib/gcc/x86_64-linux-gnu/5/include/smmintrin.h:166:1: error: inlining failed in call to always_inline '_mm_blend_epi16': target specific option mismatch
hash.c: _mm_blend_epi16 (__m128i __X, __m128i __Y, const int __M)
hash.c: ^
hash.c: In file included from hash.c:121:0:
hash.c: rounds.sse41.h:881:6: error: called from here
hash.c: tmp1 = _mm_blend_epi16(tmp0, m3, 0xC0);
hash.c: ^
hash.c: In file included from hash.c:5:0:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/5/include/smmintrin.h:166:1: error: inlining failed in call to always_inline '_mm_blend_epi16': target specific option mismatch
hash.c: _mm_blend_epi16 (__m128i __X, __m128i __Y, const int __M)
hash.c: ^
hash.c: In file included from hash.c:121:0:
hash.c: rounds.sse41.h:880:6: error: called from here
hash.c: tmp0 = _mm_blend_epi16(m0,m1,0x0F);
hash.c: ^
hash.c: In file included from hash.c:5:0:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/5/include/smmintrin.h:166:1: error: inlining failed in call to always_inline '_mm_blend_epi16': target specific option mismatch
hash.c: _mm_blend_epi16 (__m128i __X, __m128i __Y, const int __M)
hash.c: ^
hash.c: In file included from hash.c:121:0:
hash.c: rounds.sse41.h:852:6: error: called from here
hash.c: tmp6 = _mm_blend_epi16(tmp5, tmp4, 0xC0);
hash.c: ^
hash.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv sse41
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv sse41
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv sse41
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv sse41