Implementation notes: amd64, waldorf, crypto_hash/blake256

Computer: waldorf
Architecture: amd64
CPU ID: GenuineIntel-000106e5-bfebfbff
SUPERCOP version: 20160715
Operation: crypto_hash
Primitive: blake256
TimeImplementationCompilerBenchmark dateSUPERCOP version
16512sse41-2clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
16776sse41clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
17192sse41gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
17464sse41gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
17464sse41-2gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
17920sse41-2gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
18084vect128gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
18172sse41gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
18188sse41-2gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
18220vect128gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
18704vect128gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
19088ssse3gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
19352ssse3gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
19392ssse3clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
19664vect128-mmxhackgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
19684ssse3gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
20008vect128-mmxhackgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
20276vect128-mmxhackgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
20352sse2clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
21484sse2-2clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
22484sse2-2gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
22724sse2gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
22748sse2gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
23240sse2gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
23284sse2-2gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
23872sse2-2gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
30352sse41gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
30444sse41-2gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
31440vect128gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
31448ssse3gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
31888vect128-mmxhackgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
39364sphlibgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
39964sse2-2gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
40740sse2gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
42148bswapgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
44680bswapgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
45016regsgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
45400regsgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
45408sphlibgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
45432sphlibgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
45668regsgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
45876regsclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
46428sandygcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
47548sphlibgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
47928sandygcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
48660bswapgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
48704sandygcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
48960sphlibclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
49064bswapgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
49208regsgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
49360sphlib-smallgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
49844sphlib-smallgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
50304sandygcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
50564bswapclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
50624sandyclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
50672sphlib-smallgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
50972refgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
55080refgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
55812sphlib-smallgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
56852refgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
61280refgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
62356sphlib-smallclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
64204refclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715

Test failure

Implementation: crypto_hash/blake256/avxs
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
error 111

Number of similar (compiler,implementation) pairs: 9, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments avxs
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv avxicc avxs
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv avxicc avxs
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv avxicc avxs
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv avxicc avxs

Compiler output

Implementation: crypto_hash/blake256/xop
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
hash.c: hash.c:115:3: warning: implicit declaration of function '_mm_perm_epi8' is invalid in C99 [-Wimplicit-function-declaration]
hash.c: ROUND( 0);
hash.c: ^
hash.c: ./rounds.h:51:3: note: expanded from macro 'ROUND'
hash.c: LOAD_MSG_ ##r ##_1(buf1); \
hash.c: ^
hash.c: gt;:43:1: note: expanded from here
hash.c: LOAD_MSG_0_1
hash.c: ^
hash.c: ./load.xop.h:19:6: note: expanded from macro 'LOAD_MSG_0_1'
hash.c: s0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(6),TOB(4),TOB(2),TOB(0)) ); \
hash.c: ^
hash.c: hash.c:115:3: error: assigning to '__m128i' (vector of 2 'long long' values) from incompatible type 'int'
hash.c: ROUND( 0);
hash.c: ^
hash.c: ./rounds.h:51:3: note: expanded from macro 'ROUND'
hash.c: LOAD_MSG_ ##r ##_1(buf1); \
hash.c: ^
hash.c: gt;:43:1: note: expanded from here
hash.c: LOAD_MSG_0_1
hash.c: ^
hash.c: ./load.xop.h:19:4: note: expanded from macro 'LOAD_MSG_0_1'
hash.c: s0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(6),TOB(4),TOB(2),TOB(0)) ); \
hash.c: ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: hash.c:115:3: warning: implicit declaration of function '_mm_roti_epi32' is invalid in C99 [-Wimplicit-function-declaration]
hash.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments xop

Compiler output

Implementation: crypto_hash/blake256/avxicc
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
hash.s: hash.s:217952:59: error: unexpected token in argument list
hash.s: vmovdqu xmm3, XMMWORD PTR .L_2il0floatpacket.113[rip] #287.3
hash.s: ^
hash.s: hash.s:217953:60: error: unexpected token in argument list
hash.s: vmovdqu xmm11, XMMWORD PTR .L_2il0floatpacket.114[rip] #287.3
hash.s: ^
hash.s: hash.s:217993:59: error: unexpected token in argument list
hash.s: vmovdqu xmm0, XMMWORD PTR .L_2il0floatpacket.115[rip] #288.3
hash.s: ^
hash.s: hash.s:217997:59: error: unexpected token in argument list
hash.s: vmovdqu xmm9, XMMWORD PTR .L_2il0floatpacket.116[rip] #288.3
hash.s: ^
hash.s: hash.s:217998:59: error: unexpected token in argument list
hash.s: vmovdqu xmm8, XMMWORD PTR .L_2il0floatpacket.117[rip] #288.3
hash.s: ^
hash.s: hash.s:218005:59: error: unexpected token in argument list
hash.s: vmovdqu xmm3, XMMWORD PTR .L_2il0floatpacket.118[rip] #288.3
hash.s: ^
hash.s: hash.s:218017:66: error: unexpected token in argument list
hash.s: vpxor xmm7, xmm10, XMMWORD PTR .L_2il0floatpacket.119[rip] #288.3
hash.s: ^
hash.s: hash.s:218023:65: error: unexpected token in argument list
hash.s: vpaddd xmm4, xmm0, XMMWORD PTR .L_2il0floatpacket.120[rip] #288.3
hash.s: ^
hash.s: hash.s:218028:66: error: unexpected token in argument list
hash.s: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments avxicc

Compiler output

Implementation: crypto_hash/blake256/vect128-mmxhack
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
vector.c: vector.c:203:8: error: use of unknown builtin '__builtin_ia32_pshufd' [-Wimplicit-function-declaration]
vector.c: M0 = v32_shufrot(M0,1);
vector.c: ^
vector.c: ./vector.h:151:26: note: expanded from macro 'v32_shufrot'
vector.c: #define v32_shufrot(x,s) v32_shuf(x,XCAT(SHUFROT_,s))
vector.c: ^
vector.c: ./vector.h:140:18: note: expanded from macro 'v32_shuf'
vector.c: #define v32_shuf __builtin_ia32_pshufd
vector.c: ^
vector.c: vector.c:203:8: note: did you mean '__builtin_ia32_psubd'?
vector.c: ./vector.h:151:26: note: expanded from macro 'v32_shufrot'
vector.c: #define v32_shufrot(x,s) v32_shuf(x,XCAT(SHUFROT_,s))
vector.c: ^
vector.c: ./vector.h:140:18: note: expanded from macro 'v32_shuf'
vector.c: #define v32_shuf __builtin_ia32_pshufd
vector.c: ^
vector.c: /usr/include/clang/3.5.0/include/mmintrin.h:178:19: note: '__builtin_ia32_psubd' declared here
vector.c: return (__m64)__builtin_ia32_psubd((__v2si)__m1, (__v2si)__m2);
vector.c: ^
vector.c: vector.c:203:6: error: assigning to 'v32' (aka 'v4si') from incompatible type 'int'
vector.c: M0 = v32_shufrot(M0,1);
vector.c: ^ ~~~~~~~~~~~~~~~~~
vector.c: vector.c:205:6: error: assigning to 'v32' (aka 'v4si') from incompatible type 'int'
vector.c: M0 = v32_shufrot(M0,1);
vector.c: ^ ~~~~~~~~~~~~~~~~~
vector.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments vect128-mmxhack

Compiler output

Implementation: crypto_hash/blake256/vect128
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
vector.c: vector.c:389:3: error: use of unknown builtin '__builtin_ia32_punpckldq128' [-Wimplicit-function-declaration]
vector.c: v32_interleave_inplace(M0,M2);
vector.c: ^
vector.c: ./vector.h:1038:17: note: expanded from macro 'v32_interleave_inplace'
vector.c: v32 c__ = v32_interleavel (a__, b__); \
vector.c: ^
vector.c: ./vector.h:100:27: note: expanded from macro 'v32_interleavel'
vector.c: #define v32_interleavel __builtin_ia32_punpckldq128
vector.c: ^
vector.c: vector.c:389:3: error: initializing 'v32' (aka 'v4si') with an expression of incompatible type 'int'
vector.c: v32_interleave_inplace(M0,M2);
vector.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector.c: ./vector.h:1038:11: note: expanded from macro 'v32_interleave_inplace'
vector.c: v32 c__ = v32_interleavel (a__, b__); \
vector.c: ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~
vector.c: vector.c:389:3: error: use of unknown builtin '__builtin_ia32_punpckhdq128' [-Wimplicit-function-declaration]
vector.c: ./vector.h:1039:17: note: expanded from macro 'v32_interleave_inplace'
vector.c: v32 d__ = v32_interleaveh (a__, b__); \
vector.c: ^
vector.c: ./vector.h:101:27: note: expanded from macro 'v32_interleaveh'
vector.c: #define v32_interleaveh __builtin_ia32_punpckhdq128
vector.c: ^
vector.c: vector.c:389:3: note: did you mean '__builtin_ia32_punpckldq128'?
vector.c: ./vector.h:1039:17: note: expanded from macro 'v32_interleave_inplace'
vector.c: v32 d__ = v32_interleaveh (a__, b__); \
vector.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments vect128

Compiler output

Implementation: crypto_hash/blake256/xop
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/x86intrin.h:52:0,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: hash.c: In function 'blake256_compress':
hash.c: /usr/lib/gcc/x86_64-linux-gnu/4.9/include/xopintrin.h:212:1: error: inlining failed in call to always_inline '_mm_perm_epi8': target specific option mismatch
hash.c: _mm_perm_epi8(__m128i __A, __m128i __B, __m128i __C)
hash.c: ^
hash.c: In file included from rounds.h:43:0,
hash.c: from blake256.h:127,
hash.c: from hash.c:2:
hash.c: load.xop.h:19:4: error: called from here
hash.c: s0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(6),TOB(4),TOB(2),TOB(0)) ); \
hash.c: ^
hash.c: rounds.h:51:3: note: in expansion of macro 'LOAD_MSG_0_1'
hash.c: LOAD_MSG_ ##r ##_1(buf1); \
hash.c: ^
hash.c: hash.c:115:3: note: in expansion of macro 'ROUND'
hash.c: ROUND( 0);
hash.c: ^
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/x86intrin.h:52:0,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/4.9/include/xopintrin.h:260:1: error: inlining failed in call to always_inline '_mm_roti_epi32': target specific option mismatch
hash.c: _mm_roti_epi32(__m128i __A, const int __B)
hash.c: ^
hash.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv xop
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv xop
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv xop
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv xop