Implementation notes: amd64, par, crypto_hash/blake512

Computer: par
Architecture: amd64
CPU ID: GenuineIntel-000406c3-bfebfbff
SUPERCOP version: 20161026
Operation: crypto_hash
Primitive: blake512
TimeImplementationCompilerBenchmark dateSUPERCOP version
22400bswapgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
22500bswapgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
22520bswapgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
22540regsgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
22640regsgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
22680regsgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
22680bswapgcc -march=native -mcpu=native -Os2016121420161026
22800regsgcc -march=native -mcpu=native -Os2016121420161026
22960bswapgcc -march=native -mcpu=native -O32016121420161026
23000bswapgcc -march=native -mcpu=native -O22016121420161026
23020regsgcc -march=native -mcpu=native -O32016121420161026
23060sphlibgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
23100sphlibgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
23160regsgcc -march=native -mcpu=native -O22016121420161026
23740sphlibgcc -march=native -mcpu=native -O22016121420161026
23740sphlibgcc -march=native -mcpu=native -O32016121420161026
24120sphlibgcc -march=native -mcpu=native -Os2016121420161026
24300sphlibgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
28880sphlib-smallgcc -march=native -mcpu=native -Os2016121420161026
28960sphlib-smallgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
33940refgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
34040refgcc -march=native -mcpu=native -Os2016121420161026
43860sphlib-smallgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
43860sse2gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
44000sphlib-smallgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
44620sse2gcc -march=native -mcpu=native -Os2016121420161026
45040sse2gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
45180sse2gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
45320sphlib-smallgcc -march=native -mcpu=native -O22016121420161026
45600sse2gcc -march=native -mcpu=native -O32016121420161026
45640sphlib-smallgcc -march=native -mcpu=native -O32016121420161026
45740sse2gcc -march=native -mcpu=native -O22016121420161026
46340refgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
46360sse2sgcc -march=native -mcpu=native -Os2016121420161026
46440refgcc -march=native -mcpu=native -O32016121420161026
47200sse2sgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
47240refgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
47800sse2sgcc -march=native -mcpu=native -O32016121420161026
47880sse2sgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
47920sse2sgcc -march=native -mcpu=native -O22016121420161026
48020sse2sgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
48860refgcc -march=native -mcpu=native -O22016121420161026
49140sse41gcc -march=native -mcpu=native -Os2016121420161026
49280sse41gcc -march=native -mcpu=native -O32016121420161026
49380sse41gcc -march=native -mcpu=native -O22016121420161026
49380vect128gcc -march=native -mcpu=native -Os2016121420161026
49460vect128gcc -march=native -mcpu=native -O32016121420161026
49580vect128gcc -march=native -mcpu=native -O22016121420161026
49960sse41gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
50020sse41gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
50140sse41gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
50640vect128-inplacegcc -march=native -mcpu=native -O32016121420161026
50760vect128-inplacegcc -march=native -mcpu=native -O22016121420161026
50840vect128gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
50860vect128-inplacegcc -funroll-loops -march=native -mcpu=native -O32016121420161026
50880vect128gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
50900vect128-inplacegcc -funroll-loops -march=native -mcpu=native -O22016121420161026
51120vect128-inplacegcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
51140vect128gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
51660ssse3gcc -march=native -mcpu=native -Os2016121420161026
51740vect128-inplacegcc -march=native -mcpu=native -Os2016121420161026
51940ssse3gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
52060ssse3gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
52640ssse3gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
52700ssse3gcc -march=native -mcpu=native -O32016121420161026
52900ssse3gcc -march=native -mcpu=native -O22016121420161026
89520sandygcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
89940sandygcc -march=native -mcpu=native -Os2016121420161026
91700sandygcc -march=native -mcpu=native -O32016121420161026
91740sandygcc -funroll-loops -march=native -mcpu=native -O32016121420161026
91860sandygcc -march=native -mcpu=native -O22016121420161026
91880sandygcc -funroll-loops -march=native -mcpu=native -O22016121420161026

Test failure

Implementation: crypto_hash/blake512/avxicc
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
error 111

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 avxicc
gcc -funroll-loops -march=native -mcpu=native -O3 avxicc
gcc -funroll-loops -march=native -mcpu=native -Os avxicc
gcc -march=native -mcpu=native -O2 avxicc
gcc -march=native -mcpu=native -O3 avxicc
gcc -march=native -mcpu=native -Os avxicc

Compiler output

Implementation: crypto_hash/blake512/sphlib
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
blake.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 12, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 sphlib sphlib-small
gcc -funroll-loops -march=native -mcpu=native -O3 sphlib sphlib-small
gcc -funroll-loops -march=native -mcpu=native -Os sphlib sphlib-small
gcc -march=native -mcpu=native -O2 sphlib sphlib-small
gcc -march=native -mcpu=native -O3 sphlib sphlib-small
gcc -march=native -mcpu=native -Os sphlib sphlib-small

Compiler output

Implementation: crypto_hash/blake512/xop-2
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
hash.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/x86intrin.h:54:0,
hash.c: from hash.c:5:
hash.c: hash.c: In function 'blake512_compress':
hash.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/xopintrin.h:212:1: error: inlining failed in call to always_inline '_mm_perm_epi8': target specific option mismatch
hash.c: _mm_perm_epi8(__m128i __A, __m128i __B, __m128i __C)
hash.c: ^~~~~~~~~~~~~
hash.c: In file included from hash.c:8:0:
hash.c: rounds.h:15:21: note: called from here
hash.c: #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: hash.c:99:15: note: in expansion of macro 'BSWAP64'
hash.c: m.u128[7] = BSWAP64(m.u128[7]);
hash.c: ^~~~~~~
hash.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/x86intrin.h:54:0,
hash.c: from hash.c:5:
hash.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/xopintrin.h:212:1: error: inlining failed in call to always_inline '_mm_perm_epi8': target specific option mismatch
hash.c: _mm_perm_epi8(__m128i __A, __m128i __B, __m128i __C)
hash.c: ^~~~~~~~~~~~~
hash.c: In file included from hash.c:8:0:
hash.c: rounds.h:15:21: note: called from here
hash.c: #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: hash.c:98:15: note: in expansion of macro 'BSWAP64'
hash.c: m.u128[6] = BSWAP64(m.u128[6]);
hash.c: ...

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 xop-2
gcc -funroll-loops -march=native -mcpu=native -O3 xop-2
gcc -funroll-loops -march=native -mcpu=native -Os xop-2
gcc -march=native -mcpu=native -O2 xop-2
gcc -march=native -mcpu=native -O3 xop-2
gcc -march=native -mcpu=native -Os xop-2

Compiler output

Implementation: crypto_hash/blake512/xop
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
hash.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/x86intrin.h:54:0,
hash.c: from hash.c:5:
hash.c: hash.c: In function 'blake512_compress':
hash.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/xopintrin.h:266:1: error: inlining failed in call to always_inline '_mm_roti_epi64': target specific option mismatch
hash.c: _mm_roti_epi64(__m128i __A, const int __B)
hash.c: ^~~~~~~~~~~~~~
hash.c: In file included from hash.c:8:0:
hash.c: rounds.h:825:9: note: called from here
hash.c: row2h = _mm_roti_epi64(row2h, -11); \
hash.c: ^
hash.c: rounds.h:867:3: note: in expansion of macro 'G2'
hash.c: G2(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h,b0,b1); \
hash.c: ^~
hash.c: hash.c:132:3: note: in expansion of macro 'ROUND'
hash.c: ROUND(15);
hash.c: ^~~~~
hash.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/x86intrin.h:54:0,
hash.c: from hash.c:5:
hash.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/xopintrin.h:266:1: error: inlining failed in call to always_inline '_mm_roti_epi64': target specific option mismatch
hash.c: _mm_roti_epi64(__m128i __A, const int __B)
hash.c: ^~~~~~~~~~~~~~
hash.c: In file included from hash.c:8:0:
hash.c: rounds.h:824:9: note: called from here
hash.c: row2l = _mm_roti_epi64(row2l, -11); \
hash.c: ...

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 xop
gcc -funroll-loops -march=native -mcpu=native -O3 xop
gcc -funroll-loops -march=native -mcpu=native -Os xop
gcc -march=native -mcpu=native -O2 xop
gcc -march=native -mcpu=native -O3 xop
gcc -march=native -mcpu=native -Os xop

Compiler output

Implementation: crypto_hash/blake512/vect128
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
nist.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
vector.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 12, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 vect128 vect128-inplace
gcc -funroll-loops -march=native -mcpu=native -O3 vect128 vect128-inplace
gcc -funroll-loops -march=native -mcpu=native -Os vect128 vect128-inplace
gcc -march=native -mcpu=native -O2 vect128 vect128-inplace
gcc -march=native -mcpu=native -O3 vect128 vect128-inplace
gcc -march=native -mcpu=native -Os vect128 vect128-inplace

Compiler output

Implementation: crypto_hash/blake512/vect128-xop
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
nist.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
vector.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
vector.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/x86intrin.h:54:0,
vector.c: from vector.h:29,
vector.c: from vector.c:7:
vector.c: vector.c: In function 'round512':
vector.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/xopintrin.h:266:1: error: inlining failed in call to always_inline '_mm_roti_epi64': target specific option mismatch
vector.c: _mm_roti_epi64(__m128i __A, const int __B)
vector.c: ^~~~~~~~~~~~~~
vector.c: vector.c:745:8: note: called from here
vector.c: B1 = v64_rotate(B1, 64-11); \
vector.c:
vector.c: vector.c:756:36: note: in expansion of macro 'ROUND'
vector.c: ROUND(12); ROUND(13); ROUND(14); ROUND(15);
vector.c: ^~~~~
vector.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/x86intrin.h:54:0,
vector.c: from vector.h:29,
vector.c: from vector.c:7:
vector.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/xopintrin.h:266:1: error: inlining failed in call to always_inline '_mm_roti_epi64': target specific option mismatch
vector.c: _mm_roti_epi64(__m128i __A, const int __B)
vector.c: ^~~~~~~~~~~~~~
vector.c: vector.c:744:8: note: called from here
vector.c: B0 = v64_rotate(B0, 64-11); \
vector.c:
vector.c: vector.c:756:36: note: in expansion of macro 'ROUND'
vector.c: ROUND(12); ROUND(13); ROUND(14); ROUND(15);
vector.c: ...

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 vect128-xop
gcc -funroll-loops -march=native -mcpu=native -O3 vect128-xop
gcc -funroll-loops -march=native -mcpu=native -Os vect128-xop
gcc -march=native -mcpu=native -O2 vect128-xop
gcc -march=native -mcpu=native -O3 vect128-xop
gcc -march=native -mcpu=native -Os vect128-xop

Compiler output

Implementation: crypto_hash/blake512/bswap
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 48, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 bswap ref regs sandy sse2 sse2s sse41 ssse3
gcc -funroll-loops -march=native -mcpu=native -O3 bswap ref regs sandy sse2 sse2s sse41 ssse3
gcc -funroll-loops -march=native -mcpu=native -Os bswap ref regs sandy sse2 sse2s sse41 ssse3
gcc -march=native -mcpu=native -O2 bswap ref regs sandy sse2 sse2s sse41 ssse3
gcc -march=native -mcpu=native -O3 bswap ref regs sandy sse2 sse2s sse41 ssse3
gcc -march=native -mcpu=native -Os bswap ref regs sandy sse2 sse2s sse41 ssse3

Compiler output

Implementation: crypto_hash/blake512/avxicc
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
hash.s: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 avxicc
gcc -funroll-loops -march=native -mcpu=native -O3 avxicc
gcc -funroll-loops -march=native -mcpu=native -Os avxicc
gcc -march=native -mcpu=native -O2 avxicc
gcc -march=native -mcpu=native -O3 avxicc
gcc -march=native -mcpu=native -Os avxicc