Implementation notes: amd64, par, crypto_hash/blake256

Computer: par
Architecture: amd64
CPU ID: GenuineIntel-000406c3-bfebfbff
SUPERCOP version: 20161026
Operation: crypto_hash
Primitive: blake256
TimeImplementationCompilerBenchmark dateSUPERCOP version
24400sse2gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
24460sse2gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
25180sse2gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
25180vect128-mmxhackgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
25300vect128-mmxhackgcc -march=native -mcpu=native -Os2016121420161026
25340vect128-mmxhackgcc -march=native -mcpu=native -O32016121420161026
25440vect128-mmxhackgcc -march=native -mcpu=native -O22016121420161026
25460vect128-mmxhackgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
25520vect128-mmxhackgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
25600sse2gcc -march=native -mcpu=native -O22016121420161026
25600sse2gcc -march=native -mcpu=native -O32016121420161026
26520sse2-2gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
26580sse2-2gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
26860sse2-2gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
26860ssse3gcc -march=native -mcpu=native -Os2016121420161026
26920sse2-2gcc -march=native -mcpu=native -O32016121420161026
26940sse2-2gcc -march=native -mcpu=native -O22016121420161026
27140sse2gcc -march=native -mcpu=native -Os2016121420161026
27140sse41-2gcc -march=native -mcpu=native -Os2016121420161026
27220ssse3gcc -march=native -mcpu=native -O32016121420161026
27280ssse3gcc -march=native -mcpu=native -O22016121420161026
27300sse41gcc -march=native -mcpu=native -Os2016121420161026
27780sse41-2gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
27920sse41gcc -march=native -mcpu=native -O32016121420161026
27960sse41gcc -march=native -mcpu=native -O22016121420161026
27980sse41-2gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
28100sse41-2gcc -march=native -mcpu=native -O32016121420161026
28140sse2-2gcc -march=native -mcpu=native -Os2016121420161026
28180sse41-2gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
28360sse41-2gcc -march=native -mcpu=native -O22016121420161026
28640ssse3gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
28960sse41gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
29020sse41gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
29580sse41gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
29700ssse3gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
29720vect128gcc -march=native -mcpu=native -Os2016121420161026
29740ssse3gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
30160vect128gcc -march=native -mcpu=native -O32016121420161026
30300vect128gcc -march=native -mcpu=native -O22016121420161026
30760vect128gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
31860vect128gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
31920vect128gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
37920bswapgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
37960bswapgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
37960bswapgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
37980regsgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
38020regsgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
38020regsgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
38300bswapgcc -march=native -mcpu=native -Os2016121420161026
38360regsgcc -march=native -mcpu=native -Os2016121420161026
38480sphlibgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
38500sphlibgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
39020sandygcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
39100bswapgcc -march=native -mcpu=native -O32016121420161026
39120regsgcc -march=native -mcpu=native -O22016121420161026
39140regsgcc -march=native -mcpu=native -O32016121420161026
39200bswapgcc -march=native -mcpu=native -O22016121420161026
39560sphlibgcc -march=native -mcpu=native -O22016121420161026
39560sphlibgcc -march=native -mcpu=native -O32016121420161026
39900sandygcc -march=native -mcpu=native -Os2016121420161026
39980sphlibgcc -march=native -mcpu=native -Os2016121420161026
40320sphlibgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
41340sandygcc -funroll-loops -march=native -mcpu=native -O32016121420161026
41420sandygcc -funroll-loops -march=native -mcpu=native -O22016121420161026
42200sandygcc -march=native -mcpu=native -O32016121420161026
42220sandygcc -march=native -mcpu=native -O22016121420161026
48220sphlib-smallgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
48480sphlib-smallgcc -march=native -mcpu=native -Os2016121420161026
54280refgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
54900refgcc -march=native -mcpu=native -Os2016121420161026
70700sphlib-smallgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
71620sphlib-smallgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
72860refgcc -march=native -mcpu=native -O32016121420161026
73080sphlib-smallgcc -march=native -mcpu=native -O22016121420161026
73080sphlib-smallgcc -march=native -mcpu=native -O32016121420161026
74440refgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
75440refgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
81460refgcc -march=native -mcpu=native -O22016121420161026

Test failure

Implementation: crypto_hash/blake256/avxicc
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
error 111

Number of similar (compiler,implementation) pairs: 12, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 avxicc avxs
gcc -funroll-loops -march=native -mcpu=native -O3 avxicc avxs
gcc -funroll-loops -march=native -mcpu=native -Os avxicc avxs
gcc -march=native -mcpu=native -O2 avxicc avxs
gcc -march=native -mcpu=native -O3 avxicc avxs
gcc -march=native -mcpu=native -Os avxicc avxs

Compiler output

Implementation: crypto_hash/blake256/sphlib
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
blake.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 12, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 sphlib sphlib-small
gcc -funroll-loops -march=native -mcpu=native -O3 sphlib sphlib-small
gcc -funroll-loops -march=native -mcpu=native -Os sphlib sphlib-small
gcc -march=native -mcpu=native -O2 sphlib sphlib-small
gcc -march=native -mcpu=native -O3 sphlib sphlib-small
gcc -march=native -mcpu=native -Os sphlib sphlib-small

Compiler output

Implementation: crypto_hash/blake256/avxs
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
b256.s: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 avxs
gcc -funroll-loops -march=native -mcpu=native -O3 avxs
gcc -funroll-loops -march=native -mcpu=native -Os avxs
gcc -march=native -mcpu=native -O2 avxs
gcc -march=native -mcpu=native -O3 avxs
gcc -march=native -mcpu=native -Os avxs

Compiler output

Implementation: crypto_hash/blake256/xop
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
hash.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/x86intrin.h:54:0,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: hash.c: In function 'blake256_compress':
hash.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/xopintrin.h:260:1: error: inlining failed in call to always_inline '_mm_roti_epi32': target specific option mismatch
hash.c: _mm_roti_epi32(__m128i __A, const int __B)
hash.c: ^~~~~~~~~~~~~~
hash.c: In file included from blake256.h:127:0,
hash.c: from hash.c:2:
hash.c: rounds.h:19:8: note: called from here
hash.c: row2 = _mm_roti_epi32(row2, -7); \
hash.c: ^
hash.c: rounds.h:59:3: note: in expansion of macro 'G2'
hash.c: G2(row1,row2,row3,row4,buf4); \
hash.c: ^~
hash.c: hash.c:128:3: note: in expansion of macro 'ROUND'
hash.c: ROUND(13);
hash.c: ^~~~~
hash.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/x86intrin.h:54:0,
hash.c: from blake256.h:7,
hash.c: from hash.c:2:
hash.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/xopintrin.h:260:1: error: inlining failed in call to always_inline '_mm_roti_epi32': target specific option mismatch
hash.c: _mm_roti_epi32(__m128i __A, const int __B)
hash.c: ^~~~~~~~~~~~~~
hash.c: ...

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 xop
gcc -funroll-loops -march=native -mcpu=native -O3 xop
gcc -funroll-loops -march=native -mcpu=native -Os xop
gcc -march=native -mcpu=native -O2 xop
gcc -march=native -mcpu=native -O3 xop
gcc -march=native -mcpu=native -Os xop

Compiler output

Implementation: crypto_hash/blake256/vect128
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
nist.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
vector.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 12, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 vect128 vect128-mmxhack
gcc -funroll-loops -march=native -mcpu=native -O3 vect128 vect128-mmxhack
gcc -funroll-loops -march=native -mcpu=native -Os vect128 vect128-mmxhack
gcc -march=native -mcpu=native -O2 vect128 vect128-mmxhack
gcc -march=native -mcpu=native -O3 vect128 vect128-mmxhack
gcc -march=native -mcpu=native -Os vect128 vect128-mmxhack

Compiler output

Implementation: crypto_hash/blake256/bswap
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 54, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 bswap ref regs sandy sse2 sse2-2 sse41 sse41-2 ssse3
gcc -funroll-loops -march=native -mcpu=native -O3 bswap ref regs sandy sse2 sse2-2 sse41 sse41-2 ssse3
gcc -funroll-loops -march=native -mcpu=native -Os bswap ref regs sandy sse2 sse2-2 sse41 sse41-2 ssse3
gcc -march=native -mcpu=native -O2 bswap ref regs sandy sse2 sse2-2 sse41 sse41-2 ssse3
gcc -march=native -mcpu=native -O3 bswap ref regs sandy sse2 sse2-2 sse41 sse41-2 ssse3
gcc -march=native -mcpu=native -Os bswap ref regs sandy sse2 sse2-2 sse41 sse41-2 ssse3

Compiler output

Implementation: crypto_hash/blake256/avxicc
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
hash.s: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 avxicc
gcc -funroll-loops -march=native -mcpu=native -O3 avxicc
gcc -funroll-loops -march=native -mcpu=native -Os avxicc
gcc -march=native -mcpu=native -O2 avxicc
gcc -march=native -mcpu=native -O3 avxicc
gcc -march=native -mcpu=native -Os avxicc