Implementation notes: amd64, par, crypto_hash/keccakc768

Computer: par
Architecture: amd64
CPU ID: GenuineIntel-000406c3-bfebfbff
SUPERCOP version: 20161026
Operation: crypto_hash
Primitive: keccakc768
TimeImplementationCompilerBenchmark dateSUPERCOP version
46600opt64lcu24gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
46820opt64lcu24gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
46900opt64lcu24gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
47060x86_64_asmgcc -march=native -mcpu=native -O22016121420161026
47060opt64lcu24gcc -march=native -mcpu=native -Os2016121420161026
47100x86_64_asmgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
47140x86_64_asmgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
47140x86_64_asmgcc -march=native -mcpu=native -O32016121420161026
47240opt64lcu6gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
47260x86_64_asmgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
47320x86_64_asmgcc -march=native -mcpu=native -Os2016121420161026
47480opt64lcu6gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
47940opt64lcu6gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
48140opt64lcu6gcc -march=native -mcpu=native -Os2016121420161026
49120opt64lcu24gcc -march=native -mcpu=native -O22016121420161026
49140opt64lcu24gcc -march=native -mcpu=native -O32016121420161026
49260opt64lcu6gcc -march=native -mcpu=native -O22016121420161026
49580opt64lcu6gcc -march=native -mcpu=native -O32016121420161026
51280opt64u6gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
51840opt64u6gcc -march=native -mcpu=native -Os2016121420161026
52240inplacegcc -march=native -mcpu=native -Os2016121420161026
52320simplegcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
52360inplacegcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
52440opt64u6gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
52640simplegcc -march=native -mcpu=native -Os2016121420161026
53240simplegcc -funroll-loops -march=native -mcpu=native -O22016121420161026
53440opt64u6gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
53740simplegcc -march=native -mcpu=native -O22016121420161026
53940inplacegcc -funroll-loops -march=native -mcpu=native -O22016121420161026
54020inplacegcc -funroll-loops -march=native -mcpu=native -O32016121420161026
54520inplacegcc -march=native -mcpu=native -O32016121420161026
54640simplegcc -funroll-loops -march=native -mcpu=native -O32016121420161026
54680opt64u6gcc -march=native -mcpu=native -O22016121420161026
55320opt64u6gcc -march=native -mcpu=native -O32016121420161026
55880inplacegcc -march=native -mcpu=native -O22016121420161026
56740simplegcc -march=native -mcpu=native -O32016121420161026
66960sseu2gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
68220sseu2gcc -march=native -mcpu=native -Os2016121420161026
69920sseu2gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
70320sseu2gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
73600sseu2gcc -march=native -mcpu=native -O32016121420161026
73660sseu2gcc -march=native -mcpu=native -O22016121420161026
80460mmxu1gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
80580mmxu1gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
81060mmxu1gcc -march=native -mcpu=native -Os2016121420161026
81820mmxu1gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
87020compactgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
87040mmxu1gcc -march=native -mcpu=native -O32016121420161026
88660mmxu1gcc -march=native -mcpu=native -O22016121420161026
93800compactgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
103540opt32bi-s2lcu4gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
105860opt32biT-s2lcu4gcc -march=native -mcpu=native -Os2016121420161026
105900opt32biT-s2lcu4gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
107680opt32bi-s2lcu4gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
107820opt32bi-s2lcu4gcc -march=native -mcpu=native -O32016121420161026
108140opt32bi-s2lcu4gcc -march=native -mcpu=native -Os2016121420161026
108200opt32biT-s2lcu4gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
110300opt32biT-s2lcu4gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
111480opt32bi-s2lcu4gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
112020opt32biT-s2lcu4gcc -march=native -mcpu=native -O32016121420161026
113160opt32biT-s2lcu4gcc -march=native -mcpu=native -O22016121420161026
114960opt32bi-rvku2gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
115860opt32bi-rvku2gcc -march=native -mcpu=native -Os2016121420161026
115920simple32bigcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
116080opt32bi-s2lcu4gcc -march=native -mcpu=native -O22016121420161026
116840compactgcc -march=native -mcpu=native -O32016121420161026
117860opt32bi-rvku2gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
118200simple32bigcc -march=native -mcpu=native -Os2016121420161026
119100simple32bigcc -funroll-loops -march=native -mcpu=native -O32016121420161026
119520inplace32bigcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
120180opt32bi-rvku2gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
121200inplace32bigcc -march=native -mcpu=native -Os2016121420161026
123060inplace32bigcc -funroll-loops -march=native -mcpu=native -O32016121420161026
124720simple32bigcc -march=native -mcpu=native -O32016121420161026
124900opt32bi-rvku2gcc -march=native -mcpu=native -O32016121420161026
125060simple32bigcc -funroll-loops -march=native -mcpu=native -O22016121420161026
126800opt32bi-rvku2gcc -march=native -mcpu=native -O22016121420161026
128460inplace32bigcc -march=native -mcpu=native -O32016121420161026
129640simple32bigcc -march=native -mcpu=native -O22016121420161026
131040inplace32bigcc -funroll-loops -march=native -mcpu=native -O22016121420161026
135440inplace32bigcc -march=native -mcpu=native -O22016121420161026
150400opt64lcu24shldgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
150520x86_64_shldgcc -march=native -mcpu=native -O32016121420161026
150560x86_64_shldgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
150560x86_64_shldgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
150560opt64lcu24shldgcc -march=native -mcpu=native -Os2016121420161026
150580x86_64_shldgcc -march=native -mcpu=native -O22016121420161026
150680x86_64_shldgcc -march=native -mcpu=native -Os2016121420161026
150720x86_64_shldgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
151360opt64lcu24shldgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
151380opt64lcu24shldgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
152600opt64lcu24shldgcc -march=native -mcpu=native -O32016121420161026
152640opt64lcu24shldgcc -march=native -mcpu=native -O22016121420161026
215240compactgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
218000compactgcc -march=native -mcpu=native -Os2016121420161026
220580compactgcc -march=native -mcpu=native -O22016121420161026
327860compact8gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
352720compact8gcc -march=native -mcpu=native -O32016121420161026
384440compact8gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
441960compact8gcc -march=native -mcpu=native -O22016121420161026
491100compact8gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
497320compact8gcc -march=native -mcpu=native -Os2016121420161026

Compiler output

Implementation: crypto_hash/keccakc768/compact
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
Keccak-compact.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 compact
gcc -funroll-loops -march=native -mcpu=native -O3 compact
gcc -funroll-loops -march=native -mcpu=native -Os compact
gcc -march=native -mcpu=native -O2 compact
gcc -march=native -mcpu=native -O3 compact
gcc -march=native -mcpu=native -Os compact

Compiler output

Implementation: crypto_hash/keccakc768/compact8
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
Keccak-compact8.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 compact8
gcc -funroll-loops -march=native -mcpu=native -O3 compact8
gcc -funroll-loops -march=native -mcpu=native -Os compact8
gcc -march=native -mcpu=native -O2 compact8
gcc -march=native -mcpu=native -O3 compact8
gcc -march=native -mcpu=native -Os compact8

Compiler output

Implementation: crypto_hash/keccakc768/inplace
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
Keccak-inplace.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 inplace
gcc -funroll-loops -march=native -mcpu=native -O3 inplace
gcc -funroll-loops -march=native -mcpu=native -Os inplace
gcc -march=native -mcpu=native -O2 inplace
gcc -march=native -mcpu=native -O3 inplace
gcc -march=native -mcpu=native -Os inplace

Compiler output

Implementation: crypto_hash/keccakc768/inplace32bi
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
Keccak-inplace32BI.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 inplace32bi
gcc -funroll-loops -march=native -mcpu=native -O3 inplace32bi
gcc -funroll-loops -march=native -mcpu=native -Os inplace32bi
gcc -march=native -mcpu=native -O2 inplace32bi
gcc -march=native -mcpu=native -O3 inplace32bi
gcc -march=native -mcpu=native -Os inplace32bi

Compiler output

Implementation: crypto_hash/keccakc768/simple
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
Keccak-simple.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 simple
gcc -funroll-loops -march=native -mcpu=native -O3 simple
gcc -funroll-loops -march=native -mcpu=native -Os simple
gcc -march=native -mcpu=native -O2 simple
gcc -march=native -mcpu=native -O3 simple
gcc -march=native -mcpu=native -Os simple

Compiler output

Implementation: crypto_hash/keccakc768/simple32bi
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
Keccak-simple32BI.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 simple32bi
gcc -funroll-loops -march=native -mcpu=native -O3 simple32bi
gcc -funroll-loops -march=native -mcpu=native -Os simple32bi
gcc -march=native -mcpu=native -O2 simple32bi
gcc -march=native -mcpu=native -O3 simple32bi
gcc -march=native -mcpu=native -Os simple32bi

Compiler output

Implementation: crypto_hash/keccakc768/opt32bi-rvku2
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
KeccakF-1600-opt32.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
KeccakSponge.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 18, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 opt32bi-rvku2 opt32bi-s2lcu4 opt32biT-s2lcu4
gcc -funroll-loops -march=native -mcpu=native -O3 opt32bi-rvku2 opt32bi-s2lcu4 opt32biT-s2lcu4
gcc -funroll-loops -march=native -mcpu=native -Os opt32bi-rvku2 opt32bi-s2lcu4 opt32biT-s2lcu4
gcc -march=native -mcpu=native -O2 opt32bi-rvku2 opt32bi-s2lcu4 opt32biT-s2lcu4
gcc -march=native -mcpu=native -O3 opt32bi-rvku2 opt32bi-s2lcu4 opt32biT-s2lcu4
gcc -march=native -mcpu=native -Os opt32bi-rvku2 opt32bi-s2lcu4 opt32biT-s2lcu4

Compiler output

Implementation: crypto_hash/keccakc768/xopu24
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
KeccakF-1600-opt64.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
KeccakF-1600-opt64.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/x86intrin.h:54:0,
KeccakF-1600-opt64.c: from KeccakF-1600-opt64.c:74:
KeccakF-1600-opt64.c: KeccakF-1600-opt64.c: In function 'KeccakPermutationOnWords':
KeccakF-1600-opt64.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/xopintrin.h:266:1: error: inlining failed in call to always_inline '_mm_roti_epi64': target specific option mismatch
KeccakF-1600-opt64.c: _mm_roti_epi64(__m128i __A, const int __B)
KeccakF-1600-opt64.c: ^~~~~~~~~~~~~~
KeccakF-1600-opt64.c: In file included from KeccakF-1600-opt64.c:130:0:
KeccakF-1600-opt64.c: KeccakF-1600-xop.macros:103:11: note: called from here
KeccakF-1600-opt64.c: Bsusa = ROL6464same(Bsusa, 2); \
KeccakF-1600-opt64.c:
KeccakF-1600-opt64.c: KeccakF-1600-xop.macros:123:36: note: in expansion of macro 'thetaRhoPiChiIotaPrepareTheta'
KeccakF-1600-opt64.c: #define thetaRhoPiChiIota(i, A, E) thetaRhoPiChiIotaPrepareTheta(i, A, E)
KeccakF-1600-opt64.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
KeccakF-1600-opt64.c: KeccakF-1600-unrolling.macros:40:5: note: in expansion of macro 'thetaRhoPiChiIota'
KeccakF-1600-opt64.c: thetaRhoPiChiIota(23, E, A) \
KeccakF-1600-opt64.c: ^~~~~~~~~~~~~~~~~
KeccakF-1600-opt64.c: KeccakF-1600-opt64.c:185:5: note: in expansion of macro 'rounds'
KeccakF-1600-opt64.c: rounds
KeccakF-1600-opt64.c: ^~~~~~
KeccakF-1600-opt64.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/x86intrin.h:54:0,
KeccakF-1600-opt64.c: from KeccakF-1600-opt64.c:74:
KeccakF-1600-opt64.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/xopintrin.h:239:1: error: inlining failed in call to always_inline '_mm_rot_epi64': target specific option mismatch
KeccakF-1600-opt64.c: _mm_rot_epi64(__m128i __A, __m128i __B)
KeccakF-1600-opt64.c: ^~~~~~~~~~~~~
KeccakF-1600-opt64.c: ...

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 xopu24
gcc -funroll-loops -march=native -mcpu=native -O3 xopu24
gcc -funroll-loops -march=native -mcpu=native -Os xopu24
gcc -march=native -mcpu=native -O2 xopu24
gcc -march=native -mcpu=native -O3 xopu24
gcc -march=native -mcpu=native -Os xopu24

Compiler output

Implementation: crypto_hash/keccakc768/mmxu1
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
KeccakF-1600-opt64.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
KeccakSponge.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 36, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 mmxu1 opt64lcu24 opt64lcu24shld opt64lcu6 opt64u6 sseu2
gcc -funroll-loops -march=native -mcpu=native -O3 mmxu1 opt64lcu24 opt64lcu24shld opt64lcu6 opt64u6 sseu2
gcc -funroll-loops -march=native -mcpu=native -Os mmxu1 opt64lcu24 opt64lcu24shld opt64lcu6 opt64u6 sseu2
gcc -march=native -mcpu=native -O2 mmxu1 opt64lcu24 opt64lcu24shld opt64lcu6 opt64u6 sseu2
gcc -march=native -mcpu=native -O3 mmxu1 opt64lcu24 opt64lcu24shld opt64lcu6 opt64u6 sseu2
gcc -march=native -mcpu=native -Os mmxu1 opt64lcu24 opt64lcu24shld opt64lcu6 opt64u6 sseu2

Compiler output

Implementation: crypto_hash/keccakc768/x86_64_asm
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
KeccakF-1600-x86-64-asm.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
KeccakSponge.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
KeccakF-1600-x86-64-gas.s: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 x86_64_asm
gcc -funroll-loops -march=native -mcpu=native -O3 x86_64_asm
gcc -funroll-loops -march=native -mcpu=native -Os x86_64_asm
gcc -march=native -mcpu=native -O2 x86_64_asm
gcc -march=native -mcpu=native -O3 x86_64_asm
gcc -march=native -mcpu=native -Os x86_64_asm

Compiler output

Implementation: crypto_hash/keccakc768/x86_64_shld
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
KeccakF-1600-x86-64-asm.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
KeccakSponge.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
KeccakF-1600-x86-64-shld-gas.s: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 x86_64_shld
gcc -funroll-loops -march=native -mcpu=native -O3 x86_64_shld
gcc -funroll-loops -march=native -mcpu=native -Os x86_64_shld
gcc -march=native -mcpu=native -O2 x86_64_shld
gcc -march=native -mcpu=native -O3 x86_64_shld
gcc -march=native -mcpu=native -Os x86_64_shld