Implementation notes: amd64, par, crypto_hash/keccak

Computer: par
Architecture: amd64
CPU ID: GenuineIntel-000406c3-bfebfbff
SUPERCOP version: 20161026
Operation: crypto_hash
Primitive: keccak
TimeImplementationCompilerBenchmark dateSUPERCOP version
40420opt64lcu24gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
40700opt64lcu24gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
40740opt64lcu24gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
40760opt64lcu6gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
40760x86_64_asmgcc -march=native -mcpu=native -O32016121420161026
40780x86_64_asmgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
40800opt64lcu24gcc -march=native -mcpu=native -Os2016121420161026
40820x86_64_asmgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
40840x86_64_asmgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
40840x86_64_asmgcc -march=native -mcpu=native -O22016121420161026
40860x86_64_asmgcc -march=native -mcpu=native -Os2016121420161026
41140opt64lcu6gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
41380opt64lcu6gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
42100opt64lcu6gcc -march=native -mcpu=native -Os2016121420161026
42560opt64lcu6gcc -march=native -mcpu=native -O22016121420161026
42760opt64lcu24gcc -march=native -mcpu=native -O22016121420161026
42760opt64lcu24gcc -march=native -mcpu=native -O32016121420161026
42980opt64lcu6gcc -march=native -mcpu=native -O32016121420161026
44380opt64u6gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
44840opt64u6gcc -march=native -mcpu=native -Os2016121420161026
45440inplacegcc -march=native -mcpu=native -Os2016121420161026
45480opt64u6gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
45540simplegcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
45660inplacegcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
45820simplegcc -march=native -mcpu=native -Os2016121420161026
46500simplegcc -funroll-loops -march=native -mcpu=native -O22016121420161026
46640inplacegcc -funroll-loops -march=native -mcpu=native -O32016121420161026
46660simplegcc -march=native -mcpu=native -O22016121420161026
46840inplacegcc -funroll-loops -march=native -mcpu=native -O22016121420161026
46980opt64u6gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
47440inplacegcc -march=native -mcpu=native -O32016121420161026
47480opt64u6gcc -march=native -mcpu=native -O22016121420161026
47820simplegcc -funroll-loops -march=native -mcpu=native -O32016121420161026
48360opt64u6gcc -march=native -mcpu=native -O32016121420161026
48720inplacegcc -march=native -mcpu=native -O22016121420161026
49400simplegcc -march=native -mcpu=native -O32016121420161026
57740sseu2gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
58720sseu2gcc -march=native -mcpu=native -Os2016121420161026
60460sseu2gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
60940sseu2gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
63540sseu2gcc -march=native -mcpu=native -O22016121420161026
64100sseu2gcc -march=native -mcpu=native -O32016121420161026
67420mmxu1gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
68640mmxu1gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
69180mmxu1gcc -march=native -mcpu=native -Os2016121420161026
69920mmxu1gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
75460compactgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
75460mmxu1gcc -march=native -mcpu=native -O32016121420161026
75640mmxu1gcc -march=native -mcpu=native -O22016121420161026
83500compactgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
89960opt32bi-s2lcu4gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
92980opt32biT-s2lcu4gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
93180opt32biT-s2lcu4gcc -march=native -mcpu=native -Os2016121420161026
95180opt32bi-s2lcu4gcc -march=native -mcpu=native -O32016121420161026
95520opt32biT-s2lcu4gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
95540opt32bi-s2lcu4gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
96080opt32bi-s2lcu4gcc -march=native -mcpu=native -Os2016121420161026
96700opt32biT-s2lcu4gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
98800opt32bi-s2lcu4gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
98920opt32biT-s2lcu4gcc -march=native -mcpu=native -O32016121420161026
100440opt32biT-s2lcu4gcc -march=native -mcpu=native -O22016121420161026
101580compactgcc -march=native -mcpu=native -O32016121420161026
102040opt32bi-rvku2gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
102660opt32bi-rvku2gcc -march=native -mcpu=native -Os2016121420161026
102740simple32bigcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
102860simple32bigcc -funroll-loops -march=native -mcpu=native -O32016121420161026
103520opt32bi-s2lcu4gcc -march=native -mcpu=native -O22016121420161026
103600opt32bi-rvku2gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
105080simple32bigcc -march=native -mcpu=native -Os2016121420161026
106100inplace32bigcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
106500inplace32bigcc -funroll-loops -march=native -mcpu=native -O32016121420161026
107760opt32bi-rvku2gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
108240inplace32bigcc -march=native -mcpu=native -Os2016121420161026
108340opt32bi-rvku2gcc -march=native -mcpu=native -O32016121420161026
108460simple32bigcc -march=native -mcpu=native -O32016121420161026
111040simple32bigcc -funroll-loops -march=native -mcpu=native -O22016121420161026
111380inplace32bigcc -march=native -mcpu=native -O32016121420161026
112840opt32bi-rvku2gcc -march=native -mcpu=native -O22016121420161026
115480simple32bigcc -march=native -mcpu=native -O22016121420161026
116180inplace32bigcc -funroll-loops -march=native -mcpu=native -O22016121420161026
121100inplace32bigcc -march=native -mcpu=native -O22016121420161026
129920opt64lcu24shldgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
130400x86_64_shldgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
130440x86_64_shldgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
130440x86_64_shldgcc -march=native -mcpu=native -O22016121420161026
130440x86_64_shldgcc -march=native -mcpu=native -O32016121420161026
130500x86_64_shldgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
130520x86_64_shldgcc -march=native -mcpu=native -Os2016121420161026
130840opt64lcu24shldgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
130880opt64lcu24shldgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
131160opt64lcu24shldgcc -march=native -mcpu=native -Os2016121420161026
132380opt64lcu24shldgcc -march=native -mcpu=native -O32016121420161026
132420opt64lcu24shldgcc -march=native -mcpu=native -O22016121420161026
186320compactgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
186920compactgcc -march=native -mcpu=native -Os2016121420161026
191280compactgcc -march=native -mcpu=native -O22016121420161026
283420compact8gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
306100compact8gcc -march=native -mcpu=native -O32016121420161026
331500compact8gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
383380compact8gcc -march=native -mcpu=native -O22016121420161026
433660compact8gcc -march=native -mcpu=native -Os2016121420161026
434180compact8gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026

Compiler output

Implementation: crypto_hash/keccak/compact
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
Keccak-compact.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 compact
gcc -funroll-loops -march=native -mcpu=native -O3 compact
gcc -funroll-loops -march=native -mcpu=native -Os compact
gcc -march=native -mcpu=native -O2 compact
gcc -march=native -mcpu=native -O3 compact
gcc -march=native -mcpu=native -Os compact

Compiler output

Implementation: crypto_hash/keccak/compact8
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
Keccak-compact8.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 compact8
gcc -funroll-loops -march=native -mcpu=native -O3 compact8
gcc -funroll-loops -march=native -mcpu=native -Os compact8
gcc -march=native -mcpu=native -O2 compact8
gcc -march=native -mcpu=native -O3 compact8
gcc -march=native -mcpu=native -Os compact8

Compiler output

Implementation: crypto_hash/keccak/inplace
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
Keccak-inplace.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 inplace
gcc -funroll-loops -march=native -mcpu=native -O3 inplace
gcc -funroll-loops -march=native -mcpu=native -Os inplace
gcc -march=native -mcpu=native -O2 inplace
gcc -march=native -mcpu=native -O3 inplace
gcc -march=native -mcpu=native -Os inplace

Compiler output

Implementation: crypto_hash/keccak/inplace32bi
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
Keccak-inplace32BI.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 inplace32bi
gcc -funroll-loops -march=native -mcpu=native -O3 inplace32bi
gcc -funroll-loops -march=native -mcpu=native -Os inplace32bi
gcc -march=native -mcpu=native -O2 inplace32bi
gcc -march=native -mcpu=native -O3 inplace32bi
gcc -march=native -mcpu=native -Os inplace32bi

Compiler output

Implementation: crypto_hash/keccak/simple
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
Keccak-simple.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 simple
gcc -funroll-loops -march=native -mcpu=native -O3 simple
gcc -funroll-loops -march=native -mcpu=native -Os simple
gcc -march=native -mcpu=native -O2 simple
gcc -march=native -mcpu=native -O3 simple
gcc -march=native -mcpu=native -Os simple

Compiler output

Implementation: crypto_hash/keccak/simple32bi
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
Keccak-simple32BI.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 simple32bi
gcc -funroll-loops -march=native -mcpu=native -O3 simple32bi
gcc -funroll-loops -march=native -mcpu=native -Os simple32bi
gcc -march=native -mcpu=native -O2 simple32bi
gcc -march=native -mcpu=native -O3 simple32bi
gcc -march=native -mcpu=native -Os simple32bi

Compiler output

Implementation: crypto_hash/keccak/opt32bi-rvku2
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
KeccakF-1600-opt32.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
KeccakSponge.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 18, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 opt32bi-rvku2 opt32bi-s2lcu4 opt32biT-s2lcu4
gcc -funroll-loops -march=native -mcpu=native -O3 opt32bi-rvku2 opt32bi-s2lcu4 opt32biT-s2lcu4
gcc -funroll-loops -march=native -mcpu=native -Os opt32bi-rvku2 opt32bi-s2lcu4 opt32biT-s2lcu4
gcc -march=native -mcpu=native -O2 opt32bi-rvku2 opt32bi-s2lcu4 opt32biT-s2lcu4
gcc -march=native -mcpu=native -O3 opt32bi-rvku2 opt32bi-s2lcu4 opt32biT-s2lcu4
gcc -march=native -mcpu=native -Os opt32bi-rvku2 opt32bi-s2lcu4 opt32biT-s2lcu4

Compiler output

Implementation: crypto_hash/keccak/xopu24
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
KeccakF-1600-opt64.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
KeccakF-1600-opt64.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/x86intrin.h:54:0,
KeccakF-1600-opt64.c: from KeccakF-1600-opt64.c:74:
KeccakF-1600-opt64.c: KeccakF-1600-opt64.c: In function 'KeccakPermutationOnWords':
KeccakF-1600-opt64.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/xopintrin.h:266:1: error: inlining failed in call to always_inline '_mm_roti_epi64': target specific option mismatch
KeccakF-1600-opt64.c: _mm_roti_epi64(__m128i __A, const int __B)
KeccakF-1600-opt64.c: ^~~~~~~~~~~~~~
KeccakF-1600-opt64.c: In file included from KeccakF-1600-opt64.c:130:0:
KeccakF-1600-opt64.c: KeccakF-1600-xop.macros:103:11: note: called from here
KeccakF-1600-opt64.c: Bsusa = ROL6464same(Bsusa, 2); \
KeccakF-1600-opt64.c:
KeccakF-1600-opt64.c: KeccakF-1600-xop.macros:123:36: note: in expansion of macro 'thetaRhoPiChiIotaPrepareTheta'
KeccakF-1600-opt64.c: #define thetaRhoPiChiIota(i, A, E) thetaRhoPiChiIotaPrepareTheta(i, A, E)
KeccakF-1600-opt64.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
KeccakF-1600-opt64.c: KeccakF-1600-unrolling.macros:40:5: note: in expansion of macro 'thetaRhoPiChiIota'
KeccakF-1600-opt64.c: thetaRhoPiChiIota(23, E, A) \
KeccakF-1600-opt64.c: ^~~~~~~~~~~~~~~~~
KeccakF-1600-opt64.c: KeccakF-1600-opt64.c:185:5: note: in expansion of macro 'rounds'
KeccakF-1600-opt64.c: rounds
KeccakF-1600-opt64.c: ^~~~~~
KeccakF-1600-opt64.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/x86intrin.h:54:0,
KeccakF-1600-opt64.c: from KeccakF-1600-opt64.c:74:
KeccakF-1600-opt64.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/xopintrin.h:239:1: error: inlining failed in call to always_inline '_mm_rot_epi64': target specific option mismatch
KeccakF-1600-opt64.c: _mm_rot_epi64(__m128i __A, __m128i __B)
KeccakF-1600-opt64.c: ^~~~~~~~~~~~~
KeccakF-1600-opt64.c: ...

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 xopu24
gcc -funroll-loops -march=native -mcpu=native -O3 xopu24
gcc -funroll-loops -march=native -mcpu=native -Os xopu24
gcc -march=native -mcpu=native -O2 xopu24
gcc -march=native -mcpu=native -O3 xopu24
gcc -march=native -mcpu=native -Os xopu24

Compiler output

Implementation: crypto_hash/keccak/mmxu1
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
KeccakF-1600-opt64.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
KeccakSponge.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 36, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 mmxu1 opt64lcu24 opt64lcu24shld opt64lcu6 opt64u6 sseu2
gcc -funroll-loops -march=native -mcpu=native -O3 mmxu1 opt64lcu24 opt64lcu24shld opt64lcu6 opt64u6 sseu2
gcc -funroll-loops -march=native -mcpu=native -Os mmxu1 opt64lcu24 opt64lcu24shld opt64lcu6 opt64u6 sseu2
gcc -march=native -mcpu=native -O2 mmxu1 opt64lcu24 opt64lcu24shld opt64lcu6 opt64u6 sseu2
gcc -march=native -mcpu=native -O3 mmxu1 opt64lcu24 opt64lcu24shld opt64lcu6 opt64u6 sseu2
gcc -march=native -mcpu=native -Os mmxu1 opt64lcu24 opt64lcu24shld opt64lcu6 opt64u6 sseu2

Compiler output

Implementation: crypto_hash/keccak/x86_64_asm
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
KeccakF-1600-x86-64-asm.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
KeccakSponge.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
KeccakF-1600-x86-64-gas.s: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 x86_64_asm
gcc -funroll-loops -march=native -mcpu=native -O3 x86_64_asm
gcc -funroll-loops -march=native -mcpu=native -Os x86_64_asm
gcc -march=native -mcpu=native -O2 x86_64_asm
gcc -march=native -mcpu=native -O3 x86_64_asm
gcc -march=native -mcpu=native -Os x86_64_asm

Compiler output

Implementation: crypto_hash/keccak/x86_64_shld
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
KeccakF-1600-x86-64-asm.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
KeccakSponge.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
hash.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
KeccakF-1600-x86-64-shld-gas.s: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 x86_64_shld
gcc -funroll-loops -march=native -mcpu=native -O3 x86_64_shld
gcc -funroll-loops -march=native -mcpu=native -Os x86_64_shld
gcc -march=native -mcpu=native -O2 x86_64_shld
gcc -march=native -mcpu=native -O3 x86_64_shld
gcc -march=native -mcpu=native -Os x86_64_shld