Implementation notes: amd64, genji291, crypto_dh/k298

Computer: genji291
Architecture: amd64
CPU ID: GenuineIntel-00050671-bfebfbff
SUPERCOP version: 20180818
Operation: crypto_dh
Primitive: k298
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
400638? ? ?? ? ?reficc_-xAVX_-O3_-fomit-frame-pointer2018082020180818
400736? ? ?? ? ?reficc_-xCORE-AVX2_-O3_-fomit-frame-pointer2018082020180818
410074? ? ?? ? ?reficc_-xCORE-AVX-I_-O3_-fomit-frame-pointer2018082020180818
420308? ? ?? ? ?reficc_-xCORE-AVX2_-O2_-fomit-frame-pointer2018082020180818
421232? ? ?? ? ?reficc_-xCOMMON-AVX512_-O3_-fomit-frame-pointer2018082020180818
423990? ? ?? ? ?reficc_-xAVX_-O2_-fomit-frame-pointer2018082020180818
426552? ? ?? ? ?reficc_-xMIC-AVX512_-O2_-fomit-frame-pointer2018082020180818
426692? ? ?? ? ?reficc_-xMIC-AVX512_-O3_-fomit-frame-pointer2018082020180818
428624? ? ?? ? ?reficc_-xCOMMON-AVX512_-O2_-fomit-frame-pointer2018082020180818
429618? ? ?? ? ?reficc_-xCORE-AVX-I_-O2_-fomit-frame-pointer2018082020180818
508368? ? ?? ? ?reficc_-xSSE4.1_-O3_-fomit-frame-pointer2018082020180818
515158? ? ?? ? ?reficc_-xSSE4.2_-O3_-fomit-frame-pointer2018082020180818
539000? ? ?? ? ?reficc_-xSSE4.2_-O2_-fomit-frame-pointer2018082020180818
539938? ? ?? ? ?reficc_-xSSE4.1_-O2_-fomit-frame-pointer2018082020180818
547750? ? ?? ? ?reficc_-no-vec2018082020180818
551068? ? ?? ? ?reficc2018082020180818

Compiler output

Implementation: ref
Security model: unknown
Compiler: cc
dh.c: In file included from dh.c:6:0:
dh.c: ffa.h: In function 'ffa_red_149':
dh.c: ffa.h:18:10: error: incompatible types when assigning to type '__m128i' from type 'int'
dh.c: tp_2 = _mm_clmulepi64_si128(p_149_0, tp_0, 0x00);\
dh.c: ^
dh.c: ffa.h:47:5: note: in expansion of macro 'ffa_red_149_stp'
dh.c: ffa_red_149_stp(a_00, a_01, tp_0, tp_1, tp_2, p_149_0, p_149_1);
dh.c: ^
dh.c: ffa.h:19:10: error: incompatible types when assigning to type '__m128i' from type 'int'
dh.c: tp_1 = _mm_clmulepi64_si128(p_149_0, tp_0, 0x01);\
dh.c: ^
dh.c: ffa.h:47:5: note: in expansion of macro 'ffa_red_149_stp'
dh.c: ffa_red_149_stp(a_00, a_01, tp_0, tp_1, tp_2, p_149_0, p_149_1);
dh.c: ^
dh.c: ffa.h:20:10: error: incompatible types when assigning to type '__m128i' from type 'int'
dh.c: tp_0 = _mm_clmulepi64_si128(p_149_1, tp_0, 0x00);\
dh.c: ^
dh.c: ffa.h:47:5: note: in expansion of macro 'ffa_red_149_stp'
dh.c: ffa_red_149_stp(a_00, a_01, tp_0, tp_1, tp_2, p_149_0, p_149_1);
dh.c: ^
dh.c: ffa.h:18:10: error: incompatible types when assigning to type '__m128i' from type 'int'
dh.c: tp_2 = _mm_clmulepi64_si128(p_149_0, tp_0, 0x00);\
dh.c: ^
dh.c: ffa.h:48:5: note: in expansion of macro 'ffa_red_149_stp'
dh.c: ffa_red_149_stp(a_00, a_01, tp_0, tp_1, tp_2, p_149_0, p_149_1);
dh.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
cc ref

Compiler output

Implementation: ref
Security model: unknown
Compiler: gcc
dh.c: In file included from /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/x86intrin.h:43,
dh.c: from lib.h:2,
dh.c: from dh.c:2:
dh.c: smu.h: In function 'smu_3nf_ltr':
dh.c: /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/smmintrin.h:268:1: error: inlining failed in call to always_inline '_mm_cmpeq_epi64': target specific option mismatch
dh.c: _mm_cmpeq_epi64 (__m128i __X, __m128i __Y)
dh.c: ^~~~~~~~~~~~~~~
dh.c: In file included from dh.c:8:
dh.c: smu.h:337:19: note: called from here
dh.c: mask_lps[7] = _mm_cmpeq_epi64(digits[7], dig_sse);
dh.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dh.c: In file included from /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/x86intrin.h:43,
dh.c: from lib.h:2,
dh.c: from dh.c:2:
dh.c: /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/smmintrin.h:268:1: error: inlining failed in call to always_inline '_mm_cmpeq_epi64': target specific option mismatch
dh.c: _mm_cmpeq_epi64 (__m128i __X, __m128i __Y)
dh.c: ^~~~~~~~~~~~~~~
dh.c: In file included from dh.c:8:
dh.c: smu.h:336:19: note: called from here
dh.c: mask_lps[6] = _mm_cmpeq_epi64(digits[6], dig_sse);
dh.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dh.c: In file included from /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/x86intrin.h:43,
dh.c: from lib.h:2,
dh.c: from dh.c:2:
dh.c: /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/smmintrin.h:268:1: error: inlining failed in call to always_inline '_mm_cmpeq_epi64': target specific option mismatch
dh.c: ...

Number of similar (compiler,implementation) pairs: 2, namely:
CompilerImplementations
gcc ref
gcc -funroll-loops ref

Compiler output

Implementation: ref
Security model: unknown
Compiler: gcc -O2 -fomit-frame-pointer
dh.c: In file included from /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/x86intrin.h:45,
dh.c: from lib.h:2,
dh.c: from dh.c:2:
dh.c: ffa.h: In function 'ffa_red_149':
dh.c: /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/wmmintrin.h:116:1: error: inlining failed in call to always_inline '_mm_clmulepi64_si128': target specific option mismatch
dh.c: _mm_clmulepi64_si128 (__m128i __X, __m128i __Y, const int __I)
dh.c: ^~~~~~~~~~~~~~~~~~~~
dh.c: In file included from dh.c:6:
dh.c: ffa.h:20:12: note: called from here
dh.c: tp_0 = _mm_clmulepi64_si128(p_149_1, tp_0, 0x00);\
dh.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dh.c: ffa.h:82:5: note: in expansion of macro 'ffa_red_149_stp'
dh.c: ffa_red_149_stp(b_00, b_01, tp_0, tp_1, tp_2, p_149_0, p_149_1);
dh.c: ^~~~~~~~~~~~~~~
dh.c: In file included from /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/x86intrin.h:45,
dh.c: from lib.h:2,
dh.c: from dh.c:2:
dh.c: /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/wmmintrin.h:116:1: error: inlining failed in call to always_inline '_mm_clmulepi64_si128': target specific option mismatch
dh.c: _mm_clmulepi64_si128 (__m128i __X, __m128i __Y, const int __I)
dh.c: ^~~~~~~~~~~~~~~~~~~~
dh.c: In file included from dh.c:6:
dh.c: ffa.h:19:12: note: called from here
dh.c: tp_1 = _mm_clmulepi64_si128(p_149_0, tp_0, 0x01);\
dh.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dh.c: ffa.h:82:5: note: in expansion of macro 'ffa_red_149_stp'
dh.c: ...

Number of similar (compiler,implementation) pairs: 84, namely:
CompilerImplementations
gcc -O2 -fomit-frame-pointer ref
gcc -O3 -fomit-frame-pointer ref
gcc -O -fomit-frame-pointer ref
gcc -Os -fomit-frame-pointer ref
gcc -fno-schedule-insns -O2 -fomit-frame-pointer ref
gcc -fno-schedule-insns -O3 -fomit-frame-pointer ref
gcc -fno-schedule-insns -O -fomit-frame-pointer ref
gcc -fno-schedule-insns -Os -fomit-frame-pointer ref
gcc -funroll-loops -O2 -fomit-frame-pointer ref
gcc -funroll-loops -O3 -fomit-frame-pointer ref
gcc -funroll-loops -O -fomit-frame-pointer ref
gcc -funroll-loops -Os -fomit-frame-pointer ref
gcc -funroll-loops -fno-schedule-insns -O2 -fomit-frame-pointer ref
gcc -funroll-loops -fno-schedule-insns -O3 -fomit-frame-pointer ref
gcc -funroll-loops -fno-schedule-insns -O -fomit-frame-pointer ref
gcc -funroll-loops -fno-schedule-insns -Os -fomit-frame-pointer ref
gcc -funroll-loops -m64 -O2 -fomit-frame-pointer ref
gcc -funroll-loops -m64 -O3 -fomit-frame-pointer ref
gcc -funroll-loops -m64 -O -fomit-frame-pointer ref
gcc -funroll-loops -m64 -Os -fomit-frame-pointer ref
gcc -funroll-loops -m64 -march=barcelona -O2 -fomit-frame-pointer ref
gcc -funroll-loops -m64 -march=barcelona -O3 -fomit-frame-pointer ref
gcc -funroll-loops -m64 -march=barcelona -O -fomit-frame-pointer ref
gcc -funroll-loops -m64 -march=barcelona -Os -fomit-frame-pointer ref
gcc -funroll-loops -m64 -march=k8 -O2 -fomit-frame-pointer ref
gcc -funroll-loops -m64 -march=k8 -O3 -fomit-frame-pointer ref
gcc -funroll-loops -m64 -march=k8 -O -fomit-frame-pointer ref
gcc -funroll-loops -m64 -march=k8 -Os -fomit-frame-pointer ref
gcc -funroll-loops -m64 -march=nocona -O2 -fomit-frame-pointer ref
gcc -funroll-loops -m64 -march=nocona -O3 -fomit-frame-pointer ref
gcc -funroll-loops -m64 -march=nocona -O -fomit-frame-pointer ref
gcc -funroll-loops -m64 -march=nocona -Os -fomit-frame-pointer ref
gcc -funroll-loops -march=barcelona -O2 -fomit-frame-pointer ref
gcc -funroll-loops -march=barcelona -O3 -fomit-frame-pointer ref
gcc -funroll-loops -march=barcelona -O -fomit-frame-pointer ref
gcc -funroll-loops -march=barcelona -Os -fomit-frame-pointer ref
gcc -funroll-loops -march=k8 -O2 -fomit-frame-pointer ref
gcc -funroll-loops -march=k8 -O3 -fomit-frame-pointer ref
gcc -funroll-loops -march=k8 -O -fomit-frame-pointer ref
gcc -funroll-loops -march=k8 -Os -fomit-frame-pointer ref
gcc -funroll-loops -march=nocona -O2 -fomit-frame-pointer ref
gcc -funroll-loops -march=nocona -O3 -fomit-frame-pointer ref
gcc -funroll-loops -march=nocona -O -fomit-frame-pointer ref
gcc -funroll-loops -march=nocona -Os -fomit-frame-pointer ref
gcc -m64 -O2 -fomit-frame-pointer ref
gcc -m64 -O3 -fomit-frame-pointer ref
gcc -m64 -O -fomit-frame-pointer ref
gcc -m64 -Os -fomit-frame-pointer ref
gcc -m64 -march=core2 -O2 -fomit-frame-pointer ref
gcc -m64 -march=core2 -O3 -fomit-frame-pointer ref
gcc -m64 -march=core2 -O -fomit-frame-pointer ref
gcc -m64 -march=core2 -Os -fomit-frame-pointer ref
gcc -m64 -march=core2 -msse4.1 -O2 -fomit-frame-pointer ref
gcc -m64 -march=core2 -msse4.1 -O3 -fomit-frame-pointer ref
gcc -m64 -march=core2 -msse4.1 -O -fomit-frame-pointer ref
gcc -m64 -march=core2 -msse4.1 -Os -fomit-frame-pointer ref
gcc -m64 -march=core2 -msse4 -O2 -fomit-frame-pointer ref
gcc -m64 -march=core2 -msse4 -O3 -fomit-frame-pointer ref
gcc -m64 -march=core2 -msse4 -O -fomit-frame-pointer ref
gcc -m64 -march=core2 -msse4 -Os -fomit-frame-pointer ref
gcc -m64 -march=corei7 -O2 -fomit-frame-pointer ref
gcc -m64 -march=corei7 -O3 -fomit-frame-pointer ref
gcc -m64 -march=corei7 -O -fomit-frame-pointer ref
gcc -m64 -march=corei7 -Os -fomit-frame-pointer ref
gcc -m64 -march=k8 -O2 -fomit-frame-pointer ref
gcc -m64 -march=k8 -O3 -fomit-frame-pointer ref
gcc -m64 -march=k8 -O -fomit-frame-pointer ref
gcc -m64 -march=k8 -Os -fomit-frame-pointer ref
gcc -m64 -march=nocona -O2 -fomit-frame-pointer ref
gcc -m64 -march=nocona -O3 -fomit-frame-pointer ref
gcc -m64 -march=nocona -O -fomit-frame-pointer ref
gcc -m64 -march=nocona -Os -fomit-frame-pointer ref
gcc -march=barcelona -O2 -fomit-frame-pointer ref
gcc -march=barcelona -O3 -fomit-frame-pointer ref
gcc -march=barcelona -O -fomit-frame-pointer ref
gcc -march=barcelona -Os -fomit-frame-pointer ref
gcc -march=k8 -O2 -fomit-frame-pointer ref
gcc -march=k8 -O3 -fomit-frame-pointer ref
gcc -march=k8 -O -fomit-frame-pointer ref
gcc -march=k8 -Os -fomit-frame-pointer ref
gcc -march=nocona -O2 -fomit-frame-pointer ref
gcc -march=nocona -O3 -fomit-frame-pointer ref
gcc -march=nocona -O -fomit-frame-pointer ref
gcc -march=nocona -Os -fomit-frame-pointer ref

Compiler output

Implementation: ref
Security model: unknown
Compiler: gcc -m64 -march=barcelona -O2 -fomit-frame-pointer
dh.c: In file included from /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/x86intrin.h:45,
dh.c: from lib.h:2,
dh.c: from dh.c:2:
dh.c: ffa.h: In function 'ffa_red_149':
dh.c: /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/wmmintrin.h:116:1: error: inlining failed in call to always_inline '_mm_clmulepi64_si128': target specific option mismatch
dh.c: _mm_clmulepi64_si128 (__m128i __X, __m128i __Y, const int __I)
dh.c: ^~~~~~~~~~~~~~~~~~~~
dh.c: In file included from dh.c:6:
dh.c: ffa.h:20:12: note: called from here
dh.c: tp_0 = _mm_clmulepi64_si128(p_149_1, tp_0, 0x00);\
dh.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dh.c: ffa.h:82:5: note: in expansion of macro 'ffa_red_149_stp'
dh.c: ffa_red_149_stp(b_00, b_01, tp_0, tp_1, tp_2, p_149_0, p_149_1);
dh.c: ^~~~~~~~~~~~~~~
dh.c: In file included from /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/x86intrin.h:45,
dh.c: from lib.h:2,
dh.c: from dh.c:2:
dh.c: /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/wmmintrin.h:116:1: error: inlining failed in call to always_inline '_mm_clmulepi64_si128': target specific option mismatch
dh.c: _mm_clmulepi64_si128 (__m128i __X, __m128i __Y, const int __I)
dh.c: ^~~~~~~~~~~~~~~~~~~~
dh.c: In file included from dh.c:6:
dh.c: ffa.h:19:12: note: called from here
dh.c: tp_1 = _mm_clmulepi64_si128(p_149_0, tp_0, 0x01);\
dh.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dh.c: ffa.h:82:5: note: in expansion of macro 'ffa_red_149_stp'
dh.c: ...
dh.c: In file included from /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/x86intrin.h:45,
dh.c: from lib.h:2,
dh.c: from dh.c:2:
dh.c: ffa.h: In function 'ffa_red_149':
dh.c: /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/wmmintrin.h:116:1: error: inlining failed in call to always_inline '_mm_clmulepi64_si128': target specific option mismatch
dh.c: _mm_clmulepi64_si128 (__m128i __X, __m128i __Y, const int __I)
dh.c: ^~~~~~~~~~~~~~~~~~~~
dh.c: In file included from dh.c:6:
dh.c: ffa.h:20:12: note: called from here
dh.c: tp_0 = _mm_clmulepi64_si128(p_149_1, tp_0, 0x00);\
dh.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dh.c: ffa.h:82:5: note: in expansion of macro 'ffa_red_149_stp'
dh.c: ffa_red_149_stp(b_00, b_01, tp_0, tp_1, tp_2, p_149_0, p_149_1);
dh.c: ^~~~~~~~~~~~~~~
dh.c: In file included from /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/x86intrin.h:45,
dh.c: from lib.h:2,
dh.c: from dh.c:2:
dh.c: /home_nfs_robin_ib/bdolbeaur/gcc-8.2.0-full+isl/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/wmmintrin.h:116:1: error: inlining failed in call to always_inline '_mm_clmulepi64_si128': target specific option mismatch
dh.c: _mm_clmulepi64_si128 (__m128i __X, __m128i __Y, const int __I)
dh.c: ^~~~~~~~~~~~~~~~~~~~
dh.c: In file included from dh.c:6:
dh.c: ffa.h:19:12: note: called from here
dh.c: tp_1 = _mm_clmulepi64_si128(p_149_0, tp_0, 0x01);\
dh.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dh.c: ffa.h:82:5: note: in expansion of macro 'ffa_red_149_stp'
dh.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -m64 -march=barcelona -O2 -fomit-frame-pointer ref
gcc -m64 -march=barcelona -O3 -fomit-frame-pointer ref
gcc -m64 -march=barcelona -O -fomit-frame-pointer ref
gcc -m64 -march=barcelona -Os -fomit-frame-pointer ref

Compiler output

Implementation: ref
Security model: unknown
Compiler: gcc -m64 -march=core-avx-i -O2 -fomit-frame-pointer
try.c: /scratch_lustre_DDN7k/bdolbeaur/supercop-20180818/supercop-data/genji291/amd64/lib/knownrandombytes.o: In function `randombytes':
try.c: knownrandombytes.c:(.text+0x...): undefined reference to `_intel_fast_memcpy'
try.c: knownrandombytes.c:(.text+0x...): undefined reference to `_intel_fast_memset'
try.c: /scratch_lustre_DDN7k/bdolbeaur/supercop-20180818/supercop-data/genji291/amd64/lib/libsupercop.a(crypto_stream_chacha20_dolbeau_amd64_avx2-api.o): In function `crypto_stream_chacha20_dolbeau_amd64_avx2':
try.c: api.c:(.text+0x...): undefined reference to `__intel_mic_avx512f_memset'
try.c: /scratch_lustre_DDN7k/bdolbeaur/supercop-20180818/supercop-data/genji291/amd64/lib/libsupercop.a(crypto_stream_chacha20_dolbeau_amd64_avx2-chacha.o): In function `crypto_stream_chacha20_dolbeau_amd64_avx2_ECRYPT_keystream_bytes':
try.c: chacha.c:(.text+0x...): undefined reference to `__intel_mic_avx512f_memset'
try.c: collect2: error: ld returned 1 exit status

Number of similar (compiler,implementation) pairs: 20, namely:
CompilerImplementations
gcc -m64 -march=core-avx-i -O2 -fomit-frame-pointer ref
gcc -m64 -march=core-avx-i -O3 -fomit-frame-pointer ref
gcc -m64 -march=core-avx-i -O -fomit-frame-pointer ref
gcc -m64 -march=core-avx-i -Os -fomit-frame-pointer ref
gcc -m64 -march=core-avx2 -O2 -fomit-frame-pointer ref
gcc -m64 -march=core-avx2 -O3 -fomit-frame-pointer ref
gcc -m64 -march=core-avx2 -O -fomit-frame-pointer ref
gcc -m64 -march=core-avx2 -Os -fomit-frame-pointer ref
gcc -m64 -march=corei7-avx -O2 -fomit-frame-pointer ref
gcc -m64 -march=corei7-avx -O3 -fomit-frame-pointer ref
gcc -m64 -march=corei7-avx -O -fomit-frame-pointer ref
gcc -m64 -march=corei7-avx -Os -fomit-frame-pointer ref
gcc -m64 -march=native -mtune=native -O2 -fomit-frame-pointer ref
gcc -m64 -march=native -mtune=native -O3 -fomit-frame-pointer ref
gcc -m64 -march=native -mtune=native -O -fomit-frame-pointer ref
gcc -m64 -march=native -mtune=native -Os -fomit-frame-pointer ref
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv ref
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv ref
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv ref
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv ref