Implementation notes: amd64, intelnuci7, crypto_kem/firesaber2

Computer: intelnuci7
Architecture: amd64
CPU ID: GenuineIntel-000806e9-bfebfbff
SUPERCOP version: 20211108
Operation: crypto_kem
Primitive: firesaber2
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
27201277154 0 0100606 784 1984T:avx2_nttmulgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2021112320211108
298302104247 32 0127542 824 2144T:avx2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2021112320211108
31654829436 0 049790 784 1984T:avx2_nttmulgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2021112320211108
31867827989 0 048110 784 1984T:avx2_nttmulgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2021112320211108
32411827578 0 046526 776 1952T:avx2_nttmulgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2021112320211108
33328829069 32 049390 824 2144T:avx2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2021112320211108
33667854226 0 075985 784 1952T:avx2_nttmulclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2021112320211108
33709069166 0 091049 784 1952T:avx2_nttmulclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2021112320211108
33720454226 0 075985 784 1952T:avx2_nttmulclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2021112320211108
34184028814 32 048910 824 2144T:avx2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2021112320211108
34517027844 32 046782 816 2112T:avx2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2021112320211108
37360693214 32 0114337 824 2112T:avx2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2021112320211108
37363282055 32 0102649 824 2112T:avx2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2021112320211108
37409082055 32 0102649 824 2112T:avx2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2021112320211108
38224227872 0 046423 776 1952T:avx2_nttmulclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2021112320211108
40563829569 32 048143 816 2112T:avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2021112320211108
54613078171 0 099120 792 1624T:refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2021112320211108
55390466292 0 087192 792 1624T:refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2021112320211108
59997466292 0 087192 792 1624T:refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2021112320211108
63857279677 0 0103133 792 1656T:refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2021112320211108
64615089971 0 0111608 792 1624T:refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2021112320211108
197136614194 0 032782 784 1624T:refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2021112320211108
211360413347 0 033525 792 1656T:refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2021112320211108
220550414778 0 035197 792 1656T:refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2021112320211108
263242013201 0 032221 784 1624T:refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2021112320211108

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
SABER_indcpa.c: In file included from SABER_indcpa.c:9:
SABER_indcpa.c: In file included from ././polymul/toom-cook_4way.c:6:
SABER_indcpa.c: ././polymul/scm_avx.c:43:9: error: always_inline function '_mm256_mullo_epi16' requires target feature 'avx2', but would be inlined into function 'schoolbook_avx_new3_acc' that is compiled without support for 'avx2'
SABER_indcpa.c: temp = _mm256_mullo_epi16 (a0, b1);
SABER_indcpa.c: ^
SABER_indcpa.c: ././polymul/scm_avx.c:45:13: error: always_inline function '_mm256_add_epi16' requires target feature 'avx2', but would be inlined into function 'schoolbook_avx_new3_acc' that is compiled without support for 'avx2'
SABER_indcpa.c: c_avx[1] = _mm256_add_epi16(temp, c_avx[1]);
SABER_indcpa.c: ^
SABER_indcpa.c: ././polymul/scm_avx.c:48:9: error: always_inline function '_mm256_mullo_epi16' requires target feature 'avx2', but would be inlined into function 'schoolbook_avx_new3_acc' that is compiled without support for 'avx2'
SABER_indcpa.c: temp = _mm256_mullo_epi16 (a0, b2);
SABER_indcpa.c: ^
SABER_indcpa.c: ././polymul/scm_avx.c:51:13: error: always_inline function '_mm256_add_epi16' requires target feature 'avx2', but would be inlined into function 'schoolbook_avx_new3_acc' that is compiled without support for 'avx2'
SABER_indcpa.c: c_avx[2] = _mm256_add_epi16(temp, c_avx[2]);
SABER_indcpa.c: ^
SABER_indcpa.c: ././polymul/scm_avx.c:54:9: error: always_inline function '_mm256_mullo_epi16' requires target feature 'avx2', but would be inlined into function 'schoolbook_avx_new3_acc' that is compiled without support for 'avx2'
SABER_indcpa.c: temp = _mm256_mullo_epi16 (a0, b3);
SABER_indcpa.c: ^
SABER_indcpa.c: ././polymul/scm_avx.c:58:13: error: always_inline function '_mm256_add_epi16' requires target feature 'avx2', but would be inlined into function 'schoolbook_avx_new3_acc' that is compiled without support for 'avx2'
SABER_indcpa.c: c_avx[3] = _mm256_add_epi16(temp, c_avx[3]);
SABER_indcpa.c: ^
SABER_indcpa.c: ././polymul/scm_avx.c:60:9: error: always_inline function '_mm256_mullo_epi16' requires target feature 'avx2', but would be inlined into function 'schoolbook_avx_new3_acc' that is compiled without support for 'avx2'
SABER_indcpa.c: temp = _mm256_mullo_epi16 (a0, b4);
SABER_indcpa.c: ^
SABER_indcpa.c: ././polymul/scm_avx.c:65:13: error: always_inline function '_mm256_add_epi16' requires target feature 'avx2', but would be inlined into function 'schoolbook_avx_new3_acc' that is compiled without support for 'avx2'
SABER_indcpa.c: c_avx[4] = _mm256_add_epi16(temp, c_avx[4]);
SABER_indcpa.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2_nttmul
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
poly.c: poly.c:31:26: error: always_inline function '_mm256_set1_epi16' requires target feature 'sse4.2', but would be inlined into function 'nttmul_poly_crt' that is compiled without support for 'sse4.2'
poly.c: const __m256i u_pinv = _mm256_set1_epi16(CRT_U_PINV);
poly.c: ^
poly.c: poly.c:32:21: error: always_inline function '_mm256_set1_epi16' requires target feature 'sse4.2', but would be inlined into function 'nttmul_poly_crt' that is compiled without support for 'sse4.2'
poly.c: const __m256i u = _mm256_set1_epi16(CRT_U);
poly.c: ^
poly.c: poly.c:33:22: error: always_inline function '_mm256_load_si256' requires target feature 'sse4.2', but would be inlined into function 'nttmul_poly_crt' that is compiled without support for 'sse4.2'
poly.c: const __m256i p0 = _mm256_load_si256((__m256i *)&PDATA0[_16XP]);
poly.c: ^
poly.c: poly.c:34:22: error: always_inline function '_mm256_load_si256' requires target feature 'sse4.2', but would be inlined into function 'nttmul_poly_crt' that is compiled without support for 'sse4.2'
poly.c: const __m256i p1 = _mm256_load_si256((__m256i *)&PDATA1[_16XP]);
poly.c: ^
poly.c: poly.c:35:23: error: always_inline function '_mm256_set1_epi16' requires target feature 'sse4.2', but would be inlined into function 'nttmul_poly_crt' that is compiled without support for 'sse4.2'
poly.c: const __m256i mod = _mm256_set1_epi16(KEM_Q-1);
poly.c: ^
poly.c: poly.c:36:30: error: always_inline function '_mm256_load_si256' requires target feature 'sse4.2', but would be inlined into function 'nttmul_poly_crt' that is compiled without support for 'sse4.2'
poly.c: const __m256i mont0_pinv = _mm256_load_si256((__m256i *)&PDATA0[_16XMONT_PINV]);
poly.c: ^
poly.c: poly.c:37:25: error: always_inline function '_mm256_load_si256' requires target feature 'sse4.2', but would be inlined into function 'nttmul_poly_crt' that is compiled without support for 'sse4.2'
poly.c: const __m256i mont0 = _mm256_load_si256((__m256i *)&PDATA0[_16XMONT]);
poly.c: ^
poly.c: poly.c:40:10: error: always_inline function '_mm256_load_si256' requires target feature 'sse4.2', but would be inlined into function 'nttmul_poly_crt' that is compiled without support for 'sse4.2'
poly.c: f0 = _mm256_load_si256((__m256i *)&a->coeffs[16*i]);
poly.c: ^
poly.c: poly.c:41:10: error: always_inline function '_mm256_load_si256' requires target feature 'sse4.2', but would be inlined into function 'nttmul_poly_crt' that is compiled without support for 'sse4.2'
poly.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2_nttmul

Namespace violations

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
SABER_indcpa.o BS2POLq T
SABER_indcpa.o GenMatrix T
SABER_indcpa.o GenSecret T
SABER_indcpa.o H1_avx C
SABER_indcpa.o H2_avx C
SABER_indcpa.o KARA_eval T
SABER_indcpa.o KARA_interpol T
SABER_indcpa.o POL2MSG T
SABER_indcpa.o TC_eval T
SABER_indcpa.o TC_interpol T
SABER_indcpa.o batch_64coefficient_multiplications_new T
SABER_indcpa.o clock1 C
SABER_indcpa.o clock2 C
SABER_indcpa.o clock_arith C
SABER_indcpa.o clock_load C
SABER_indcpa.o clock_matrix C
SABER_indcpa.o clock_matrix_vec C
SABER_indcpa.o clock_mul C
SABER_indcpa.o clock_samp C
SABER_indcpa.o clock_secret C
SABER_indcpa.o count_mul C
SABER_indcpa.o floor_round C
SABER_indcpa.o indcpa_kem_dec T
SABER_indcpa.o indcpa_kem_enc T
SABER_indcpa.o indcpa_kem_keypair T
SABER_indcpa.o int0_avx C
SABER_indcpa.o int30_avx C
SABER_indcpa.o int45_avx C
SABER_indcpa.o inv15_avx C
SABER_indcpa.o inv3_avx C
SABER_indcpa.o inv9_avx C
SABER_indcpa.o load_values T
SABER_indcpa.o mask C
SABER_indcpa.o mask_ar D
SABER_indcpa.o mask_load C
SABER_indcpa.o matrix_vec_count C
SABER_indcpa.o matrix_vector_mul T
SABER_indcpa.o schoolbook_avx_new2 T
SABER_indcpa.o schoolbook_avx_new3_acc T
SABER_indcpa.o toom_cook_4way_avx_n1 T
SABER_indcpa.o transpose_n1 T
SABER_indcpa.o vector_vector_mul T
cbd.o cbd T
fips202.o KeccakF1600_StatePermute T
fips202.o cshake128_simple T
fips202.o cshake128_simple_absorb T
fips202.o cshake128_simple_squeezeblocks T
fips202.o sha3_256 T
fips202.o sha3_512 T
fips202.o shake128 T
kem.o clock1 C
kem.o clock2 C
kem.o clock_arith C
kem.o clock_load C
kem.o clock_matrix C
kem.o clock_matrix_vec C
kem.o clock_mul C
kem.o clock_samp C
kem.o clock_secret C
kem.o count_mul C
kem.o int0_avx C
kem.o int30_avx C
kem.o int45_avx C
kem.o inv15_avx C
kem.o inv3_avx C
kem.o inv9_avx C
kem.o mask C
kem.o matrix_vec_count C
pack_unpack.o BS2POLVEC T
pack_unpack.o BS2POLVECp T
pack_unpack.o BS2POLVECq T
pack_unpack.o POLVEC2BS T
pack_unpack.o POLVECp2BS T
pack_unpack.o POLVECq2BS T
pack_unpack.o SABER_pack10bit T
pack_unpack.o SABER_pack11bit T
pack_unpack.o SABER_pack13bit T
pack_unpack.o SABER_pack14bit T
pack_unpack.o SABER_pack_3bit T
pack_unpack.o SABER_pack_4bit T
pack_unpack.o SABER_pack_6bit T
pack_unpack.o SABER_poly_un_pack13bit T
pack_unpack.o SABER_un_pack10bit T
pack_unpack.o SABER_un_pack11bit T
pack_unpack.o SABER_un_pack13bit T
pack_unpack.o SABER_un_pack14bit T
pack_unpack.o SABER_un_pack3bit T
pack_unpack.o SABER_un_pack4bit T
pack_unpack.o SABER_un_pack6bit T
poly.o clock_matrix C
poly.o clock_matrix_vec C
poly.o clock_mul C
poly.o clock_secret C
poly.o count_mul C
poly.o matrix_vec_count C
poly.o poly_getnoise T
verify.o cmov T
verify.o verify T

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2

Namespace violations

Implementation: T:avx2_nttmul
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
SABER_indcpa.o GenMatrix T
SABER_indcpa.o GenSecret T
SABER_indcpa.o clock_dec_kex C
SABER_indcpa.o clock_enc_kex C
SABER_indcpa.o clock_kp_kex C
SABER_indcpa.o clock_kp_temp C
SABER_indcpa.o clock_matrix C
SABER_indcpa.o clock_mul C
SABER_indcpa.o clock_mv_vv_mul C
SABER_indcpa.o clock_secret C
SABER_indcpa.o count_enc C
SABER_indcpa.o count_mul C
SABER_indcpa.o indcpa_kem_dec T
SABER_indcpa.o indcpa_kem_enc T
SABER_indcpa.o indcpa_kem_keypair T
SABER_indcpa.o int0_avx C
SABER_indcpa.o int30_avx C
SABER_indcpa.o int45_avx C
SABER_indcpa.o inv15_avx C
SABER_indcpa.o inv3_avx C
SABER_indcpa.o inv9_avx C
SABER_indcpa.o mask C
basemul256x1.o nttmul_poly_basemul_montgomery T
basemul256x1.o nttmul_polyvec_basemul_acc_montgomery T
cbd.o cbd T
cbd.o clock_matrix C
cbd.o clock_mul C
cbd.o clock_mv_vv_mul C
cbd.o clock_secret C
cbd.o count_enc C
cbd.o count_mul C
consts256n10753.o nttmul_pdata10753 R
consts256n7681.o nttmul_pdata7681 R
fips202.o KeccakF1600_StatePermute T
fips202.o cshake128_simple T
fips202.o cshake128_simple_absorb T
fips202.o cshake128_simple_squeezeblocks T
fips202.o sha3_256 T
fips202.o sha3_512 T
fips202.o shake128 T
invntt256n.o nttmul_poly_invntt_tomont T
kem.o clock_dec_kex C
kem.o clock_enc_kex C
kem.o clock_kp_kex C
kem.o clock_kp_temp C
kem.o clock_matrix C
kem.o clock_mul C
kem.o clock_mv_vv_mul C
kem.o clock_secret C
kem.o count_enc C
kem.o count_mul C
kem.o int0_avx C
kem.o int30_avx C
kem.o int45_avx C
kem.o inv15_avx C
kem.o inv3_avx C
kem.o inv9_avx C
kem.o mask C
ntt256n.o nttmul_poly_ntt T
pack_unpack.o BS2POLT T
pack_unpack.o BS2POLVEC T
pack_unpack.o BS2POLVECp T
pack_unpack.o BS2POLVECq T
pack_unpack.o BS2POLq T
pack_unpack.o POL2MSG T
pack_unpack.o POLT2BS T
pack_unpack.o POLVEC2BS T
pack_unpack.o POLVECp2BS T
pack_unpack.o POLVECq2BS T
pack_unpack.o clock_matrix C
pack_unpack.o clock_mul C
pack_unpack.o clock_mv_vv_mul C
pack_unpack.o clock_secret C
pack_unpack.o count_enc C
pack_unpack.o count_mul C
poly.o clock_matrix C
poly.o clock_mul C
poly.o clock_mv_vv_mul C
poly.o clock_secret C
poly.o count_enc C
poly.o count_mul C
poly.o nttmul_poly_add T
poly.o nttmul_poly_crt T
poly.o nttmul_poly_mul T
poly.o nttmul_poly_sub T
polyvec.o clock_matrix C
polyvec.o clock_mul C
polyvec.o clock_mv_vv_mul C
polyvec.o clock_secret C
polyvec.o count_enc C
polyvec.o count_mul C
polyvec.o nttmul_polyvec_crt T
polyvec.o nttmul_polyvec_invntt_tomont T
polyvec.o nttmul_polyvec_iprod T
polyvec.o nttmul_polyvec_iprod2 T
polyvec.o nttmul_polyvec_matrix_vector_mul T
polyvec.o nttmul_polyvec_matrix_vector_mul2 T
polyvec.o nttmul_polyvec_ntt T
verify.o cmov T
verify.o verify T

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2_nttmul
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2_nttmul
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2_nttmul
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2_nttmul
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2_nttmul
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2_nttmul
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2_nttmul
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2_nttmul

Namespace violations

Implementation: T:ref
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
SABER_indcpa.o GenMatrix T
SABER_indcpa.o InnerProd T
SABER_indcpa.o MatrixVectorMul T
SABER_indcpa.o POL2MSG T
SABER_indcpa.o clock1 C
SABER_indcpa.o clock2 C
SABER_indcpa.o clock_cl_mv C
SABER_indcpa.o clock_cl_sm C
SABER_indcpa.o clock_kp_mv C
SABER_indcpa.o clock_kp_sm C
SABER_indcpa.o indcpa_kem_dec T
SABER_indcpa.o indcpa_kem_enc T
SABER_indcpa.o indcpa_kem_keypair T
SABER_indcpa.o karatsuba_simple T
SABER_indcpa.o pol_mul T
SABER_indcpa.o print_poly2 T
SABER_indcpa.o reduce T
SABER_indcpa.o toom_cook_4way T
cbd.o cbd T
fips202.o KeccakF1600_StatePermute T
fips202.o cshake128_simple T
fips202.o cshake128_simple_absorb T
fips202.o cshake128_simple_squeezeblocks T
fips202.o sha3_256 T
fips202.o sha3_512 T
fips202.o shake128 T
kem.o clock1 C
kem.o clock2 C
kem.o clock_cl_mv C
kem.o clock_cl_sm C
kem.o clock_kp_mv C
kem.o clock_kp_sm C
pack_unpack.o BS2POL T
pack_unpack.o BS2POLVEC T
pack_unpack.o BS2POLVECp T
pack_unpack.o BS2POLVECq T
pack_unpack.o POLVEC2BS T
pack_unpack.o POLVECp2BS T
pack_unpack.o POLVECq2BS T
pack_unpack.o SABER_pack_3bit T
pack_unpack.o SABER_pack_4bit T
pack_unpack.o SABER_pack_6bit T
pack_unpack.o SABER_un_pack3bit T
pack_unpack.o SABER_un_pack4bit T
pack_unpack.o SABER_un_pack6bit T
poly.o GenSecret T
verify.o cmov T
verify.o verify T

Number of similar (compiler,implementation) pairs: 9, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ref
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ref
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ref
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ref
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ref
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref