Implementation notes: amd64, skylake, crypto_sign/lattisigns512

Computer: skylake
Architecture: amd64
CPU ID: GenuineIntel-000506e3-bfebfbff
SUPERCOP version: 20161026
Operation: crypto_sign
Primitive: lattisigns512
TimeImplementationCompilerBenchmark dateSUPERCOP version
330774avxgcc -m64 -march=native -mtune=native -O3 -fomit-frame-pointer2016121720161026
337278avxgcc -m64 -march=core-avx2 -O3 -fomit-frame-pointer2016121720161026
343056avxgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016121720161026
359192avxgcc -m64 -march=core-avx2 -O2 -fomit-frame-pointer2016121720161026
362232avxgcc -m64 -march=corei7-avx -O3 -fomit-frame-pointer2016121720161026
365766avxgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016121720161026
368252avxgcc -m64 -march=native -mtune=native -O2 -fomit-frame-pointer2016121720161026
373586avxclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016121720161026
374096avxclang -O3 -fwrapv -mavx -fomit-frame-pointer -Qunused-arguments2016121720161026
374280avxclang -O3 -fwrapv -march=x86-64 -mcpu=core-avx2 -mavx2 -maes -mpclmul -fomit-frame-pointer -Qunused-arguments2016121720161026
379722avxclang -O3 -fwrapv -mavx -maes -mpclmul -fomit-frame-pointer -Qunused-arguments2016121720161026
379862avxgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016121720161026
380396avxclang -O3 -fwrapv -march=native -fomit-frame-pointer -Qunused-arguments2016121720161026
380858avxclang -O3 -fwrapv -mavx2 -fomit-frame-pointer -Qunused-arguments2016121720161026
381106avxgcc -m64 -march=core-avx2 -Os -fomit-frame-pointer2016121720161026
383928avxgcc -m64 -march=native -mtune=native -Os -fomit-frame-pointer2016121720161026
393406avxgcc -m64 -march=core-avx-i -O -fomit-frame-pointer2016121720161026
393986avxgcc -m64 -march=native -mtune=native -O -fomit-frame-pointer2016121720161026
394280avxgcc -m64 -march=corei7-avx -O2 -fomit-frame-pointer2016121720161026
397302avxgcc -m64 -march=core-avx-i -O2 -fomit-frame-pointer2016121720161026
401204avxgcc -m64 -march=corei7-avx -O -fomit-frame-pointer2016121720161026
403002avxgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016121720161026
404274avxgcc -m64 -march=core-avx-i -O3 -fomit-frame-pointer2016121720161026
408784avxgcc -m64 -march=core-avx2 -O -fomit-frame-pointer2016121720161026
419414avxgcc -m64 -march=core-avx-i -Os -fomit-frame-pointer2016121720161026
420732avxgcc -m64 -march=corei7-avx -Os -fomit-frame-pointer2016121720161026

Compiler output

Implementation: crypto_sign/lattisigns512/avx
Compiler: cc
ntt_transform.c: ntt_transform.c: In function 'ntt_transform':
ntt_transform.c: ntt_transform.c:27:9: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
ntt_transform.c: vpinv = _mm256_set_pd(PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE);
ntt_transform.c: ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/avxintrin.h:834:1: error: inlining failed in call to always_inline '_mm256_load_pd': target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^~~~~~~~~~~~~~
ntt_transform.c: ntt_transform.c:35:8: note: called from here
ntt_transform.c: neg4 = _mm256_load_pd(_neg4);
ntt_transform.c: ~~~~~^~~~~~~~~~~~~~~~~~~~~~~
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/avxintrin.h:834:1: error: inlining failed in call to always_inline '_mm256_load_pd': target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^~~~~~~~~~~~~~
ntt_transform.c: ntt_transform.c:34:8: note: called from here
ntt_transform.c: neg2 = _mm256_load_pd(_neg2);
ntt_transform.c: ~~~~~^~~~~~~~~~~~~~~~~~~~~~~
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/avxintrin.h:834:1: error: inlining failed in call to always_inline '_mm256_load_pd': target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^~~~~~~~~~~~~~
ntt_transform.c: ...

Number of similar (compiler,implementation) pairs: 87, namely:
CompilerImplementations
cc avx
gcc avx
gcc -O2 -fomit-frame-pointer avx
gcc -O3 -fomit-frame-pointer avx
gcc -O -fomit-frame-pointer avx
gcc -Os -fomit-frame-pointer avx
gcc -fno-schedule-insns -O2 -fomit-frame-pointer avx
gcc -fno-schedule-insns -O3 -fomit-frame-pointer avx
gcc -fno-schedule-insns -O -fomit-frame-pointer avx
gcc -fno-schedule-insns -Os -fomit-frame-pointer avx
gcc -funroll-loops avx
gcc -funroll-loops -O2 -fomit-frame-pointer avx
gcc -funroll-loops -O3 -fomit-frame-pointer avx
gcc -funroll-loops -O -fomit-frame-pointer avx
gcc -funroll-loops -Os -fomit-frame-pointer avx
gcc -funroll-loops -fno-schedule-insns -O2 -fomit-frame-pointer avx
gcc -funroll-loops -fno-schedule-insns -O3 -fomit-frame-pointer avx
gcc -funroll-loops -fno-schedule-insns -O -fomit-frame-pointer avx
gcc -funroll-loops -fno-schedule-insns -Os -fomit-frame-pointer avx
gcc -funroll-loops -m64 -O2 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -O3 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -O -fomit-frame-pointer avx
gcc -funroll-loops -m64 -Os -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=barcelona -O2 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=barcelona -O3 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=barcelona -O -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=barcelona -Os -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=k8 -O2 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=k8 -O3 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=k8 -O -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=k8 -Os -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=nocona -O2 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=nocona -O3 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=nocona -O -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=nocona -Os -fomit-frame-pointer avx
gcc -funroll-loops -march=barcelona -O2 -fomit-frame-pointer avx
gcc -funroll-loops -march=barcelona -O3 -fomit-frame-pointer avx
gcc -funroll-loops -march=barcelona -O -fomit-frame-pointer avx
gcc -funroll-loops -march=barcelona -Os -fomit-frame-pointer avx
gcc -funroll-loops -march=k8 -O2 -fomit-frame-pointer avx
gcc -funroll-loops -march=k8 -O3 -fomit-frame-pointer avx
gcc -funroll-loops -march=k8 -O -fomit-frame-pointer avx
gcc -funroll-loops -march=k8 -Os -fomit-frame-pointer avx
gcc -funroll-loops -march=nocona -O2 -fomit-frame-pointer avx
gcc -funroll-loops -march=nocona -O3 -fomit-frame-pointer avx
gcc -funroll-loops -march=nocona -O -fomit-frame-pointer avx
gcc -funroll-loops -march=nocona -Os -fomit-frame-pointer avx
gcc -m64 -O2 -fomit-frame-pointer avx
gcc -m64 -O3 -fomit-frame-pointer avx
gcc -m64 -O -fomit-frame-pointer avx
gcc -m64 -Os -fomit-frame-pointer avx
gcc -m64 -march=core2 -O2 -fomit-frame-pointer avx
gcc -m64 -march=core2 -O3 -fomit-frame-pointer avx
gcc -m64 -march=core2 -O -fomit-frame-pointer avx
gcc -m64 -march=core2 -Os -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4.1 -O2 -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4.1 -O3 -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4.1 -O -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4.1 -Os -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4 -O2 -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4 -O3 -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4 -O -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4 -Os -fomit-frame-pointer avx
gcc -m64 -march=corei7 -O2 -fomit-frame-pointer avx
gcc -m64 -march=corei7 -O3 -fomit-frame-pointer avx
gcc -m64 -march=corei7 -O -fomit-frame-pointer avx
gcc -m64 -march=corei7 -Os -fomit-frame-pointer avx
gcc -m64 -march=k8 -O2 -fomit-frame-pointer avx
gcc -m64 -march=k8 -O3 -fomit-frame-pointer avx
gcc -m64 -march=k8 -O -fomit-frame-pointer avx
gcc -m64 -march=k8 -Os -fomit-frame-pointer avx
gcc -m64 -march=nocona -O2 -fomit-frame-pointer avx
gcc -m64 -march=nocona -O3 -fomit-frame-pointer avx
gcc -m64 -march=nocona -O -fomit-frame-pointer avx
gcc -m64 -march=nocona -Os -fomit-frame-pointer avx
gcc -march=barcelona -O2 -fomit-frame-pointer avx
gcc -march=barcelona -O3 -fomit-frame-pointer avx
gcc -march=barcelona -O -fomit-frame-pointer avx
gcc -march=barcelona -Os -fomit-frame-pointer avx
gcc -march=k8 -O2 -fomit-frame-pointer avx
gcc -march=k8 -O3 -fomit-frame-pointer avx
gcc -march=k8 -O -fomit-frame-pointer avx
gcc -march=k8 -Os -fomit-frame-pointer avx
gcc -march=nocona -O2 -fomit-frame-pointer avx
gcc -march=nocona -O3 -fomit-frame-pointer avx
gcc -march=nocona -O -fomit-frame-pointer avx
gcc -march=nocona -Os -fomit-frame-pointer avx

Compiler output

Implementation: crypto_sign/lattisigns512/avx
Compiler: clang -O3 -fomit-frame-pointer -Qunused-arguments
ntt_transform.c: ntt_transform.c:27:11: error: always_inline function '_mm256_set_pd' requires target feature 'sse4.2', but would be inlined into function 'ntt_transform' that is compiled without support for 'sse4.2'
ntt_transform.c: vpinv = _mm256_set_pd(PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE);
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:28:11: error: always_inline function '_mm256_set_pd' requires target feature 'sse4.2', but would be inlined into function 'ntt_transform' that is compiled without support for 'sse4.2'
ntt_transform.c: vp = _mm256_set_pd(8383489., 8383489., 8383489., 8383489.);
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:32:10: error: always_inline function '_mm256_load_pd' requires target feature 'sse4.2', but would be inlined into function 'ntt_transform' that is compiled without support for 'sse4.2'
ntt_transform.c: vo10 = _mm256_load_pd(o+pos);
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:33:10: error: always_inline function '_mm256_load_pd' requires target feature 'sse4.2', but would be inlined into function 'ntt_transform' that is compiled without support for 'sse4.2'
ntt_transform.c: vo20 = _mm256_load_pd(o+pos+4);
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:34:10: error: always_inline function '_mm256_load_pd' requires target feature 'sse4.2', but would be inlined into function 'ntt_transform' that is compiled without support for 'sse4.2'
ntt_transform.c: neg2 = _mm256_load_pd(_neg2);
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:35:10: error: always_inline function '_mm256_load_pd' requires target feature 'sse4.2', but would be inlined into function 'ntt_transform' that is compiled without support for 'sse4.2'
ntt_transform.c: neg4 = _mm256_load_pd(_neg4);
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:41:11: error: always_inline function '_mm256_load_pd' requires target feature 'sse4.2', but would be inlined into function 'ntt_transform' that is compiled without support for 'sse4.2'
ntt_transform.c: vx0 = _mm256_load_pd(out+s);
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:42:10: error: always_inline function '_mm256_mul_pd' requires target feature 'sse4.2', but would be inlined into function 'ntt_transform' that is compiled without support for 'sse4.2'
ntt_transform.c: vt = _mm256_mul_pd(vx0,neg2);
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:43:11: error: always_inline function '_mm256_hadd_pd' requires target feature 'sse4.2', but would be inlined into function 'ntt_transform' that is compiled without support for 'sse4.2'
ntt_transform.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
clang -O3 -fomit-frame-pointer -Qunused-arguments avx
clang -mcpu=cortex-a8 -mfpu=neon -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments avx
clang -mcpu=cortex-a9 -mfpu=neon -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments avx
clang -mcpu=native -mfpu=neon -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments avx

Compiler output

Implementation: crypto_sign/lattisigns512/avx
Compiler: gcc -m64 -march=barcelona -O2 -fomit-frame-pointer
ntt_transform.c: ntt_transform.c: In function 'ntt_transform':
ntt_transform.c: ntt_transform.c:27:9: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
ntt_transform.c: vpinv = _mm256_set_pd(PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE);
ntt_transform.c: ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/avxintrin.h:834:1: error: inlining failed in call to always_inline '_mm256_load_pd': target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^~~~~~~~~~~~~~
ntt_transform.c: ntt_transform.c:35:8: note: called from here
ntt_transform.c: neg4 = _mm256_load_pd(_neg4);
ntt_transform.c: ~~~~~^~~~~~~~~~~~~~~~~~~~~~~
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/avxintrin.h:834:1: error: inlining failed in call to always_inline '_mm256_load_pd': target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^~~~~~~~~~~~~~
ntt_transform.c: ntt_transform.c:34:8: note: called from here
ntt_transform.c: neg2 = _mm256_load_pd(_neg2);
ntt_transform.c: ~~~~~^~~~~~~~~~~~~~~~~~~~~~~
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/avxintrin.h:834:1: error: inlining failed in call to always_inline '_mm256_load_pd': target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^~~~~~~~~~~~~~
ntt_transform.c: ...
ntt_transform.c: ntt_transform.c: In function 'ntt_transform':
ntt_transform.c: ntt_transform.c:27:9: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
ntt_transform.c: vpinv = _mm256_set_pd(PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE);
ntt_transform.c: ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/avxintrin.h:834:1: error: inlining failed in call to always_inline '_mm256_load_pd': target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^~~~~~~~~~~~~~
ntt_transform.c: ntt_transform.c:35:8: note: called from here
ntt_transform.c: neg4 = _mm256_load_pd(_neg4);
ntt_transform.c: ~~~~~^~~~~~~~~~~~~~~~~~~~~~~
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/avxintrin.h:834:1: error: inlining failed in call to always_inline '_mm256_load_pd': target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^~~~~~~~~~~~~~
ntt_transform.c: ntt_transform.c:34:8: note: called from here
ntt_transform.c: neg2 = _mm256_load_pd(_neg2);
ntt_transform.c: ~~~~~^~~~~~~~~~~~~~~~~~~~~~~
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/avxintrin.h:834:1: error: inlining failed in call to always_inline '_mm256_load_pd': target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^~~~~~~~~~~~~~
ntt_transform.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -m64 -march=barcelona -O2 -fomit-frame-pointer avx
gcc -m64 -march=barcelona -O3 -fomit-frame-pointer avx
gcc -m64 -march=barcelona -O -fomit-frame-pointer avx
gcc -m64 -march=barcelona -Os -fomit-frame-pointer avx