Implementation notes: amd64, cryptothinkx, crypto_sign/lattisigns512

Computer: cryptothinkx
Architecture: amd64
CPU ID: GenuineIntel-00040651-bfebfbff
SUPERCOP version: 20170105
Operation: crypto_sign
Primitive: lattisigns512
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
317700? ? ?? ? ?avxgcc_-m64_-march=native_-mtune=native_-O3_-fomit-frame-pointer2017021520170105
318174? ? ?? ? ?avxgcc_-m64_-march=core-avx2_-O3_-fomit-frame-pointer2017021520170105
322578? ? ?? ? ?avxgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv2017021520170105
342219? ? ?? ? ?avxgcc_-m64_-march=native_-mtune=native_-O2_-fomit-frame-pointer2017021520170105
343200? ? ?? ? ?avxgcc_-m64_-march=corei7-avx_-O3_-fomit-frame-pointer2017021520170105
343842? ? ?? ? ?avxgcc_-m64_-march=core-avx-i_-O3_-fomit-frame-pointer2017021520170105
343884? ? ?? ? ?avxgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv2017021520170105
353820? ? ?? ? ?avxclang_-O3_-fwrapv_-march=native_-fomit-frame-pointer_-Qunused-arguments2017021520170105
353940? ? ?? ? ?avxclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments2017021520170105
354600? ? ?? ? ?avxclang_-O3_-fwrapv_-mavx_-maes_-mpclmul_-fomit-frame-pointer_-Qunused-arguments2017021520170105
354660? ? ?? ? ?avxclang_-O3_-fwrapv_-mavx_-fomit-frame-pointer_-Qunused-arguments2017021520170105
355326? ? ?? ? ?avxclang_-O3_-fwrapv_-march=x86-64_-mcpu=core-avx2_-mavx2_-maes_-mpclmul_-fomit-frame-pointer_-Qunused-arguments2017021520170105
366420? ? ?? ? ?avxclang_-O3_-fwrapv_-mavx2_-fomit-frame-pointer_-Qunused-arguments2017021520170105
367761? ? ?? ? ?avxgcc_-m64_-march=corei7-avx_-O2_-fomit-frame-pointer2017021520170105
375123? ? ?? ? ?avxgcc_-m64_-march=core-avx2_-Os_-fomit-frame-pointer2017021520170105
376023? ? ?? ? ?avxgcc_-m64_-march=core-avx2_-O_-fomit-frame-pointer2017021520170105
379167? ? ?? ? ?avxgcc_-m64_-march=corei7-avx_-O_-fomit-frame-pointer2017021520170105
380632? ? ?? ? ?avxgcc_-m64_-march=core-avx2_-O2_-fomit-frame-pointer2017021520170105
382848? ? ?? ? ?avxgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv2017021520170105
388434? ? ?? ? ?avxgcc_-m64_-march=native_-mtune=native_-Os_-fomit-frame-pointer2017021520170105
390204? ? ?? ? ?avxgcc_-m64_-march=core-avx-i_-O_-fomit-frame-pointer2017021520170105
393996? ? ?? ? ?avxgcc_-m64_-march=core-avx-i_-O2_-fomit-frame-pointer2017021520170105
398577? ? ?? ? ?avxgcc_-m64_-march=corei7-avx_-Os_-fomit-frame-pointer2017021520170105
399888? ? ?? ? ?avxgcc_-m64_-march=core-avx-i_-Os_-fomit-frame-pointer2017021520170105
433504? ? ?? ? ?avxgcc_-m64_-march=native_-mtune=native_-O_-fomit-frame-pointer2017021520170105
450760? ? ?? ? ?avxgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv2017021520170105

Compiler output

Implementation: crypto_sign/lattisigns512/avx
Compiler: cc
ntt_transform.c: ntt_transform.c: In function ‘ntt_transform’:
ntt_transform.c: ntt_transform.c:27:9: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
ntt_transform.c: vpinv = _mm256_set_pd(PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE);
ntt_transform.c: ^
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-linux-gnu/5/include/avxintrin.h:834:1: error: inlining failed in call to always_inline ‘_mm256_load_pd’: target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:35:8: error: called from here
ntt_transform.c: ...
ntt_transform.c: vt = _mm256_mul_pd(vx1, vo0);
ntt_transform.c: ^
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-linux-gnu/5/include/avxintrin.h:834:1: error: inlining failed in call to always_inline ‘_mm256_load_pd’: target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:256:11: error: called from here
ntt_transform.c: vx1 = _mm256_load_pd(out+offset+s+64);
ntt_transform.c: ^

Number of similar (compiler,implementation) pairs: 87, namely:
CompilerImplementations
cc avx
gcc avx
gcc -O2 -fomit-frame-pointer avx
gcc -O3 -fomit-frame-pointer avx
gcc -O -fomit-frame-pointer avx
gcc -Os -fomit-frame-pointer avx
gcc -fno-schedule-insns -O2 -fomit-frame-pointer avx
gcc -fno-schedule-insns -O3 -fomit-frame-pointer avx
gcc -fno-schedule-insns -O -fomit-frame-pointer avx
gcc -fno-schedule-insns -Os -fomit-frame-pointer avx
gcc -funroll-loops avx
gcc -funroll-loops -O2 -fomit-frame-pointer avx
gcc -funroll-loops -O3 -fomit-frame-pointer avx
gcc -funroll-loops -O -fomit-frame-pointer avx
gcc -funroll-loops -Os -fomit-frame-pointer avx
gcc -funroll-loops -fno-schedule-insns -O2 -fomit-frame-pointer avx
gcc -funroll-loops -fno-schedule-insns -O3 -fomit-frame-pointer avx
gcc -funroll-loops -fno-schedule-insns -O -fomit-frame-pointer avx
gcc -funroll-loops -fno-schedule-insns -Os -fomit-frame-pointer avx
gcc -funroll-loops -m64 -O2 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -O3 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -O -fomit-frame-pointer avx
gcc -funroll-loops -m64 -Os -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=barcelona -O2 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=barcelona -O3 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=barcelona -O -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=barcelona -Os -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=k8 -O2 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=k8 -O3 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=k8 -O -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=k8 -Os -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=nocona -O2 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=nocona -O3 -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=nocona -O -fomit-frame-pointer avx
gcc -funroll-loops -m64 -march=nocona -Os -fomit-frame-pointer avx
gcc -funroll-loops -march=barcelona -O2 -fomit-frame-pointer avx
gcc -funroll-loops -march=barcelona -O3 -fomit-frame-pointer avx
gcc -funroll-loops -march=barcelona -O -fomit-frame-pointer avx
gcc -funroll-loops -march=barcelona -Os -fomit-frame-pointer avx
gcc -funroll-loops -march=k8 -O2 -fomit-frame-pointer avx
gcc -funroll-loops -march=k8 -O3 -fomit-frame-pointer avx
gcc -funroll-loops -march=k8 -O -fomit-frame-pointer avx
gcc -funroll-loops -march=k8 -Os -fomit-frame-pointer avx
gcc -funroll-loops -march=nocona -O2 -fomit-frame-pointer avx
gcc -funroll-loops -march=nocona -O3 -fomit-frame-pointer avx
gcc -funroll-loops -march=nocona -O -fomit-frame-pointer avx
gcc -funroll-loops -march=nocona -Os -fomit-frame-pointer avx
gcc -m64 -O2 -fomit-frame-pointer avx
gcc -m64 -O3 -fomit-frame-pointer avx
gcc -m64 -O -fomit-frame-pointer avx
gcc -m64 -Os -fomit-frame-pointer avx
gcc -m64 -march=core2 -O2 -fomit-frame-pointer avx
gcc -m64 -march=core2 -O3 -fomit-frame-pointer avx
gcc -m64 -march=core2 -O -fomit-frame-pointer avx
gcc -m64 -march=core2 -Os -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4.1 -O2 -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4.1 -O3 -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4.1 -O -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4.1 -Os -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4 -O2 -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4 -O3 -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4 -O -fomit-frame-pointer avx
gcc -m64 -march=core2 -msse4 -Os -fomit-frame-pointer avx
gcc -m64 -march=corei7 -O2 -fomit-frame-pointer avx
gcc -m64 -march=corei7 -O3 -fomit-frame-pointer avx
gcc -m64 -march=corei7 -O -fomit-frame-pointer avx
gcc -m64 -march=corei7 -Os -fomit-frame-pointer avx
gcc -m64 -march=k8 -O2 -fomit-frame-pointer avx
gcc -m64 -march=k8 -O3 -fomit-frame-pointer avx
gcc -m64 -march=k8 -O -fomit-frame-pointer avx
gcc -m64 -march=k8 -Os -fomit-frame-pointer avx
gcc -m64 -march=nocona -O2 -fomit-frame-pointer avx
gcc -m64 -march=nocona -O3 -fomit-frame-pointer avx
gcc -m64 -march=nocona -O -fomit-frame-pointer avx
gcc -m64 -march=nocona -Os -fomit-frame-pointer avx
gcc -march=barcelona -O2 -fomit-frame-pointer avx
gcc -march=barcelona -O3 -fomit-frame-pointer avx
gcc -march=barcelona -O -fomit-frame-pointer avx
gcc -march=barcelona -Os -fomit-frame-pointer avx
gcc -march=k8 -O2 -fomit-frame-pointer avx
gcc -march=k8 -O3 -fomit-frame-pointer avx
gcc -march=k8 -O -fomit-frame-pointer avx
gcc -march=k8 -Os -fomit-frame-pointer avx
gcc -march=nocona -O2 -fomit-frame-pointer avx
gcc -march=nocona -O3 -fomit-frame-pointer avx
gcc -march=nocona -O -fomit-frame-pointer avx
gcc -march=nocona -Os -fomit-frame-pointer avx

Compiler output

Implementation: crypto_sign/lattisigns512/avx
Compiler: clang -O3 -fomit-frame-pointer -Qunused-arguments
ntt_transform.c: ntt_transform.c:27:11: error: always_inline function '_mm256_set_pd' requires target feature 'sse4.2', but would be inlined into function 'ntt_transform' that is compiled without support for 'sse4.2'
ntt_transform.c: vpinv = _mm256_set_pd(PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE);
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:28:11: error: always_inline function '_mm256_set_pd' requires target feature 'sse4.2', but would be inlined into function 'ntt_transform' that is compiled without support for 'sse4.2'
ntt_transform.c: vp = _mm256_set_pd(8383489., 8383489., 8383489., 8383489.);
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:32:10: error: always_inline function '_mm256_load_pd' requires target feature 'sse4.2', but would be inlined into function 'ntt_transform' that is compiled without support for 'sse4.2'
ntt_transform.c: vo10 = _mm256_load_pd(o+pos);
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:33:10: error: always_inline function '_mm256_load_pd' requires target feature 'sse4.2', but would be inlined into function 'ntt_transform' that is compiled without support for 'sse4.2'
ntt_transform.c: ...
ntt_transform.c: vt = _mm256_permute2f128_pd (vx0, vx0, 0x01); // now contains x2,x3,x0,x1
ntt_transform.c: ^
ntt_transform.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/avxintrin.h:297:12: note: expanded from macro '_mm256_permute2f128_pd'
ntt_transform.c: (__m256d)__builtin_ia32_vperm2f128_pd256((__v4df)(__m256d)(V1), ^
ntt_transform.c: ntt_transform.c:55:11: error: always_inline function '_mm256_mul_pd' requires target feature 'sse4.2', but would be inlined into function 'ntt_transform' that is compiled without support for 'sse4.2'
ntt_transform.c: vx0 = _mm256_mul_pd(vx0, neg4);
ntt_transform.c: ^
ntt_transform.c: fatal error: too many errors emitted, stopping now [-ferror-limit=]
ntt_transform.c: 20 errors generated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
clang -O3 -fomit-frame-pointer -Qunused-arguments avx
clang -mcpu=cortex-a8 -mfpu=neon -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments avx
clang -mcpu=cortex-a9 -mfpu=neon -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments avx
clang -mcpu=native -mfpu=neon -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments avx

Compiler output

Implementation: crypto_sign/lattisigns512/avx
Compiler: gcc -m64 -march=barcelona -O2 -fomit-frame-pointer
ntt_transform.c: ntt_transform.c: In function ‘ntt_transform’:
ntt_transform.c: ntt_transform.c:27:9: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
ntt_transform.c: vpinv = _mm256_set_pd(PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE);
ntt_transform.c: ^
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-linux-gnu/5/include/avxintrin.h:834:1: error: inlining failed in call to always_inline ‘_mm256_load_pd’: target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:35:8: error: called from here
ntt_transform.c: ...
ntt_transform.c: vt = _mm256_mul_pd(vx1, vo0);
ntt_transform.c: ^
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-linux-gnu/5/include/avxintrin.h:834:1: error: inlining failed in call to always_inline ‘_mm256_load_pd’: target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:256:11: error: called from here
ntt_transform.c: vx1 = _mm256_load_pd(out+offset+s+64);
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c: In function ‘ntt_transform’:
ntt_transform.c: ntt_transform.c:27:9: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
ntt_transform.c: vpinv = _mm256_set_pd(PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE, PARAM_APPROX_P_INVERSE);
ntt_transform.c: ^
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-linux-gnu/5/include/avxintrin.h:834:1: error: inlining failed in call to always_inline ‘_mm256_load_pd’: target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:35:8: error: called from here
ntt_transform.c: ...
ntt_transform.c: vt = _mm256_mul_pd(vx1, vo0);
ntt_transform.c: ^
ntt_transform.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/immintrin.h:41:0,
ntt_transform.c: from ntt_transform.c:9:
ntt_transform.c: /usr/lib/gcc/x86_64-linux-gnu/5/include/avxintrin.h:834:1: error: inlining failed in call to always_inline ‘_mm256_load_pd’: target specific option mismatch
ntt_transform.c: _mm256_load_pd (double const *__P)
ntt_transform.c: ^
ntt_transform.c: ntt_transform.c:256:11: error: called from here
ntt_transform.c: vx1 = _mm256_load_pd(out+offset+s+64);
ntt_transform.c: ^

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -m64 -march=barcelona -O2 -fomit-frame-pointer avx
gcc -m64 -march=barcelona -O3 -fomit-frame-pointer avx
gcc -m64 -march=barcelona -O -fomit-frame-pointer avx
gcc -m64 -march=barcelona -Os -fomit-frame-pointer avx