Implementation notes: amd64, icelake2, crypto_core/invsntrup761

Computer: icelake2
Architecture: amd64
CPU ID: GenuineIntel-000706e5-bfebfbff
SUPERCOP version: 20221005
Operation: crypto_core
Primitive: invsntrup761

Time	Object size	Test size	Implementation	Compiler	Benchmark date	SUPERCOP version
468500	227549 0 0	235034 772 960	`jumpdivsteps`	`clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20221003	20220506
477939	239865 0 0	247426 772 1024	`jumpdivsteps`	`clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20221003	20220506
482400	229699 0 0	234002 772 928	`jumpdivsteps`	`clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20221003	20220506
484974	234932 0 0	248210 764 992	`jumpdivsteps`	`gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20221003	20220506
494791	201207 0 0	213108 764 1024	`jumpdivsteps`	`clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20221003	20220506
503573	257440 0 0	268930 764 992	`jumpdivsteps`	`gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20221003	20220506
543387	3531 0 0	16906 764 992	`avx`	`gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20221003	20220506
567556	1766 0 0	13322 764 992	`avx`	`gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20221003	20220506
568141	1740 0 0	12897 756 992	`avx`	`gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20221003	20220506
589639	193106 0 0	203421 748 960	`jumpdivsteps`	`gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20221003	20220506
597071	1354 0 0	11517 748 960	`avx`	`gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20221003	20220506
597381	4097 0 0	18426 772 1024	`avx`	`clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20221003	20220506
609067	3153 0 0	17410 772 960	`avx`	`clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20221003	20220506
646899	259793 0 0	271098 764 992	`jumpdivsteps`	`gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20221003	20220506
661901	2266 0 0	14140 764 1024	`avx`	`clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20221003	20220506
686962	1677 0 0	12746 772 928	`avx`	`clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20221003	20220506
7157627	3995 0 0	17436 772 992	`ref`	`gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20221003	20220506
20118131	2906 0 0	17236 780 960	`ref`	`clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20221003	20220506
20184339	3722 0 0	18124 780 1024	`ref`	`clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20221003	20220506
21805436	2194 0 0	14126 772 1024	`ref`	`clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20221003	20220506
24791073	1022 0 0	12644 772 992	`ref`	`gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20221003	20220506
28350122	1170 0 0	12268 780 928	`ref`	`clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20221003	20220506
29459750	2885 0 0	15972 780 928	`ref`	`clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20221003	20220506
31218706	1084 0 0	12233 756 992	`ref`	`gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20221003	20220506
36125310	893 0 0	11087 756 960	`ref`	`gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20221003	20220506

Compiler output

Implementation: avx
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE

recip.c: recip.c:94:19: error: always_inline function '_mm256_set1_epi16' requires target feature 'avx', but would be inlined into function 'vectormodq_swapeliminate' that is compiled without support for 'avx'
recip.c: __m256i f0vec = _mm256_set1_epi16(f0);
recip.c: ^
recip.c: recip.c:94:19: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
recip.c: recip.c:95:19: error: always_inline function '_mm256_set1_epi16' requires target feature 'avx', but would be inlined into function 'vectormodq_swapeliminate' that is compiled without support for 'avx'
recip.c: __m256i g0vec = _mm256_set1_epi16(g0);
recip.c: ^
recip.c: recip.c:95:19: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
recip.c: recip.c:96:48: error: always_inline function '_mm256_set1_epi16' requires target feature 'avx', but would be inlined into function 'vectormodq_swapeliminate' that is compiled without support for 'avx'
recip.c: __m256i f0vecqinv = _mm256_mullo_epi16(f0vec,qinvvec);
recip.c: ^
recip.c: recip.c:80:17: note: expanded from macro 'qinvvec'
recip.c: #define qinvvec _mm256_set1_epi16(qinv)
recip.c: ^
recip.c: recip.c:96:48: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
recip.c: recip.c:80:17: note: expanded from macro 'qinvvec'
recip.c: #define qinvvec _mm256_set1_epi16(qinv)
recip.c: ^
recip.c: recip.c:96:23: error: always_inline function '_mm256_mullo_epi16' requires target feature 'avx2', but would be inlined into function 'vectormodq_swapeliminate' that is compiled without support for 'avx2'
recip.c: __m256i f0vecqinv = _mm256_mullo_epi16(f0vec,qinvvec);
recip.c: ^
recip.c: recip.c:96:23: error: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
recip.c: recip.c:97:48: error: always_inline function '_mm256_set1_epi16' requires target feature 'avx', but would be inlined into function 'vectormodq_swapeliminate' that is compiled without support for 'avx'
recip.c: __m256i g0vecqinv = _mm256_mullo_epi16(g0vec,qinvvec);
recip.c: ^
recip.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:

Compiler	Implementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE	avx

Compiler output

Implementation: jumpdivsteps
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE

avx-768.c: avx-768.c:544:36: error: invalid output size for constraint '+x'
avx-768.c: __asm__("vpsubw %1,%0,%0" : "+x"(a),"+x"(b));
avx-768.c: ^
avx-768.c: avx-768.c:550:36: error: invalid output size for constraint '+x'
avx-768.c: __asm__("vpaddw %1,%0,%0" : "+x"(a),"+x"(b));
avx-768.c: ^
avx-768.c: 2 errors generated.

Number of similar (compiler,implementation) pairs: 1, namely:

Compiler	Implementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE	jumpdivsteps