Implementation notes: amd64, hiphop, crypto_core/invsntrup761

Computer: hiphop
Microarchitecture: amd64; Haswell+AES (306c3)
Architecture: amd64
CPU ID: GenuineIntel-000306c3-bfebfbff
SUPERCOP version: 20231107
Operation: crypto_core
Primitive: invsntrup761

Time	Object size	Test size	Implementation	Compiler	Benchmark date	SUPERCOP version
658141	242222 0 0	250440 860 960	`jumpdivsteps`	`clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20230530	20230530
662369	230843 0 0	238784 860 960	`jumpdivsteps`	`clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20230530	20230530
668439	204988 0 0	217466 852 1024	`jumpdivsteps`	`clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20230530	20230530
673235	233564 0 0	238472 860 928	`jumpdivsteps`	`clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20230530	20230530
708707	242100 0 0	255837 804 992	`jumpdivsteps`	`gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231107	20231107
728649	256817 0 0	268621 804 992	`jumpdivsteps`	`gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231107	20231107
749538	3281 0 0	17904 860 960	`avx`	`clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20230530	20230530
754279	4225 0 0	19128 860 960	`avx`	`clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20230530	20230530
798876	202442 0 0	213092 788 960	`jumpdivsteps`	`gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231107	20231107
855519	1841 0 0	14213 804 992	`avx`	`gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231107	20231107
858937	266001 0 0	277597 804 992	`jumpdivsteps`	`gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231107	20231107
865456	1526 0 0	13858 852 1024	`avx`	`clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20230530	20230530
878117	1693 0 0	13256 860 928	`avx`	`clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20230530	20230530
977364	3837 0 0	18205 804 992	`avx`	`gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231107	20231107
985885	1451 0 0	12524 788 960	`avx`	`gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231107	20231107
989587	1775 0 0	13860 796 992	`avx`	`gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231107	20231107
9235189	4075 0 0	18519 812 992	`ref`	`gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231107	20231107
25491126	4096 0 0	19082 868 960	`ref`	`clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20230530	20230530
25633312	3088 0 0	17794 868 960	`ref`	`clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20230530	20230530
33605837	2885 0 0	16482 868 928	`ref`	`clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20230530	20230530
35634750	1170 0 0	12778 868 928	`ref`	`clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20230530	20230530
35721243	1061 0 0	13452 860 1024	`ref`	`clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20230530	20230530
36244017	1031 0 0	13431 812 992	`ref`	`gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231107	20231107
38889635	1143 0 0	13188 796 992	`ref`	`gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231107	20231107
41874781	939 0 0	12030 796 960	`ref`	`gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231107	20231107

Compiler output

Implementation: avx
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE

recip.c: recip.c:94:19: error: always_inline function '_mm256_set1_epi16' requires target feature 'avx', but would be inlined into function 'vectormodq_swapeliminate' that is compiled without support for 'avx'
recip.c: __m256i f0vec = _mm256_set1_epi16(f0);
recip.c: ^
recip.c: recip.c:94:19: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
recip.c: recip.c:95:19: error: always_inline function '_mm256_set1_epi16' requires target feature 'avx', but would be inlined into function 'vectormodq_swapeliminate' that is compiled without support for 'avx'
recip.c: __m256i g0vec = _mm256_set1_epi16(g0);
recip.c: ^
recip.c: recip.c:95:19: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
recip.c: recip.c:96:48: error: always_inline function '_mm256_set1_epi16' requires target feature 'avx', but would be inlined into function 'vectormodq_swapeliminate' that is compiled without support for 'avx'
recip.c: __m256i f0vecqinv = _mm256_mullo_epi16(f0vec,qinvvec);
recip.c: ^
recip.c: recip.c:80:17: note: expanded from macro 'qinvvec'
recip.c: #define qinvvec _mm256_set1_epi16(qinv)
recip.c: ^
recip.c: recip.c:96:48: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
recip.c: recip.c:80:17: note: expanded from macro 'qinvvec'
recip.c: #define qinvvec _mm256_set1_epi16(qinv)
recip.c: ^
recip.c: recip.c:96:23: error: always_inline function '_mm256_mullo_epi16' requires target feature 'avx2', but would be inlined into function 'vectormodq_swapeliminate' that is compiled without support for 'avx2'
recip.c: __m256i f0vecqinv = _mm256_mullo_epi16(f0vec,qinvvec);
recip.c: ^
recip.c: recip.c:96:23: error: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
recip.c: recip.c:97:48: error: always_inline function '_mm256_set1_epi16' requires target feature 'avx', but would be inlined into function 'vectormodq_swapeliminate' that is compiled without support for 'avx'
recip.c: __m256i g0vecqinv = _mm256_mullo_epi16(g0vec,qinvvec);
recip.c: ^
recip.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:

Compiler	Implementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE	avx

Compiler output

Implementation: jumpdivsteps
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE

avx-768.c: avx-768.c:544:36: error: invalid output size for constraint '+x'
avx-768.c: __asm__("vpsubw %1,%0,%0" : "+x"(a),"+x"(b));
avx-768.c: ^
avx-768.c: avx-768.c:550:36: error: invalid output size for constraint '+x'
avx-768.c: __asm__("vpaddw %1,%0,%0" : "+x"(a),"+x"(b));
avx-768.c: ^
avx-768.c: 2 errors generated.

Number of similar (compiler,implementation) pairs: 1, namely:

Compiler	Implementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE	jumpdivsteps