Implementation notes: amd64, shoe, crypto_core/mult3sntrup857

Computer: shoe
Microarchitecture: amd64; Broadwell+AES (306d4)
Architecture: amd64
CPU ID: GenuineIntel-000306d4-bfebfbff
SUPERCOP version: 20240107
Operation: crypto_core
Primitive: mult3sntrup857

Time	Object size	Test size	Implementation	Compiler	Benchmark date	SUPERCOP version
12048	19128 0 0	33728 812 952	`avx800`	`clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
12057	19603 0 0	31736 812 952	`avx`	`clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
12080	21751 0 0	33992 812 952	`avx`	`clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
12150	16996 0 0	31488 812 952	`avx800`	`clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
12357	17247 0 0	30400 812 952	`round2`	`clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
12364	19379 0 0	32640 812 952	`round2`	`clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
13731	15913 0 0	29680 780 984	`avx800`	`gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
14530	17058 0 0	30776 780 984	`avx`	`gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
15030	16910 0 0	30672 780 984	`round2`	`gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
16721	15932 0 0	25166 804 920	`avx`	`clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
16797	14013 0 0	25262 804 920	`avx800`	`clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
17154	11813 0 0	22526 804 920	`round2`	`clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
19476	14945 0 0	26438 804 920	`avx800`	`clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
20357	14206 0 0	24835 756 952	`avx800`	`gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
20718	14933 0 0	25491 756 952	`avx`	`gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
21434	14797 0 0	26800 780 984	`avx800`	`gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
21730	11979 0 0	22611 756 952	`round2`	`gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
22286	15942 0 0	27896 780 984	`avx`	`gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
22490	14835 0 0	26511 772 984	`avx800`	`gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
22924	13724 0 0	25728 780 984	`round2`	`gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
23083	11914 0 0	23615 772 984	`round2`	`gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
23339	15925 0 0	27543 772 984	`avx`	`gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
47014	27346 0 0	36718 804 920	`avx`	`clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
47545	17074 0 0	28262 804 920	`round2`	`clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
290487	1897 0 0	16424 812 952	`ref`	`clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
293140	4249 0 0	18856 812 952	`ref`	`clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
309115	5136 0 0	18872 780 984	`ref`	`gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
508290	2718 0 0	16568 812 920	`ref`	`clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
1378296	471 0 0	11710 804 920	`ref`	`clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
1388136	559 0 0	12030 804 920	`ref`	`clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE`	20231214	20231212
1518355	573 0 0	12544 780 984	`ref`	`gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
2649312	611 0 0	12255 772 984	`ref`	`gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212
3159987	482 0 0	11067 756 952	`ref`	`gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE`	20231214	20231212

Compiler output

Implementation: avx
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE

mult1024.c: mult1024.c:212:7: error: always_inline function '_mm256_set1_epi16' requires target feature 'avx', but would be inlined into function 'crypto_core_mult3sntrup857_avx_constbranchindex' that is compiled without support for 'avx'
mult1024.c: x = const_x16(0);
mult1024.c: ^
mult1024.c: mult1024.c:10:19: note: expanded from macro 'const_x16'
mult1024.c: #define const_x16 _mm256_set1_epi16
mult1024.c: ^
mult1024.c: mult1024.c:212:7: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
mult1024.c: mult1024.c:10:19: note: expanded from macro 'const_x16'
mult1024.c: #define const_x16 _mm256_set1_epi16
mult1024.c: ^
mult1024.c: mult1024.c:213:36: error: always_inline function '_mm256_storeu_si256' requires target feature 'avx', but would be inlined into function 'crypto_core_mult3sntrup857_avx_constbranchindex' that is compiled without support for 'avx'
mult1024.c: for (i = p&~15;i < 1024;i += 16) store_x16(&f[i],x);
mult1024.c: ^
mult1024.c: mult1024.c:9:24: note: expanded from macro 'store_x16'
mult1024.c: #define store_x16(p,v) _mm256_storeu_si256((int16x16 *) (p),(v))
mult1024.c: ^
mult1024.c: mult1024.c:213:36: error: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
mult1024.c: mult1024.c:9:24: note: expanded from macro 'store_x16'
mult1024.c: #define store_x16(p,v) _mm256_storeu_si256((int16x16 *) (p),(v))
mult1024.c: ^
mult1024.c: mult1024.c:214:36: error: always_inline function '_mm256_storeu_si256' requires target feature 'avx', but would be inlined into function 'crypto_core_mult3sntrup857_avx_constbranchindex' that is compiled without support for 'avx'
mult1024.c: for (i = p&~15;i < 1024;i += 16) store_x16(&g[i],x);
mult1024.c: ^
mult1024.c: mult1024.c:9:24: note: expanded from macro 'store_x16'
mult1024.c: #define store_x16(p,v) _mm256_storeu_si256((int16x16 *) (p),(v))
mult1024.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:

Compiler	Implementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE	avx

Compiler output

Implementation: avx800
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE

mult1024.c: mult1024.c:212:7: error: always_inline function '_mm256_set1_epi16' requires target feature 'avx', but would be inlined into function 'crypto_core_mult3sntrup857_avx800_constbranchindex' that is compiled without support for 'avx'
mult1024.c: x = const_x16(0);
mult1024.c: ^
mult1024.c: mult1024.c:10:19: note: expanded from macro 'const_x16'
mult1024.c: #define const_x16 _mm256_set1_epi16
mult1024.c: ^
mult1024.c: mult1024.c:212:7: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
mult1024.c: mult1024.c:10:19: note: expanded from macro 'const_x16'
mult1024.c: #define const_x16 _mm256_set1_epi16
mult1024.c: ^
mult1024.c: mult1024.c:213:36: error: always_inline function '_mm256_storeu_si256' requires target feature 'avx', but would be inlined into function 'crypto_core_mult3sntrup857_avx800_constbranchindex' that is compiled without support for 'avx'
mult1024.c: for (i = p&~15;i < 1024;i += 16) store_x16(&f[i],x);
mult1024.c: ^
mult1024.c: mult1024.c:9:24: note: expanded from macro 'store_x16'
mult1024.c: #define store_x16(p,v) _mm256_storeu_si256((int16x16 *) (p),(v))
mult1024.c: ^
mult1024.c: mult1024.c:213:36: error: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
mult1024.c: mult1024.c:9:24: note: expanded from macro 'store_x16'
mult1024.c: #define store_x16(p,v) _mm256_storeu_si256((int16x16 *) (p),(v))
mult1024.c: ^
mult1024.c: mult1024.c:214:36: error: always_inline function '_mm256_storeu_si256' requires target feature 'avx', but would be inlined into function 'crypto_core_mult3sntrup857_avx800_constbranchindex' that is compiled without support for 'avx'
mult1024.c: for (i = p&~15;i < 1024;i += 16) store_x16(&g[i],x);
mult1024.c: ^
mult1024.c: mult1024.c:9:24: note: expanded from macro 'store_x16'
mult1024.c: #define store_x16(p,v) _mm256_storeu_si256((int16x16 *) (p),(v))
mult1024.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:

Compiler	Implementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE	avx800

Compiler output

Implementation: round2
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE

mult1024.c: mult1024.c:212:7: error: always_inline function '_mm256_set1_epi16' requires target feature 'avx', but would be inlined into function 'crypto_core_mult3sntrup857_round2_constbranchindex' that is compiled without support for 'avx'
mult1024.c: x = const_x16(0);
mult1024.c: ^
mult1024.c: mult1024.c:10:19: note: expanded from macro 'const_x16'
mult1024.c: #define const_x16 _mm256_set1_epi16
mult1024.c: ^
mult1024.c: mult1024.c:212:7: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
mult1024.c: mult1024.c:10:19: note: expanded from macro 'const_x16'
mult1024.c: #define const_x16 _mm256_set1_epi16
mult1024.c: ^
mult1024.c: mult1024.c:213:36: error: always_inline function '_mm256_storeu_si256' requires target feature 'avx', but would be inlined into function 'crypto_core_mult3sntrup857_round2_constbranchindex' that is compiled without support for 'avx'
mult1024.c: for (i = p&~15;i < 1024;i += 16) store_x16(&f[i],x);
mult1024.c: ^
mult1024.c: mult1024.c:9:24: note: expanded from macro 'store_x16'
mult1024.c: #define store_x16(p,v) _mm256_storeu_si256((int16x16 *) (p),(v))
mult1024.c: ^
mult1024.c: mult1024.c:213:36: error: AVX vector argument of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
mult1024.c: mult1024.c:9:24: note: expanded from macro 'store_x16'
mult1024.c: #define store_x16(p,v) _mm256_storeu_si256((int16x16 *) (p),(v))
mult1024.c: ^
mult1024.c: mult1024.c:214:36: error: always_inline function '_mm256_storeu_si256' requires target feature 'avx', but would be inlined into function 'crypto_core_mult3sntrup857_round2_constbranchindex' that is compiled without support for 'avx'
mult1024.c: for (i = p&~15;i < 1024;i += 16) store_x16(&g[i],x);
mult1024.c: ^
mult1024.c: mult1024.c:9:24: note: expanded from macro 'store_x16'
mult1024.c: #define store_x16(p,v) _mm256_storeu_si256((int16x16 *) (p),(v))
mult1024.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:

Compiler	Implementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE	round2