Implementation notes: amd64, cel02, crypto_core/weightsntrup857

Computer: cel02
Architecture: amd64
CPU ID: GenuineIntel-00050657-bfebfbff
SUPERCOP version: 20201130
Operation: crypto_core
Primitive: weightsntrup857
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
80254 0 011092 792 760avxclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020121120201130
102331 0 012412 816 800avxgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
122331 0 015637 824 864avxgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
142323 0 012148 816 800avxgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
162321 0 011016 800 800avxgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
2361298 0 016645 824 864refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
244191 0 012882 800 760refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020121120201130
1434106 0 011900 816 800refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
204697 0 010752 800 800refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
2372100 0 010924 792 760refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020121120201130
4234103 0 012164 816 800refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130

Compiler output

Implementation: avx
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55a94e6bd010: v4i64 = X86ISD::VTRUNC 0x55a94e6bcee0
try.c: 0x55a94e6bcee0: v16i32 = vselect 0x55a94e6a6ee0, 0x55a94e646a50, 0x55a94e6bcdb0
try.c: 0x55a94e6a6ee0: v4i1 = X86ISD::PCMPGTM 0x55a94e69f670, 0x55a94e69b200
try.c: 0x55a94e69f670: v4i64 = X86ISD::VBROADCAST 0x55a94e642a30
try.c: 0x55a94e642a30: i64,ch = load<LD8[%lsr.iv6971]> 0x55a94e5b0950, 0x55a94e692770, undef:i64
try.c: 0x55a94e692770: i64,ch = CopyFromReg 0x55a94e5b0950, Register:i64 %vreg50
try.c: 0x55a94e69b460: i64 = Register %vreg50
try.c: 0x55a94e643f00: i64 = undef
try.c: 0x55a94e69b200: v4i64,ch = CopyFromReg 0x55a94e5b0950, Register:v4i64 %vreg13
try.c: 0x55a94e69fec0: v4i64 = Register %vreg13
try.c: 0x55a94e646a50: v16i32 = X86ISD::VBROADCAST 0x55a94e69f8d0
try.c: 0x55a94e69f8d0: i32,ch = load<LD4[ConstantPool]> 0x55a94e5b0950, 0x55a94e63f3d0, undef:i64
try.c: 0x55a94e63f3d0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x55a94e667480: i64 = TargetConstantPool<i32 1> 0
try.c: 0x55a94e643f00: i64 = undef
try.c: 0x55a94e6bcdb0: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55a94e6bcc80: i32 = Constant<0>
try.c: 0x55a94e6bcc80: i32 = Constant<0>
try.c: 0x55a94e6bcc80: i32 = Constant<0>
try.c: 0x55a94e6bcc80: i32 = Constant<0>
try.c: 0x55a94e6bcc80: i32 = Constant<0>
try.c: 0x55a94e6bcc80: i32 = Constant<0>
try.c: 0x55a94e6bcc80: i32 = Constant<0>
try.c: 0x55a94e6bcc80: i32 = Constant<0>
try.c: 0x55a94e6bcc80: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx

Compiler output

Implementation: avx
Security model: constbranchindex
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55e1338b26c0: v4i64 = X86ISD::VTRUNC 0x55e1338b2590
try.c: 0x55e1338b2590: v16i32 = vselect 0x55e133892db0, 0x55e133827430, 0x55e1338b2460
try.c: 0x55e133892db0: v4i1 = X86ISD::PCMPGTM 0x55e13388bf40, 0x55e133888ae0
try.c: 0x55e13388bf40: v4i64 = X86ISD::VBROADCAST 0x55e1338278f0
try.c: 0x55e1338278f0: i64,ch = load<LD8[%lsr.iv6971]> 0x55e133786a10, 0x55e13382d350, undef:i64
try.c: 0x55e13382d350: i64,ch = CopyFromReg 0x55e133786a10, Register:i64 %vreg50
try.c: 0x55e133888d40: i64 = Register %vreg50
try.c: 0x55e133821050: i64 = undef
try.c: 0x55e133888ae0: v4i64,ch = CopyFromReg 0x55e133786a10, Register:v4i64 %vreg13
try.c: 0x55e13388c790: v4i64 = Register %vreg13
try.c: 0x55e133827430: v16i32 = X86ISD::VBROADCAST 0x55e13388c1a0
try.c: 0x55e13388c1a0: i32,ch = load<LD4[ConstantPool]> 0x55e133786a10, 0x55e133829dd0, undef:i64
try.c: 0x55e133829dd0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x55e1338219d0: i64 = TargetConstantPool<i32 1> 0
try.c: 0x55e133821050: i64 = undef
try.c: 0x55e1338b2460: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55e1338b2330: i32 = Constant<0>
try.c: 0x55e1338b2330: i32 = Constant<0>
try.c: 0x55e1338b2330: i32 = Constant<0>
try.c: 0x55e1338b2330: i32 = Constant<0>
try.c: 0x55e1338b2330: i32 = Constant<0>
try.c: 0x55e1338b2330: i32 = Constant<0>
try.c: 0x55e1338b2330: i32 = Constant<0>
try.c: 0x55e1338b2330: i32 = Constant<0>
try.c: 0x55e1338b2330: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx

Compiler output

Implementation: avx
Security model: constbranchindex
Compiler: clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x5602b25c11a0: v4i64 = X86ISD::VTRUNC 0x5602b25c1070
try.c: 0x5602b25c1070: v16i32 = vselect 0x5602b25c8c50, 0x5602b2562350, 0x5602b25c0f40
try.c: 0x5602b25c8c50: v4i1 = X86ISD::PCMPGTM 0x5602b25a6820, 0x5602b25a23b0
try.c: 0x5602b25a6820: v4i64 = X86ISD::VBROADCAST 0x5602b2546aa0
try.c: 0x5602b2546aa0: i64,ch = load<LD8[%lsr.iv6971]> 0x5602b24b7950, 0x5602b2592150, undef:i64
try.c: 0x5602b2592150: i64,ch = CopyFromReg 0x5602b24b7950, Register:i64 %vreg50
try.c: 0x5602b25a2610: i64 = Register %vreg50
try.c: 0x5602b25609c0: i64 = undef
try.c: 0x5602b25a23b0: v4i64,ch = CopyFromReg 0x5602b24b7950, Register:v4i64 %vreg13
try.c: 0x5602b25a7070: v4i64 = Register %vreg13
try.c: 0x5602b2562350: v16i32 = X86ISD::VBROADCAST 0x5602b25a6a80
try.c: 0x5602b25a6a80: i32,ch = load<LD4[ConstantPool]> 0x5602b24b7950, 0x5602b2546080, undef:i64
try.c: 0x5602b2546080: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x5602b2529780: i64 = TargetConstantPool<i32 1> 0
try.c: 0x5602b25609c0: i64 = undef
try.c: 0x5602b25c0f40: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x5602b25c0e10: i32 = Constant<0>
try.c: 0x5602b25c0e10: i32 = Constant<0>
try.c: 0x5602b25c0e10: i32 = Constant<0>
try.c: 0x5602b25c0e10: i32 = Constant<0>
try.c: 0x5602b25c0e10: i32 = Constant<0>
try.c: 0x5602b25c0e10: i32 = Constant<0>
try.c: 0x5602b25c0e10: i32 = Constant<0>
try.c: 0x5602b25c0e10: i32 = Constant<0>
try.c: 0x5602b25c0e10: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx

Compiler output

Implementation: avx
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
weight.c: weight.c:20:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup857_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: sum = _mm256_loadu_si256((__m256i *) (in+p-32));
weight.c: ^
weight.c: weight.c:21:10: error: always_inline function '_mm256_set_epi8' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup857_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: sum &= endingmask;
weight.c: ^
weight.c: ./params.h:2:20: note: expanded from macro 'endingmask'
weight.c: #define endingmask _mm256_set_epi8(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0)
weight.c: ^
weight.c: weight.c:24:20: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup857_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: __m256i bits = _mm256_loadu_si256((__m256i *) in);
weight.c: ^
weight.c: weight.c:25:13: error: always_inline function '_mm256_set1_epi8' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup857_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: bits &= _mm256_set1_epi8(1);
weight.c: ^
weight.c: weight.c:26:11: error: always_inline function '_mm256_add_epi8' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup857_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: sum = _mm256_add_epi8(sum,bits);
weight.c: ^
weight.c: weight.c:31:11: error: always_inline function '_mm256_srli_epi16' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup857_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: sumhi = _mm256_srli_epi16(sum,8);
weight.c: ^
weight.c: weight.c:32:10: error: always_inline function '_mm256_set1_epi16' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup857_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: sum &= _mm256_set1_epi16(0xff);
weight.c: ^
weight.c: weight.c:33:9: error: always_inline function '_mm256_add_epi16' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup857_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx

Compiler output

Implementation: ref
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x555edf936040: v4i64 = X86ISD::VTRUNC 0x555edf935f10
try.c: 0x555edf935f10: v16i32 = vselect 0x555edf9298f0, 0x555edf8d71d0, 0x555edf935de0
try.c: 0x555edf9298f0: v4i1 = X86ISD::PCMPGTM 0x555edf92cea0, 0x555edf928420
try.c: 0x555edf92cea0: v4i64 = X86ISD::VBROADCAST 0x555edf8d4370
try.c: 0x555edf8d4370: i64,ch = load<LD8[%lsr.iv6971]> 0x555edf83d950, 0x555edf916a80, undef:i64
try.c: 0x555edf916a80: i64,ch = CopyFromReg 0x555edf83d950, Register:i64 %vreg50
try.c: 0x555edf928680: i64 = Register %vreg50
try.c: 0x555edf8d5840: i64 = undef
try.c: 0x555edf928420: v4i64,ch = CopyFromReg 0x555edf83d950, Register:v4i64 %vreg13
try.c: 0x555edf92d6f0: v4i64 = Register %vreg13
try.c: 0x555edf8d71d0: v16i32 = X86ISD::VBROADCAST 0x555edf92d100
try.c: 0x555edf92d100: i32,ch = load<LD4[ConstantPool]> 0x555edf83d950, 0x555edf8d3950, undef:i64
try.c: 0x555edf8d3950: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x555edf88ddf0: i64 = TargetConstantPool<i32 1> 0
try.c: 0x555edf8d5840: i64 = undef
try.c: 0x555edf935de0: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x555edf935cb0: i32 = Constant<0>
try.c: 0x555edf935cb0: i32 = Constant<0>
try.c: 0x555edf935cb0: i32 = Constant<0>
try.c: 0x555edf935cb0: i32 = Constant<0>
try.c: 0x555edf935cb0: i32 = Constant<0>
try.c: 0x555edf935cb0: i32 = Constant<0>
try.c: 0x555edf935cb0: i32 = Constant<0>
try.c: 0x555edf935cb0: i32 = Constant<0>
try.c: 0x555edf935cb0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref

Compiler output

Implementation: ref
Security model: constbranchindex
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55a8c6b87bf0: v4i64 = X86ISD::VTRUNC 0x55a8c6b87ac0
try.c: 0x55a8c6b87ac0: v16i32 = vselect 0x55a8c6b73390, 0x55a8c6b1abf0, 0x55a8c6b87990
try.c: 0x55a8c6b73390: v4i1 = X86ISD::PCMPGTM 0x55a8c6b6e400, 0x55a8c6b6ade0
try.c: 0x55a8c6b6e400: v4i64 = X86ISD::VBROADCAST 0x55a8c6b1b0b0
try.c: 0x55a8c6b1b0b0: i64,ch = load<LD8[%lsr.iv6971]> 0x55a8c6a68a30, 0x55a8c6b0a880, undef:i64
try.c: 0x55a8c6b0a880: i64,ch = CopyFromReg 0x55a8c6a68a30, Register:i64 %vreg50
try.c: 0x55a8c6b6b040: i64 = Register %vreg50
try.c: 0x55a8c6b030c0: i64 = undef
try.c: 0x55a8c6b6ade0: v4i64,ch = CopyFromReg 0x55a8c6a68a30, Register:v4i64 %vreg13
try.c: 0x55a8c6b6ec50: v4i64 = Register %vreg13
try.c: 0x55a8c6b1abf0: v16i32 = X86ISD::VBROADCAST 0x55a8c6b6e660
try.c: 0x55a8c6b6e660: i32,ch = load<LD4[ConstantPool]> 0x55a8c6a68a30, 0x55a8c6b08e50, undef:i64
try.c: 0x55a8c6b08e50: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x55a8c6b03a40: i64 = TargetConstantPool<i32 1> 0
try.c: 0x55a8c6b030c0: i64 = undef
try.c: 0x55a8c6b87990: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55a8c6b87860: i32 = Constant<0>
try.c: 0x55a8c6b87860: i32 = Constant<0>
try.c: 0x55a8c6b87860: i32 = Constant<0>
try.c: 0x55a8c6b87860: i32 = Constant<0>
try.c: 0x55a8c6b87860: i32 = Constant<0>
try.c: 0x55a8c6b87860: i32 = Constant<0>
try.c: 0x55a8c6b87860: i32 = Constant<0>
try.c: 0x55a8c6b87860: i32 = Constant<0>
try.c: 0x55a8c6b87860: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref

Compiler output

Implementation: ref
Security model: constbranchindex
Compiler: clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55624289b470: v4i64 = X86ISD::VTRUNC 0x55624289b340
try.c: 0x55624289b340: v16i32 = vselect 0x556242887670, 0x556242839dc0, 0x55624289b210
try.c: 0x556242887670: v4i1 = X86ISD::PCMPGTM 0x55624287fae0, 0x55624287b670
try.c: 0x55624287fae0: v4i64 = X86ISD::VBROADCAST 0x556242827570
try.c: 0x556242827570: i64,ch = load<LD8[%lsr.iv6971]> 0x556242790960, 0x5562428764d0, undef:i64
try.c: 0x5562428764d0: i64,ch = CopyFromReg 0x556242790960, Register:i64 %vreg50
try.c: 0x55624287b8d0: i64 = Register %vreg50
try.c: 0x556242828a40: i64 = undef
try.c: 0x55624287b670: v4i64,ch = CopyFromReg 0x556242790960, Register:v4i64 %vreg13
try.c: 0x556242880330: v4i64 = Register %vreg13
try.c: 0x556242839dc0: v16i32 = X86ISD::VBROADCAST 0x55624287fd40
try.c: 0x55624287fd40: i32,ch = load<LD4[ConstantPool]> 0x556242790960, 0x55624281dbd0, undef:i64
try.c: 0x55624281dbd0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x556242826460: i64 = TargetConstantPool<i32 1> 0
try.c: 0x556242828a40: i64 = undef
try.c: 0x55624289b210: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55624289b0e0: i32 = Constant<0>
try.c: 0x55624289b0e0: i32 = Constant<0>
try.c: 0x55624289b0e0: i32 = Constant<0>
try.c: 0x55624289b0e0: i32 = Constant<0>
try.c: 0x55624289b0e0: i32 = Constant<0>
try.c: 0x55624289b0e0: i32 = Constant<0>
try.c: 0x55624289b0e0: i32 = Constant<0>
try.c: 0x55624289b0e0: i32 = Constant<0>
try.c: 0x55624289b0e0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref