Implementation notes: amd64, cel02, crypto_core/weightsntrup761

Computer: cel02
Architecture: amd64
CPU ID: GenuineIntel-00050657-bfebfbff
SUPERCOP version: 20201130
Operation: crypto_core
Primitive: weightsntrup761
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
66331 0 015637 824 864avxgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
76254 0 011092 792 760avxclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020121120201130
94323 0 012148 816 800avxgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
116331 0 012412 816 800avxgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
128321 0 011016 800 800avxgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
158263 0 012962 800 760refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020121120201130
2021226 0 016565 824 864refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
2108106 0 011900 816 800refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
2118100 0 010924 792 760refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020121120201130
2320103 0 012164 816 800refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
233697 0 010752 800 800refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130

Compiler output

Implementation: avx
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x564775abd640: v4i64 = X86ISD::VTRUNC 0x564775abd510
try.c: 0x564775abd510: v16i32 = vselect 0x564775ab6410, 0x564775a3cc70, 0x564775abd3e0
try.c: 0x564775ab6410: v4i1 = X86ISD::PCMPGTM 0x564775a9e090, 0x564775a99c20
try.c: 0x564775a9e090: v4i64 = X86ISD::VBROADCAST 0x564775a40a90
try.c: 0x564775a40a90: i64,ch = load<LD8[%lsr.iv6971]> 0x5647759ae950, 0x564775a91500, undef:i64
try.c: 0x564775a91500: i64,ch = CopyFromReg 0x5647759ae950, Register:i64 %vreg50
try.c: 0x564775a99e80: i64 = Register %vreg50
try.c: 0x564775a41f60: i64 = undef
try.c: 0x564775a99c20: v4i64,ch = CopyFromReg 0x5647759ae950, Register:v4i64 %vreg13
try.c: 0x564775a9e8e0: v4i64 = Register %vreg13
try.c: 0x564775a3cc70: v16i32 = X86ISD::VBROADCAST 0x564775a9e2f0
try.c: 0x564775a9e2f0: i32,ch = load<LD4[ConstantPool]> 0x5647759ae950, 0x564775a45bf0, undef:i64
try.c: 0x564775a45bf0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x564775a83f00: i64 = TargetConstantPool<i32 1> 0
try.c: 0x564775a41f60: i64 = undef
try.c: 0x564775abd3e0: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x564775abd2b0: i32 = Constant<0>
try.c: 0x564775abd2b0: i32 = Constant<0>
try.c: 0x564775abd2b0: i32 = Constant<0>
try.c: 0x564775abd2b0: i32 = Constant<0>
try.c: 0x564775abd2b0: i32 = Constant<0>
try.c: 0x564775abd2b0: i32 = Constant<0>
try.c: 0x564775abd2b0: i32 = Constant<0>
try.c: 0x564775abd2b0: i32 = Constant<0>
try.c: 0x564775abd2b0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx

Compiler output

Implementation: avx
Security model: constbranchindex
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x5624f3e91e40: v4i64 = X86ISD::VTRUNC 0x5624f3e91d10
try.c: 0x5624f3e91d10: v16i32 = vselect 0x5624f3ea2570, 0x5624f3e43cc0, 0x5624f3e91be0
try.c: 0x5624f3ea2570: v4i1 = X86ISD::PCMPGTM 0x5624f3e8a000, 0x5624f3e85580
try.c: 0x5624f3e8a000: v4i64 = X86ISD::VBROADCAST 0x5624f3e44180
try.c: 0x5624f3e44180: i64,ch = load<LD8[%lsr.iv6971]> 0x5624f3d83a30, 0x5624f3e1d660, undef:i64
try.c: 0x5624f3e1d660: i64,ch = CopyFromReg 0x5624f3d83a30, Register:i64 %vreg50
try.c: 0x5624f3e857e0: i64 = Register %vreg50
try.c: 0x5624f3e21780: i64 = undef
try.c: 0x5624f3e85580: v4i64,ch = CopyFromReg 0x5624f3d83a30, Register:v4i64 %vreg13
try.c: 0x5624f3e8a850: v4i64 = Register %vreg13
try.c: 0x5624f3e43cc0: v16i32 = X86ISD::VBROADCAST 0x5624f3e8a260
try.c: 0x5624f3e8a260: i32,ch = load<LD4[ConstantPool]> 0x5624f3d83a30, 0x5624f3e46660, undef:i64
try.c: 0x5624f3e46660: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x5624f3e22100: i64 = TargetConstantPool<i32 1> 0
try.c: 0x5624f3e21780: i64 = undef
try.c: 0x5624f3e91be0: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x5624f3e91ab0: i32 = Constant<0>
try.c: 0x5624f3e91ab0: i32 = Constant<0>
try.c: 0x5624f3e91ab0: i32 = Constant<0>
try.c: 0x5624f3e91ab0: i32 = Constant<0>
try.c: 0x5624f3e91ab0: i32 = Constant<0>
try.c: 0x5624f3e91ab0: i32 = Constant<0>
try.c: 0x5624f3e91ab0: i32 = Constant<0>
try.c: 0x5624f3e91ab0: i32 = Constant<0>
try.c: 0x5624f3e91ab0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx

Compiler output

Implementation: avx
Security model: constbranchindex
Compiler: clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x5604c37a50f0: v4i64 = X86ISD::VTRUNC 0x5604c37a4fc0
try.c: 0x5604c37a4fc0: v16i32 = vselect 0x5604c37a1ad0, 0x5604c3732080, 0x5604c37a4e90
try.c: 0x5604c37a1ad0: v4i1 = X86ISD::PCMPGTM 0x5604c378a760, 0x5604c37862f0
try.c: 0x5604c378a760: v4i64 = X86ISD::VBROADCAST 0x5604c3734670
try.c: 0x5604c3734670: i64,ch = load<LD8[%lsr.iv6971]> 0x5604c369b950, 0x5604c37744d0, undef:i64
try.c: 0x5604c37744d0: i64,ch = CopyFromReg 0x5604c369b950, Register:i64 %vreg50
try.c: 0x5604c3786550: i64 = Register %vreg50
try.c: 0x5604c37306f0: i64 = undef
try.c: 0x5604c37862f0: v4i64,ch = CopyFromReg 0x5604c369b950, Register:v4i64 %vreg13
try.c: 0x5604c378afb0: v4i64 = Register %vreg13
try.c: 0x5604c3732080: v16i32 = X86ISD::VBROADCAST 0x5604c378a9c0
try.c: 0x5604c378a9c0: i32,ch = load<LD4[ConstantPool]> 0x5604c369b950, 0x5604c3733c50, undef:i64
try.c: 0x5604c3733c50: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x5604c3738380: i64 = TargetConstantPool<i32 1> 0
try.c: 0x5604c37306f0: i64 = undef
try.c: 0x5604c37a4e90: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x5604c37a4d60: i32 = Constant<0>
try.c: 0x5604c37a4d60: i32 = Constant<0>
try.c: 0x5604c37a4d60: i32 = Constant<0>
try.c: 0x5604c37a4d60: i32 = Constant<0>
try.c: 0x5604c37a4d60: i32 = Constant<0>
try.c: 0x5604c37a4d60: i32 = Constant<0>
try.c: 0x5604c37a4d60: i32 = Constant<0>
try.c: 0x5604c37a4d60: i32 = Constant<0>
try.c: 0x5604c37a4d60: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx

Compiler output

Implementation: avx
Security model: constbranchindex
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
weight.c: weight.c:20:9: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup761_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: sum = _mm256_loadu_si256((__m256i *) (in+p-32));
weight.c: ^
weight.c: weight.c:21:10: error: always_inline function '_mm256_set_epi8' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup761_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: sum &= endingmask;
weight.c: ^
weight.c: ./params.h:2:20: note: expanded from macro 'endingmask'
weight.c: #define endingmask _mm256_set_epi8(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0)
weight.c: ^
weight.c: weight.c:24:20: error: always_inline function '_mm256_loadu_si256' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup761_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: __m256i bits = _mm256_loadu_si256((__m256i *) in);
weight.c: ^
weight.c: weight.c:25:13: error: always_inline function '_mm256_set1_epi8' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup761_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: bits &= _mm256_set1_epi8(1);
weight.c: ^
weight.c: weight.c:26:11: error: always_inline function '_mm256_add_epi8' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup761_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: sum = _mm256_add_epi8(sum,bits);
weight.c: ^
weight.c: weight.c:31:11: error: always_inline function '_mm256_srli_epi16' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup761_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: sumhi = _mm256_srli_epi16(sum,8);
weight.c: ^
weight.c: weight.c:32:10: error: always_inline function '_mm256_set1_epi16' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup761_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: sum &= _mm256_set1_epi16(0xff);
weight.c: ^
weight.c: weight.c:33:9: error: always_inline function '_mm256_add_epi16' requires target feature 'sse4.2', but would be inlined into function 'crypto_core_weightsntrup761_avx_constbranchindex' that is compiled without support for 'sse4.2'
weight.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE avx

Compiler output

Implementation: ref
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x56140ae78fa0: v4i64 = X86ISD::VTRUNC 0x56140ae78e70
try.c: 0x56140ae78e70: v16i32 = vselect 0x56140ae80a90, 0x56140ae181e0, 0x56140ae78d40
try.c: 0x56140ae80a90: v4i1 = X86ISD::PCMPGTM 0x56140ae73970, 0x56140ae6f500
try.c: 0x56140ae73970: v4i64 = X86ISD::VBROADCAST 0x56140ae13660
try.c: 0x56140ae13660: i64,ch = load<LD8[%lsr.iv6971]> 0x56140ad84960, 0x56140ae6a360, undef:i64
try.c: 0x56140ae6a360: i64,ch = CopyFromReg 0x56140ad84960, Register:i64 %vreg50
try.c: 0x56140ae6f760: i64 = Register %vreg50
try.c: 0x56140ae16850: i64 = undef
try.c: 0x56140ae6f500: v4i64,ch = CopyFromReg 0x56140ad84960, Register:v4i64 %vreg13
try.c: 0x56140ae741c0: v4i64 = Register %vreg13
try.c: 0x56140ae181e0: v16i32 = X86ISD::VBROADCAST 0x56140ae73bd0
try.c: 0x56140ae73bd0: i32,ch = load<LD4[ConstantPool]> 0x56140ad84960, 0x56140ae12c40, undef:i64
try.c: 0x56140ae12c40: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x56140ae59350: i64 = TargetConstantPool<i32 1> 0
try.c: 0x56140ae16850: i64 = undef
try.c: 0x56140ae78d40: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x56140ae78c10: i32 = Constant<0>
try.c: 0x56140ae78c10: i32 = Constant<0>
try.c: 0x56140ae78c10: i32 = Constant<0>
try.c: 0x56140ae78c10: i32 = Constant<0>
try.c: 0x56140ae78c10: i32 = Constant<0>
try.c: 0x56140ae78c10: i32 = Constant<0>
try.c: 0x56140ae78c10: i32 = Constant<0>
try.c: 0x56140ae78c10: i32 = Constant<0>
try.c: 0x56140ae78c10: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref

Compiler output

Implementation: ref
Security model: constbranchindex
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x556188754da0: v4i64 = X86ISD::VTRUNC 0x556188754c70
try.c: 0x556188754c70: v16i32 = vselect 0x556188743800, 0x5561886d8320, 0x556188754b40
try.c: 0x556188743800: v4i1 = X86ISD::PCMPGTM 0x55618873d7b0, 0x556188739340
try.c: 0x55618873d7b0: v4i64 = X86ISD::VBROADCAST 0x5561886d87e0
try.c: 0x5561886d87e0: i64,ch = load<LD8[%lsr.iv6971]> 0x556188636a30, 0x5561886d08e0, undef:i64
try.c: 0x5561886d08e0: i64,ch = CopyFromReg 0x556188636a30, Register:i64 %vreg50
try.c: 0x5561887395a0: i64 = Register %vreg50
try.c: 0x5561886e51d0: i64 = undef
try.c: 0x556188739340: v4i64,ch = CopyFromReg 0x556188636a30, Register:v4i64 %vreg13
try.c: 0x55618873e000: v4i64 = Register %vreg13
try.c: 0x5561886d8320: v16i32 = X86ISD::VBROADCAST 0x55618873da10
try.c: 0x55618873da10: i32,ch = load<LD4[ConstantPool]> 0x556188636a30, 0x5561886dacc0, undef:i64
try.c: 0x5561886dacc0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x5561886e5b50: i64 = TargetConstantPool<i32 1> 0
try.c: 0x5561886e51d0: i64 = undef
try.c: 0x556188754b40: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x556188754a10: i32 = Constant<0>
try.c: 0x556188754a10: i32 = Constant<0>
try.c: 0x556188754a10: i32 = Constant<0>
try.c: 0x556188754a10: i32 = Constant<0>
try.c: 0x556188754a10: i32 = Constant<0>
try.c: 0x556188754a10: i32 = Constant<0>
try.c: 0x556188754a10: i32 = Constant<0>
try.c: 0x556188754a10: i32 = Constant<0>
try.c: 0x556188754a10: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref

Compiler output

Implementation: ref
Security model: constbranchindex
Compiler: clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55ac09e77210: v4i64 = X86ISD::VTRUNC 0x55ac09e770e0
try.c: 0x55ac09e770e0: v16i32 = vselect 0x55ac09e7f4d0, 0x55ac09e2ec30, 0x55ac09e76fb0
try.c: 0x55ac09e7f4d0: v4i1 = X86ISD::PCMPGTM 0x55ac09e73c00, 0x55ac09e6f790
try.c: 0x55ac09e73c00: v4i64 = X86ISD::VBROADCAST 0x55ac09e13f70
try.c: 0x55ac09e13f70: i64,ch = load<LD8[%lsr.iv6971]> 0x55ac09d84900, 0x55ac09e5edf0, undef:i64
try.c: 0x55ac09e5edf0: i64,ch = CopyFromReg 0x55ac09d84900, Register:i64 %vreg50
try.c: 0x55ac09e6f9f0: i64 = Register %vreg50
try.c: 0x55ac09e2d2a0: i64 = undef
try.c: 0x55ac09e6f790: v4i64,ch = CopyFromReg 0x55ac09d84900, Register:v4i64 %vreg13
try.c: 0x55ac09e74450: v4i64 = Register %vreg13
try.c: 0x55ac09e2ec30: v16i32 = X86ISD::VBROADCAST 0x55ac09e73e60
try.c: 0x55ac09e73e60: i32,ch = load<LD4[ConstantPool]> 0x55ac09d84900, 0x55ac09e13550, undef:i64
try.c: 0x55ac09e13550: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x55ac09e57d30: i64 = TargetConstantPool<i32 1> 0
try.c: 0x55ac09e2d2a0: i64 = undef
try.c: 0x55ac09e76fb0: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55ac09e76e80: i32 = Constant<0>
try.c: 0x55ac09e76e80: i32 = Constant<0>
try.c: 0x55ac09e76e80: i32 = Constant<0>
try.c: 0x55ac09e76e80: i32 = Constant<0>
try.c: 0x55ac09e76e80: i32 = Constant<0>
try.c: 0x55ac09e76e80: i32 = Constant<0>
try.c: 0x55ac09e76e80: i32 = Constant<0>
try.c: 0x55ac09e76e80: i32 = Constant<0>
try.c: 0x55ac09e76e80: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE ref