Implementation notes: amd64, cel02, crypto_stream/simon6496ctr

Computer: cel02
Architecture: amd64
CPU ID: GenuineIntel-00050657-bfebfbff
SUPERCOP version: 20201130
Operation: crypto_stream
Primitive: simon6496ctr
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
6068127786 0 0144413 824 888T:avx2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
856288191 0 0101548 816 856T:avx2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
862687587 0 099480 800 824T:avx2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
8640104994 0 0116956 792 800T:avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020121120201130
878288742 0 0101748 816 856T:avx2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
900680112 0 096701 824 888T:sse4gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
14564102451 0 0114252 792 800T:sse4clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020121120201130
1586880602 0 093564 816 856T:sse4gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
1606079938 0 093268 816 856T:sse4gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
1723080549 0 092400 800 824T:sse4gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x56035c17d070: v4i64 = X86ISD::VTRUNC 0x56035c17cf40
try.c: 0x56035c17cf40: v16i32 = vselect 0x56035c183390, 0x56035c117db0, 0x56035c17ce10
try.c: 0x56035c183390: v4i1 = X86ISD::PCMPGTM 0x56035c177a40, 0x56035c1735d0
try.c: 0x56035c177a40: v4i64 = X86ISD::VBROADCAST 0x56035c13e140
try.c: 0x56035c13e140: i64,ch = load<LD8[%lsr.iv6971]> 0x56035c088950, 0x56035c162050, undef:i64
try.c: 0x56035c162050: i64,ch = CopyFromReg 0x56035c088950, Register:i64 %vreg50
try.c: 0x56035c173830: i64 = Register %vreg50
try.c: 0x56035c116420: i64 = undef
try.c: 0x56035c1735d0: v4i64,ch = CopyFromReg 0x56035c088950, Register:v4i64 %vreg13
try.c: 0x56035c178290: v4i64 = Register %vreg13
try.c: 0x56035c117db0: v16i32 = X86ISD::VBROADCAST 0x56035c177ca0
try.c: 0x56035c177ca0: i32,ch = load<LD4[ConstantPool]> 0x56035c088950, 0x56035c13d720, undef:i64
try.c: 0x56035c13d720: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x56035c165d20: i64 = TargetConstantPool<i32 1> 0
try.c: 0x56035c116420: i64 = undef
try.c: 0x56035c17ce10: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x56035c17cce0: i32 = Constant<0>
try.c: 0x56035c17cce0: i32 = Constant<0>
try.c: 0x56035c17cce0: i32 = Constant<0>
try.c: 0x56035c17cce0: i32 = Constant<0>
try.c: 0x56035c17cce0: i32 = Constant<0>
try.c: 0x56035c17cce0: i32 = Constant<0>
try.c: 0x56035c17cce0: i32 = Constant<0>
try.c: 0x56035c17cce0: i32 = Constant<0>
try.c: 0x56035c17cce0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x555db3a996b0: v4i64 = X86ISD::VTRUNC 0x555db3a99580
try.c: 0x555db3a99580: v16i32 = vselect 0x555db3a88e60, 0x555db3a1dc80, 0x555db3a99450
try.c: 0x555db3a88e60: v4i1 = X86ISD::PCMPGTM 0x555db3a74690, 0x555db3a70220
try.c: 0x555db3a74690: v4i64 = X86ISD::VBROADCAST 0x555db3a1e140
try.c: 0x555db3a1e140: i64,ch = load<LD8[%lsr.iv6971]> 0x555db396da30, 0x555db3a08770, undef:i64
try.c: 0x555db3a08770: i64,ch = CopyFromReg 0x555db396da30, Register:i64 %vreg50
try.c: 0x555db3a70480: i64 = Register %vreg50
try.c: 0x555db3a21c00: i64 = undef
try.c: 0x555db3a70220: v4i64,ch = CopyFromReg 0x555db396da30, Register:v4i64 %vreg13
try.c: 0x555db3a74ee0: v4i64 = Register %vreg13
try.c: 0x555db3a1dc80: v16i32 = X86ISD::VBROADCAST 0x555db3a748f0
try.c: 0x555db3a748f0: i32,ch = load<LD4[ConstantPool]> 0x555db396da30, 0x555db3a06d40, undef:i64
try.c: 0x555db3a06d40: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x555db3a22580: i64 = TargetConstantPool<i32 1> 0
try.c: 0x555db3a21c00: i64 = undef
try.c: 0x555db3a99450: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x555db3a99320: i32 = Constant<0>
try.c: 0x555db3a99320: i32 = Constant<0>
try.c: 0x555db3a99320: i32 = Constant<0>
try.c: 0x555db3a99320: i32 = Constant<0>
try.c: 0x555db3a99320: i32 = Constant<0>
try.c: 0x555db3a99320: i32 = Constant<0>
try.c: 0x555db3a99320: i32 = Constant<0>
try.c: 0x555db3a99320: i32 = Constant<0>
try.c: 0x555db3a99320: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55e72c356af0: v4i64 = X86ISD::VTRUNC 0x55e72c3569c0
try.c: 0x55e72c3569c0: v16i32 = vselect 0x55e72c36bf80, 0x55e72c2ef640, 0x55e72c356890
try.c: 0x55e72c36bf80: v4i1 = X86ISD::PCMPGTM 0x55e72c34e890, 0x55e72c34a420
try.c: 0x55e72c34e890: v4i64 = X86ISD::VBROADCAST 0x55e72c319ee0
try.c: 0x55e72c319ee0: i64,ch = load<LD8[%lsr.iv6971]> 0x55e72c25f960, 0x55e72c338650, undef:i64
try.c: 0x55e72c338650: i64,ch = CopyFromReg 0x55e72c25f960, Register:i64 %vreg50
try.c: 0x55e72c34a680: i64 = Register %vreg50
try.c: 0x55e72c31b3b0: i64 = undef
try.c: 0x55e72c34a420: v4i64,ch = CopyFromReg 0x55e72c25f960, Register:v4i64 %vreg13
try.c: 0x55e72c34f0e0: v4i64 = Register %vreg13
try.c: 0x55e72c2ef640: v16i32 = X86ISD::VBROADCAST 0x55e72c34eaf0
try.c: 0x55e72c34eaf0: i32,ch = load<LD4[ConstantPool]> 0x55e72c25f960, 0x55e72c2fdc50, undef:i64
try.c: 0x55e72c2fdc50: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x55e72c3330c0: i64 = TargetConstantPool<i32 1> 0
try.c: 0x55e72c31b3b0: i64 = undef
try.c: 0x55e72c356890: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55e72c356760: i32 = Constant<0>
try.c: 0x55e72c356760: i32 = Constant<0>
try.c: 0x55e72c356760: i32 = Constant<0>
try.c: 0x55e72c356760: i32 = Constant<0>
try.c: 0x55e72c356760: i32 = Constant<0>
try.c: 0x55e72c356760: i32 = Constant<0>
try.c: 0x55e72c356760: i32 = Constant<0>
try.c: 0x55e72c356760: i32 = Constant<0>
try.c: 0x55e72c356760: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: stream.c:148:3: error: always_inline function '_mm256_set_epi32' requires target feature 'sse4.2', but would be inlined into function 'Encrypt' that is compiled without support for 'sse4.2'
stream.c: SET1(X[0],nonce[1]); SET8(Y[0],nonce[0]);
stream.c: ^
stream.c: ./Intrinsics_AVX2_64block.h:25:22: note: expanded from macro 'SET1'
stream.c: #define SET1(X,c) (X=SET(c,c,c,c,c,c,c,c))
stream.c: ^
stream.c: ./Intrinsics_AVX2_64block.h:24:13: note: expanded from macro 'SET'
stream.c: #define SET _mm256_set_epi32
stream.c: ^
stream.c: stream.c:148:24: error: always_inline function '_mm256_set_epi32' requires target feature 'sse4.2', but would be inlined into function 'Encrypt' that is compiled without support for 'sse4.2'
stream.c: SET1(X[0],nonce[1]); SET8(Y[0],nonce[0]);
stream.c: ^
stream.c: ./Intrinsics_AVX2_64block.h:26:22: note: expanded from macro 'SET8'
stream.c: #define SET8(X,c) (X=SET(c,c,c,c,c,c,c,c), X=ADD(X,_q))
stream.c: ^
stream.c: ./Intrinsics_AVX2_64block.h:24:13: note: expanded from macro 'SET'
stream.c: #define SET _mm256_set_epi32
stream.c: ^
stream.c: stream.c:148:24: error: always_inline function '_mm256_add_epi32' requires target feature 'sse4.2', but would be inlined into function 'Encrypt' that is compiled without support for 'sse4.2'
stream.c: ./Intrinsics_AVX2_64block.h:26:46: note: expanded from macro 'SET8'
stream.c: #define SET8(X,c) (X=SET(c,c,c,c,c,c,c,c), X=ADD(X,_q))
stream.c: ^
stream.c: ./Intrinsics_AVX2_64block.h:17:13: note: expanded from macro 'ADD'
stream.c: #define ADD _mm256_add_epi32
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x5601abec24f0: v4i64 = X86ISD::VTRUNC 0x5601abec23c0
try.c: 0x5601abec23c0: v16i32 = vselect 0x5601abeae7d0, 0x5601abe4c310, 0x5601abec2290
try.c: 0x5601abeae7d0: v4i1 = X86ISD::PCMPGTM 0x5601abea7b70, 0x5601abea3700
try.c: 0x5601abea7b70: v4i64 = X86ISD::VBROADCAST 0x5601abe51b30
try.c: 0x5601abe51b30: i64,ch = load<LD8[%lsr.iv6971]> 0x5601abdb8950, 0x5601abe9e560, undef:i64
try.c: 0x5601abe9e560: i64,ch = CopyFromReg 0x5601abdb8950, Register:i64 %vreg50
try.c: 0x5601abea3960: i64 = Register %vreg50
try.c: 0x5601abe4a980: i64 = undef
try.c: 0x5601abea3700: v4i64,ch = CopyFromReg 0x5601abdb8950, Register:v4i64 %vreg13
try.c: 0x5601abea83c0: v4i64 = Register %vreg13
try.c: 0x5601abe4c310: v16i32 = X86ISD::VBROADCAST 0x5601abea7dd0
try.c: 0x5601abea7dd0: i32,ch = load<LD4[ConstantPool]> 0x5601abdb8950, 0x5601abe51110, undef:i64
try.c: 0x5601abe51110: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x5601abe8d070: i64 = TargetConstantPool<i32 1> 0
try.c: 0x5601abe4a980: i64 = undef
try.c: 0x5601abec2290: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x5601abec2160: i32 = Constant<0>
try.c: 0x5601abec2160: i32 = Constant<0>
try.c: 0x5601abec2160: i32 = Constant<0>
try.c: 0x5601abec2160: i32 = Constant<0>
try.c: 0x5601abec2160: i32 = Constant<0>
try.c: 0x5601abec2160: i32 = Constant<0>
try.c: 0x5601abec2160: i32 = Constant<0>
try.c: 0x5601abec2160: i32 = Constant<0>
try.c: 0x5601abec2160: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x555fbbf6d010: v4i64 = X86ISD::VTRUNC 0x555fbbf6cee0
try.c: 0x555fbbf6cee0: v16i32 = vselect 0x555fbbf5c720, 0x555fbbedc300, 0x555fbbf6cdb0
try.c: 0x555fbbf5c720: v4i1 = X86ISD::PCMPGTM 0x555fbbf47f40, 0x555fbbf44ae0
try.c: 0x555fbbf47f40: v4i64 = X86ISD::VBROADCAST 0x555fbbedc7c0
try.c: 0x555fbbedc7c0: i64,ch = load<LD8[%lsr.iv6971]> 0x555fbbe42a30, 0x555fbbef4220, undef:i64
try.c: 0x555fbbef4220: i64,ch = CopyFromReg 0x555fbbe42a30, Register:i64 %vreg50
try.c: 0x555fbbf44d40: i64 = Register %vreg50
try.c: 0x555fbbef7640: i64 = undef
try.c: 0x555fbbf44ae0: v4i64,ch = CopyFromReg 0x555fbbe42a30, Register:v4i64 %vreg13
try.c: 0x555fbbf48790: v4i64 = Register %vreg13
try.c: 0x555fbbedc300: v16i32 = X86ISD::VBROADCAST 0x555fbbf481a0
try.c: 0x555fbbf481a0: i32,ch = load<LD4[ConstantPool]> 0x555fbbe42a30, 0x555fbbef27f0, undef:i64
try.c: 0x555fbbef27f0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x555fbbef7fc0: i64 = TargetConstantPool<i32 1> 0
try.c: 0x555fbbef7640: i64 = undef
try.c: 0x555fbbf6cdb0: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x555fbbf6cc80: i32 = Constant<0>
try.c: 0x555fbbf6cc80: i32 = Constant<0>
try.c: 0x555fbbf6cc80: i32 = Constant<0>
try.c: 0x555fbbf6cc80: i32 = Constant<0>
try.c: 0x555fbbf6cc80: i32 = Constant<0>
try.c: 0x555fbbf6cc80: i32 = Constant<0>
try.c: 0x555fbbf6cc80: i32 = Constant<0>
try.c: 0x555fbbf6cc80: i32 = Constant<0>
try.c: 0x555fbbf6cc80: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55ee4d094300: v4i64 = X86ISD::VTRUNC 0x55ee4d0941d0
try.c: 0x55ee4d0941d0: v16i32 = vselect 0x55ee4d074b80, 0x55ee4d02e940, 0x55ee4d0940a0
try.c: 0x55ee4d074b80: v4i1 = X86ISD::PCMPGTM 0x55ee4d071b50, 0x55ee4d06d6e0
try.c: 0x55ee4d071b50: v4i64 = X86ISD::VBROADCAST 0x55ee4d02bae0
try.c: 0x55ee4d02bae0: i64,ch = load<LD8[%lsr.iv6971]> 0x55ee4cf82940, 0x55ee4d068540, undef:i64
try.c: 0x55ee4d068540: i64,ch = CopyFromReg 0x55ee4cf82940, Register:i64 %vreg50
try.c: 0x55ee4d06d940: i64 = Register %vreg50
try.c: 0x55ee4d02cfb0: i64 = undef
try.c: 0x55ee4d06d6e0: v4i64,ch = CopyFromReg 0x55ee4cf82940, Register:v4i64 %vreg13
try.c: 0x55ee4d0723a0: v4i64 = Register %vreg13
try.c: 0x55ee4d02e940: v16i32 = X86ISD::VBROADCAST 0x55ee4d071db0
try.c: 0x55ee4d071db0: i32,ch = load<LD4[ConstantPool]> 0x55ee4cf82940, 0x55ee4d016160, undef:i64
try.c: 0x55ee4d016160: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x55ee4d057ca0: i64 = TargetConstantPool<i32 1> 0
try.c: 0x55ee4d02cfb0: i64 = undef
try.c: 0x55ee4d0940a0: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55ee4d093f70: i32 = Constant<0>
try.c: 0x55ee4d093f70: i32 = Constant<0>
try.c: 0x55ee4d093f70: i32 = Constant<0>
try.c: 0x55ee4d093f70: i32 = Constant<0>
try.c: 0x55ee4d093f70: i32 = Constant<0>
try.c: 0x55ee4d093f70: i32 = Constant<0>
try.c: 0x55ee4d093f70: i32 = Constant<0>
try.c: 0x55ee4d093f70: i32 = Constant<0>
try.c: 0x55ee4d093f70: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: stream.c:341:3: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'ExpandKeyBS' that is compiled without support for 'ssse3'
stream.c: EKBS(rk);
stream.c: ^
stream.c: ./Simon6496SSE4.h:59:19: note: expanded from macro 'EKBS'
stream.c: #define EKBS(rk) (RKBS(rk,3,_D), RKBS(rk,4,_C), RKBS(rk,5,_D), RKBS(rk,6,_C), RKBS(rk,7,_D), RKBS(rk,8,_D), \
stream.c: ^
stream.c: ./Simon6496SSE4.h:50:52: note: expanded from macro 'RKBS'
stream.c: #define RKBS(rk,r,_V) (rk[r][7]= _D ^ rk[r-3][7] ^ ROR8(rk[r-1][2] ^ rk[r-1][3]), \
stream.c: ^
stream.c: ./Intrinsics_SSE4_64block.h:40:19: note: expanded from macro 'ROR8'
stream.c: #define ROR8(X) (SHFL(X,R8))
stream.c: ^
stream.c: ./Intrinsics_SSE4_64block.h:35:14: note: expanded from macro 'SHFL'
stream.c: #define SHFL _mm_shuffle_epi8
stream.c: ^
stream.c: stream.c:341:3: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'ExpandKeyBS' that is compiled without support for 'ssse3'
stream.c: ./Simon6496SSE4.h:59:19: note: expanded from macro 'EKBS'
stream.c: #define EKBS(rk) (RKBS(rk,3,_D), RKBS(rk,4,_C), RKBS(rk,5,_D), RKBS(rk,6,_C), RKBS(rk,7,_D), RKBS(rk,8,_D), \
stream.c: ^
stream.c: ./Simon6496SSE4.h:51:52: note: expanded from macro 'RKBS'
stream.c: rk[r][6]= _D ^ rk[r-3][6] ^ ROR8(rk[r-1][1] ^ rk[r-1][2]), \
stream.c: ^
stream.c: ./Intrinsics_SSE4_64block.h:40:19: note: expanded from macro 'ROR8'
stream.c: #define ROR8(X) (SHFL(X,R8))
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4