Implementation notes: amd64, cel02, crypto_stream/simon128128ctr

Computer: cel02
Architecture: amd64
CPU ID: GenuineIntel-00050657-bfebfbff
SUPERCOP version: 20201130
Operation: crypto_stream
Primitive: simon128128ctr
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
7194121519 0 0134540 816 856T:avx2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
7272181876 0 0198525 824 888T:avx2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
9380121095 0 0134476 816 856T:avx2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
12408115103 0 0131701 824 888T:sse4gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
13348121179 0 0133112 800 824T:avx2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
14418147063 0 0158908 792 800T:sse4clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020121120201130
14628148022 0 0160028 792 800T:avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020121120201130
22114115098 0 0128452 816 856T:sse4gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
24860126297 0 0139292 816 856T:sse4gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
26862120613 0 0132512 800 824T:sse4gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55ff87293f60: v4i64 = X86ISD::VTRUNC 0x55ff87293e30
try.c: 0x55ff87293e30: v16i32 = vselect 0x55ff872ae2c0, 0x55ff87235a00, 0x55ff87293d00
try.c: 0x55ff872ae2c0: v4i1 = X86ISD::PCMPGTM 0x55ff8728d930, 0x55ff872894c0
try.c: 0x55ff8728d930: v4i64 = X86ISD::VBROADCAST 0x55ff8722d9d0
try.c: 0x55ff8722d9d0: i64,ch = load<LD8[%lsr.iv6971]> 0x55ff8719e950, 0x55ff87277570, undef:i64
try.c: 0x55ff87277570: i64,ch = CopyFromReg 0x55ff8719e950, Register:i64 %vreg50
try.c: 0x55ff87289720: i64 = Register %vreg50
try.c: 0x55ff87234070: i64 = undef
try.c: 0x55ff872894c0: v4i64,ch = CopyFromReg 0x55ff8719e950, Register:v4i64 %vreg13
try.c: 0x55ff8728e180: v4i64 = Register %vreg13
try.c: 0x55ff87235a00: v16i32 = X86ISD::VBROADCAST 0x55ff8728db90
try.c: 0x55ff8728db90: i32,ch = load<LD4[ConstantPool]> 0x55ff8719e950, 0x55ff8722cfb0, undef:i64
try.c: 0x55ff8722cfb0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x55ff87272360: i64 = TargetConstantPool<i32 1> 0
try.c: 0x55ff87234070: i64 = undef
try.c: 0x55ff87293d00: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55ff87293bd0: i32 = Constant<0>
try.c: 0x55ff87293bd0: i32 = Constant<0>
try.c: 0x55ff87293bd0: i32 = Constant<0>
try.c: 0x55ff87293bd0: i32 = Constant<0>
try.c: 0x55ff87293bd0: i32 = Constant<0>
try.c: 0x55ff87293bd0: i32 = Constant<0>
try.c: 0x55ff87293bd0: i32 = Constant<0>
try.c: 0x55ff87293bd0: i32 = Constant<0>
try.c: 0x55ff87293bd0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55bc9f593d40: v4i64 = X86ISD::VTRUNC 0x55bc9f593c10
try.c: 0x55bc9f593c10: v16i32 = vselect 0x55bc9f5a3090, 0x55bc9f521b30, 0x55bc9f593ae0
try.c: 0x55bc9f5a3090: v4i1 = X86ISD::PCMPGTM 0x55bc9f58b130, 0x55bc9f5872c0
try.c: 0x55bc9f58b130: v4i64 = X86ISD::VBROADCAST 0x55bc9f521ff0
try.c: 0x55bc9f521ff0: i64,ch = load<LD8[%lsr.iv6971]> 0x55bc9f484a30, 0x55bc9f5274e0, undef:i64
try.c: 0x55bc9f5274e0: i64,ch = CopyFromReg 0x55bc9f484a30, Register:i64 %vreg50
try.c: 0x55bc9f587520: i64 = Register %vreg50
try.c: 0x55bc9f536ed0: i64 = undef
try.c: 0x55bc9f5872c0: v4i64,ch = CopyFromReg 0x55bc9f484a30, Register:v4i64 %vreg13
try.c: 0x55bc9f58b980: v4i64 = Register %vreg13
try.c: 0x55bc9f521b30: v16i32 = X86ISD::VBROADCAST 0x55bc9f58b390
try.c: 0x55bc9f58b390: i32,ch = load<LD4[ConstantPool]> 0x55bc9f484a30, 0x55bc9f525ab0, undef:i64
try.c: 0x55bc9f525ab0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x55bc9f537850: i64 = TargetConstantPool<i32 1> 0
try.c: 0x55bc9f536ed0: i64 = undef
try.c: 0x55bc9f593ae0: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55bc9f5939b0: i32 = Constant<0>
try.c: 0x55bc9f5939b0: i32 = Constant<0>
try.c: 0x55bc9f5939b0: i32 = Constant<0>
try.c: 0x55bc9f5939b0: i32 = Constant<0>
try.c: 0x55bc9f5939b0: i32 = Constant<0>
try.c: 0x55bc9f5939b0: i32 = Constant<0>
try.c: 0x55bc9f5939b0: i32 = Constant<0>
try.c: 0x55bc9f5939b0: i32 = Constant<0>
try.c: 0x55bc9f5939b0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55b189aa0840: v4i64 = X86ISD::VTRUNC 0x55b189aa0710
try.c: 0x55b189aa0710: v16i32 = vselect 0x55b189a78410, 0x55b189a2b010, 0x55b189aa05e0
try.c: 0x55b189a78410: v4i1 = X86ISD::PCMPGTM 0x55b189a7d7d0, 0x55b189a79360
try.c: 0x55b189a7d7d0: v4i64 = X86ISD::VBROADCAST 0x55b189a24dc0
try.c: 0x55b189a24dc0: i64,ch = load<LD8[%lsr.iv6971]> 0x55b18998e930, 0x55b189a68020, undef:i64
try.c: 0x55b189a68020: i64,ch = CopyFromReg 0x55b18998e930, Register:i64 %vreg50
try.c: 0x55b189a795c0: i64 = Register %vreg50
try.c: 0x55b189a26290: i64 = undef
try.c: 0x55b189a79360: v4i64,ch = CopyFromReg 0x55b18998e930, Register:v4i64 %vreg13
try.c: 0x55b189a7e020: v4i64 = Register %vreg13
try.c: 0x55b189a2b010: v16i32 = X86ISD::VBROADCAST 0x55b189a7da30
try.c: 0x55b189a7da30: i32,ch = load<LD4[ConstantPool]> 0x55b18998e930, 0x55b189a44da0, undef:i64
try.c: 0x55b189a44da0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x55b189a1ddd0: i64 = TargetConstantPool<i32 1> 0
try.c: 0x55b189a26290: i64 = undef
try.c: 0x55b189aa05e0: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55b189aa04b0: i32 = Constant<0>
try.c: 0x55b189aa04b0: i32 = Constant<0>
try.c: 0x55b189aa04b0: i32 = Constant<0>
try.c: 0x55b189aa04b0: i32 = Constant<0>
try.c: 0x55b189aa04b0: i32 = Constant<0>
try.c: 0x55b189aa04b0: i32 = Constant<0>
try.c: 0x55b189aa04b0: i32 = Constant<0>
try.c: 0x55b189aa04b0: i32 = Constant<0>
try.c: 0x55b189aa04b0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: stream.c:130:3: error: always_inline function '_mm256_set_epi64x' requires target feature 'sse4.2', but would be inlined into function 'Encrypt' that is compiled without support for 'sse4.2'
stream.c: SET1(X[0],nonce[1]); SET4(Y[0],nonce[0]);
stream.c: ^
stream.c: ./Intrinsics_AVX2_128block.h:25:22: note: expanded from macro 'SET1'
stream.c: #define SET1(X,c) (X=SET(c,c,c,c))
stream.c: ^
stream.c: ./Intrinsics_AVX2_128block.h:24:13: note: expanded from macro 'SET'
stream.c: #define SET _mm256_set_epi64x
stream.c: ^
stream.c: stream.c:130:24: error: always_inline function '_mm256_set_epi64x' requires target feature 'sse4.2', but would be inlined into function 'Encrypt' that is compiled without support for 'sse4.2'
stream.c: SET1(X[0],nonce[1]); SET4(Y[0],nonce[0]);
stream.c: ^
stream.c: ./Intrinsics_AVX2_128block.h:26:22: note: expanded from macro 'SET4'
stream.c: #define SET4(X,c) (X=SET(c,c,c,c), X=ADD(X,_q))
stream.c: ^
stream.c: ./Intrinsics_AVX2_128block.h:24:13: note: expanded from macro 'SET'
stream.c: #define SET _mm256_set_epi64x
stream.c: ^
stream.c: stream.c:130:24: error: always_inline function '_mm256_add_epi64' requires target feature 'sse4.2', but would be inlined into function 'Encrypt' that is compiled without support for 'sse4.2'
stream.c: ./Intrinsics_AVX2_128block.h:26:38: note: expanded from macro 'SET4'
stream.c: #define SET4(X,c) (X=SET(c,c,c,c), X=ADD(X,_q))
stream.c: ^
stream.c: ./Intrinsics_AVX2_128block.h:17:13: note: expanded from macro 'ADD'
stream.c: #define ADD _mm256_add_epi64
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x556dfbf34180: v4i64 = X86ISD::VTRUNC 0x556dfbf34050
try.c: 0x556dfbf34050: v16i32 = vselect 0x556dfbf4a6f0, 0x556dfbecf8f0, 0x556dfbf33f20
try.c: 0x556dfbf4a6f0: v4i1 = X86ISD::PCMPGTM 0x556dfbf2fb60, 0x556dfbf2b6f0
try.c: 0x556dfbf2fb60: v4i64 = X86ISD::VBROADCAST 0x556dfbedab50
try.c: 0x556dfbedab50: i64,ch = load<LD8[%lsr.iv6971]> 0x556dfbe40950, 0x556dfbf1b2b0, undef:i64
try.c: 0x556dfbf1b2b0: i64,ch = CopyFromReg 0x556dfbe40950, Register:i64 %vreg50
try.c: 0x556dfbf2b950: i64 = Register %vreg50
try.c: 0x556dfbedc020: i64 = undef
try.c: 0x556dfbf2b6f0: v4i64,ch = CopyFromReg 0x556dfbe40950, Register:v4i64 %vreg13
try.c: 0x556dfbf303b0: v4i64 = Register %vreg13
try.c: 0x556dfbecf8f0: v16i32 = X86ISD::VBROADCAST 0x556dfbf2fdc0
try.c: 0x556dfbf2fdc0: i32,ch = load<LD4[ConstantPool]> 0x556dfbe40950, 0x556dfbed9990, undef:i64
try.c: 0x556dfbed9990: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x556dfbed3b90: i64 = TargetConstantPool<i32 1> 0
try.c: 0x556dfbedc020: i64 = undef
try.c: 0x556dfbf33f20: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x556dfbf33df0: i32 = Constant<0>
try.c: 0x556dfbf33df0: i32 = Constant<0>
try.c: 0x556dfbf33df0: i32 = Constant<0>
try.c: 0x556dfbf33df0: i32 = Constant<0>
try.c: 0x556dfbf33df0: i32 = Constant<0>
try.c: 0x556dfbf33df0: i32 = Constant<0>
try.c: 0x556dfbf33df0: i32 = Constant<0>
try.c: 0x556dfbf33df0: i32 = Constant<0>
try.c: 0x556dfbf33df0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x5613f7e7a000: v4i64 = X86ISD::VTRUNC 0x5613f7e79ed0
try.c: 0x5613f7e79ed0: v16i32 = vselect 0x5613f7e769e0, 0x5613f7e0c390, 0x5613f7e79da0
try.c: 0x5613f7e769e0: v4i1 = X86ISD::PCMPGTM 0x5613f7e62210, 0x5613f7e5d5e0
try.c: 0x5613f7e62210: v4i64 = X86ISD::VBROADCAST 0x5613f7e0c850
try.c: 0x5613f7e0c850: i64,ch = load<LD8[%lsr.iv6971]> 0x5613f7d5ba30, 0x5613f7df7710, undef:i64
try.c: 0x5613f7df7710: i64,ch = CopyFromReg 0x5613f7d5ba30, Register:i64 %vreg50
try.c: 0x5613f7e5d840: i64 = Register %vreg50
try.c: 0x5613f7df0a80: i64 = undef
try.c: 0x5613f7e5d5e0: v4i64,ch = CopyFromReg 0x5613f7d5ba30, Register:v4i64 %vreg13
try.c: 0x5613f7e62a60: v4i64 = Register %vreg13
try.c: 0x5613f7e0c390: v16i32 = X86ISD::VBROADCAST 0x5613f7e62470
try.c: 0x5613f7e62470: i32,ch = load<LD4[ConstantPool]> 0x5613f7d5ba30, 0x5613f7df5ce0, undef:i64
try.c: 0x5613f7df5ce0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x5613f7df1400: i64 = TargetConstantPool<i32 1> 0
try.c: 0x5613f7df0a80: i64 = undef
try.c: 0x5613f7e79da0: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x5613f7e79c70: i32 = Constant<0>
try.c: 0x5613f7e79c70: i32 = Constant<0>
try.c: 0x5613f7e79c70: i32 = Constant<0>
try.c: 0x5613f7e79c70: i32 = Constant<0>
try.c: 0x5613f7e79c70: i32 = Constant<0>
try.c: 0x5613f7e79c70: i32 = Constant<0>
try.c: 0x5613f7e79c70: i32 = Constant<0>
try.c: 0x5613f7e79c70: i32 = Constant<0>
try.c: 0x5613f7e79c70: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x559b81408070: v4i64 = X86ISD::VTRUNC 0x559b81407f40
try.c: 0x559b81407f40: v16i32 = vselect 0x559b81421910, 0x559b813b15d0, 0x559b81407e10
try.c: 0x559b81421910: v4i1 = X86ISD::PCMPGTM 0x559b81402a40, 0x559b813fe5d0
try.c: 0x559b81402a40: v4i64 = X86ISD::VBROADCAST 0x559b813a5d40
try.c: 0x559b813a5d40: i64,ch = load<LD8[%lsr.iv6971]> 0x559b81313930, 0x559b813ec5c0, undef:i64
try.c: 0x559b813ec5c0: i64,ch = CopyFromReg 0x559b81313930, Register:i64 %vreg50
try.c: 0x559b813fe830: i64 = Register %vreg50
try.c: 0x559b813a7210: i64 = undef
try.c: 0x559b813fe5d0: v4i64,ch = CopyFromReg 0x559b81313930, Register:v4i64 %vreg13
try.c: 0x559b81403290: v4i64 = Register %vreg13
try.c: 0x559b813b15d0: v16i32 = X86ISD::VBROADCAST 0x559b81402ca0
try.c: 0x559b81402ca0: i32,ch = load<LD4[ConstantPool]> 0x559b81313930, 0x559b813af870, undef:i64
try.c: 0x559b813af870: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x559b813e7900: i64 = TargetConstantPool<i32 1> 0
try.c: 0x559b813a7210: i64 = undef
try.c: 0x559b81407e10: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x559b81407ce0: i32 = Constant<0>
try.c: 0x559b81407ce0: i32 = Constant<0>
try.c: 0x559b81407ce0: i32 = Constant<0>
try.c: 0x559b81407ce0: i32 = Constant<0>
try.c: 0x559b81407ce0: i32 = Constant<0>
try.c: 0x559b81407ce0: i32 = Constant<0>
try.c: 0x559b81407ce0: i32 = Constant<0>
try.c: 0x559b81407ce0: i32 = Constant<0>
try.c: 0x559b81407ce0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: stream.c:306:3: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'ExpandKeyBS' that is compiled without support for 'ssse3'
stream.c: EKBS(rk);
stream.c: ^
stream.c: ./Simon128128SSE4.h:62:19: note: expanded from macro 'EKBS'
stream.c: #define EKBS(rk) (RKBS(rk,2,_D), RKBS(rk,3,_C), RKBS(rk,4,_D), RKBS(rk,5,_C), RKBS(rk,6,_D), RKBS(rk,7,_D), RKBS(rk,8,_D), RKBS(rk,9,_D), \
stream.c: ^
stream.c: ./Simon128128SSE4.h:53:52: note: expanded from macro 'RKBS'
stream.c: #define RKBS(rk,r,_V) (rk[r][7]= _D ^ rk[r-2][7] ^ ROR8(rk[r-1][2] ^ rk[r-1][3]), \
stream.c: ^
stream.c: ./Intrinsics_SSE4_128block.h:39:19: note: expanded from macro 'ROR8'
stream.c: #define ROR8(X) (SHFL(X,R8))
stream.c: ^
stream.c: ./Intrinsics_SSE4_128block.h:35:14: note: expanded from macro 'SHFL'
stream.c: #define SHFL _mm_shuffle_epi8
stream.c: ^
stream.c: stream.c:306:3: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'ExpandKeyBS' that is compiled without support for 'ssse3'
stream.c: ./Simon128128SSE4.h:62:19: note: expanded from macro 'EKBS'
stream.c: #define EKBS(rk) (RKBS(rk,2,_D), RKBS(rk,3,_C), RKBS(rk,4,_D), RKBS(rk,5,_C), RKBS(rk,6,_D), RKBS(rk,7,_D), RKBS(rk,8,_D), RKBS(rk,9,_D), \
stream.c: ^
stream.c: ./Simon128128SSE4.h:54:38: note: expanded from macro 'RKBS'
stream.c: rk[r][6]= _D ^ rk[r-2][6] ^ ROR8(rk[r-1][1] ^ rk[r-1][2]), \
stream.c: ^
stream.c: ./Intrinsics_SSE4_128block.h:39:19: note: expanded from macro 'ROR8'
stream.c: #define ROR8(X) (SHFL(X,R8))
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4