Implementation notes: amd64, cel02, crypto_stream/simon64128ctr

Computer: cel02
Architecture: amd64
CPU ID: GenuineIntel-00050657-bfebfbff
SUPERCOP version: 20201130
Operation: crypto_stream
Primitive: simon64128ctr
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
3700142500 0 0159125 824 888T:avx2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
525297945 0 0111316 816 856T:avx2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
913097066 0 0108960 800 824T:avx2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
978698102 0 0111100 816 856T:avx2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
1058890589 0 0107181 824 888T:sse4gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
11182117285 0 0129244 792 800T:avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020121120201130
1699290188 0 0103156 816 856T:sse4gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
1752489877 0 0101736 800 824T:sse4gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
17732114432 0 0126220 792 800T:sse4clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020121120201130
1779291055 0 0104396 816 856T:sse4gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x5652174f21b0: v4i64 = X86ISD::VTRUNC 0x5652174f2080
try.c: 0x5652174f2080: v16i32 = vselect 0x5652174f83f0, 0x565217495880, 0x5652174f1f50
try.c: 0x5652174f83f0: v4i1 = X86ISD::PCMPGTM 0x5652174da850, 0x5652174d63e0
try.c: 0x5652174da850: v4i64 = X86ISD::VBROADCAST 0x56521747da60
try.c: 0x56521747da60: i64,ch = load<LD8[%lsr.iv6971]> 0x5652173eb950, 0x5652174cd060, undef:i64
try.c: 0x5652174cd060: i64,ch = CopyFromReg 0x5652173eb950, Register:i64 %vreg50
try.c: 0x5652174d6640: i64 = Register %vreg50
try.c: 0x56521747ef30: i64 = undef
try.c: 0x5652174d63e0: v4i64,ch = CopyFromReg 0x5652173eb950, Register:v4i64 %vreg13
try.c: 0x5652174db0a0: v4i64 = Register %vreg13
try.c: 0x565217495880: v16i32 = X86ISD::VBROADCAST 0x5652174daab0
try.c: 0x5652174daab0: i32,ch = load<LD4[ConstantPool]> 0x5652173eb950, 0x5652174914f0, undef:i64
try.c: 0x5652174914f0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x5652174bee70: i64 = TargetConstantPool<i32 1> 0
try.c: 0x56521747ef30: i64 = undef
try.c: 0x5652174f1f50: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x5652174f1e20: i32 = Constant<0>
try.c: 0x5652174f1e20: i32 = Constant<0>
try.c: 0x5652174f1e20: i32 = Constant<0>
try.c: 0x5652174f1e20: i32 = Constant<0>
try.c: 0x5652174f1e20: i32 = Constant<0>
try.c: 0x5652174f1e20: i32 = Constant<0>
try.c: 0x5652174f1e20: i32 = Constant<0>
try.c: 0x5652174f1e20: i32 = Constant<0>
try.c: 0x5652174f1e20: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x5604a7f0d930: v4i64 = X86ISD::VTRUNC 0x5604a7f0d800
try.c: 0x5604a7f0d800: v16i32 = vselect 0x5604a7f0a310, 0x5604a7e98f00, 0x5604a7f0d6d0
try.c: 0x5604a7f0a310: v4i1 = X86ISD::PCMPGTM 0x5604a7f04ad0, 0x5604a7f00660
try.c: 0x5604a7f04ad0: v4i64 = X86ISD::VBROADCAST 0x5604a7e993c0
try.c: 0x5604a7e993c0: i64,ch = load<LD8[%lsr.iv6971]> 0x5604a7dfea30, 0x5604a7ea1550, undef:i64
try.c: 0x5604a7ea1550: i64,ch = CopyFromReg 0x5604a7dfea30, Register:i64 %vreg50
try.c: 0x5604a7f008c0: i64 = Register %vreg50
try.c: 0x5604a7ea6c30: i64 = undef
try.c: 0x5604a7f00660: v4i64,ch = CopyFromReg 0x5604a7dfea30, Register:v4i64 %vreg13
try.c: 0x5604a7f05320: v4i64 = Register %vreg13
try.c: 0x5604a7e98f00: v16i32 = X86ISD::VBROADCAST 0x5604a7f04d30
try.c: 0x5604a7f04d30: i32,ch = load<LD4[ConstantPool]> 0x5604a7dfea30, 0x5604a7e9b8a0, undef:i64
try.c: 0x5604a7e9b8a0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x5604a7ea75b0: i64 = TargetConstantPool<i32 1> 0
try.c: 0x5604a7ea6c30: i64 = undef
try.c: 0x5604a7f0d6d0: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x5604a7f0d5a0: i32 = Constant<0>
try.c: 0x5604a7f0d5a0: i32 = Constant<0>
try.c: 0x5604a7f0d5a0: i32 = Constant<0>
try.c: 0x5604a7f0d5a0: i32 = Constant<0>
try.c: 0x5604a7f0d5a0: i32 = Constant<0>
try.c: 0x5604a7f0d5a0: i32 = Constant<0>
try.c: 0x5604a7f0d5a0: i32 = Constant<0>
try.c: 0x5604a7f0d5a0: i32 = Constant<0>
try.c: 0x5604a7f0d5a0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x556e13e4ebd0: v4i64 = X86ISD::VTRUNC 0x556e13e4eaa0
try.c: 0x556e13e4eaa0: v16i32 = vselect 0x556e13e64550, 0x556e13e01220, 0x556e13e4e970
try.c: 0x556e13e64550: v4i1 = X86ISD::PCMPGTM 0x556e13e46970, 0x556e13e42500
try.c: 0x556e13e46970: v4i64 = X86ISD::VBROADCAST 0x556e13deede0
try.c: 0x556e13deede0: i64,ch = load<LD8[%lsr.iv6971]> 0x556e13d57950, 0x556e13e3d360, undef:i64
try.c: 0x556e13e3d360: i64,ch = CopyFromReg 0x556e13d57950, Register:i64 %vreg50
try.c: 0x556e13e42760: i64 = Register %vreg50
try.c: 0x556e13dff890: i64 = undef
try.c: 0x556e13e42500: v4i64,ch = CopyFromReg 0x556e13d57950, Register:v4i64 %vreg13
try.c: 0x556e13e471c0: v4i64 = Register %vreg13
try.c: 0x556e13e01220: v16i32 = X86ISD::VBROADCAST 0x556e13e46bd0
try.c: 0x556e13e46bd0: i32,ch = load<LD4[ConstantPool]> 0x556e13d57950, 0x556e13dee3c0, undef:i64
try.c: 0x556e13dee3c0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x556e13e2bee0: i64 = TargetConstantPool<i32 1> 0
try.c: 0x556e13dff890: i64 = undef
try.c: 0x556e13e4e970: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x556e13e4e840: i32 = Constant<0>
try.c: 0x556e13e4e840: i32 = Constant<0>
try.c: 0x556e13e4e840: i32 = Constant<0>
try.c: 0x556e13e4e840: i32 = Constant<0>
try.c: 0x556e13e4e840: i32 = Constant<0>
try.c: 0x556e13e4e840: i32 = Constant<0>
try.c: 0x556e13e4e840: i32 = Constant<0>
try.c: 0x556e13e4e840: i32 = Constant<0>
try.c: 0x556e13e4e840: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: stream.c:147:3: error: always_inline function '_mm256_set_epi32' requires target feature 'sse4.2', but would be inlined into function 'Encrypt' that is compiled without support for 'sse4.2'
stream.c: SET1(X[0],nonce[1]); SET8(Y[0],nonce[0]);
stream.c: ^
stream.c: ./Intrinsics_AVX2_64block.h:24:22: note: expanded from macro 'SET1'
stream.c: #define SET1(X,c) (X=SET(c,c,c,c,c,c,c,c))
stream.c: ^
stream.c: ./Intrinsics_AVX2_64block.h:23:13: note: expanded from macro 'SET'
stream.c: #define SET _mm256_set_epi32
stream.c: ^
stream.c: stream.c:147:24: error: always_inline function '_mm256_set_epi32' requires target feature 'sse4.2', but would be inlined into function 'Encrypt' that is compiled without support for 'sse4.2'
stream.c: SET1(X[0],nonce[1]); SET8(Y[0],nonce[0]);
stream.c: ^
stream.c: ./Intrinsics_AVX2_64block.h:25:22: note: expanded from macro 'SET8'
stream.c: #define SET8(X,c) (X=SET(c,c,c,c,c,c,c,c), X=ADD(X,_q))
stream.c: ^
stream.c: ./Intrinsics_AVX2_64block.h:23:13: note: expanded from macro 'SET'
stream.c: #define SET _mm256_set_epi32
stream.c: ^
stream.c: stream.c:147:24: error: always_inline function '_mm256_add_epi32' requires target feature 'sse4.2', but would be inlined into function 'Encrypt' that is compiled without support for 'sse4.2'
stream.c: ./Intrinsics_AVX2_64block.h:25:46: note: expanded from macro 'SET8'
stream.c: #define SET8(X,c) (X=SET(c,c,c,c,c,c,c,c), X=ADD(X,_q))
stream.c: ^
stream.c: ./Intrinsics_AVX2_64block.h:16:13: note: expanded from macro 'ADD'
stream.c: #define ADD _mm256_add_epi32
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55a9d8b64240: v4i64 = X86ISD::VTRUNC 0x55a9d8b64110
try.c: 0x55a9d8b64110: v16i32 = vselect 0x55a9d8b40ce0, 0x55a9d8af64c0, 0x55a9d8b63fe0
try.c: 0x55a9d8b40ce0: v4i1 = X86ISD::PCMPGTM 0x55a9d8b488b0, 0x55a9d8b44440
try.c: 0x55a9d8b488b0: v4i64 = X86ISD::VBROADCAST 0x55a9d8af3660
try.c: 0x55a9d8af3660: i64,ch = load<LD8[%lsr.iv6971]> 0x55a9d8a59950, 0x55a9d8b32a00, undef:i64
try.c: 0x55a9d8b32a00: i64,ch = CopyFromReg 0x55a9d8a59950, Register:i64 %vreg50
try.c: 0x55a9d8b446a0: i64 = Register %vreg50
try.c: 0x55a9d8af4b30: i64 = undef
try.c: 0x55a9d8b44440: v4i64,ch = CopyFromReg 0x55a9d8a59950, Register:v4i64 %vreg13
try.c: 0x55a9d8b49100: v4i64 = Register %vreg13
try.c: 0x55a9d8af64c0: v16i32 = X86ISD::VBROADCAST 0x55a9d8b48b10
try.c: 0x55a9d8b48b10: i32,ch = load<LD4[ConstantPool]> 0x55a9d8a59950, 0x55a9d8b033c0, undef:i64
try.c: 0x55a9d8b033c0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x55a9d8b33c00: i64 = TargetConstantPool<i32 1> 0
try.c: 0x55a9d8af4b30: i64 = undef
try.c: 0x55a9d8b63fe0: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55a9d8b63eb0: i32 = Constant<0>
try.c: 0x55a9d8b63eb0: i32 = Constant<0>
try.c: 0x55a9d8b63eb0: i32 = Constant<0>
try.c: 0x55a9d8b63eb0: i32 = Constant<0>
try.c: 0x55a9d8b63eb0: i32 = Constant<0>
try.c: 0x55a9d8b63eb0: i32 = Constant<0>
try.c: 0x55a9d8b63eb0: i32 = Constant<0>
try.c: 0x55a9d8b63eb0: i32 = Constant<0>
try.c: 0x55a9d8b63eb0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x5555ec16a370: v4i64 = X86ISD::VTRUNC 0x5555ec16a240
try.c: 0x5555ec16a240: v16i32 = vselect 0x5555ec150580, 0x5555ec0fa930, 0x5555ec16a110
try.c: 0x5555ec150580: v4i1 = X86ISD::PCMPGTM 0x5555ec162260, 0x5555ec15d630
try.c: 0x5555ec162260: v4i64 = X86ISD::VBROADCAST 0x5555ec0fadf0
try.c: 0x5555ec0fadf0: i64,ch = load<LD8[%lsr.iv6971]> 0x5555ec05ba30, 0x5555ec10bf70, undef:i64
try.c: 0x5555ec10bf70: i64,ch = CopyFromReg 0x5555ec05ba30, Register:i64 %vreg50
try.c: 0x5555ec15d890: i64 = Register %vreg50
try.c: 0x5555ec0fe380: i64 = undef
try.c: 0x5555ec15d630: v4i64,ch = CopyFromReg 0x5555ec05ba30, Register:v4i64 %vreg13
try.c: 0x5555ec162ab0: v4i64 = Register %vreg13
try.c: 0x5555ec0fa930: v16i32 = X86ISD::VBROADCAST 0x5555ec1624c0
try.c: 0x5555ec1624c0: i32,ch = load<LD4[ConstantPool]> 0x5555ec05ba30, 0x5555ec0fd2d0, undef:i64
try.c: 0x5555ec0fd2d0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x5555ec0fed00: i64 = TargetConstantPool<i32 1> 0
try.c: 0x5555ec0fe380: i64 = undef
try.c: 0x5555ec16a110: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x5555ec169fe0: i32 = Constant<0>
try.c: 0x5555ec169fe0: i32 = Constant<0>
try.c: 0x5555ec169fe0: i32 = Constant<0>
try.c: 0x5555ec169fe0: i32 = Constant<0>
try.c: 0x5555ec169fe0: i32 = Constant<0>
try.c: 0x5555ec169fe0: i32 = Constant<0>
try.c: 0x5555ec169fe0: i32 = Constant<0>
try.c: 0x5555ec169fe0: i32 = Constant<0>
try.c: 0x5555ec169fe0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55a9e1026f00: v4i64 = X86ISD::VTRUNC 0x55a9e1026dd0
try.c: 0x55a9e1026dd0: v16i32 = vselect 0x55a9e103c460, 0x55a9e0fc9b20, 0x55a9e1026ca0
try.c: 0x55a9e103c460: v4i1 = X86ISD::PCMPGTM 0x55a9e10218d0, 0x55a9e101d460
try.c: 0x55a9e10218d0: v4i64 = X86ISD::VBROADCAST 0x55a9e0fd8130
try.c: 0x55a9e0fd8130: i64,ch = load<LD8[%lsr.iv6971]> 0x55a9e0f32940, 0x55a9e100c820, undef:i64
try.c: 0x55a9e100c820: i64,ch = CopyFromReg 0x55a9e0f32940, Register:i64 %vreg50
try.c: 0x55a9e101d6c0: i64 = Register %vreg50
try.c: 0x55a9e0fc8190: i64 = undef
try.c: 0x55a9e101d460: v4i64,ch = CopyFromReg 0x55a9e0f32940, Register:v4i64 %vreg13
try.c: 0x55a9e1022120: v4i64 = Register %vreg13
try.c: 0x55a9e0fc9b20: v16i32 = X86ISD::VBROADCAST 0x55a9e1021b30
try.c: 0x55a9e1021b30: i32,ch = load<LD4[ConstantPool]> 0x55a9e0f32940, 0x55a9e0fd7710, undef:i64
try.c: 0x55a9e0fd7710: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x55a9e100af00: i64 = TargetConstantPool<i32 1> 0
try.c: 0x55a9e0fc8190: i64 = undef
try.c: 0x55a9e1026ca0: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55a9e1026b70: i32 = Constant<0>
try.c: 0x55a9e1026b70: i32 = Constant<0>
try.c: 0x55a9e1026b70: i32 = Constant<0>
try.c: 0x55a9e1026b70: i32 = Constant<0>
try.c: 0x55a9e1026b70: i32 = Constant<0>
try.c: 0x55a9e1026b70: i32 = Constant<0>
try.c: 0x55a9e1026b70: i32 = Constant<0>
try.c: 0x55a9e1026b70: i32 = Constant<0>
try.c: 0x55a9e1026b70: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: stream.c:340:3: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'ExpandKeyBS' that is compiled without support for 'ssse3'
stream.c: EKBS(rk);
stream.c: ^
stream.c: ./Simon64128SSE4.h:64:19: note: expanded from macro 'EKBS'
stream.c: #define EKBS(rk) (RKBS(rk,4,_D), RKBS(rk,5,_D), RKBS(rk,6,_C), RKBS(rk,7,_D), RKBS(rk,8,_D), RKBS(rk,9,_C), RKBS(rk,10,_D), RKBS(rk,11,_D), \
stream.c: ^
stream.c: ./Simon64128SSE4.h:53:52: note: expanded from macro 'RKBS'
stream.c: #define RKBS(rk,r,_V) (rk[r][7]= _D ^ rk[r-4][7] ^ ROR8(rk[r-1][2]) ^ rk[r-3][7] ^ ROR8(rk[r-1][3]) ^ ROR8(rk[r-3][0]), \
stream.c: ^
stream.c: ./Intrinsics_SSE4_64block.h:39:19: note: expanded from macro 'ROR8'
stream.c: #define ROR8(X) (SHFL(X,R8))
stream.c: ^
stream.c: ./Intrinsics_SSE4_64block.h:34:14: note: expanded from macro 'SHFL'
stream.c: #define SHFL _mm_shuffle_epi8
stream.c: ^
stream.c: stream.c:340:3: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'ExpandKeyBS' that is compiled without support for 'ssse3'
stream.c: ./Simon64128SSE4.h:64:19: note: expanded from macro 'EKBS'
stream.c: #define EKBS(rk) (RKBS(rk,4,_D), RKBS(rk,5,_D), RKBS(rk,6,_C), RKBS(rk,7,_D), RKBS(rk,8,_D), RKBS(rk,9,_C), RKBS(rk,10,_D), RKBS(rk,11,_D), \
stream.c: ^
stream.c: ./Simon64128SSE4.h:53:85: note: expanded from macro 'RKBS'
stream.c: #define RKBS(rk,r,_V) (rk[r][7]= _D ^ rk[r-4][7] ^ ROR8(rk[r-1][2]) ^ rk[r-3][7] ^ ROR8(rk[r-1][3]) ^ ROR8(rk[r-3][0]), \
stream.c: ^
stream.c: ./Intrinsics_SSE4_64block.h:39:19: note: expanded from macro 'ROR8'
stream.c: #define ROR8(X) (SHFL(X,R8))
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4