Implementation notes: amd64, cel02, crypto_stream/simon128256ctr

Computer: cel02
Architecture: amd64
CPU ID: GenuineIntel-00050657-bfebfbff
SUPERCOP version: 20201130
Operation: crypto_stream
Primitive: simon128256ctr
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
8754138273 0 0150208 800 824T:avx2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
9214196492 0 0213173 824 888T:avx2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
14696136932 0 0153565 824 888T:sse4gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
15668138240 0 0151300 816 856T:avx2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
15728168356 0 0180204 792 800T:sse4clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020121120201130
17032169534 0 0181532 792 800T:avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2020121120201130
18060137924 0 0151332 816 856T:avx2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
24246145112 0 0158132 816 856T:sse4gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
26392137785 0 0149688 800 824T:sse4gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130
28094136998 0 0150380 816 856T:sse4gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2020121120201130

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x563d5d132eb0: v4i64 = X86ISD::VTRUNC 0x563d5d132d80
try.c: 0x563d5d132d80: v16i32 = vselect 0x563d5d127cc0, 0x563d5d0d7d70, 0x563d5d132c50
try.c: 0x563d5d127cc0: v4i1 = X86ISD::PCMPGTM 0x563d5d1318b0, 0x563d5d12d440
try.c: 0x563d5d1318b0: v4i64 = X86ISD::VBROADCAST 0x563d5d0d4f10
try.c: 0x563d5d0d4f10: i64,ch = load<LD8[%lsr.iv6971]> 0x563d5d042950, 0x563d5d1247f0, undef:i64
try.c: 0x563d5d1247f0: i64,ch = CopyFromReg 0x563d5d042950, Register:i64 %vreg50
try.c: 0x563d5d12d6a0: i64 = Register %vreg50
try.c: 0x563d5d0d63e0: i64 = undef
try.c: 0x563d5d12d440: v4i64,ch = CopyFromReg 0x563d5d042950, Register:v4i64 %vreg13
try.c: 0x563d5d132100: v4i64 = Register %vreg13
try.c: 0x563d5d0d7d70: v16i32 = X86ISD::VBROADCAST 0x563d5d131b10
try.c: 0x563d5d131b10: i32,ch = load<LD4[ConstantPool]> 0x563d5d042950, 0x563d5d0ee430, undef:i64
try.c: 0x563d5d0ee430: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x563d5d0ef730: i64 = TargetConstantPool<i32 1> 0
try.c: 0x563d5d0d63e0: i64 = undef
try.c: 0x563d5d132c50: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x563d5d132b20: i32 = Constant<0>
try.c: 0x563d5d132b20: i32 = Constant<0>
try.c: 0x563d5d132b20: i32 = Constant<0>
try.c: 0x563d5d132b20: i32 = Constant<0>
try.c: 0x563d5d132b20: i32 = Constant<0>
try.c: 0x563d5d132b20: i32 = Constant<0>
try.c: 0x563d5d132b20: i32 = Constant<0>
try.c: 0x563d5d132b20: i32 = Constant<0>
try.c: 0x563d5d132b20: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55674817fbf0: v4i64 = X86ISD::VTRUNC 0x55674817fac0
try.c: 0x55674817fac0: v16i32 = vselect 0x55674817c5d0, 0x556748101e90, 0x55674817f990
try.c: 0x55674817c5d0: v4i1 = X86ISD::PCMPGTM 0x5567481663f0, 0x556748162dd0
try.c: 0x5567481663f0: v4i64 = X86ISD::VBROADCAST 0x556748102350
try.c: 0x556748102350: i64,ch = load<LD8[%lsr.iv6971]> 0x556748060a40, 0x556748114430, undef:i64
try.c: 0x556748114430: i64,ch = CopyFromReg 0x556748060a40, Register:i64 %vreg50
try.c: 0x556748163030: i64 = Register %vreg50
try.c: 0x5567480fc660: i64 = undef
try.c: 0x556748162dd0: v4i64,ch = CopyFromReg 0x556748060a40, Register:v4i64 %vreg13
try.c: 0x556748166c40: v4i64 = Register %vreg13
try.c: 0x556748101e90: v16i32 = X86ISD::VBROADCAST 0x556748166650
try.c: 0x556748166650: i32,ch = load<LD4[ConstantPool]> 0x556748060a40, 0x556748104830, undef:i64
try.c: 0x556748104830: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x5567480fcfe0: i64 = TargetConstantPool<i32 1> 0
try.c: 0x5567480fc660: i64 = undef
try.c: 0x55674817f990: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55674817f860: i32 = Constant<0>
try.c: 0x55674817f860: i32 = Constant<0>
try.c: 0x55674817f860: i32 = Constant<0>
try.c: 0x55674817f860: i32 = Constant<0>
try.c: 0x55674817f860: i32 = Constant<0>
try.c: 0x55674817f860: i32 = Constant<0>
try.c: 0x55674817f860: i32 = Constant<0>
try.c: 0x55674817f860: i32 = Constant<0>
try.c: 0x55674817f860: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55812919d2c0: v4i64 = X86ISD::VTRUNC 0x55812919d190
try.c: 0x55812919d190: v16i32 = vselect 0x5581291a4930, 0x558129125840, 0x55812919d060
try.c: 0x5581291a4930: v4i1 = X86ISD::PCMPGTM 0x558129185960, 0x5581291814f0
try.c: 0x558129185960: v4i64 = X86ISD::VBROADCAST 0x55812912c7e0
try.c: 0x55812912c7e0: i64,ch = load<LD8[%lsr.iv6971]> 0x558129096950, 0x5581291784b0, undef:i64
try.c: 0x5581291784b0: i64,ch = CopyFromReg 0x558129096950, Register:i64 %vreg50
try.c: 0x558129181750: i64 = Register %vreg50
try.c: 0x55812912dcb0: i64 = undef
try.c: 0x5581291814f0: v4i64,ch = CopyFromReg 0x558129096950, Register:v4i64 %vreg13
try.c: 0x5581291861b0: v4i64 = Register %vreg13
try.c: 0x558129125840: v16i32 = X86ISD::VBROADCAST 0x558129185bc0
try.c: 0x558129185bc0: i32,ch = load<LD4[ConstantPool]> 0x558129096950, 0x55812912bdc0, undef:i64
try.c: 0x55812912bdc0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x558129165da0: i64 = TargetConstantPool<i32 1> 0
try.c: 0x55812912dcb0: i64 = undef
try.c: 0x55812919d060: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55812919cf30: i32 = Constant<0>
try.c: 0x55812919cf30: i32 = Constant<0>
try.c: 0x55812919cf30: i32 = Constant<0>
try.c: 0x55812919cf30: i32 = Constant<0>
try.c: 0x55812919cf30: i32 = Constant<0>
try.c: 0x55812919cf30: i32 = Constant<0>
try.c: 0x55812919cf30: i32 = Constant<0>
try.c: 0x55812919cf30: i32 = Constant<0>
try.c: 0x55812919cf30: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: stream.c:128:3: error: always_inline function '_mm256_set_epi64x' requires target feature 'sse4.2', but would be inlined into function 'Encrypt' that is compiled without support for 'sse4.2'
stream.c: SET1(X[0],nonce[1]); SET4(Y[0],nonce[0]);
stream.c: ^
stream.c: ./Intrinsics_AVX2_128block.h:25:22: note: expanded from macro 'SET1'
stream.c: #define SET1(X,c) (X=SET(c,c,c,c))
stream.c: ^
stream.c: ./Intrinsics_AVX2_128block.h:24:13: note: expanded from macro 'SET'
stream.c: #define SET _mm256_set_epi64x
stream.c: ^
stream.c: stream.c:128:24: error: always_inline function '_mm256_set_epi64x' requires target feature 'sse4.2', but would be inlined into function 'Encrypt' that is compiled without support for 'sse4.2'
stream.c: SET1(X[0],nonce[1]); SET4(Y[0],nonce[0]);
stream.c: ^
stream.c: ./Intrinsics_AVX2_128block.h:26:22: note: expanded from macro 'SET4'
stream.c: #define SET4(X,c) (X=SET(c,c,c,c), X=ADD(X,_q))
stream.c: ^
stream.c: ./Intrinsics_AVX2_128block.h:24:13: note: expanded from macro 'SET'
stream.c: #define SET _mm256_set_epi64x
stream.c: ^
stream.c: stream.c:128:24: error: always_inline function '_mm256_add_epi64' requires target feature 'sse4.2', but would be inlined into function 'Encrypt' that is compiled without support for 'sse4.2'
stream.c: ./Intrinsics_AVX2_128block.h:26:38: note: expanded from macro 'SET4'
stream.c: #define SET4(X,c) (X=SET(c,c,c,c), X=ADD(X,_q))
stream.c: ^
stream.c: ./Intrinsics_AVX2_128block.h:17:13: note: expanded from macro 'ADD'
stream.c: #define ADD _mm256_add_epi64
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55af850797b0: v4i64 = X86ISD::VTRUNC 0x55af85079680
try.c: 0x55af85079680: v16i32 = vselect 0x55af8508e510, 0x55af8502ae10, 0x55af85079550
try.c: 0x55af8508e510: v4i1 = X86ISD::PCMPGTM 0x55af85075190, 0x55af85070d20
try.c: 0x55af85075190: v4i64 = X86ISD::VBROADCAST 0x55af8502fa00
try.c: 0x55af8502fa00: i64,ch = load<LD8[%lsr.iv6971]> 0x55af84f859d0, 0x55af85068270, undef:i64
try.c: 0x55af85068270: i64,ch = CopyFromReg 0x55af84f859d0, Register:i64 %vreg50
try.c: 0x55af85070f80: i64 = Register %vreg50
try.c: 0x55af85030ed0: i64 = undef
try.c: 0x55af85070d20: v4i64,ch = CopyFromReg 0x55af84f859d0, Register:v4i64 %vreg13
try.c: 0x55af850759e0: v4i64 = Register %vreg13
try.c: 0x55af8502ae10: v16i32 = X86ISD::VBROADCAST 0x55af850753f0
try.c: 0x55af850753f0: i32,ch = load<LD4[ConstantPool]> 0x55af84f859d0, 0x55af8502efe0, undef:i64
try.c: 0x55af8502efe0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x55af85014c60: i64 = TargetConstantPool<i32 1> 0
try.c: 0x55af85030ed0: i64 = undef
try.c: 0x55af85079550: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55af85079420: i32 = Constant<0>
try.c: 0x55af85079420: i32 = Constant<0>
try.c: 0x55af85079420: i32 = Constant<0>
try.c: 0x55af85079420: i32 = Constant<0>
try.c: 0x55af85079420: i32 = Constant<0>
try.c: 0x55af85079420: i32 = Constant<0>
try.c: 0x55af85079420: i32 = Constant<0>
try.c: 0x55af85079420: i32 = Constant<0>
try.c: 0x55af85079420: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x5609a0527940: v4i64 = X86ISD::VTRUNC 0x5609a0527810
try.c: 0x5609a0527810: v16i32 = vselect 0x5609a0517ea0, 0x5609a04ac960, 0x5609a05276e0
try.c: 0x5609a0517ea0: v4i1 = X86ISD::PCMPGTM 0x5609a0503d70, 0x5609a04ff2f0
try.c: 0x5609a0503d70: v4i64 = X86ISD::VBROADCAST 0x5609a04ace20
try.c: 0x5609a04ace20: i64,ch = load<LD8[%lsr.iv6971]> 0x5609a03fda30, 0x5609a049f300, undef:i64
try.c: 0x5609a049f300: i64,ch = CopyFromReg 0x5609a03fda30, Register:i64 %vreg50
try.c: 0x5609a04ff550: i64 = Register %vreg50
try.c: 0x5609a0499a60: i64 = undef
try.c: 0x5609a04ff2f0: v4i64,ch = CopyFromReg 0x5609a03fda30, Register:v4i64 %vreg13
try.c: 0x5609a05045c0: v4i64 = Register %vreg13
try.c: 0x5609a04ac960: v16i32 = X86ISD::VBROADCAST 0x5609a0503fd0
try.c: 0x5609a0503fd0: i32,ch = load<LD4[ConstantPool]> 0x5609a03fda30, 0x5609a04af300, undef:i64
try.c: 0x5609a04af300: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x5609a049a3e0: i64 = TargetConstantPool<i32 1> 0
try.c: 0x5609a0499a60: i64 = undef
try.c: 0x5609a05276e0: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x5609a05275b0: i32 = Constant<0>
try.c: 0x5609a05275b0: i32 = Constant<0>
try.c: 0x5609a05275b0: i32 = Constant<0>
try.c: 0x5609a05275b0: i32 = Constant<0>
try.c: 0x5609a05275b0: i32 = Constant<0>
try.c: 0x5609a05275b0: i32 = Constant<0>
try.c: 0x5609a05275b0: i32 = Constant<0>
try.c: 0x5609a05275b0: i32 = Constant<0>
try.c: 0x5609a05275b0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: fatal error: error in backend: Cannot select: 0x55bd34e97f60: v4i64 = X86ISD::VTRUNC 0x55bd34e97e30
try.c: 0x55bd34e97e30: v16i32 = vselect 0x55bd34e92930, 0x55bd34e3d660, 0x55bd34e97d00
try.c: 0x55bd34e92930: v4i1 = X86ISD::PCMPGTM 0x55bd34e91920, 0x55bd34e8d4b0
try.c: 0x55bd34e91920: v4i64 = X86ISD::VBROADCAST 0x55bd34e349b0
try.c: 0x55bd34e349b0: i64,ch = load<LD8[%lsr.iv6971]> 0x55bd34da2950, 0x55bd34e78ed0, undef:i64
try.c: 0x55bd34e78ed0: i64,ch = CopyFromReg 0x55bd34da2950, Register:i64 %vreg50
try.c: 0x55bd34e8d710: i64 = Register %vreg50
try.c: 0x55bd34e35e80: i64 = undef
try.c: 0x55bd34e8d4b0: v4i64,ch = CopyFromReg 0x55bd34da2950, Register:v4i64 %vreg13
try.c: 0x55bd34e92170: v4i64 = Register %vreg13
try.c: 0x55bd34e3d660: v16i32 = X86ISD::VBROADCAST 0x55bd34e91b80
try.c: 0x55bd34e91b80: i32,ch = load<LD4[ConstantPool]> 0x55bd34da2950, 0x55bd34e391e0, undef:i64
try.c: 0x55bd34e391e0: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<i32 1> 0
try.c: 0x55bd34e7cb90: i64 = TargetConstantPool<i32 1> 0
try.c: 0x55bd34e35e80: i64 = undef
try.c: 0x55bd34e97d00: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
try.c: 0x55bd34e97bd0: i32 = Constant<0>
try.c: 0x55bd34e97bd0: i32 = Constant<0>
try.c: 0x55bd34e97bd0: i32 = Constant<0>
try.c: 0x55bd34e97bd0: i32 = Constant<0>
try.c: 0x55bd34e97bd0: i32 = Constant<0>
try.c: 0x55bd34e97bd0: i32 = Constant<0>
try.c: 0x55bd34e97bd0: i32 = Constant<0>
try.c: 0x55bd34e97bd0: i32 = Constant<0>
try.c: 0x55bd34e97bd0: i32 = Constant<0>
try.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4

Compiler output

Implementation: T:sse4
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
stream.c: stream.c:307:3: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'ExpandKeyBS' that is compiled without support for 'ssse3'
stream.c: EKBS(rk);
stream.c: ^
stream.c: ./Simon128256SSE4.h:67:19: note: expanded from macro 'EKBS'
stream.c: #define EKBS(rk) (RKBS(rk,4,_D), RKBS(rk,5,_D), RKBS(rk,6,_C), RKBS(rk,7,_D), RKBS(rk,8,_C), RKBS(rk,9,_C), RKBS(rk,10,_C), RKBS(rk,11,_D), \
stream.c: ^
stream.c: ./Simon128256SSE4.h:57:52: note: expanded from macro 'RKBS'
stream.c: #define RKBS(rk,r,_V) (rk[r][7]= _D ^ rk[r-4][7] ^ ROR8(rk[r-1][2]) ^ rk[r-3][7] ^ ROR8(rk[r-1][3]) ^ ROR8(rk[r-3][0]), \
stream.c: ^
stream.c: ./Intrinsics_SSE4_128block.h:39:19: note: expanded from macro 'ROR8'
stream.c: #define ROR8(X) (SHFL(X,R8))
stream.c: ^
stream.c: ./Intrinsics_SSE4_128block.h:35:14: note: expanded from macro 'SHFL'
stream.c: #define SHFL _mm_shuffle_epi8
stream.c: ^
stream.c: stream.c:307:3: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'ExpandKeyBS' that is compiled without support for 'ssse3'
stream.c: ./Simon128256SSE4.h:67:19: note: expanded from macro 'EKBS'
stream.c: #define EKBS(rk) (RKBS(rk,4,_D), RKBS(rk,5,_D), RKBS(rk,6,_C), RKBS(rk,7,_D), RKBS(rk,8,_C), RKBS(rk,9,_C), RKBS(rk,10,_C), RKBS(rk,11,_D), \
stream.c: ^
stream.c: ./Simon128256SSE4.h:57:85: note: expanded from macro 'RKBS'
stream.c: #define RKBS(rk,r,_V) (rk[r][7]= _D ^ rk[r-4][7] ^ ROR8(rk[r-1][2]) ^ rk[r-3][7] ^ ROR8(rk[r-1][3]) ^ ROR8(rk[r-3][0]), \
stream.c: ^
stream.c: ./Intrinsics_SSE4_128block.h:39:19: note: expanded from macro 'ROR8'
stream.c: #define ROR8(X) (SHFL(X,R8))
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse4