Implementation notes: amd64, skylake, crypto_aead/scream12v3

Computer: skylake
Architecture: amd64
CPU ID: GenuineIntel-000506e3-bfebfbff
SUPERCOP version: 20161026
Operation: crypto_aead
Primitive: scream12v3
TimeImplementationCompilerBenchmark dateSUPERCOP version
70138ssegcc -m64 -march=native -mtune=native -O3 -fomit-frame-pointer2016121620161026
70516ssegcc -m64 -march=core-avx2 -O3 -fomit-frame-pointer2016121620161026
70952ssegcc -m64 -march=core-avx-i -O3 -fomit-frame-pointer2016121620161026
71834ssegcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016121620161026
73496ssegcc -m64 -march=corei7-avx -O3 -fomit-frame-pointer2016121620161026
74850ssegcc -m64 -march=corei7 -O3 -fomit-frame-pointer2016121620161026
75006ssegcc -m64 -march=core2 -O3 -fomit-frame-pointer2016121620161026
75026ssegcc -m64 -march=core2 -msse4 -O3 -fomit-frame-pointer2016121620161026
75948ssegcc -m64 -march=core2 -msse4.1 -O3 -fomit-frame-pointer2016121620161026
76452ssegcc -m64 -march=native -mtune=native -O2 -fomit-frame-pointer2016121620161026
76924sseclang -O3 -fwrapv -mavx2 -fomit-frame-pointer -Qunused-arguments2016121620161026
77174ssegcc -m64 -march=corei7-avx -O2 -fomit-frame-pointer2016121620161026
77468ssegcc -m64 -march=core-avx-i -O2 -fomit-frame-pointer2016121620161026
77484sseclang -O3 -fwrapv -march=native -fomit-frame-pointer -Qunused-arguments2016121620161026
77800ssegcc -m64 -march=core-avx2 -Os -fomit-frame-pointer2016121620161026
77812ssegcc -m64 -march=native -mtune=native -Os -fomit-frame-pointer2016121620161026
77932sseclang -O3 -fwrapv -mavx -maes -mpclmul -fomit-frame-pointer -Qunused-arguments2016121620161026
78008ssegcc -m64 -march=core-avx-i -Os -fomit-frame-pointer2016121620161026
78038sseclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016121620161026
78140sseclang -O3 -fwrapv -mavx -fomit-frame-pointer -Qunused-arguments2016121620161026
78384ssegcc -m64 -march=core-avx2 -O2 -fomit-frame-pointer2016121620161026
78636ssegcc -m64 -march=corei7-avx -Os -fomit-frame-pointer2016121620161026
81818ssegcc -m64 -march=corei7 -O2 -fomit-frame-pointer2016121620161026
81910ssegcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016121620161026
82200ssegcc -m64 -march=core2 -msse4 -O2 -fomit-frame-pointer2016121620161026
82270ssegcc -m64 -march=core2 -O2 -fomit-frame-pointer2016121620161026
82368ssegcc -m64 -march=core2 -msse4.1 -O2 -fomit-frame-pointer2016121620161026
82768ssegcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016121620161026
83196ssegcc -m64 -march=core-avx2 -O -fomit-frame-pointer2016121620161026
83224ssegcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016121620161026
83648ssegcc -m64 -march=native -mtune=native -O -fomit-frame-pointer2016121620161026
83786ssegcc -m64 -march=corei7-avx -O -fomit-frame-pointer2016121620161026
84638ssegcc -m64 -march=core-avx-i -O -fomit-frame-pointer2016121620161026
86968sseclang -O3 -fwrapv -march=x86-64 -mcpu=core-avx2 -mavx2 -maes -mpclmul -fomit-frame-pointer -Qunused-arguments2016121620161026
88176ssegcc -m64 -march=corei7 -O -fomit-frame-pointer2016121620161026
88248ssegcc -m64 -march=core2 -msse4.1 -O -fomit-frame-pointer2016121620161026
88266ssegcc -m64 -march=core2 -msse4 -O -fomit-frame-pointer2016121620161026
89308ssegcc -m64 -march=core2 -O -fomit-frame-pointer2016121620161026
120918ssegcc -m64 -march=core2 -Os -fomit-frame-pointer2016121620161026
121070ssegcc -m64 -march=corei7 -Os -fomit-frame-pointer2016121620161026
121332ssegcc -m64 -march=core2 -msse4.1 -Os -fomit-frame-pointer2016121620161026
122126ssegcc -m64 -march=core2 -msse4 -Os -fomit-frame-pointer2016121620161026
321062refgcc -funroll-loops -march=nocona -O3 -fomit-frame-pointer2016121620161026
321320refgcc -funroll-loops -m64 -march=nocona -O3 -fomit-frame-pointer2016121620161026
324510refgcc -march=nocona -O3 -fomit-frame-pointer2016121620161026
325662refgcc -m64 -march=nocona -O3 -fomit-frame-pointer2016121620161026
326204refgcc -m64 -march=core2 -msse4 -O3 -fomit-frame-pointer2016121620161026
327012refgcc -m64 -march=core2 -O3 -fomit-frame-pointer2016121620161026
327346refgcc -m64 -march=corei7 -O3 -fomit-frame-pointer2016121620161026
328286refgcc -m64 -march=core-avx-i -O3 -fomit-frame-pointer2016121620161026
328500refgcc -m64 -march=core2 -msse4.1 -O3 -fomit-frame-pointer2016121620161026
328530refgcc -m64 -march=native -mtune=native -O3 -fomit-frame-pointer2016121620161026
329642refgcc -m64 -march=barcelona -O3 -fomit-frame-pointer2016121620161026
330120refgcc -march=k8 -O3 -fomit-frame-pointer2016121620161026
330666refgcc -m64 -march=core-avx2 -O3 -fomit-frame-pointer2016121620161026
330730refgcc -funroll-loops -m64 -march=barcelona -O2 -fomit-frame-pointer2016121620161026
331192refgcc -funroll-loops -m64 -march=k8 -O2 -fomit-frame-pointer2016121620161026
331522refgcc -funroll-loops -march=barcelona -O2 -fomit-frame-pointer2016121620161026
331698refgcc -march=barcelona -O3 -fomit-frame-pointer2016121620161026
331862refgcc -funroll-loops -march=nocona -O2 -fomit-frame-pointer2016121620161026
332104refgcc -funroll-loops -O2 -fomit-frame-pointer2016121620161026
332126refgcc -funroll-loops -m64 -march=nocona -O2 -fomit-frame-pointer2016121620161026
332298refgcc -funroll-loops -m64 -O2 -fomit-frame-pointer2016121620161026
332450refgcc -funroll-loops -march=k8 -O2 -fomit-frame-pointer2016121620161026
332964refgcc -m64 -march=k8 -O3 -fomit-frame-pointer2016121620161026
333096refgcc -funroll-loops -march=barcelona -O3 -fomit-frame-pointer2016121620161026
333310refgcc -funroll-loops -m64 -march=barcelona -O3 -fomit-frame-pointer2016121620161026
333622refgcc -funroll-loops -march=k8 -O3 -fomit-frame-pointer2016121620161026
333892refgcc -funroll-loops -fno-schedule-insns -O2 -fomit-frame-pointer2016121620161026
334942refgcc -m64 -march=corei7-avx -O3 -fomit-frame-pointer2016121620161026
336386refgcc -funroll-loops -m64 -march=k8 -O3 -fomit-frame-pointer2016121620161026
336632refgcc -funroll-loops -march=barcelona -O -fomit-frame-pointer2016121620161026
336716refgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016121620161026
337128refgcc -funroll-loops -m64 -O -fomit-frame-pointer2016121620161026
337506refgcc -funroll-loops -m64 -march=barcelona -O -fomit-frame-pointer2016121620161026
337526refgcc -funroll-loops -m64 -march=nocona -O -fomit-frame-pointer2016121620161026
337610refgcc -funroll-loops -fno-schedule-insns -O -fomit-frame-pointer2016121620161026
338116refgcc -funroll-loops -march=k8 -O -fomit-frame-pointer2016121620161026
338658refgcc -funroll-loops -m64 -march=k8 -O -fomit-frame-pointer2016121620161026
338682refgcc -funroll-loops -O -fomit-frame-pointer2016121620161026
338956refgcc -funroll-loops -march=nocona -O -fomit-frame-pointer2016121620161026
342148refgcc -funroll-loops -m64 -O3 -fomit-frame-pointer2016121620161026
342878refgcc -funroll-loops -fno-schedule-insns -O3 -fomit-frame-pointer2016121620161026
345298refgcc -O3 -fomit-frame-pointer2016121620161026
345428refgcc -fno-schedule-insns -O3 -fomit-frame-pointer2016121620161026
346316refgcc -m64 -O3 -fomit-frame-pointer2016121620161026
347136refgcc -funroll-loops -O3 -fomit-frame-pointer2016121620161026
485758refclang -O3 -fwrapv -mavx -maes -mpclmul -fomit-frame-pointer -Qunused-arguments2016121620161026
485760refclang -O3 -fwrapv -march=x86-64 -mcpu=core-avx2 -mavx2 -maes -mpclmul -fomit-frame-pointer -Qunused-arguments2016121620161026
485996refclang -O3 -fwrapv -mavx -fomit-frame-pointer -Qunused-arguments2016121620161026
487420refclang -O3 -fwrapv -mavx2 -fomit-frame-pointer -Qunused-arguments2016121620161026
487550refclang -O3 -fwrapv -march=native -fomit-frame-pointer -Qunused-arguments2016121620161026
491844refclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016121620161026
500022refgcc -march=barcelona -O2 -fomit-frame-pointer2016121620161026
502732refgcc -m64 -march=barcelona -O2 -fomit-frame-pointer2016121620161026
503110refgcc -m64 -march=k8 -O2 -fomit-frame-pointer2016121620161026
503518refgcc -m64 -march=nocona -O2 -fomit-frame-pointer2016121620161026
504096refgcc -march=k8 -O2 -fomit-frame-pointer2016121620161026
504172refgcc -march=nocona -O2 -fomit-frame-pointer2016121620161026
504396refgcc -m64 -march=corei7 -O2 -fomit-frame-pointer2016121620161026
504746refgcc -m64 -march=core2 -O2 -fomit-frame-pointer2016121620161026
505490refgcc -m64 -march=core-avx-i -O2 -fomit-frame-pointer2016121620161026
505732refgcc -m64 -march=core2 -msse4 -O2 -fomit-frame-pointer2016121620161026
506992refgcc -m64 -march=core2 -msse4.1 -O2 -fomit-frame-pointer2016121620161026
507004refgcc -m64 -march=core-avx2 -O2 -fomit-frame-pointer2016121620161026
507346refgcc -m64 -march=corei7-avx -O2 -fomit-frame-pointer2016121620161026
508506refgcc -m64 -O2 -fomit-frame-pointer2016121620161026
508586refgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016121620161026
509484refgcc -O2 -fomit-frame-pointer2016121620161026
509632refgcc -m64 -march=native -mtune=native -O2 -fomit-frame-pointer2016121620161026
512182refgcc -m64 -march=k8 -O -fomit-frame-pointer2016121620161026
512424refgcc -m64 -march=native -mtune=native -O -fomit-frame-pointer2016121620161026
512482refgcc -fno-schedule-insns -O2 -fomit-frame-pointer2016121620161026
513000refgcc -m64 -march=core2 -msse4 -O -fomit-frame-pointer2016121620161026
513070refgcc -fno-schedule-insns -O -fomit-frame-pointer2016121620161026
513420refgcc -march=barcelona -O -fomit-frame-pointer2016121620161026
513572refgcc -m64 -march=core-avx2 -O -fomit-frame-pointer2016121620161026
513618refgcc -m64 -march=core-avx-i -O -fomit-frame-pointer2016121620161026
513864refgcc -march=k8 -O -fomit-frame-pointer2016121620161026
513940refgcc -O -fomit-frame-pointer2016121620161026
513988refgcc -m64 -march=corei7 -O -fomit-frame-pointer2016121620161026
514164refgcc -m64 -march=barcelona -O -fomit-frame-pointer2016121620161026
515060refgcc -m64 -march=core2 -O -fomit-frame-pointer2016121620161026
515778refgcc -m64 -march=corei7-avx -O -fomit-frame-pointer2016121620161026
516572refgcc -m64 -march=core2 -msse4.1 -O -fomit-frame-pointer2016121620161026
516640refgcc -m64 -O -fomit-frame-pointer2016121620161026
518358refgcc -m64 -march=nocona -O -fomit-frame-pointer2016121620161026
518482refgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016121620161026
519620refgcc -march=nocona -O -fomit-frame-pointer2016121620161026
531610refclang -mcpu=native -mfpu=neon -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016121620161026
531710refclang -mcpu=cortex-a9 -mfpu=neon -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016121620161026
532062refclang -O3 -fomit-frame-pointer -Qunused-arguments2016121620161026
532202refclang -mcpu=cortex-a8 -mfpu=neon -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016121620161026
796078refgcc -m64 -Os -fomit-frame-pointer2016121620161026
798408refgcc -fno-schedule-insns -Os -fomit-frame-pointer2016121620161026
800294refgcc -march=k8 -Os -fomit-frame-pointer2016121620161026
800982refgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016121620161026
802052refgcc -m64 -march=core2 -msse4.1 -Os -fomit-frame-pointer2016121620161026
802060refgcc -m64 -march=k8 -Os -fomit-frame-pointer2016121620161026
802566refgcc -Os -fomit-frame-pointer2016121620161026
802726refgcc -march=nocona -Os -fomit-frame-pointer2016121620161026
803004refgcc -m64 -march=core-avx-i -Os -fomit-frame-pointer2016121620161026
804368refgcc -m64 -march=core2 -msse4 -Os -fomit-frame-pointer2016121620161026
804796refgcc -m64 -march=barcelona -Os -fomit-frame-pointer2016121620161026
805692refgcc -m64 -march=native -mtune=native -Os -fomit-frame-pointer2016121620161026
807134refgcc -m64 -march=core2 -Os -fomit-frame-pointer2016121620161026
807414refgcc -m64 -march=core-avx2 -Os -fomit-frame-pointer2016121620161026
808878refgcc -march=barcelona -Os -fomit-frame-pointer2016121620161026
809374refgcc -m64 -march=corei7-avx -Os -fomit-frame-pointer2016121620161026
809782refgcc -m64 -march=corei7 -Os -fomit-frame-pointer2016121620161026
815500refgcc -m64 -march=nocona -Os -fomit-frame-pointer2016121620161026
1020446refgcc -funroll-loops -Os -fomit-frame-pointer2016121620161026
1020626refgcc -funroll-loops -m64 -march=k8 -Os -fomit-frame-pointer2016121620161026
1022112refgcc -funroll-loops -march=nocona -Os -fomit-frame-pointer2016121620161026
1022188refgcc -funroll-loops -m64 -Os -fomit-frame-pointer2016121620161026
1022366refgcc -funroll-loops -m64 -march=nocona -Os -fomit-frame-pointer2016121620161026
1024502refgcc -funroll-loops -march=barcelona -Os -fomit-frame-pointer2016121620161026
1024712refgcc -funroll-loops -fno-schedule-insns -Os -fomit-frame-pointer2016121620161026
1025756refgcc -funroll-loops -march=k8 -Os -fomit-frame-pointer2016121620161026
1026146refgcc -funroll-loops -m64 -march=barcelona -Os -fomit-frame-pointer2016121620161026
1741290refgcc2016121620161026
1752960refgcc -funroll-loops2016121620161026
1758210refcc2016121620161026

Compiler output

Implementation: crypto_aead/scream12v3/sse
Compiler: cc
scream.c: scream.c: In function 'LBox16P':
scream.c: scream.c:202:10: warning: implicit declaration of function '__builtin_ia32_pshufb128' [-Wimplicit-function-declaration]
scream.c: A = __builtin_ia32_pshufb128(table, t0);
scream.c: ^~~~~~~~~~~~~~~~~~~~~~~~
scream.c: scream.c:202:8: error: incompatible types when assigning to type 'v16qi {aka __vector(16) char}' from type 'int'
scream.c: A = __builtin_ia32_pshufb128(table, t0);
scream.c: ^
scream.c: scream.c:203:8: error: incompatible types when assigning to type 'v16qi {aka __vector(16) char}' from type 'int'
scream.c: C = __builtin_ia32_pshufb128(table, t1);
scream.c: ^
scream.c: scream.c:207:8: error: incompatible types when assigning to type 'v16qi {aka __vector(16) char}' from type 'int'
scream.c: B = __builtin_ia32_pshufb128(table, t0);
scream.c: ^
scream.c: scream.c:208:8: error: incompatible types when assigning to type 'v16qi {aka __vector(16) char}' from type 'int'
scream.c: D = __builtin_ia32_pshufb128(table, t1);
scream.c: ^
scream.c: scream.c:215:7: error: conversion of scalar 'int' to vector 'v16qi {aka __vector(16) char}' involves truncation
scream.c: A ^= __builtin_ia32_pshufb128(table, in[0]);
scream.c: ^~
scream.c: scream.c:216:7: error: conversion of scalar 'int' to vector 'v16qi {aka __vector(16) char}' involves truncation
scream.c: C ^= __builtin_ia32_pshufb128(table, in[2]);
scream.c: ^~
scream.c: scream.c:220:7: error: conversion of scalar 'int' to vector 'v16qi {aka __vector(16) char}' involves truncation
scream.c: B ^= __builtin_ia32_pshufb128(table, in[0]);
scream.c: ^~
scream.c: ...

Number of similar (compiler,implementation) pairs: 71, namely:
CompilerImplementations
cc sse
gcc sse
gcc -O2 -fomit-frame-pointer sse
gcc -O3 -fomit-frame-pointer sse
gcc -O -fomit-frame-pointer sse
gcc -Os -fomit-frame-pointer sse
gcc -fno-schedule-insns -O2 -fomit-frame-pointer sse
gcc -fno-schedule-insns -O3 -fomit-frame-pointer sse
gcc -fno-schedule-insns -O -fomit-frame-pointer sse
gcc -fno-schedule-insns -Os -fomit-frame-pointer sse
gcc -funroll-loops sse
gcc -funroll-loops -O2 -fomit-frame-pointer sse
gcc -funroll-loops -O3 -fomit-frame-pointer sse
gcc -funroll-loops -O -fomit-frame-pointer sse
gcc -funroll-loops -Os -fomit-frame-pointer sse
gcc -funroll-loops -fno-schedule-insns -O2 -fomit-frame-pointer sse
gcc -funroll-loops -fno-schedule-insns -O3 -fomit-frame-pointer sse
gcc -funroll-loops -fno-schedule-insns -O -fomit-frame-pointer sse
gcc -funroll-loops -fno-schedule-insns -Os -fomit-frame-pointer sse
gcc -funroll-loops -m64 -O2 -fomit-frame-pointer sse
gcc -funroll-loops -m64 -O3 -fomit-frame-pointer sse
gcc -funroll-loops -m64 -O -fomit-frame-pointer sse
gcc -funroll-loops -m64 -Os -fomit-frame-pointer sse
gcc -funroll-loops -m64 -march=barcelona -O2 -fomit-frame-pointer sse
gcc -funroll-loops -m64 -march=barcelona -O3 -fomit-frame-pointer sse
gcc -funroll-loops -m64 -march=barcelona -O -fomit-frame-pointer sse
gcc -funroll-loops -m64 -march=barcelona -Os -fomit-frame-pointer sse
gcc -funroll-loops -m64 -march=k8 -O2 -fomit-frame-pointer sse
gcc -funroll-loops -m64 -march=k8 -O3 -fomit-frame-pointer sse
gcc -funroll-loops -m64 -march=k8 -O -fomit-frame-pointer sse
gcc -funroll-loops -m64 -march=k8 -Os -fomit-frame-pointer sse
gcc -funroll-loops -m64 -march=nocona -O2 -fomit-frame-pointer sse
gcc -funroll-loops -m64 -march=nocona -O3 -fomit-frame-pointer sse
gcc -funroll-loops -m64 -march=nocona -O -fomit-frame-pointer sse
gcc -funroll-loops -m64 -march=nocona -Os -fomit-frame-pointer sse
gcc -funroll-loops -march=barcelona -O2 -fomit-frame-pointer sse
gcc -funroll-loops -march=barcelona -O3 -fomit-frame-pointer sse
gcc -funroll-loops -march=barcelona -O -fomit-frame-pointer sse
gcc -funroll-loops -march=barcelona -Os -fomit-frame-pointer sse
gcc -funroll-loops -march=k8 -O2 -fomit-frame-pointer sse
gcc -funroll-loops -march=k8 -O3 -fomit-frame-pointer sse
gcc -funroll-loops -march=k8 -O -fomit-frame-pointer sse
gcc -funroll-loops -march=k8 -Os -fomit-frame-pointer sse
gcc -funroll-loops -march=nocona -O2 -fomit-frame-pointer sse
gcc -funroll-loops -march=nocona -O3 -fomit-frame-pointer sse
gcc -funroll-loops -march=nocona -O -fomit-frame-pointer sse
gcc -funroll-loops -march=nocona -Os -fomit-frame-pointer sse
gcc -m64 -O2 -fomit-frame-pointer sse
gcc -m64 -O3 -fomit-frame-pointer sse
gcc -m64 -O -fomit-frame-pointer sse
gcc -m64 -Os -fomit-frame-pointer sse
gcc -m64 -march=k8 -O2 -fomit-frame-pointer sse
gcc -m64 -march=k8 -O3 -fomit-frame-pointer sse
gcc -m64 -march=k8 -O -fomit-frame-pointer sse
gcc -m64 -march=k8 -Os -fomit-frame-pointer sse
gcc -m64 -march=nocona -O2 -fomit-frame-pointer sse
gcc -m64 -march=nocona -O3 -fomit-frame-pointer sse
gcc -m64 -march=nocona -O -fomit-frame-pointer sse
gcc -m64 -march=nocona -Os -fomit-frame-pointer sse
gcc -march=barcelona -O2 -fomit-frame-pointer sse
gcc -march=barcelona -O3 -fomit-frame-pointer sse
gcc -march=barcelona -O -fomit-frame-pointer sse
gcc -march=barcelona -Os -fomit-frame-pointer sse
gcc -march=k8 -O2 -fomit-frame-pointer sse
gcc -march=k8 -O3 -fomit-frame-pointer sse
gcc -march=k8 -O -fomit-frame-pointer sse
gcc -march=k8 -Os -fomit-frame-pointer sse
gcc -march=nocona -O2 -fomit-frame-pointer sse
gcc -march=nocona -O3 -fomit-frame-pointer sse
gcc -march=nocona -O -fomit-frame-pointer sse
gcc -march=nocona -Os -fomit-frame-pointer sse

Compiler output

Implementation: crypto_aead/scream12v3/sse
Compiler: clang -O3 -fomit-frame-pointer -Qunused-arguments
scream.c: scream.c:202:10: error: '__builtin_ia32_pshufb128' needs target feature ssse3
scream.c: A = __builtin_ia32_pshufb128(table, t0);
scream.c: ^
scream.c: scream.c:203:10: error: '__builtin_ia32_pshufb128' needs target feature ssse3
scream.c: C = __builtin_ia32_pshufb128(table, t1);
scream.c: ^
scream.c: scream.c:207:10: error: '__builtin_ia32_pshufb128' needs target feature ssse3
scream.c: B = __builtin_ia32_pshufb128(table, t0);
scream.c: ^
scream.c: scream.c:208:10: error: '__builtin_ia32_pshufb128' needs target feature ssse3
scream.c: D = __builtin_ia32_pshufb128(table, t1);
scream.c: ^
scream.c: scream.c:215:10: error: '__builtin_ia32_pshufb128' needs target feature ssse3
scream.c: A ^= __builtin_ia32_pshufb128(table, in[0]);
scream.c: ^
scream.c: scream.c:216:10: error: '__builtin_ia32_pshufb128' needs target feature ssse3
scream.c: C ^= __builtin_ia32_pshufb128(table, in[2]);
scream.c: ^
scream.c: scream.c:220:10: error: '__builtin_ia32_pshufb128' needs target feature ssse3
scream.c: B ^= __builtin_ia32_pshufb128(table, in[0]);
scream.c: ^
scream.c: scream.c:221:10: error: '__builtin_ia32_pshufb128' needs target feature ssse3
scream.c: D ^= __builtin_ia32_pshufb128(table, in[2]);
scream.c: ^
scream.c: scream.c:228:10: error: '__builtin_ia32_pshufb128' needs target feature ssse3
scream.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
clang -O3 -fomit-frame-pointer -Qunused-arguments sse
clang -mcpu=cortex-a8 -mfpu=neon -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments sse
clang -mcpu=cortex-a9 -mfpu=neon -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments sse
clang -mcpu=native -mfpu=neon -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments sse

Compiler output

Implementation: crypto_aead/scream12v3/sse
Compiler: gcc -m64 -march=barcelona -O2 -fomit-frame-pointer
scream.c: scream.c: In function 'LBox16P':
scream.c: scream.c:202:10: warning: implicit declaration of function '__builtin_ia32_pshufb128' [-Wimplicit-function-declaration]
scream.c: A = __builtin_ia32_pshufb128(table, t0);
scream.c: ^~~~~~~~~~~~~~~~~~~~~~~~
scream.c: scream.c:202:8: error: incompatible types when assigning to type 'v16qi {aka __vector(16) char}' from type 'int'
scream.c: A = __builtin_ia32_pshufb128(table, t0);
scream.c: ^
scream.c: scream.c:203:8: error: incompatible types when assigning to type 'v16qi {aka __vector(16) char}' from type 'int'
scream.c: C = __builtin_ia32_pshufb128(table, t1);
scream.c: ^
scream.c: scream.c:207:8: error: incompatible types when assigning to type 'v16qi {aka __vector(16) char}' from type 'int'
scream.c: B = __builtin_ia32_pshufb128(table, t0);
scream.c: ^
scream.c: scream.c:208:8: error: incompatible types when assigning to type 'v16qi {aka __vector(16) char}' from type 'int'
scream.c: D = __builtin_ia32_pshufb128(table, t1);
scream.c: ^
scream.c: scream.c:215:7: error: conversion of scalar 'int' to vector 'v16qi {aka __vector(16) char}' involves truncation
scream.c: A ^= __builtin_ia32_pshufb128(table, in[0]);
scream.c: ^~
scream.c: scream.c:216:7: error: conversion of scalar 'int' to vector 'v16qi {aka __vector(16) char}' involves truncation
scream.c: C ^= __builtin_ia32_pshufb128(table, in[2]);
scream.c: ^~
scream.c: scream.c:220:7: error: conversion of scalar 'int' to vector 'v16qi {aka __vector(16) char}' involves truncation
scream.c: B ^= __builtin_ia32_pshufb128(table, in[0]);
scream.c: ^~
scream.c: ...
scream.c: scream.c: In function 'LBox16P':
scream.c: scream.c:202:10: warning: implicit declaration of function '__builtin_ia32_pshufb128' [-Wimplicit-function-declaration]
scream.c: A = __builtin_ia32_pshufb128(table, t0);
scream.c: ^~~~~~~~~~~~~~~~~~~~~~~~
scream.c: scream.c:202:8: error: incompatible types when assigning to type 'v16qi {aka __vector(16) char}' from type 'int'
scream.c: A = __builtin_ia32_pshufb128(table, t0);
scream.c: ^
scream.c: scream.c:203:8: error: incompatible types when assigning to type 'v16qi {aka __vector(16) char}' from type 'int'
scream.c: C = __builtin_ia32_pshufb128(table, t1);
scream.c: ^
scream.c: scream.c:207:8: error: incompatible types when assigning to type 'v16qi {aka __vector(16) char}' from type 'int'
scream.c: B = __builtin_ia32_pshufb128(table, t0);
scream.c: ^
scream.c: scream.c:208:8: error: incompatible types when assigning to type 'v16qi {aka __vector(16) char}' from type 'int'
scream.c: D = __builtin_ia32_pshufb128(table, t1);
scream.c: ^
scream.c: scream.c:215:7: error: conversion of scalar 'int' to vector 'v16qi {aka __vector(16) char}' involves truncation
scream.c: A ^= __builtin_ia32_pshufb128(table, in[0]);
scream.c: ^~
scream.c: scream.c:216:7: error: conversion of scalar 'int' to vector 'v16qi {aka __vector(16) char}' involves truncation
scream.c: C ^= __builtin_ia32_pshufb128(table, in[2]);
scream.c: ^~
scream.c: scream.c:220:7: error: conversion of scalar 'int' to vector 'v16qi {aka __vector(16) char}' involves truncation
scream.c: B ^= __builtin_ia32_pshufb128(table, in[0]);
scream.c: ^~
scream.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -m64 -march=barcelona -O2 -fomit-frame-pointer sse
gcc -m64 -march=barcelona -O3 -fomit-frame-pointer sse
gcc -m64 -march=barcelona -O -fomit-frame-pointer sse
gcc -m64 -march=barcelona -Os -fomit-frame-pointer sse