Implementation notes: x86, samba, crypto_kem/kyber512

Computer: samba
Architecture: x86
CPU ID: GenuineIntel-000506e3-bfebfbff
SUPERCOP version: 20190803
Operation: crypto_kem
Primitive: kyber512
TimeImplementationCompilerBenchmark dateSUPERCOP version
590882refgcc -funroll-loops -m32 -march=pentium-m -O3 -fomit-frame-pointer2019081020190803
596422refgcc -m32 -march=pentium-m -O3 -fomit-frame-pointer2019081020190803
600929refgcc -funroll-loops -m32 -march=pentium-m -O2 -fomit-frame-pointer2019081020190803
602562refgcc -m32 -march=core-avx2 -O3 -fomit-frame-pointer2019081020190803
607836refgcc -m32 -march=corei7 -O3 -fomit-frame-pointer2019081020190803
614379refgcc -m32 -march=core2 -O3 -fomit-frame-pointer2019081020190803
615282refgcc -m32 -march=native -mtune=native -O3 -fomit-frame-pointer2019081020190803
620980refgcc -m32 -march=core2 -msse4.1 -O3 -fomit-frame-pointer2019081020190803
624504refgcc -m32 -march=core2 -msse4 -O3 -fomit-frame-pointer2019081020190803
631575refgcc -funroll-loops -m32 -march=k8 -O -fomit-frame-pointer2019081020190803
634365refgcc -m32 -march=core-avx2 -O2 -fomit-frame-pointer2019081020190803
637873refgcc -m32 -march=corei7-avx -O2 -fomit-frame-pointer2019081020190803
638419refgcc -funroll-loops -m32 -march=pentium-m -O -fomit-frame-pointer2019081020190803
642926refgcc -m32 -march=core-avx-i -O2 -fomit-frame-pointer2019081020190803
644857refgcc -m32 -march=native -mtune=native -O2 -fomit-frame-pointer2019081020190803
646015refgcc -funroll-loops -m32 -march=barcelona -O -fomit-frame-pointer2019081020190803
647964refgcc -m32 -march=corei7 -O2 -fomit-frame-pointer2019081020190803
655186refgcc -m32 -march=core2 -msse4 -O2 -fomit-frame-pointer2019081020190803
656897refgcc -m32 -march=pentium-m -O2 -fomit-frame-pointer2019081020190803
657060refgcc -m32 -march=core-avx-i -O3 -fomit-frame-pointer2019081020190803
658232refgcc -m32 -march=core2 -msse4.1 -O2 -fomit-frame-pointer2019081020190803
660926refgcc -m32 -march=corei7-avx -O3 -fomit-frame-pointer2019081020190803
662327refgcc -m32 -march=core-avx2 -Os -fomit-frame-pointer2019081020190803
664351refgcc -m32 -march=native -mtune=native -Os -fomit-frame-pointer2019081020190803
674466refgcc -m32 -march=core2 -msse4 -Os -fomit-frame-pointer2019081020190803
674653refgcc -m32 -march=core2 -O2 -fomit-frame-pointer2019081020190803
676641refgcc -m32 -march=core2 -msse4.1 -Os -fomit-frame-pointer2019081020190803
678133refgcc -m32 -march=core-avx2 -O -fomit-frame-pointer2019081020190803
678425refgcc -m32 -march=core-avx-i -Os -fomit-frame-pointer2019081020190803
680196refgcc -m32 -march=corei7-avx -Os -fomit-frame-pointer2019081020190803
683107refgcc -m32 -march=corei7 -Os -fomit-frame-pointer2019081020190803
685336refgcc -m32 -march=pentium-m -O -fomit-frame-pointer2019081020190803
686042refgcc -m32 -march=native -mtune=native -O -fomit-frame-pointer2019081020190803
686728refgcc -funroll-loops -m32 -march=prescott -O3 -fomit-frame-pointer2019081020190803
688623refgcc -m32 -march=k8 -O -fomit-frame-pointer2019081020190803
688774refgcc -funroll-loops -m32 -march=pentium-m -Os -fomit-frame-pointer2019081020190803
690600refgcc -m32 -march=nocona -Os -fomit-frame-pointer2019081020190803
690657refgcc -m32 -march=core2 -Os -fomit-frame-pointer2019081020190803
691100refgcc -m32 -march=pentium4 -Os -fomit-frame-pointer2019081020190803
692204refgcc -m32 -march=prescott -Os -fomit-frame-pointer2019081020190803
695413refgcc -m32 -march=pentium-m -Os -fomit-frame-pointer2019081020190803
695984refgcc -funroll-loops -m32 -march=nocona -Os -fomit-frame-pointer2019081020190803
700359refgcc -m32 -march=core2 -msse4 -O -fomit-frame-pointer2019081020190803
700651refgcc -funroll-loops -m32 -march=athlon -O3 -fomit-frame-pointer2019081020190803
701528refgcc -funroll-loops -m32 -march=prescott -O -fomit-frame-pointer2019081020190803
704288refgcc -funroll-loops -m32 -march=pentium3 -O -fomit-frame-pointer2019081020190803
704421refgcc -funroll-loops -m32 -march=pentiumpro -O -fomit-frame-pointer2019081020190803
704469refgcc -funroll-loops -m32 -march=prescott -Os -fomit-frame-pointer2019081020190803
704619refgcc -funroll-loops -m32 -march=nocona -O3 -fomit-frame-pointer2019081020190803
704734refgcc -funroll-loops -m32 -march=nocona -O2 -fomit-frame-pointer2019081020190803
705299refgcc -m32 -march=core2 -O -fomit-frame-pointer2019081020190803
705504refgcc -funroll-loops -m32 -march=pentium4 -Os -fomit-frame-pointer2019081020190803
710399refgcc -m32 -march=core2 -msse4.1 -O -fomit-frame-pointer2019081020190803
713143refgcc -funroll-loops -m32 -O3 -fomit-frame-pointer2019081020190803
713807refgcc -funroll-loops -m32 -march=nocona -O -fomit-frame-pointer2019081020190803
713982refgcc -funroll-loops -m32 -march=prescott -O2 -fomit-frame-pointer2019081020190803
714280refgcc -m32 -march=barcelona -O -fomit-frame-pointer2019081020190803
714893refgcc -m32 -march=corei7-avx -O -fomit-frame-pointer2019081020190803
715789refgcc -funroll-loops -m32 -march=pentium2 -O -fomit-frame-pointer2019081020190803
718583refgcc -m32 -march=prescott -O3 -fomit-frame-pointer2019081020190803
721550refgcc -funroll-loops -m32 -march=pentium3 -O3 -fomit-frame-pointer2019081020190803
721740refgcc -m32 -march=corei7 -O -fomit-frame-pointer2019081020190803
722232refgcc -m32 -march=core-avx-i -O -fomit-frame-pointer2019081020190803
722992refgcc -funroll-loops -m32 -march=athlon -O2 -fomit-frame-pointer2019081020190803
724603refgcc -m32 -march=athlon -O3 -fomit-frame-pointer2019081020190803
729012refgcc -funroll-loops -m32 -march=pentiumpro -O3 -fomit-frame-pointer2019081020190803
729675refgcc -funroll-loops -m32 -march=k6-2 -O -fomit-frame-pointer2019081020190803
729861refgcc -m32 -march=nocona -O3 -fomit-frame-pointer2019081020190803
731292refgcc -funroll-loops -m32 -march=athlon -O -fomit-frame-pointer2019081020190803
732311refgcc -funroll-loops -m32 -march=pentium2 -O3 -fomit-frame-pointer2019081020190803
737276refgcc -funroll-loops -m32 -march=k6-3 -O -fomit-frame-pointer2019081020190803
740729refgcc -funroll-loops -m32 -march=k6 -O -fomit-frame-pointer2019081020190803
742152refgcc -funroll-loops -m32 -march=pentium3 -O2 -fomit-frame-pointer2019081020190803
743283refgcc -funroll-loops -m32 -O2 -fomit-frame-pointer2019081020190803
743401refgcc -funroll-loops -m32 -O -fomit-frame-pointer2019081020190803
746082refgcc -funroll-loops -m32 -march=pentiumpro -O2 -fomit-frame-pointer2019081020190803
746670refgcc -funroll-loops -m32 -march=pentium4 -O3 -fomit-frame-pointer2019081020190803
751119refgcc -funroll-loops -m32 -march=pentium2 -O2 -fomit-frame-pointer2019081020190803
755121refgcc -funroll-loops -m32 -march=k6-3 -O2 -fomit-frame-pointer2019081020190803
755308refgcc -funroll-loops -m32 -march=k6-2 -O2 -fomit-frame-pointer2019081020190803
756088refgcc -funroll-loops -m32 -march=k6 -O3 -fomit-frame-pointer2019081020190803
756379refgcc -m32 -march=nocona -O -fomit-frame-pointer2019081020190803
756789refgcc -funroll-loops -m32 -march=pentium4 -O2 -fomit-frame-pointer2019081020190803
757962refgcc -m32 -march=pentium3 -O3 -fomit-frame-pointer2019081020190803
758923refgcc -m32 -O3 -fomit-frame-pointer2019081020190803
761272refgcc -m32 -march=prescott -O -fomit-frame-pointer2019081020190803
761556refgcc -funroll-loops -m32 -march=k6-2 -O3 -fomit-frame-pointer2019081020190803
762373refgcc -funroll-loops -m32 -march=k6-3 -O3 -fomit-frame-pointer2019081020190803
762633refgcc -m32 -march=k6-3 -O3 -fomit-frame-pointer2019081020190803
763201refgcc -funroll-loops -m32 -march=pentium4 -O -fomit-frame-pointer2019081020190803
765014refgcc -m32 -march=pentium2 -O -fomit-frame-pointer2019081020190803
765893refgcc -m32 -march=pentiumpro -O3 -fomit-frame-pointer2019081020190803
766913refgcc -m32 -march=pentiumpro -O -fomit-frame-pointer2019081020190803
768323refgcc -m32 -march=pentium3 -O -fomit-frame-pointer2019081020190803
768611refgcc -m32 -march=pentium2 -O3 -fomit-frame-pointer2019081020190803
774466refgcc -funroll-loops -m32 -march=k6 -O2 -fomit-frame-pointer2019081020190803
776282refgcc -m32 -march=pentium4 -O3 -fomit-frame-pointer2019081020190803
776744refgcc -funroll-loops -m32 -march=k6 -Os -fomit-frame-pointer2019081020190803
777883refgcc -m32 -march=athlon -O -fomit-frame-pointer2019081020190803
778895refgcc -m32 -march=prescott -O2 -fomit-frame-pointer2019081020190803
779252refgcc -funroll-loops -m32 -march=k6-3 -Os -fomit-frame-pointer2019081020190803
781038refgcc -m32 -march=k6-2 -O3 -fomit-frame-pointer2019081020190803
781758refgcc -m32 -march=k6 -Os -fomit-frame-pointer2019081020190803
781827refgcc -m32 -march=k6-2 -Os -fomit-frame-pointer2019081020190803
784655refgcc -funroll-loops -m32 -march=k6-2 -Os -fomit-frame-pointer2019081020190803
784952refgcc -m32 -march=k6-3 -Os -fomit-frame-pointer2019081020190803
785262refgcc -m32 -march=nocona -O2 -fomit-frame-pointer2019081020190803
789171refgcc -funroll-loops -m32 -march=i386 -Os -fomit-frame-pointer2019081020190803
789587refgcc -m32 -march=athlon -O2 -fomit-frame-pointer2019081020190803
790855refgcc -m32 -Os -fomit-frame-pointer2019081020190803
794421refgcc -m32 -march=pentium3 -Os -fomit-frame-pointer2019081020190803
795095refgcc -m32 -march=pentium2 -Os -fomit-frame-pointer2019081020190803
795574refgcc -m32 -march=i486 -Os -fomit-frame-pointer2019081020190803
795692refgcc -m32 -march=pentium2 -O2 -fomit-frame-pointer2019081020190803
796014refgcc -m32 -march=i386 -Os -fomit-frame-pointer2019081020190803
796034refgcc -m32 -march=athlon -Os -fomit-frame-pointer2019081020190803
796641refgcc -m32 -march=pentium -Os -fomit-frame-pointer2019081020190803
797861refgcc -funroll-loops -m32 -Os -fomit-frame-pointer2019081020190803
798535refgcc -m32 -march=k6 -O3 -fomit-frame-pointer2019081020190803
798738refgcc -funroll-loops -m32 -march=pentium -Os -fomit-frame-pointer2019081020190803
798980refgcc -funroll-loops -m32 -march=athlon -Os -fomit-frame-pointer2019081020190803
799948refgcc -funroll-loops -m32 -march=pentium-mmx -Os -fomit-frame-pointer2019081020190803
801326refgcc -funroll-loops -m32 -march=pentium3 -Os -fomit-frame-pointer2019081020190803
801899refgcc -funroll-loops -m32 -march=i486 -Os -fomit-frame-pointer2019081020190803
803093refgcc -m32 -march=pentiumpro -Os -fomit-frame-pointer2019081020190803
803863refgcc -m32 -march=pentium-mmx -Os -fomit-frame-pointer2019081020190803
805167refgcc -m32 -O -fomit-frame-pointer2019081020190803
806046refgcc -m32 -O2 -fomit-frame-pointer2019081020190803
806145refgcc -funroll-loops -m32 -march=pentiumpro -Os -fomit-frame-pointer2019081020190803
807201refgcc -funroll-loops -m32 -march=pentium2 -Os -fomit-frame-pointer2019081020190803
807667refgcc -m32 -march=pentium3 -O2 -fomit-frame-pointer2019081020190803
816024refgcc -m32 -march=pentiumpro -O2 -fomit-frame-pointer2019081020190803
819016refgcc -m32 -march=k6 -O -fomit-frame-pointer2019081020190803
821625refgcc -m32 -march=pentium4 -O2 -fomit-frame-pointer2019081020190803
824992refgcc -m32 -march=pentium4 -O -fomit-frame-pointer2019081020190803
825464refgcc -m32 -march=k6-2 -O -fomit-frame-pointer2019081020190803
827411refgcc -funroll-loops -m32 -march=pentium-mmx -O -fomit-frame-pointer2019081020190803
829425refgcc -m32 -march=k6-3 -O -fomit-frame-pointer2019081020190803
838978refgcc -funroll-loops -m32 -march=i486 -O -fomit-frame-pointer2019081020190803
839380refgcc -funroll-loops -m32 -march=pentium -O -fomit-frame-pointer2019081020190803
843718refgcc -funroll-loops -m32 -march=i386 -O -fomit-frame-pointer2019081020190803
846844refgcc -m32 -march=k6-3 -O2 -fomit-frame-pointer2019081020190803
847899refgcc -funroll-loops -m32 -march=i386 -O3 -fomit-frame-pointer2019081020190803
851571refgcc -funroll-loops -m32 -march=i486 -O2 -fomit-frame-pointer2019081020190803
854245refgcc -m32 -march=k6 -O2 -fomit-frame-pointer2019081020190803
854373refgcc -funroll-loops -m32 -march=i486 -O3 -fomit-frame-pointer2019081020190803
859488refgcc -m32 -march=k6-2 -O2 -fomit-frame-pointer2019081020190803
859837refgcc -funroll-loops -m32 -march=i386 -O2 -fomit-frame-pointer2019081020190803
884692refgcc -m32 -march=i386 -O3 -fomit-frame-pointer2019081020190803
897863refgcc -m32 -march=pentium -O -fomit-frame-pointer2019081020190803
900782refgcc -m32 -march=i486 -O3 -fomit-frame-pointer2019081020190803
904030refgcc -m32 -march=pentium-mmx -O -fomit-frame-pointer2019081020190803
905032refgcc -m32 -march=i386 -O -fomit-frame-pointer2019081020190803
923137refgcc -m32 -march=i486 -O -fomit-frame-pointer2019081020190803
929562refgcc -m32 -march=i386 -O2 -fomit-frame-pointer2019081020190803
947081refgcc -m32 -march=i486 -O2 -fomit-frame-pointer2019081020190803
952185refgcc -funroll-loops -m32 -march=pentium-mmx -O3 -fomit-frame-pointer2019081020190803
968827refgcc -funroll-loops -m32 -march=pentium -O3 -fomit-frame-pointer2019081020190803
983922refgcc -funroll-loops -m32 -march=pentium -O2 -fomit-frame-pointer2019081020190803
994184refgcc -m32 -march=pentium -O3 -fomit-frame-pointer2019081020190803
995402refgcc -funroll-loops -m32 -march=pentium-mmx -O2 -fomit-frame-pointer2019081020190803
998839refgcc -m32 -march=pentium-mmx -O3 -fomit-frame-pointer2019081020190803
1025795refgcc -m32 -march=pentium -O2 -fomit-frame-pointer2019081020190803
1031358refgcc -m32 -march=pentium-mmx -O2 -fomit-frame-pointer2019081020190803
1305036refgcc -funroll-loops -m32 -march=barcelona -O3 -fomit-frame-pointer2019081020190803
1335768refgcc -m32 -march=barcelona -O3 -fomit-frame-pointer2019081020190803
1352629refgcc -funroll-loops -m32 -march=k8 -O2 -fomit-frame-pointer2019081020190803
1361556refgcc -funroll-loops -m32 -march=barcelona -O2 -fomit-frame-pointer2019081020190803
1380278refgcc -m32 -march=k8 -O3 -fomit-frame-pointer2019081020190803
1384811refgcc -funroll-loops -m32 -march=k8 -O3 -fomit-frame-pointer2019081020190803
1413492refgcc -m32 -march=barcelona -Os -fomit-frame-pointer2019081020190803
1417060refgcc -funroll-loops -m32 -march=barcelona -Os -fomit-frame-pointer2019081020190803
1447592refgcc -m32 -march=barcelona -O2 -fomit-frame-pointer2019081020190803
1462325refgcc -m32 -march=k8 -Os -fomit-frame-pointer2019081020190803
1462371refgcc -funroll-loops -m32 -march=k8 -Os -fomit-frame-pointer2019081020190803
1462604refgcc -m32 -march=k8 -O2 -fomit-frame-pointer2019081020190803

Compiler output

Implementation: crypto_kem/kyber512/avx2
Compiler: gcc -funroll-loops -m32 -O2 -fomit-frame-pointer
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c: In function 'KeccakP1600times4_AddLanesAll':
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:135:40: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
KeccakP-1600-times4-SIMD256.c: #define Xor_In4( argIndex ) lanes0 = LOAD256u( curData0[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 0 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:143:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+3], lanes3 )
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:149:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 12 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ...

Number of similar (compiler,implementation) pairs: 156, namely:
CompilerImplementations
gcc -funroll-loops -m32 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=athlon -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=athlon -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=athlon -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=athlon -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=barcelona -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=barcelona -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=barcelona -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=barcelona -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i386 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i386 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i386 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i386 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i486 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i486 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i486 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i486 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-2 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-2 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-2 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-2 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-3 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-3 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-3 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-3 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k8 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k8 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k8 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k8 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=nocona -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=nocona -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=nocona -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=nocona -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-m -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-m -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-m -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-m -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-mmx -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-mmx -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-mmx -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-mmx -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium2 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium2 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium2 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium2 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium3 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium3 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium3 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium3 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium4 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium4 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium4 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium4 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentiumpro -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentiumpro -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentiumpro -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentiumpro -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=prescott -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=prescott -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=prescott -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=prescott -Os -fomit-frame-pointer avx2
gcc -m32 -O2 -fomit-frame-pointer avx2
gcc -m32 -O3 -fomit-frame-pointer avx2
gcc -m32 -O -fomit-frame-pointer avx2
gcc -m32 -Os -fomit-frame-pointer avx2
gcc -m32 -march=athlon -O2 -fomit-frame-pointer avx2
gcc -m32 -march=athlon -O3 -fomit-frame-pointer avx2
gcc -m32 -march=athlon -O -fomit-frame-pointer avx2
gcc -m32 -march=athlon -Os -fomit-frame-pointer avx2
gcc -m32 -march=core2 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -O -fomit-frame-pointer avx2
gcc -m32 -march=core2 -Os -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4.1 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4.1 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4.1 -O -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4.1 -Os -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4 -O -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4 -Os -fomit-frame-pointer avx2
gcc -m32 -march=corei7 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=corei7 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=corei7 -O -fomit-frame-pointer avx2
gcc -m32 -march=corei7 -Os -fomit-frame-pointer avx2
gcc -m32 -march=i386 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=i386 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=i386 -O -fomit-frame-pointer avx2
gcc -m32 -march=i386 -Os -fomit-frame-pointer avx2
gcc -m32 -march=i486 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=i486 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=i486 -O -fomit-frame-pointer avx2
gcc -m32 -march=i486 -Os -fomit-frame-pointer avx2
gcc -m32 -march=k6-2 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=k6-2 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=k6-2 -O -fomit-frame-pointer avx2
gcc -m32 -march=k6-2 -Os -fomit-frame-pointer avx2
gcc -m32 -march=k6-3 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=k6-3 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=k6-3 -O -fomit-frame-pointer avx2
gcc -m32 -march=k6-3 -Os -fomit-frame-pointer avx2
gcc -m32 -march=k6 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=k6 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=k6 -O -fomit-frame-pointer avx2
gcc -m32 -march=k6 -Os -fomit-frame-pointer avx2
gcc -m32 -march=k8 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=k8 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=k8 -O -fomit-frame-pointer avx2
gcc -m32 -march=k8 -Os -fomit-frame-pointer avx2
gcc -m32 -march=nocona -O2 -fomit-frame-pointer avx2
gcc -m32 -march=nocona -O3 -fomit-frame-pointer avx2
gcc -m32 -march=nocona -O -fomit-frame-pointer avx2
gcc -m32 -march=nocona -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium-m -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium-m -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium-m -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium-m -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium-mmx -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium-mmx -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium-mmx -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium-mmx -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium2 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium2 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium2 -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium2 -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium3 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium3 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium3 -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium3 -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium4 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium4 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium4 -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium4 -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentiumpro -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentiumpro -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentiumpro -O -fomit-frame-pointer avx2
gcc -m32 -march=pentiumpro -Os -fomit-frame-pointer avx2
gcc -m32 -march=prescott -O2 -fomit-frame-pointer avx2
gcc -m32 -march=prescott -O3 -fomit-frame-pointer avx2
gcc -m32 -march=prescott -O -fomit-frame-pointer avx2
gcc -m32 -march=prescott -Os -fomit-frame-pointer avx2

Compiler output

Implementation: crypto_kem/kyber512/avx2
Compiler: gcc -m32 -march=barcelona -O2 -fomit-frame-pointer
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c: In function 'KeccakP1600times4_AddLanesAll':
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:135:40: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
KeccakP-1600-times4-SIMD256.c: #define Xor_In4( argIndex ) lanes0 = LOAD256u( curData0[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 0 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:143:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+3], lanes3 )
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:149:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 12 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ...
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c: In function 'KeccakP1600times4_AddLanesAll':
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:135:40: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
KeccakP-1600-times4-SIMD256.c: #define Xor_In4( argIndex ) lanes0 = LOAD256u( curData0[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 0 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:143:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+3], lanes3 )
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:149:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 12 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -m32 -march=barcelona -O2 -fomit-frame-pointer avx2
gcc -m32 -march=barcelona -O3 -fomit-frame-pointer avx2
gcc -m32 -march=barcelona -O -fomit-frame-pointer avx2
gcc -m32 -march=barcelona -Os -fomit-frame-pointer avx2

Compiler output

Implementation: crypto_kem/kyber512/avx2
Compiler: gcc -m32 -march=core-avx-i -O2 -fomit-frame-pointer
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c: In function 'KeccakP1600times4_AddLanesAll':
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:143:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+3], lanes3 )
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:149:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 12 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:142:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+2], lanes2 ),\
KeccakP-1600-times4-SIMD256.c: ...

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
gcc -m32 -march=core-avx-i -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core-avx-i -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core-avx-i -O -fomit-frame-pointer avx2
gcc -m32 -march=core-avx-i -Os -fomit-frame-pointer avx2
gcc -m32 -march=corei7-avx -O2 -fomit-frame-pointer avx2
gcc -m32 -march=corei7-avx -O3 -fomit-frame-pointer avx2
gcc -m32 -march=corei7-avx -O -fomit-frame-pointer avx2
gcc -m32 -march=corei7-avx -Os -fomit-frame-pointer avx2

Compiler output

Implementation: crypto_kem/kyber512/avx2
Compiler: gcc -m32 -march=core-avx2 -O2 -fomit-frame-pointer
basemul.S: basemul.S: Assembler messages:
basemul.S: basemul.S:79: Error: bad register name `%rip)'
basemul.S: basemul.S:80: Error: bad register name `%rip)'
basemul.S: basemul.S:81: Error: bad register name `%rcx)'
basemul.S: basemul.S:84: Error: bad register name `%rsi)'
basemul.S: basemul.S:84: Error: bad register name `%rdx)'
basemul.S: basemul.S:84: Error: bad register name `%rsi)'
basemul.S: basemul.S:84: Error: bad register name `%rdx)'
basemul.S: basemul.S:84: Error: bad register name `%ymm8'
basemul.S: basemul.S:84: Error: bad register name `%ymm8'
basemul.S: basemul.S:84: Error: bad register name `%ymm10'
basemul.S: basemul.S:84: Error: bad register name `%ymm10'
basemul.S: basemul.S:84: Error: bad register name `%ymm9'
basemul.S: basemul.S:84: Error: bad register name `%ymm9'
basemul.S: basemul.S:84: Error: bad register name `%ymm9'
basemul.S: basemul.S:84: Error: bad register name `%ymm9'
basemul.S: basemul.S:84: Error: bad register name `%ymm11'
basemul.S: basemul.S:84: Error: bad register name `%ymm11'
basemul.S: basemul.S:84: Error: bad register name `%ymm11'
basemul.S: basemul.S:84: Error: bad register name `%ymm11'
basemul.S: basemul.S:84: Error: bad register name `%ymm11'
basemul.S: basemul.S:84: Error: bad register name `%ymm13'
basemul.S: basemul.S:84: Error: bad register name `%ymm13'
basemul.S: basemul.S:84: Error: bad register name `%ymm8'
basemul.S: basemul.S:84: Error: bad register name `%ymm8'
basemul.S: ...

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
gcc -m32 -march=core-avx2 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core-avx2 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core-avx2 -O -fomit-frame-pointer avx2
gcc -m32 -march=core-avx2 -Os -fomit-frame-pointer avx2
gcc -m32 -march=native -mtune=native -O2 -fomit-frame-pointer avx2
gcc -m32 -march=native -mtune=native -O3 -fomit-frame-pointer avx2
gcc -m32 -march=native -mtune=native -O -fomit-frame-pointer avx2
gcc -m32 -march=native -mtune=native -Os -fomit-frame-pointer avx2