Implementation notes: x86, samba, crypto_sign/dilithium4

Computer: samba
Architecture: x86
CPU ID: GenuineIntel-000506e3-bfebfbff
SUPERCOP version: 20190803
Operation: crypto_sign
Primitive: dilithium4
TimeImplementationCompilerBenchmark dateSUPERCOP version
4330048refgcc -m32 -march=corei7 -O3 -fomit-frame-pointer2019080520190803
4330397refgcc -m32 -march=core2 -msse4 -O3 -fomit-frame-pointer2019080520190803
4395395refgcc -funroll-loops -m32 -march=pentium4 -O3 -fomit-frame-pointer2019080520190803
4396316refgcc -m32 -march=core2 -O3 -fomit-frame-pointer2019080520190803
4420927refgcc -m32 -march=core-avx2 -O3 -fomit-frame-pointer2019080520190803
4430885refgcc -m32 -march=core2 -msse4.1 -O3 -fomit-frame-pointer2019080520190803
4465730refgcc -m32 -march=native -mtune=native -O3 -fomit-frame-pointer2019080520190803
4480242refgcc -funroll-loops -m32 -march=pentium4 -O2 -fomit-frame-pointer2019080520190803
4507071refgcc -m32 -march=pentium4 -O3 -fomit-frame-pointer2019080520190803
4516356refgcc -funroll-loops -m32 -march=pentium4 -O -fomit-frame-pointer2019080520190803
4563596refgcc -funroll-loops -m32 -march=pentium-m -O -fomit-frame-pointer2019080520190803
4566540refgcc -funroll-loops -m32 -march=k8 -O -fomit-frame-pointer2019080520190803
4591240refgcc -funroll-loops -m32 -march=barcelona -O -fomit-frame-pointer2019080520190803
4594936refgcc -m32 -march=native -mtune=native -O2 -fomit-frame-pointer2019080520190803
4620953refgcc -m32 -march=corei7-avx -O2 -fomit-frame-pointer2019080520190803
4627351refgcc -m32 -march=core-avx-i -O2 -fomit-frame-pointer2019080520190803
4629758refgcc -m32 -march=core-avx2 -O2 -fomit-frame-pointer2019080520190803
4632028refgcc -m32 -march=core-avx-i -O3 -fomit-frame-pointer2019080520190803
4637581refgcc -funroll-loops -m32 -march=athlon -O3 -fomit-frame-pointer2019080520190803
4658782refgcc -m32 -march=corei7-avx -O3 -fomit-frame-pointer2019080520190803
4661439refgcc -m32 -march=corei7 -O2 -fomit-frame-pointer2019080520190803
4668486refgcc -m32 -march=core2 -msse4 -O2 -fomit-frame-pointer2019080520190803
4677631refgcc -m32 -march=core2 -msse4.1 -O2 -fomit-frame-pointer2019080520190803
4728563refgcc -m32 -march=athlon -O3 -fomit-frame-pointer2019080520190803
4737594refgcc -m32 -march=core-avx2 -Os -fomit-frame-pointer2019080520190803
4743073refgcc -m32 -march=native -mtune=native -Os -fomit-frame-pointer2019080520190803
4761563refgcc -m32 -march=core-avx-i -Os -fomit-frame-pointer2019080520190803
4774721refgcc -m32 -march=core2 -msse4.1 -Os -fomit-frame-pointer2019080520190803
4782402refgcc -m32 -march=corei7-avx -Os -fomit-frame-pointer2019080520190803
4782498refgcc -m32 -march=corei7 -Os -fomit-frame-pointer2019080520190803
4790679refgcc -m32 -march=core2 -msse4 -Os -fomit-frame-pointer2019080520190803
4814304refgcc -m32 -march=core2 -O2 -fomit-frame-pointer2019080520190803
4825806refgcc -funroll-loops -m32 -march=prescott -O2 -fomit-frame-pointer2019080520190803
4828205refgcc -funroll-loops -m32 -march=nocona -O2 -fomit-frame-pointer2019080520190803
4843199refgcc -m32 -march=pentium4 -O2 -fomit-frame-pointer2019080520190803
4843920refgcc -funroll-loops -m32 -march=prescott -O3 -fomit-frame-pointer2019080520190803
4844044refgcc -funroll-loops -m32 -O3 -fomit-frame-pointer2019080520190803
4845361refgcc -funroll-loops -m32 -march=athlon -O2 -fomit-frame-pointer2019080520190803
4855164refgcc -funroll-loops -m32 -march=nocona -O -fomit-frame-pointer2019080520190803
4864345refgcc -funroll-loops -m32 -march=pentium2 -O -fomit-frame-pointer2019080520190803
4870891refgcc -funroll-loops -m32 -march=k6-2 -O3 -fomit-frame-pointer2019080520190803
4874939refgcc -funroll-loops -m32 -march=pentiumpro -O -fomit-frame-pointer2019080520190803
4876740refgcc -m32 -march=prescott -Os -fomit-frame-pointer2019080520190803
4881667refgcc -m32 -march=nocona -Os -fomit-frame-pointer2019080520190803
4885573refgcc -funroll-loops -m32 -march=k6 -O3 -fomit-frame-pointer2019080520190803
4885665refgcc -funroll-loops -m32 -O2 -fomit-frame-pointer2019080520190803
4887081refgcc -m32 -march=prescott -O3 -fomit-frame-pointer2019080520190803
4891076refgcc -funroll-loops -m32 -march=k6-3 -O3 -fomit-frame-pointer2019080520190803
4898046refgcc -funroll-loops -m32 -march=prescott -O -fomit-frame-pointer2019080520190803
4899665refgcc -m32 -march=pentium4 -Os -fomit-frame-pointer2019080520190803
4900170refgcc -funroll-loops -m32 -march=k6-3 -O2 -fomit-frame-pointer2019080520190803
4904930refgcc -m32 -march=core2 -Os -fomit-frame-pointer2019080520190803
4910556refgcc -funroll-loops -m32 -march=nocona -O3 -fomit-frame-pointer2019080520190803
4919577refgcc -funroll-loops -m32 -march=nocona -Os -fomit-frame-pointer2019080520190803
4927429refgcc -funroll-loops -m32 -march=pentium3 -O -fomit-frame-pointer2019080520190803
4928647refgcc -funroll-loops -m32 -march=athlon -O -fomit-frame-pointer2019080520190803
4935074refgcc -funroll-loops -m32 -march=pentium4 -Os -fomit-frame-pointer2019080520190803
4935964refgcc -funroll-loops -m32 -march=k6 -O2 -fomit-frame-pointer2019080520190803
4943861refgcc -funroll-loops -m32 -march=prescott -Os -fomit-frame-pointer2019080520190803
4945774refgcc -m32 -march=nocona -O3 -fomit-frame-pointer2019080520190803
4961718refgcc -funroll-loops -m32 -march=pentium3 -O3 -fomit-frame-pointer2019080520190803
4966661refgcc -funroll-loops -m32 -march=pentium-m -Os -fomit-frame-pointer2019080520190803
4976879refgcc -m32 -march=k6-3 -O3 -fomit-frame-pointer2019080520190803
4987417refgcc -funroll-loops -m32 -march=pentium2 -O2 -fomit-frame-pointer2019080520190803
4988798refgcc -m32 -march=pentium-m -Os -fomit-frame-pointer2019080520190803
4988949refgcc -funroll-loops -m32 -march=k6-2 -O2 -fomit-frame-pointer2019080520190803
4989070refgcc -funroll-loops -m32 -march=pentium2 -O3 -fomit-frame-pointer2019080520190803
4989968refgcc -m32 -march=k6-2 -O3 -fomit-frame-pointer2019080520190803
4994683refgcc -funroll-loops -m32 -march=pentiumpro -O2 -fomit-frame-pointer2019080520190803
5016170refgcc -funroll-loops -m32 -O -fomit-frame-pointer2019080520190803
5021993refgcc -funroll-loops -m32 -march=pentium3 -O2 -fomit-frame-pointer2019080520190803
5025066refgcc -m32 -march=core-avx2 -O -fomit-frame-pointer2019080520190803
5035851refgcc -m32 -O3 -fomit-frame-pointer2019080520190803
5037635refgcc -funroll-loops -m32 -march=pentiumpro -O3 -fomit-frame-pointer2019080520190803
5050925refgcc -funroll-loops -m32 -march=k6-3 -O -fomit-frame-pointer2019080520190803
5062545refgcc -m32 -march=pentium-m -O -fomit-frame-pointer2019080520190803
5080234refgcc -funroll-loops -m32 -march=k6 -O -fomit-frame-pointer2019080520190803
5083536refgcc -funroll-loops -m32 -march=k6-2 -O -fomit-frame-pointer2019080520190803
5083953refgcc -m32 -march=pentium3 -O3 -fomit-frame-pointer2019080520190803
5094405refgcc -m32 -march=native -mtune=native -O -fomit-frame-pointer2019080520190803
5111554refgcc -m32 -march=k8 -O -fomit-frame-pointer2019080520190803
5111845refgcc -m32 -march=k6 -O3 -fomit-frame-pointer2019080520190803
5111871refgcc -m32 -march=pentium2 -O3 -fomit-frame-pointer2019080520190803
5127990refgcc -m32 -march=pentium4 -O -fomit-frame-pointer2019080520190803
5205786refgcc -m32 -march=pentium2 -O -fomit-frame-pointer2019080520190803
5211940refgcc -m32 -march=pentium3 -O -fomit-frame-pointer2019080520190803
5221909refgcc -m32 -march=pentiumpro -O -fomit-frame-pointer2019080520190803
5223394refgcc -m32 -march=pentiumpro -O3 -fomit-frame-pointer2019080520190803
5224530refgcc -m32 -march=barcelona -O -fomit-frame-pointer2019080520190803
5241707refgcc -m32 -march=nocona -O2 -fomit-frame-pointer2019080520190803
5242518refgcc -m32 -march=core2 -msse4.1 -O -fomit-frame-pointer2019080520190803
5245561refgcc -m32 -march=core2 -msse4 -O -fomit-frame-pointer2019080520190803
5246072refgcc -m32 -march=corei7-avx -O -fomit-frame-pointer2019080520190803
5250175refgcc -m32 -march=prescott -O2 -fomit-frame-pointer2019080520190803
5251413refgcc -m32 -march=core-avx-i -O -fomit-frame-pointer2019080520190803
5258608refgcc -m32 -march=corei7 -O -fomit-frame-pointer2019080520190803
5260274refgcc -m32 -march=core2 -O -fomit-frame-pointer2019080520190803
5272940refgcc -m32 -march=athlon -O2 -fomit-frame-pointer2019080520190803
5317798refgcc -m32 -march=k6-3 -Os -fomit-frame-pointer2019080520190803
5319590refgcc -m32 -march=k6 -Os -fomit-frame-pointer2019080520190803
5320963refgcc -m32 -march=k6-2 -Os -fomit-frame-pointer2019080520190803
5327402refgcc -funroll-loops -m32 -march=k6 -Os -fomit-frame-pointer2019080520190803
5327563refgcc -funroll-loops -m32 -march=k6-2 -Os -fomit-frame-pointer2019080520190803
5370155refgcc -m32 -march=nocona -O -fomit-frame-pointer2019080520190803
5378861refgcc -m32 -march=pentium -Os -fomit-frame-pointer2019080520190803
5379292refgcc -funroll-loops -m32 -march=k6-3 -Os -fomit-frame-pointer2019080520190803
5380330refgcc -m32 -march=pentium-mmx -Os -fomit-frame-pointer2019080520190803
5395227refgcc -funroll-loops -m32 -Os -fomit-frame-pointer2019080520190803
5397728refgcc -m32 -march=prescott -O -fomit-frame-pointer2019080520190803
5407740refgcc -m32 -march=i486 -Os -fomit-frame-pointer2019080520190803
5411924refgcc -m32 -march=i386 -Os -fomit-frame-pointer2019080520190803
5415938refgcc -m32 -march=athlon -O -fomit-frame-pointer2019080520190803
5444250refgcc -m32 -march=athlon -Os -fomit-frame-pointer2019080520190803
5446181refgcc -funroll-loops -m32 -march=athlon -Os -fomit-frame-pointer2019080520190803
5449554refgcc -m32 -Os -fomit-frame-pointer2019080520190803
5450629refgcc -funroll-loops -m32 -march=pentium -Os -fomit-frame-pointer2019080520190803
5465551refgcc -funroll-loops -m32 -march=i386 -Os -fomit-frame-pointer2019080520190803
5471103refgcc -funroll-loops -m32 -march=i486 -Os -fomit-frame-pointer2019080520190803
5473365refgcc -m32 -march=pentiumpro -O2 -fomit-frame-pointer2019080520190803
5485905refgcc -m32 -march=pentium2 -O2 -fomit-frame-pointer2019080520190803
5494169refgcc -funroll-loops -m32 -march=pentium-mmx -Os -fomit-frame-pointer2019080520190803
5496595refgcc -funroll-loops -m32 -march=i486 -O2 -fomit-frame-pointer2019080520190803
5500470refgcc -m32 -march=pentiumpro -Os -fomit-frame-pointer2019080520190803
5503821refgcc -m32 -march=k6-2 -O2 -fomit-frame-pointer2019080520190803
5505000refgcc -m32 -march=k6-3 -O2 -fomit-frame-pointer2019080520190803
5505098refgcc -m32 -march=pentium2 -Os -fomit-frame-pointer2019080520190803
5507197refgcc -m32 -march=pentium3 -Os -fomit-frame-pointer2019080520190803
5510677refgcc -m32 -march=k6 -O2 -fomit-frame-pointer2019080520190803
5530465refgcc -funroll-loops -m32 -march=i486 -O3 -fomit-frame-pointer2019080520190803
5531127refgcc -funroll-loops -m32 -march=pentium2 -Os -fomit-frame-pointer2019080520190803
5534204refgcc -funroll-loops -m32 -march=pentium3 -Os -fomit-frame-pointer2019080520190803
5535319refgcc -m32 -march=pentium3 -O2 -fomit-frame-pointer2019080520190803
5547673refgcc -funroll-loops -m32 -march=pentiumpro -Os -fomit-frame-pointer2019080520190803
5558081refgcc -m32 -march=k6-2 -O -fomit-frame-pointer2019080520190803
5575342refgcc -m32 -march=k6-3 -O -fomit-frame-pointer2019080520190803
5575901refgcc -m32 -O2 -fomit-frame-pointer2019080520190803
5619246refgcc -m32 -march=k6 -O -fomit-frame-pointer2019080520190803
5631214refgcc -funroll-loops -m32 -march=pentium-mmx -O2 -fomit-frame-pointer2019080520190803
5658351refgcc -m32 -O -fomit-frame-pointer2019080520190803
5681405refgcc -funroll-loops -m32 -march=pentium-mmx -O -fomit-frame-pointer2019080520190803
5684625refgcc -funroll-loops -m32 -march=pentium -O3 -fomit-frame-pointer2019080520190803
5707769refgcc -funroll-loops -m32 -march=pentium -O -fomit-frame-pointer2019080520190803
5712796refgcc -funroll-loops -m32 -march=pentium-mmx -O3 -fomit-frame-pointer2019080520190803
5754323refgcc -funroll-loops -m32 -march=pentium -O2 -fomit-frame-pointer2019080520190803
5792514refgcc -m32 -march=i486 -O3 -fomit-frame-pointer2019080520190803
5794502refgcc -funroll-loops -m32 -march=i486 -O -fomit-frame-pointer2019080520190803
5985149refgcc -m32 -march=pentium -O3 -fomit-frame-pointer2019080520190803
5985187refgcc -funroll-loops -m32 -march=i386 -O -fomit-frame-pointer2019080520190803
6004451refgcc -m32 -march=pentium-mmx -O3 -fomit-frame-pointer2019080520190803
6053439refgcc -m32 -march=i486 -O2 -fomit-frame-pointer2019080520190803
6053871refgcc -funroll-loops -m32 -march=i386 -O3 -fomit-frame-pointer2019080520190803
6166464refgcc -m32 -march=i486 -O -fomit-frame-pointer2019080520190803
6181603refgcc -funroll-loops -m32 -march=i386 -O2 -fomit-frame-pointer2019080520190803
6229845refgcc -m32 -march=pentium -O -fomit-frame-pointer2019080520190803
6245034refgcc -m32 -march=pentium-mmx -O -fomit-frame-pointer2019080520190803
6343760refgcc -m32 -march=i386 -O -fomit-frame-pointer2019080520190803
6389473refgcc -m32 -march=i386 -O3 -fomit-frame-pointer2019080520190803
6429000refgcc -m32 -march=pentium-mmx -O2 -fomit-frame-pointer2019080520190803
6489974refgcc -m32 -march=pentium -O2 -fomit-frame-pointer2019080520190803
6744502refgcc -m32 -march=i386 -O2 -fomit-frame-pointer2019080520190803
8614752refgcc -funroll-loops -m32 -march=pentium-m -O3 -fomit-frame-pointer2019080520190803
8690333refgcc -funroll-loops -m32 -march=pentium-m -O2 -fomit-frame-pointer2019080520190803
8744087refgcc -m32 -march=pentium-m -O3 -fomit-frame-pointer2019080520190803
8962924refgcc -funroll-loops -m32 -march=barcelona -O3 -fomit-frame-pointer2019080520190803
8972936refgcc -m32 -march=pentium-m -O2 -fomit-frame-pointer2019080520190803
9057254refgcc -funroll-loops -m32 -march=barcelona -O2 -fomit-frame-pointer2019080520190803
9121405refgcc -m32 -march=barcelona -O3 -fomit-frame-pointer2019080520190803
9148125refgcc -funroll-loops -m32 -march=k8 -O2 -fomit-frame-pointer2019080520190803
9338434refgcc -m32 -march=barcelona -Os -fomit-frame-pointer2019080520190803
9434455refgcc -funroll-loops -m32 -march=barcelona -Os -fomit-frame-pointer2019080520190803
9565841refgcc -m32 -march=k8 -O3 -fomit-frame-pointer2019080520190803
9615068refgcc -funroll-loops -m32 -march=k8 -O3 -fomit-frame-pointer2019080520190803
9685766refgcc -m32 -march=barcelona -O2 -fomit-frame-pointer2019080520190803
9702351refgcc -funroll-loops -m32 -march=k8 -Os -fomit-frame-pointer2019080520190803
9785339refgcc -m32 -march=k8 -Os -fomit-frame-pointer2019080520190803
9846774refgcc -m32 -march=k8 -O2 -fomit-frame-pointer2019080520190803

Compiler output

Implementation: crypto_sign/dilithium4/avx2
Compiler: gcc -funroll-loops -m32 -O2 -fomit-frame-pointer
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c: In function 'KeccakP1600times4_AddLanesAll':
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:135:40: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
KeccakP-1600-times4-SIMD256.c: #define Xor_In4( argIndex ) lanes0 = LOAD256u( curData0[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 0 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:143:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+3], lanes3 )
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:149:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 12 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ...

Number of similar (compiler,implementation) pairs: 156, namely:
CompilerImplementations
gcc -funroll-loops -m32 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=athlon -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=athlon -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=athlon -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=athlon -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=barcelona -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=barcelona -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=barcelona -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=barcelona -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i386 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i386 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i386 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i386 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i486 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i486 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i486 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i486 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-2 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-2 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-2 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-2 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-3 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-3 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-3 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-3 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k8 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k8 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k8 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k8 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=nocona -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=nocona -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=nocona -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=nocona -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-m -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-m -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-m -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-m -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-mmx -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-mmx -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-mmx -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-mmx -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium2 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium2 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium2 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium2 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium3 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium3 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium3 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium3 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium4 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium4 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium4 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium4 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentiumpro -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentiumpro -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentiumpro -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentiumpro -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=prescott -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=prescott -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=prescott -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=prescott -Os -fomit-frame-pointer avx2
gcc -m32 -O2 -fomit-frame-pointer avx2
gcc -m32 -O3 -fomit-frame-pointer avx2
gcc -m32 -O -fomit-frame-pointer avx2
gcc -m32 -Os -fomit-frame-pointer avx2
gcc -m32 -march=athlon -O2 -fomit-frame-pointer avx2
gcc -m32 -march=athlon -O3 -fomit-frame-pointer avx2
gcc -m32 -march=athlon -O -fomit-frame-pointer avx2
gcc -m32 -march=athlon -Os -fomit-frame-pointer avx2
gcc -m32 -march=core2 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -O -fomit-frame-pointer avx2
gcc -m32 -march=core2 -Os -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4.1 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4.1 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4.1 -O -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4.1 -Os -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4 -O -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4 -Os -fomit-frame-pointer avx2
gcc -m32 -march=corei7 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=corei7 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=corei7 -O -fomit-frame-pointer avx2
gcc -m32 -march=corei7 -Os -fomit-frame-pointer avx2
gcc -m32 -march=i386 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=i386 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=i386 -O -fomit-frame-pointer avx2
gcc -m32 -march=i386 -Os -fomit-frame-pointer avx2
gcc -m32 -march=i486 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=i486 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=i486 -O -fomit-frame-pointer avx2
gcc -m32 -march=i486 -Os -fomit-frame-pointer avx2
gcc -m32 -march=k6-2 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=k6-2 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=k6-2 -O -fomit-frame-pointer avx2
gcc -m32 -march=k6-2 -Os -fomit-frame-pointer avx2
gcc -m32 -march=k6-3 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=k6-3 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=k6-3 -O -fomit-frame-pointer avx2
gcc -m32 -march=k6-3 -Os -fomit-frame-pointer avx2
gcc -m32 -march=k6 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=k6 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=k6 -O -fomit-frame-pointer avx2
gcc -m32 -march=k6 -Os -fomit-frame-pointer avx2
gcc -m32 -march=k8 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=k8 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=k8 -O -fomit-frame-pointer avx2
gcc -m32 -march=k8 -Os -fomit-frame-pointer avx2
gcc -m32 -march=nocona -O2 -fomit-frame-pointer avx2
gcc -m32 -march=nocona -O3 -fomit-frame-pointer avx2
gcc -m32 -march=nocona -O -fomit-frame-pointer avx2
gcc -m32 -march=nocona -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium-m -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium-m -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium-m -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium-m -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium-mmx -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium-mmx -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium-mmx -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium-mmx -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium2 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium2 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium2 -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium2 -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium3 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium3 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium3 -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium3 -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium4 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium4 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium4 -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium4 -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentiumpro -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentiumpro -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentiumpro -O -fomit-frame-pointer avx2
gcc -m32 -march=pentiumpro -Os -fomit-frame-pointer avx2
gcc -m32 -march=prescott -O2 -fomit-frame-pointer avx2
gcc -m32 -march=prescott -O3 -fomit-frame-pointer avx2
gcc -m32 -march=prescott -O -fomit-frame-pointer avx2
gcc -m32 -march=prescott -Os -fomit-frame-pointer avx2

Compiler output

Implementation: crypto_sign/dilithium4/avx2
Compiler: gcc -m32 -march=barcelona -O2 -fomit-frame-pointer
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c: In function 'KeccakP1600times4_AddLanesAll':
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:135:40: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
KeccakP-1600-times4-SIMD256.c: #define Xor_In4( argIndex ) lanes0 = LOAD256u( curData0[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 0 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:143:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+3], lanes3 )
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:149:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 12 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ...
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c: In function 'KeccakP1600times4_AddLanesAll':
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:135:40: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
KeccakP-1600-times4-SIMD256.c: #define Xor_In4( argIndex ) lanes0 = LOAD256u( curData0[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 0 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:143:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+3], lanes3 )
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:149:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 12 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -m32 -march=barcelona -O2 -fomit-frame-pointer avx2
gcc -m32 -march=barcelona -O3 -fomit-frame-pointer avx2
gcc -m32 -march=barcelona -O -fomit-frame-pointer avx2
gcc -m32 -march=barcelona -Os -fomit-frame-pointer avx2

Compiler output

Implementation: crypto_sign/dilithium4/avx2
Compiler: gcc -m32 -march=core-avx-i -O2 -fomit-frame-pointer
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c: In function 'KeccakP1600times4_AddLanesAll':
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:143:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+3], lanes3 )
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:149:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 12 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:142:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+2], lanes2 ),\
KeccakP-1600-times4-SIMD256.c: ...

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
gcc -m32 -march=core-avx-i -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core-avx-i -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core-avx-i -O -fomit-frame-pointer avx2
gcc -m32 -march=core-avx-i -Os -fomit-frame-pointer avx2
gcc -m32 -march=corei7-avx -O2 -fomit-frame-pointer avx2
gcc -m32 -march=corei7-avx -O3 -fomit-frame-pointer avx2
gcc -m32 -march=corei7-avx -O -fomit-frame-pointer avx2
gcc -m32 -march=corei7-avx -Os -fomit-frame-pointer avx2

Compiler output

Implementation: crypto_sign/dilithium4/avx2
Compiler: gcc -m32 -march=core-avx2 -O2 -fomit-frame-pointer
invntt.s: invntt.s: Assembler messages:
invntt.s: invntt.s:47: Error: bad register name `%rip)'
invntt.s: invntt.s:48: Error: bad register name `%rip)'
invntt.s: invntt.s:49: Error: bad register name `%rip)'
invntt.s: invntt.s:52: Error: bad register name `%rsi)'
invntt.s: invntt.s:53: Error: bad register name `%rsi)'
invntt.s: invntt.s:54: Error: bad register name `%rsi)'
invntt.s: invntt.s:55: Error: bad register name `%rsi)'
invntt.s: invntt.s:58: Error: bad register name `%ymm8'
invntt.s: invntt.s:59: Error: bad register name `%ymm10'
invntt.s: invntt.s:59: Error: bad register name `%ymm10'
invntt.s: invntt.s:61: Error: bad register name `%ymm8'
invntt.s: invntt.s:61: Error: bad register name `%ymm8'
invntt.s: invntt.s:62: Error: bad register name `%ymm10'
invntt.s: invntt.s:62: Error: bad register name `%ymm10'
invntt.s: invntt.s:66: Error: bad register name `%ymm8'
invntt.s: invntt.s:67: Error: bad register name `%ymm10'
invntt.s: invntt.s:70: Error: bad register name `%rdx)'
invntt.s: invntt.s:71: Error: bad register name `%rdx)'
invntt.s: invntt.s:72: Error: bad register name `%ymm12'
invntt.s: invntt.s:73: Error: bad register name `%ymm13'
invntt.s: invntt.s:74: Error: bad register name `%ymm8'
invntt.s: invntt.s:76: Error: bad register name `%ymm12'
invntt.s: invntt.s:77: Error: bad register name `%ymm13'
invntt.s: invntt.s:78: Error: bad register name `%ymm9'
invntt.s: ...

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
gcc -m32 -march=core-avx2 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core-avx2 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core-avx2 -O -fomit-frame-pointer avx2
gcc -m32 -march=core-avx2 -Os -fomit-frame-pointer avx2
gcc -m32 -march=native -mtune=native -O2 -fomit-frame-pointer avx2
gcc -m32 -march=native -mtune=native -O3 -fomit-frame-pointer avx2
gcc -m32 -march=native -mtune=native -O -fomit-frame-pointer avx2
gcc -m32 -march=native -mtune=native -Os -fomit-frame-pointer avx2