Implementation notes: x86, titan0, crypto_sign/dilithium4

Computer: titan0
Architecture: x86
CPU ID: GenuineIntel-000306c3-bfebfbff
SUPERCOP version: 20190803
Operation: crypto_sign
Primitive: dilithium4
TimeImplementationCompilerBenchmark dateSUPERCOP version
4850792refgcc -m32 -march=corei7 -O3 -fomit-frame-pointer2019080520190803
4881272refgcc -funroll-loops -m32 -march=pentium4 -O3 -fomit-frame-pointer2019080520190803
4883672refgcc -m32 -march=core2 -O3 -fomit-frame-pointer2019080520190803
4884532refgcc -m32 -march=core2 -msse4.1 -O3 -fomit-frame-pointer2019080520190803
4897972refgcc -m32 -march=core2 -msse4 -O3 -fomit-frame-pointer2019080520190803
4968764refgcc -funroll-loops -m32 -march=pentium4 -O2 -fomit-frame-pointer2019080520190803
4982692refgcc -m32 -march=core-avx2 -O3 -fomit-frame-pointer2019080520190803
5060052refgcc -m32 -march=pentium4 -O3 -fomit-frame-pointer2019080520190803
5066624refgcc -funroll-loops -m32 -march=pentium4 -O -fomit-frame-pointer2019080520190803
5078592refgcc -m32 -march=native -mtune=native -O3 -fomit-frame-pointer2019080520190803
5132176refgcc -funroll-loops -m32 -march=pentium-m -O -fomit-frame-pointer2019080520190803
5145644refgcc -funroll-loops -m32 -march=k8 -O -fomit-frame-pointer2019080520190803
5146176refgcc -funroll-loops -m32 -march=athlon -O3 -fomit-frame-pointer2019080520190803
5183496refgcc -funroll-loops -m32 -march=barcelona -O -fomit-frame-pointer2019080520190803
5204708refgcc -m32 -march=athlon -O3 -fomit-frame-pointer2019080520190803
5228372refgcc -m32 -march=core-avx-i -O3 -fomit-frame-pointer2019080520190803
5232836refgcc -m32 -march=core-avx2 -O2 -fomit-frame-pointer2019080520190803
5235260refgcc -m32 -march=native -mtune=native -O2 -fomit-frame-pointer2019080520190803
5264244refgcc -m32 -march=corei7-avx -O3 -fomit-frame-pointer2019080520190803
5274180refgcc -funroll-loops -m32 -march=athlon -O2 -fomit-frame-pointer2019080520190803
5314536refgcc -funroll-loops -m32 -O3 -fomit-frame-pointer2019080520190803
5335048refgcc -m32 -march=corei7-avx -O2 -fomit-frame-pointer2019080520190803
5348052refgcc -m32 -march=core2 -msse4.1 -O2 -fomit-frame-pointer2019080520190803
5356628refgcc -m32 -march=corei7 -O2 -fomit-frame-pointer2019080520190803
5363972refgcc -m32 -march=core2 -msse4 -O2 -fomit-frame-pointer2019080520190803
5367104refgcc -m32 -march=core-avx-i -O2 -fomit-frame-pointer2019080520190803
5407808refgcc -m32 -march=k6-3 -O3 -fomit-frame-pointer2019080520190803
5421132refgcc -funroll-loops -m32 -march=k6 -O3 -fomit-frame-pointer2019080520190803
5425940refgcc -m32 -march=k6-2 -O3 -fomit-frame-pointer2019080520190803
5442432refgcc -m32 -march=core-avx2 -Os -fomit-frame-pointer2019080520190803
5443008refgcc -funroll-loops -m32 -march=prescott -O2 -fomit-frame-pointer2019080520190803
5451892refgcc -funroll-loops -m32 -march=nocona -O2 -fomit-frame-pointer2019080520190803
5453316refgcc -m32 -march=native -mtune=native -Os -fomit-frame-pointer2019080520190803
5453552refgcc -funroll-loops -m32 -march=pentium2 -O -fomit-frame-pointer2019080520190803
5458820refgcc -funroll-loops -m32 -march=pentiumpro -O -fomit-frame-pointer2019080520190803
5459668refgcc -funroll-loops -m32 -O2 -fomit-frame-pointer2019080520190803
5459828refgcc -funroll-loops -m32 -march=pentium3 -O -fomit-frame-pointer2019080520190803
5463044refgcc -funroll-loops -m32 -march=athlon -O -fomit-frame-pointer2019080520190803
5463648refgcc -funroll-loops -m32 -march=k6-2 -O3 -fomit-frame-pointer2019080520190803
5469244refgcc -m32 -march=pentium4 -O2 -fomit-frame-pointer2019080520190803
5469956refgcc -funroll-loops -m32 -march=pentiumpro -O3 -fomit-frame-pointer2019080520190803
5474308refgcc -funroll-loops -m32 -march=pentium2 -O3 -fomit-frame-pointer2019080520190803
5477496refgcc -funroll-loops -m32 -march=prescott -O3 -fomit-frame-pointer2019080520190803
5481388refgcc -m32 -march=core2 -O2 -fomit-frame-pointer2019080520190803
5486384refgcc -m32 -O3 -fomit-frame-pointer2019080520190803
5488640refgcc -funroll-loops -m32 -march=k6-3 -O3 -fomit-frame-pointer2019080520190803
5489460refgcc -funroll-loops -m32 -march=pentium3 -O3 -fomit-frame-pointer2019080520190803
5494872refgcc -funroll-loops -m32 -march=k6-3 -O2 -fomit-frame-pointer2019080520190803
5501028refgcc -m32 -march=corei7-avx -Os -fomit-frame-pointer2019080520190803
5515744refgcc -funroll-loops -m32 -march=k6 -O2 -fomit-frame-pointer2019080520190803
5522204refgcc -funroll-loops -m32 -march=k6-2 -O2 -fomit-frame-pointer2019080520190803
5528224refgcc -funroll-loops -m32 -march=pentiumpro -O2 -fomit-frame-pointer2019080520190803
5529436refgcc -funroll-loops -m32 -march=pentium3 -O2 -fomit-frame-pointer2019080520190803
5530200refgcc -funroll-loops -m32 -march=nocona -O3 -fomit-frame-pointer2019080520190803
5538180refgcc -m32 -march=nocona -O3 -fomit-frame-pointer2019080520190803
5542220refgcc -m32 -march=prescott -O3 -fomit-frame-pointer2019080520190803
5543904refgcc -funroll-loops -m32 -march=pentium2 -O2 -fomit-frame-pointer2019080520190803
5548780refgcc -m32 -march=core-avx-i -Os -fomit-frame-pointer2019080520190803
5574748refgcc -m32 -march=pentiumpro -O3 -fomit-frame-pointer2019080520190803
5577684refgcc -m32 -march=pentium3 -O3 -fomit-frame-pointer2019080520190803
5586720refgcc -m32 -march=pentium2 -O3 -fomit-frame-pointer2019080520190803
5596540refgcc -funroll-loops -m32 -O -fomit-frame-pointer2019080520190803
5623688refgcc -m32 -march=pentium-m -O -fomit-frame-pointer2019080520190803
5627176refgcc -funroll-loops -m32 -march=nocona -O -fomit-frame-pointer2019080520190803
5633180refgcc -m32 -march=pentium4 -O -fomit-frame-pointer2019080520190803
5639324refgcc -funroll-loops -m32 -march=prescott -O -fomit-frame-pointer2019080520190803
5646636refgcc -m32 -march=native -mtune=native -O -fomit-frame-pointer2019080520190803
5657092refgcc -funroll-loops -m32 -march=k6-3 -O -fomit-frame-pointer2019080520190803
5672504refgcc -m32 -march=core-avx2 -O -fomit-frame-pointer2019080520190803
5681588refgcc -m32 -march=k6 -O3 -fomit-frame-pointer2019080520190803
5684108refgcc -m32 -march=k8 -O -fomit-frame-pointer2019080520190803
5686236refgcc -funroll-loops -m32 -march=k6-2 -O -fomit-frame-pointer2019080520190803
5698884refgcc -funroll-loops -m32 -march=k6 -O -fomit-frame-pointer2019080520190803
5870432refgcc -m32 -march=pentium3 -O -fomit-frame-pointer2019080520190803
5887424refgcc -m32 -march=pentiumpro -O -fomit-frame-pointer2019080520190803
5895536refgcc -m32 -march=pentium2 -O -fomit-frame-pointer2019080520190803
5909404refgcc -m32 -march=core-avx-i -O -fomit-frame-pointer2019080520190803
5911428refgcc -m32 -march=core2 -O -fomit-frame-pointer2019080520190803
5911492refgcc -m32 -march=corei7-avx -O -fomit-frame-pointer2019080520190803
5914100refgcc -m32 -march=corei7 -O -fomit-frame-pointer2019080520190803
5930828refgcc -m32 -march=core2 -msse4.1 -O -fomit-frame-pointer2019080520190803
5957596refgcc -m32 -march=core2 -msse4 -O -fomit-frame-pointer2019080520190803
5961328refgcc -m32 -march=athlon -O2 -fomit-frame-pointer2019080520190803
5964812refgcc -m32 -march=barcelona -O -fomit-frame-pointer2019080520190803
5993096refgcc -m32 -march=athlon -O -fomit-frame-pointer2019080520190803
6015440refgcc -m32 -march=pentiumpro -O2 -fomit-frame-pointer2019080520190803
6021412refgcc -m32 -march=prescott -O2 -fomit-frame-pointer2019080520190803
6023200refgcc -m32 -march=nocona -O2 -fomit-frame-pointer2019080520190803
6037484refgcc -m32 -march=pentium3 -O2 -fomit-frame-pointer2019080520190803
6042044refgcc -funroll-loops -m32 -march=k6 -Os -fomit-frame-pointer2019080520190803
6049432refgcc -funroll-loops -m32 -march=k6-2 -Os -fomit-frame-pointer2019080520190803
6061568refgcc -m32 -march=pentium2 -O2 -fomit-frame-pointer2019080520190803
6078468refgcc -funroll-loops -m32 -march=k6-3 -Os -fomit-frame-pointer2019080520190803
6113052refgcc -m32 -march=k6 -Os -fomit-frame-pointer2019080520190803
6113144refgcc -m32 -march=core2 -msse4 -Os -fomit-frame-pointer2019080520190803
6115664refgcc -m32 -march=k6-2 -Os -fomit-frame-pointer2019080520190803
6116108refgcc -m32 -march=corei7 -Os -fomit-frame-pointer2019080520190803
6116140refgcc -m32 -march=k6-3 -Os -fomit-frame-pointer2019080520190803
6126112refgcc -m32 -march=core2 -Os -fomit-frame-pointer2019080520190803
6128100refgcc -m32 -march=core2 -msse4.1 -Os -fomit-frame-pointer2019080520190803
6137100refgcc -m32 -march=pentium -Os -fomit-frame-pointer2019080520190803
6150160refgcc -funroll-loops -m32 -march=athlon -Os -fomit-frame-pointer2019080520190803
6150972refgcc -m32 -march=k6-3 -O2 -fomit-frame-pointer2019080520190803
6162224refgcc -m32 -march=pentium-mmx -Os -fomit-frame-pointer2019080520190803
6165532refgcc -funroll-loops -m32 -Os -fomit-frame-pointer2019080520190803
6166940refgcc -m32 -march=pentium4 -Os -fomit-frame-pointer2019080520190803
6170076refgcc -m32 -march=prescott -Os -fomit-frame-pointer2019080520190803
6172540refgcc -m32 -march=k6 -O2 -fomit-frame-pointer2019080520190803
6173904refgcc -m32 -march=nocona -Os -fomit-frame-pointer2019080520190803
6177520refgcc -funroll-loops -m32 -march=pentium -Os -fomit-frame-pointer2019080520190803
6178480refgcc -funroll-loops -m32 -march=pentium-m -Os -fomit-frame-pointer2019080520190803
6183008refgcc -funroll-loops -m32 -march=pentium-mmx -Os -fomit-frame-pointer2019080520190803
6184680refgcc -m32 -march=i386 -Os -fomit-frame-pointer2019080520190803
6193724refgcc -m32 -O -fomit-frame-pointer2019080520190803
6199940refgcc -m32 -march=i486 -Os -fomit-frame-pointer2019080520190803
6224688refgcc -funroll-loops -m32 -march=pentium4 -Os -fomit-frame-pointer2019080520190803
6225628refgcc -m32 -O2 -fomit-frame-pointer2019080520190803
6229004refgcc -funroll-loops -m32 -march=nocona -Os -fomit-frame-pointer2019080520190803
6239404refgcc -m32 -march=k6-2 -O -fomit-frame-pointer2019080520190803
6239692refgcc -funroll-loops -m32 -march=prescott -Os -fomit-frame-pointer2019080520190803
6242396refgcc -m32 -march=nocona -O -fomit-frame-pointer2019080520190803
6244736refgcc -m32 -march=k6 -O -fomit-frame-pointer2019080520190803
6244788refgcc -m32 -march=prescott -O -fomit-frame-pointer2019080520190803
6248576refgcc -m32 -march=pentium-m -Os -fomit-frame-pointer2019080520190803
6252584refgcc -m32 -march=k6-2 -O2 -fomit-frame-pointer2019080520190803
6257120refgcc -funroll-loops -m32 -march=i386 -Os -fomit-frame-pointer2019080520190803
6268592refgcc -m32 -Os -fomit-frame-pointer2019080520190803
6268740refgcc -m32 -march=pentium2 -Os -fomit-frame-pointer2019080520190803
6269416refgcc -m32 -march=pentiumpro -Os -fomit-frame-pointer2019080520190803
6271444refgcc -funroll-loops -m32 -march=pentiumpro -Os -fomit-frame-pointer2019080520190803
6281824refgcc -m32 -march=pentium3 -Os -fomit-frame-pointer2019080520190803
6283276refgcc -funroll-loops -m32 -march=i486 -Os -fomit-frame-pointer2019080520190803
6288480refgcc -funroll-loops -m32 -march=pentium2 -Os -fomit-frame-pointer2019080520190803
6295676refgcc -m32 -march=k6-3 -O -fomit-frame-pointer2019080520190803
6296288refgcc -funroll-loops -m32 -march=i486 -O2 -fomit-frame-pointer2019080520190803
6298056refgcc -funroll-loops -m32 -march=pentium3 -Os -fomit-frame-pointer2019080520190803
6307024refgcc -m32 -march=athlon -Os -fomit-frame-pointer2019080520190803
6307924refgcc -funroll-loops -m32 -march=i486 -O3 -fomit-frame-pointer2019080520190803
6434680refgcc -funroll-loops -m32 -march=pentium-mmx -O2 -fomit-frame-pointer2019080520190803
6446964refgcc -funroll-loops -m32 -march=pentium-mmx -O3 -fomit-frame-pointer2019080520190803
6456712refgcc -funroll-loops -m32 -march=pentium -O2 -fomit-frame-pointer2019080520190803
6475340refgcc -funroll-loops -m32 -march=pentium -O3 -fomit-frame-pointer2019080520190803
6538004refgcc -funroll-loops -m32 -march=pentium -O -fomit-frame-pointer2019080520190803
6545712refgcc -funroll-loops -m32 -march=pentium-mmx -O -fomit-frame-pointer2019080520190803
6605940refgcc -m32 -march=i486 -O3 -fomit-frame-pointer2019080520190803
6619380refgcc -funroll-loops -m32 -march=i486 -O -fomit-frame-pointer2019080520190803
6756288refgcc -m32 -march=pentium -O3 -fomit-frame-pointer2019080520190803
6817360refgcc -funroll-loops -m32 -march=i386 -O -fomit-frame-pointer2019080520190803
6825280refgcc -m32 -march=pentium-mmx -O3 -fomit-frame-pointer2019080520190803
6884728refgcc -m32 -march=i486 -O2 -fomit-frame-pointer2019080520190803
6998240refgcc -funroll-loops -m32 -march=i386 -O3 -fomit-frame-pointer2019080520190803
7000852refgcc -m32 -march=i486 -O -fomit-frame-pointer2019080520190803
7033080refgcc -funroll-loops -m32 -march=i386 -O2 -fomit-frame-pointer2019080520190803
7155500refgcc -m32 -march=pentium-mmx -O -fomit-frame-pointer2019080520190803
7171152refgcc -m32 -march=pentium -O -fomit-frame-pointer2019080520190803
7185348refgcc -m32 -march=i386 -O -fomit-frame-pointer2019080520190803
7222600refgcc -m32 -march=i386 -O3 -fomit-frame-pointer2019080520190803
7241212refgcc -m32 -march=pentium -O2 -fomit-frame-pointer2019080520190803
7274556refgcc -m32 -march=pentium-mmx -O2 -fomit-frame-pointer2019080520190803
7600520refgcc -m32 -march=i386 -O2 -fomit-frame-pointer2019080520190803
8145060refgcc -funroll-loops -m32 -march=pentium-m -O3 -fomit-frame-pointer2019080520190803
8168464refgcc -funroll-loops -m32 -march=pentium-m -O2 -fomit-frame-pointer2019080520190803
8200244refgcc -m32 -march=pentium-m -O3 -fomit-frame-pointer2019080520190803
8503184refgcc -m32 -march=pentium-m -O2 -fomit-frame-pointer2019080520190803
9426952refgcc -funroll-loops -m32 -march=barcelona -O3 -fomit-frame-pointer2019080520190803
9461628refgcc -funroll-loops -m32 -march=barcelona -O2 -fomit-frame-pointer2019080520190803
9550588refgcc -m32 -march=barcelona -O3 -fomit-frame-pointer2019080520190803
9645260refgcc -funroll-loops -m32 -march=k8 -O2 -fomit-frame-pointer2019080520190803
9936516refgcc -funroll-loops -m32 -march=k8 -O3 -fomit-frame-pointer2019080520190803
10034304refgcc -funroll-loops -m32 -march=barcelona -Os -fomit-frame-pointer2019080520190803
10045292refgcc -m32 -march=k8 -O3 -fomit-frame-pointer2019080520190803
10115792refgcc -m32 -march=barcelona -Os -fomit-frame-pointer2019080520190803
10200652refgcc -m32 -march=barcelona -O2 -fomit-frame-pointer2019080520190803
10239724refgcc -funroll-loops -m32 -march=k8 -Os -fomit-frame-pointer2019080520190803
10275744refgcc -m32 -march=k8 -O2 -fomit-frame-pointer2019080520190803
10369232refgcc -m32 -march=k8 -Os -fomit-frame-pointer2019080520190803

Compiler output

Implementation: crypto_sign/dilithium4/avx2
Compiler: gcc -funroll-loops -m32 -O2 -fomit-frame-pointer
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c: In function 'KeccakP1600times4_AddLanesAll':
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:135:40: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
KeccakP-1600-times4-SIMD256.c: #define Xor_In4( argIndex ) lanes0 = LOAD256u( curData0[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 0 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:143:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+3], lanes3 )
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:149:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 12 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ...

Number of similar (compiler,implementation) pairs: 156, namely:
CompilerImplementations
gcc -funroll-loops -m32 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=athlon -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=athlon -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=athlon -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=athlon -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=barcelona -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=barcelona -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=barcelona -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=barcelona -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i386 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i386 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i386 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i386 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i486 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i486 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i486 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=i486 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-2 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-2 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-2 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-2 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-3 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-3 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-3 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6-3 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k6 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k8 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k8 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k8 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=k8 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=nocona -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=nocona -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=nocona -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=nocona -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-m -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-m -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-m -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-m -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-mmx -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-mmx -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-mmx -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium-mmx -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium2 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium2 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium2 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium2 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium3 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium3 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium3 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium3 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium4 -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium4 -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium4 -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium4 -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentium -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentiumpro -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentiumpro -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentiumpro -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=pentiumpro -Os -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=prescott -O2 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=prescott -O3 -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=prescott -O -fomit-frame-pointer avx2
gcc -funroll-loops -m32 -march=prescott -Os -fomit-frame-pointer avx2
gcc -m32 -O2 -fomit-frame-pointer avx2
gcc -m32 -O3 -fomit-frame-pointer avx2
gcc -m32 -O -fomit-frame-pointer avx2
gcc -m32 -Os -fomit-frame-pointer avx2
gcc -m32 -march=athlon -O2 -fomit-frame-pointer avx2
gcc -m32 -march=athlon -O3 -fomit-frame-pointer avx2
gcc -m32 -march=athlon -O -fomit-frame-pointer avx2
gcc -m32 -march=athlon -Os -fomit-frame-pointer avx2
gcc -m32 -march=core2 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -O -fomit-frame-pointer avx2
gcc -m32 -march=core2 -Os -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4.1 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4.1 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4.1 -O -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4.1 -Os -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4 -O -fomit-frame-pointer avx2
gcc -m32 -march=core2 -msse4 -Os -fomit-frame-pointer avx2
gcc -m32 -march=corei7 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=corei7 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=corei7 -O -fomit-frame-pointer avx2
gcc -m32 -march=corei7 -Os -fomit-frame-pointer avx2
gcc -m32 -march=i386 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=i386 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=i386 -O -fomit-frame-pointer avx2
gcc -m32 -march=i386 -Os -fomit-frame-pointer avx2
gcc -m32 -march=i486 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=i486 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=i486 -O -fomit-frame-pointer avx2
gcc -m32 -march=i486 -Os -fomit-frame-pointer avx2
gcc -m32 -march=k6-2 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=k6-2 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=k6-2 -O -fomit-frame-pointer avx2
gcc -m32 -march=k6-2 -Os -fomit-frame-pointer avx2
gcc -m32 -march=k6-3 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=k6-3 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=k6-3 -O -fomit-frame-pointer avx2
gcc -m32 -march=k6-3 -Os -fomit-frame-pointer avx2
gcc -m32 -march=k6 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=k6 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=k6 -O -fomit-frame-pointer avx2
gcc -m32 -march=k6 -Os -fomit-frame-pointer avx2
gcc -m32 -march=k8 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=k8 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=k8 -O -fomit-frame-pointer avx2
gcc -m32 -march=k8 -Os -fomit-frame-pointer avx2
gcc -m32 -march=nocona -O2 -fomit-frame-pointer avx2
gcc -m32 -march=nocona -O3 -fomit-frame-pointer avx2
gcc -m32 -march=nocona -O -fomit-frame-pointer avx2
gcc -m32 -march=nocona -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium-m -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium-m -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium-m -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium-m -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium-mmx -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium-mmx -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium-mmx -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium-mmx -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium2 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium2 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium2 -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium2 -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium3 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium3 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium3 -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium3 -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium4 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium4 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium4 -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium4 -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentium -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentium -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentium -O -fomit-frame-pointer avx2
gcc -m32 -march=pentium -Os -fomit-frame-pointer avx2
gcc -m32 -march=pentiumpro -O2 -fomit-frame-pointer avx2
gcc -m32 -march=pentiumpro -O3 -fomit-frame-pointer avx2
gcc -m32 -march=pentiumpro -O -fomit-frame-pointer avx2
gcc -m32 -march=pentiumpro -Os -fomit-frame-pointer avx2
gcc -m32 -march=prescott -O2 -fomit-frame-pointer avx2
gcc -m32 -march=prescott -O3 -fomit-frame-pointer avx2
gcc -m32 -march=prescott -O -fomit-frame-pointer avx2
gcc -m32 -march=prescott -Os -fomit-frame-pointer avx2

Compiler output

Implementation: crypto_sign/dilithium4/avx2
Compiler: gcc -m32 -march=barcelona -O2 -fomit-frame-pointer
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c: In function 'KeccakP1600times4_AddLanesAll':
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:135:40: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
KeccakP-1600-times4-SIMD256.c: #define Xor_In4( argIndex ) lanes0 = LOAD256u( curData0[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 0 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:143:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+3], lanes3 )
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:149:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 12 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ...
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c: In function 'KeccakP1600times4_AddLanesAll':
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:135:40: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
KeccakP-1600-times4-SIMD256.c: #define Xor_In4( argIndex ) lanes0 = LOAD256u( curData0[argIndex]),\
KeccakP-1600-times4-SIMD256.c: ^
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:146:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 0 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:143:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+3], lanes3 )
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:149:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 12 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -m32 -march=barcelona -O2 -fomit-frame-pointer avx2
gcc -m32 -march=barcelona -O3 -fomit-frame-pointer avx2
gcc -m32 -march=barcelona -O -fomit-frame-pointer avx2
gcc -m32 -march=barcelona -Os -fomit-frame-pointer avx2

Compiler output

Implementation: crypto_sign/dilithium4/avx2
Compiler: gcc -m32 -march=core-avx-i -O2 -fomit-frame-pointer
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c: In function 'KeccakP1600times4_AddLanesAll':
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:143:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+3], lanes3 )
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:149:9: note: in expansion of macro 'Xor_In4'
KeccakP-1600-times4-SIMD256.c: Xor_In4( 12 );
KeccakP-1600-times4-SIMD256.c: ^~~~~~~
KeccakP-1600-times4-SIMD256.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
KeccakP-1600-times4-SIMD256.c: from KeccakP-1600-times4-SIMD256.c:21:
KeccakP-1600-times4-SIMD256.c: /usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline '_mm256_xor_si256': target specific option mismatch
KeccakP-1600-times4-SIMD256.c: _mm256_xor_si256 (__m256i __A, __m256i __B)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:55:41: note: called from here
KeccakP-1600-times4-SIMD256.c: #define XOReq256(a, b) a = _mm256_xor_si256(a, b)
KeccakP-1600-times4-SIMD256.c: ^~~~~~~~~~~~~~~~~~~~~~
KeccakP-1600-times4-SIMD256.c: KeccakP-1600-times4-SIMD256.c:142:33: note: in expansion of macro 'XOReq256'
KeccakP-1600-times4-SIMD256.c: XOReq256( stateAsLanes[argIndex+2], lanes2 ),\
KeccakP-1600-times4-SIMD256.c: ...

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
gcc -m32 -march=core-avx-i -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core-avx-i -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core-avx-i -O -fomit-frame-pointer avx2
gcc -m32 -march=core-avx-i -Os -fomit-frame-pointer avx2
gcc -m32 -march=corei7-avx -O2 -fomit-frame-pointer avx2
gcc -m32 -march=corei7-avx -O3 -fomit-frame-pointer avx2
gcc -m32 -march=corei7-avx -O -fomit-frame-pointer avx2
gcc -m32 -march=corei7-avx -Os -fomit-frame-pointer avx2

Compiler output

Implementation: crypto_sign/dilithium4/avx2
Compiler: gcc -m32 -march=core-avx2 -O2 -fomit-frame-pointer
invntt.s: invntt.s: Assembler messages:
invntt.s: invntt.s:47: Error: bad register name `%rip)'
invntt.s: invntt.s:48: Error: bad register name `%rip)'
invntt.s: invntt.s:49: Error: bad register name `%rip)'
invntt.s: invntt.s:52: Error: bad register name `%rsi)'
invntt.s: invntt.s:53: Error: bad register name `%rsi)'
invntt.s: invntt.s:54: Error: bad register name `%rsi)'
invntt.s: invntt.s:55: Error: bad register name `%rsi)'
invntt.s: invntt.s:58: Error: bad register name `%ymm8'
invntt.s: invntt.s:59: Error: bad register name `%ymm10'
invntt.s: invntt.s:59: Error: bad register name `%ymm10'
invntt.s: invntt.s:61: Error: bad register name `%ymm8'
invntt.s: invntt.s:61: Error: bad register name `%ymm8'
invntt.s: invntt.s:62: Error: bad register name `%ymm10'
invntt.s: invntt.s:62: Error: bad register name `%ymm10'
invntt.s: invntt.s:66: Error: bad register name `%ymm8'
invntt.s: invntt.s:67: Error: bad register name `%ymm10'
invntt.s: invntt.s:70: Error: bad register name `%rdx)'
invntt.s: invntt.s:71: Error: bad register name `%rdx)'
invntt.s: invntt.s:72: Error: bad register name `%ymm12'
invntt.s: invntt.s:73: Error: bad register name `%ymm13'
invntt.s: invntt.s:74: Error: bad register name `%ymm8'
invntt.s: invntt.s:76: Error: bad register name `%ymm12'
invntt.s: invntt.s:77: Error: bad register name `%ymm13'
invntt.s: invntt.s:78: Error: bad register name `%ymm9'
invntt.s: ...

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
gcc -m32 -march=core-avx2 -O2 -fomit-frame-pointer avx2
gcc -m32 -march=core-avx2 -O3 -fomit-frame-pointer avx2
gcc -m32 -march=core-avx2 -O -fomit-frame-pointer avx2
gcc -m32 -march=core-avx2 -Os -fomit-frame-pointer avx2
gcc -m32 -march=native -mtune=native -O2 -fomit-frame-pointer avx2
gcc -m32 -march=native -mtune=native -O3 -fomit-frame-pointer avx2
gcc -m32 -march=native -mtune=native -O -fomit-frame-pointer avx2
gcc -m32 -march=native -mtune=native -Os -fomit-frame-pointer avx2