Implementation notes: amd64, par, crypto_stream/chacha12

Computer: par
Architecture: amd64
CPU ID: GenuineIntel-000406c3-bfebfbff
SUPERCOP version: 20161026
Operation: crypto_stream
Primitive: chacha12
TimeImplementationCompilerBenchmark dateSUPERCOP version
5320moon/sse2/64gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
5320moon/sse2/64gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
5320moon/sse2/64gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
5320moon/sse2/64gcc -march=native -mcpu=native -Os2016121420161026
5340moon/sse2/64gcc -march=native -mcpu=native -O22016121420161026
5340moon/sse2/64gcc -march=native -mcpu=native -O32016121420161026
7200krovetz/vec128gcc -march=native -mcpu=native -Os2016121420161026
7220krovetz/vec128gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
7620e/amd64-xmm6gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
7620e/amd64-xmm6gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
7620e/amd64-xmm6gcc -march=native -mcpu=native -O22016121420161026
7620e/amd64-xmm6gcc -march=native -mcpu=native -O32016121420161026
7640e/amd64-xmm6gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
7640krovetz/vec128gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
7640e/amd64-xmm6gcc -march=native -mcpu=native -Os2016121420161026
7680krovetz/vec128gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
7680dolbeau/amd64-avx2gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
7680krovetz/vec128gcc -march=native -mcpu=native -O22016121420161026
7680krovetz/vec128gcc -march=native -mcpu=native -O32016121420161026
7940dolbeau/amd64-avx2gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
7960dolbeau/amd64-avx2gcc -march=native -mcpu=native -Os2016121420161026
8520moon/ssse3/64gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
8520moon/ssse3/64gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
8520moon/ssse3/64gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
8520moon/ssse3/64gcc -march=native -mcpu=native -O22016121420161026
8520moon/ssse3/64gcc -march=native -mcpu=native -O32016121420161026
8520moon/ssse3/64gcc -march=native -mcpu=native -Os2016121420161026
8600amd64-ssse3gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
8600amd64-ssse3gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
8600amd64-ssse3gcc -march=native -mcpu=native -O22016121420161026
8600amd64-ssse3gcc -march=native -mcpu=native -O32016121420161026
8620amd64-ssse3gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
8620amd64-ssse3gcc -march=native -mcpu=native -Os2016121420161026
9020dolbeau/amd64-avx2gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
9100dolbeau/amd64-avx2gcc -march=native -mcpu=native -O32016121420161026
9120dolbeau/amd64-avx2gcc -march=native -mcpu=native -O22016121420161026
11720e/amd64-3gcc -march=native -mcpu=native -O22016121420161026
11720e/amd64-3gcc -march=native -mcpu=native -Os2016121420161026
11740e/amd64-3gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
11740e/amd64-3gcc -march=native -mcpu=native -O32016121420161026
11760e/amd64-3gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
11780e/amd64-3gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
13280e/mergedgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
13840e/mergedgcc -march=native -mcpu=native -O32016121420161026
14080e/mergedgcc -march=native -mcpu=native -O22016121420161026
14160e/regsgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
14420e/refgcc -march=native -mcpu=native -O32016121420161026
14500e/refgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
14620e/regsgcc -march=native -mcpu=native -O32016121420161026
14700e/mergedgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
14780e/mergedgcc -march=native -mcpu=native -Os2016121420161026
15320e/mergedgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
17320e/regsgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
17540e/refgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
17780e/regsgcc -march=native -mcpu=native -O22016121420161026
23120e/regsgcc -march=native -mcpu=native -Os2016121420161026
23420e/regsgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
24400e/refgcc -march=native -mcpu=native -O22016121420161026
27420e/refgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
31680e/refgcc -march=native -mcpu=native -Os2016121420161026

Test failure

Implementation: crypto_stream/chacha12/moon/avx/64
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
error 111

Number of similar (compiler,implementation) pairs: 18, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -funroll-loops -march=native -mcpu=native -O3 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -funroll-loops -march=native -mcpu=native -Os moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mcpu=native -O2 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mcpu=native -O3 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mcpu=native -Os moon/avx/64 moon/avx2/64 moon/xop/64

Compiler output

Implementation: crypto_stream/chacha12/dolbeau/ppc-altivec
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
api.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.c: chacha.c:11:21: fatal error: altivec.h: No such file or directory
chacha.c: #include gt;
chacha.c: ^
chacha.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 dolbeau/ppc-altivec
gcc -funroll-loops -march=native -mcpu=native -O3 dolbeau/ppc-altivec
gcc -funroll-loops -march=native -mcpu=native -Os dolbeau/ppc-altivec
gcc -march=native -mcpu=native -O2 dolbeau/ppc-altivec
gcc -march=native -mcpu=native -O3 dolbeau/ppc-altivec
gcc -march=native -mcpu=native -Os dolbeau/ppc-altivec

Compiler output

Implementation: crypto_stream/chacha12/dolbeau/mipsel-msa
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
api.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.c: chacha.c:11:22: fatal error: arm_neon.h: No such file or directory
chacha.c: #include gt;
chacha.c: ^
chacha.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 dolbeau/mipsel-msa
gcc -funroll-loops -march=native -mcpu=native -O3 dolbeau/mipsel-msa
gcc -funroll-loops -march=native -mcpu=native -Os dolbeau/mipsel-msa
gcc -march=native -mcpu=native -O2 dolbeau/mipsel-msa
gcc -march=native -mcpu=native -O3 dolbeau/mipsel-msa
gcc -march=native -mcpu=native -Os dolbeau/mipsel-msa

Compiler output

Implementation: crypto_stream/chacha12/dolbeau/amd64-avx2
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
api.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 24, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 dolbeau/amd64-avx2 e/merged e/ref e/regs
gcc -funroll-loops -march=native -mcpu=native -O3 dolbeau/amd64-avx2 e/merged e/ref e/regs
gcc -funroll-loops -march=native -mcpu=native -Os dolbeau/amd64-avx2 e/merged e/ref e/regs
gcc -march=native -mcpu=native -O2 dolbeau/amd64-avx2 e/merged e/ref e/regs
gcc -march=native -mcpu=native -O3 dolbeau/amd64-avx2 e/merged e/ref e/regs
gcc -march=native -mcpu=native -Os dolbeau/amd64-avx2 e/merged e/ref e/regs

Compiler output

Implementation: crypto_stream/chacha12/amd64-ssse3
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
api.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.s: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 18, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 amd64-ssse3 e/amd64-3 e/amd64-xmm6
gcc -funroll-loops -march=native -mcpu=native -O3 amd64-ssse3 e/amd64-3 e/amd64-xmm6
gcc -funroll-loops -march=native -mcpu=native -Os amd64-ssse3 e/amd64-3 e/amd64-xmm6
gcc -march=native -mcpu=native -O2 amd64-ssse3 e/amd64-3 e/amd64-xmm6
gcc -march=native -mcpu=native -O3 amd64-ssse3 e/amd64-3 e/amd64-xmm6
gcc -march=native -mcpu=native -Os amd64-ssse3 e/amd64-3 e/amd64-xmm6

Compiler output

Implementation: crypto_stream/chacha12/moon/avx/64
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
crypto_stream.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.S: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 18, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -funroll-loops -march=native -mcpu=native -O3 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -funroll-loops -march=native -mcpu=native -Os moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mcpu=native -O2 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mcpu=native -O3 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mcpu=native -Os moon/avx/64 moon/avx2/64 moon/xop/64

Compiler output

Implementation: crypto_stream/chacha12/moon/sse2/64
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
crypto_stream.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.S: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 12, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 moon/sse2/64 moon/ssse3/64
gcc -funroll-loops -march=native -mcpu=native -O3 moon/sse2/64 moon/ssse3/64
gcc -funroll-loops -march=native -mcpu=native -Os moon/sse2/64 moon/ssse3/64
gcc -march=native -mcpu=native -O2 moon/sse2/64 moon/ssse3/64
gcc -march=native -mcpu=native -O3 moon/sse2/64 moon/ssse3/64
gcc -march=native -mcpu=native -Os moon/sse2/64 moon/ssse3/64

Compiler output

Implementation: crypto_stream/chacha12/krovetz/avx2
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
stream.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
stream.c: stream.c: In function 'crypto_stream_chacha12_krovetz_avx2_xor':
stream.c: stream.c:58:13: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
stream.c: __m256i s0 = _mm256_broadcastsi128_si256(*(__m128i *)sigma);
stream.c: ^~
stream.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/immintrin.h:43:0,
stream.c: from stream.c:8:
stream.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/avx2intrin.h:574:1: error: inlining failed in call to always_inline '_mm256_or_si256': target specific option mismatch
stream.c: _mm256_or_si256 (__m256i __A, __m256i __B)
stream.c: ^~~~~~~~~~~~~~~
stream.c: stream.c:63:13: note: called from here
stream.c: __m256i s3 = _mm256_or_si256(
stream.c: ^~
stream.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/immintrin.h:43:0,
stream.c: from stream.c:8:
stream.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/avx2intrin.h:655:1: error: inlining failed in call to always_inline '_mm256_slli_si256': target specific option mismatch
stream.c: _mm256_slli_si256 (__m256i __A, const int __N)
stream.c: ^~~~~~~~~~~~~~~~~
stream.c: stream.c:63:18: note: called from here
stream.c: __m256i s3 = _mm256_or_si256(
stream.c: ^~~~~~~~~~~~~~~~
stream.c: _mm256_slli_si256(_mm256_broadcastq_epi64(*(__m128i *)n), 8),
stream.c: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: _mm256_set_epi32(0,0,0,1,0,0,0,0)
stream.c: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: ...

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 krovetz/avx2
gcc -funroll-loops -march=native -mcpu=native -O3 krovetz/avx2
gcc -funroll-loops -march=native -mcpu=native -Os krovetz/avx2
gcc -march=native -mcpu=native -O2 krovetz/avx2
gcc -march=native -mcpu=native -O3 krovetz/avx2
gcc -march=native -mcpu=native -Os krovetz/avx2

Compiler output

Implementation: crypto_stream/chacha12/goll_gueron
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
stream.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
stream.c: stream.c:126:2: error: #error -- Implementation supports only microarchitectures with support for Advanced Vector Extensions (AVX2 or AVX512).
stream.c: #error -- Implementation supports only microarchitectures with support for Advanced Vector Extensions (AVX2 or AVX512).
stream.c: ^~~~~

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 goll_gueron
gcc -funroll-loops -march=native -mcpu=native -O3 goll_gueron
gcc -funroll-loops -march=native -mcpu=native -Os goll_gueron
gcc -march=native -mcpu=native -O2 goll_gueron
gcc -march=native -mcpu=native -O3 goll_gueron
gcc -march=native -mcpu=native -Os goll_gueron

Compiler output

Implementation: crypto_stream/chacha12/krovetz/vec128
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
stream.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 krovetz/vec128
gcc -funroll-loops -march=native -mcpu=native -O3 krovetz/vec128
gcc -funroll-loops -march=native -mcpu=native -Os krovetz/vec128
gcc -march=native -mcpu=native -O2 krovetz/vec128
gcc -march=native -mcpu=native -O3 krovetz/vec128
gcc -march=native -mcpu=native -Os krovetz/vec128