Implementation notes: amd64, par, crypto_stream/chacha8

Computer: par
Architecture: amd64
CPU ID: GenuineIntel-000406c3-bfebfbff
SUPERCOP version: 20161026
Operation: crypto_stream
Primitive: chacha8
TimeImplementationCompilerBenchmark dateSUPERCOP version
3800moon/sse2/64gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
3800moon/sse2/64gcc -march=native -mcpu=native -O32016121420161026
3800moon/sse2/64gcc -march=native -mcpu=native -Os2016121420161026
3820moon/sse2/64gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
3820moon/sse2/64gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
3820moon/sse2/64gcc -march=native -mcpu=native -O22016121420161026
4940krovetz/vec128gcc -march=native -mcpu=native -Os2016121420161026
4980krovetz/vec128gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
5220krovetz/vec128gcc -march=native -mcpu=native -O22016121420161026
5220krovetz/vec128gcc -march=native -mcpu=native -O32016121420161026
5240krovetz/vec128gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
5240krovetz/vec128gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
5400dolbeau/amd64-avx2gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
5540dolbeau/amd64-avx2gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
5540e/amd64-xmm6gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
5540e/amd64-xmm6gcc -march=native -mcpu=native -O32016121420161026
5540dolbeau/amd64-avx2gcc -march=native -mcpu=native -Os2016121420161026
5560e/amd64-xmm6gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
5560e/amd64-xmm6gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
5560e/amd64-xmm6gcc -march=native -mcpu=native -O22016121420161026
5560e/amd64-xmm6gcc -march=native -mcpu=native -Os2016121420161026
5960moon/ssse3/64gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
5960moon/ssse3/64gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
5960moon/ssse3/64gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
5960moon/ssse3/64gcc -march=native -mcpu=native -O22016121420161026
5960moon/ssse3/64gcc -march=native -mcpu=native -O32016121420161026
5960moon/ssse3/64gcc -march=native -mcpu=native -Os2016121420161026
6100dolbeau/amd64-avx2gcc -march=native -mcpu=native -O22016121420161026
6120dolbeau/amd64-avx2gcc -march=native -mcpu=native -O32016121420161026
6200amd64-ssse3gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
6200dolbeau/amd64-avx2gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
6200amd64-ssse3gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
6200amd64-ssse3gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
6200amd64-ssse3gcc -march=native -mcpu=native -O22016121420161026
6200amd64-ssse3gcc -march=native -mcpu=native -O32016121420161026
6200amd64-ssse3gcc -march=native -mcpu=native -Os2016121420161026
8400e/amd64-3gcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
8400e/amd64-3gcc -march=native -mcpu=native -O22016121420161026
8400e/amd64-3gcc -march=native -mcpu=native -O32016121420161026
8400e/amd64-3gcc -march=native -mcpu=native -Os2016121420161026
8420e/amd64-3gcc -funroll-loops -march=native -mcpu=native -O22016121420161026
8420e/amd64-3gcc -funroll-loops -march=native -mcpu=native -O32016121420161026
10160e/mergedgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
10220e/refgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
10240e/mergedgcc -march=native -mcpu=native -O22016121420161026
10240e/mergedgcc -march=native -mcpu=native -O32016121420161026
10240e/refgcc -march=native -mcpu=native -O32016121420161026
10240e/regsgcc -march=native -mcpu=native -O32016121420161026
10340e/regsgcc -funroll-loops -march=native -mcpu=native -O32016121420161026
10500e/mergedgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
10700e/mergedgcc -march=native -mcpu=native -Os2016121420161026
11020e/mergedgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
14120e/regsgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
14320e/regsgcc -march=native -mcpu=native -O22016121420161026
14540e/refgcc -funroll-loops -march=native -mcpu=native -O22016121420161026
18660e/refgcc -march=native -mcpu=native -O22016121420161026
19400e/regsgcc -march=native -mcpu=native -Os2016121420161026
19540e/regsgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
23520e/refgcc -funroll-loops -march=native -mcpu=native -Os2016121420161026
28080e/refgcc -march=native -mcpu=native -Os2016121420161026

Test failure

Implementation: crypto_stream/chacha8/moon/avx/64
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
error 111

Number of similar (compiler,implementation) pairs: 18, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -funroll-loops -march=native -mcpu=native -O3 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -funroll-loops -march=native -mcpu=native -Os moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mcpu=native -O2 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mcpu=native -O3 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mcpu=native -Os moon/avx/64 moon/avx2/64 moon/xop/64

Compiler output

Implementation: crypto_stream/chacha8/dolbeau/ppc-altivec
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
api.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.c: chacha.c:11:21: fatal error: altivec.h: No such file or directory
chacha.c: #include gt;
chacha.c: ^
chacha.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 dolbeau/ppc-altivec
gcc -funroll-loops -march=native -mcpu=native -O3 dolbeau/ppc-altivec
gcc -funroll-loops -march=native -mcpu=native -Os dolbeau/ppc-altivec
gcc -march=native -mcpu=native -O2 dolbeau/ppc-altivec
gcc -march=native -mcpu=native -O3 dolbeau/ppc-altivec
gcc -march=native -mcpu=native -Os dolbeau/ppc-altivec

Compiler output

Implementation: crypto_stream/chacha8/dolbeau/mipsel-msa
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
api.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.c: chacha.c:11:22: fatal error: arm_neon.h: No such file or directory
chacha.c: #include gt;
chacha.c: ^
chacha.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 dolbeau/mipsel-msa
gcc -funroll-loops -march=native -mcpu=native -O3 dolbeau/mipsel-msa
gcc -funroll-loops -march=native -mcpu=native -Os dolbeau/mipsel-msa
gcc -march=native -mcpu=native -O2 dolbeau/mipsel-msa
gcc -march=native -mcpu=native -O3 dolbeau/mipsel-msa
gcc -march=native -mcpu=native -Os dolbeau/mipsel-msa

Compiler output

Implementation: crypto_stream/chacha8/dolbeau/amd64-avx2
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
api.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 24, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 dolbeau/amd64-avx2 e/merged e/ref e/regs
gcc -funroll-loops -march=native -mcpu=native -O3 dolbeau/amd64-avx2 e/merged e/ref e/regs
gcc -funroll-loops -march=native -mcpu=native -Os dolbeau/amd64-avx2 e/merged e/ref e/regs
gcc -march=native -mcpu=native -O2 dolbeau/amd64-avx2 e/merged e/ref e/regs
gcc -march=native -mcpu=native -O3 dolbeau/amd64-avx2 e/merged e/ref e/regs
gcc -march=native -mcpu=native -Os dolbeau/amd64-avx2 e/merged e/ref e/regs

Compiler output

Implementation: crypto_stream/chacha8/amd64-ssse3
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
api.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.s: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 18, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 amd64-ssse3 e/amd64-3 e/amd64-xmm6
gcc -funroll-loops -march=native -mcpu=native -O3 amd64-ssse3 e/amd64-3 e/amd64-xmm6
gcc -funroll-loops -march=native -mcpu=native -Os amd64-ssse3 e/amd64-3 e/amd64-xmm6
gcc -march=native -mcpu=native -O2 amd64-ssse3 e/amd64-3 e/amd64-xmm6
gcc -march=native -mcpu=native -O3 amd64-ssse3 e/amd64-3 e/amd64-xmm6
gcc -march=native -mcpu=native -Os amd64-ssse3 e/amd64-3 e/amd64-xmm6

Compiler output

Implementation: crypto_stream/chacha8/moon/avx/64
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
crypto_stream.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.S: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 18, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -funroll-loops -march=native -mcpu=native -O3 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -funroll-loops -march=native -mcpu=native -Os moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mcpu=native -O2 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mcpu=native -O3 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mcpu=native -Os moon/avx/64 moon/avx2/64 moon/xop/64

Compiler output

Implementation: crypto_stream/chacha8/moon/sse2/64
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
crypto_stream.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
chacha.S: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 12, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 moon/sse2/64 moon/ssse3/64
gcc -funroll-loops -march=native -mcpu=native -O3 moon/sse2/64 moon/ssse3/64
gcc -funroll-loops -march=native -mcpu=native -Os moon/sse2/64 moon/ssse3/64
gcc -march=native -mcpu=native -O2 moon/sse2/64 moon/ssse3/64
gcc -march=native -mcpu=native -O3 moon/sse2/64 moon/ssse3/64
gcc -march=native -mcpu=native -Os moon/sse2/64 moon/ssse3/64

Compiler output

Implementation: crypto_stream/chacha8/krovetz/avx2
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
stream.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
stream.c: stream.c: In function 'crypto_stream_chacha8_krovetz_avx2_xor':
stream.c: stream.c:58:13: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
stream.c: __m256i s0 = _mm256_broadcastsi128_si256(*(__m128i *)sigma);
stream.c: ^~
stream.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/immintrin.h:43:0,
stream.c: from stream.c:8:
stream.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/avx2intrin.h:574:1: error: inlining failed in call to always_inline '_mm256_or_si256': target specific option mismatch
stream.c: _mm256_or_si256 (__m256i __A, __m256i __B)
stream.c: ^~~~~~~~~~~~~~~
stream.c: stream.c:63:13: note: called from here
stream.c: __m256i s3 = _mm256_or_si256(
stream.c: ^~
stream.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/immintrin.h:43:0,
stream.c: from stream.c:8:
stream.c: /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.1/include/avx2intrin.h:655:1: error: inlining failed in call to always_inline '_mm256_slli_si256': target specific option mismatch
stream.c: _mm256_slli_si256 (__m256i __A, const int __N)
stream.c: ^~~~~~~~~~~~~~~~~
stream.c: stream.c:63:18: note: called from here
stream.c: __m256i s3 = _mm256_or_si256(
stream.c: ^~~~~~~~~~~~~~~~
stream.c: _mm256_slli_si256(_mm256_broadcastq_epi64(*(__m128i *)n), 8),
stream.c: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: _mm256_set_epi32(0,0,0,1,0,0,0,0)
stream.c: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: ...

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 krovetz/avx2
gcc -funroll-loops -march=native -mcpu=native -O3 krovetz/avx2
gcc -funroll-loops -march=native -mcpu=native -Os krovetz/avx2
gcc -march=native -mcpu=native -O2 krovetz/avx2
gcc -march=native -mcpu=native -O3 krovetz/avx2
gcc -march=native -mcpu=native -Os krovetz/avx2

Compiler output

Implementation: crypto_stream/chacha8/goll_gueron
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
stream.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
stream.c: stream.c:126:2: error: #error -- Implementation supports only microarchitectures with support for Advanced Vector Extensions (AVX2 or AVX512).
stream.c: #error -- Implementation supports only microarchitectures with support for Advanced Vector Extensions (AVX2 or AVX512).
stream.c: ^~~~~

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 goll_gueron
gcc -funroll-loops -march=native -mcpu=native -O3 goll_gueron
gcc -funroll-loops -march=native -mcpu=native -Os goll_gueron
gcc -march=native -mcpu=native -O2 goll_gueron
gcc -march=native -mcpu=native -O3 goll_gueron
gcc -march=native -mcpu=native -Os goll_gueron

Compiler output

Implementation: crypto_stream/chacha8/krovetz/vec128
Compiler: gcc -funroll-loops -march=native -mcpu=native -O2
stream.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
try.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
measure.c: gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead

Number of similar (compiler,implementation) pairs: 6, namely:
CompilerImplementations
gcc -funroll-loops -march=native -mcpu=native -O2 krovetz/vec128
gcc -funroll-loops -march=native -mcpu=native -O3 krovetz/vec128
gcc -funroll-loops -march=native -mcpu=native -Os krovetz/vec128
gcc -march=native -mcpu=native -O2 krovetz/vec128
gcc -march=native -mcpu=native -O3 krovetz/vec128
gcc -march=native -mcpu=native -Os krovetz/vec128