Implementation notes: amd64, waldorf, crypto_stream/chacha20

Computer: waldorf
Architecture: amd64
CPU ID: GenuineIntel-000106e5-bfebfbff
SUPERCOP version: 20160715
Operation: crypto_stream
Primitive: chacha20
TimeImplementationCompilerBenchmark dateSUPERCOP version
6968krovetz/vec128gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
7428krovetz/vec128gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
7464dolbeau/amd64-avx2gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
7568amd64-ssse3gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
7588dolbeau/amd64-avx2clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
7708moon/sse2/64clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
7760moon/ssse3/64clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
7760moon/ssse3/64gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
7848dolbeau/amd64-avx2gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
7880moon/ssse3/64gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
7896moon/ssse3/64gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
7944amd64-ssse3gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
7976moon/ssse3/64gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
7996amd64-ssse3gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
8016amd64-ssse3gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
8052dolbeau/amd64-avx2gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
8148krovetz/vec128gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
8236e/amd64-xmm6gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
8316e/amd64-xmm6gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
8504moon/sse2/64gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
8536moon/sse2/64gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
8828moon/sse2/64gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
8844krovetz/vec128clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
8900moon/sse2/64gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
9152e/amd64-xmm6gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
9324e/amd64-xmm6gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
10672dolbeau/amd64-avx2gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
13220krovetz/vec128gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
18344e/mergedgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
21124e/amd64-3clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
21740e/amd64-3gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
21800e/amd64-3gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
21844e/amd64-3gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
22000e/amd64-3gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
22944e/mergedclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
24048e/regsgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
25160e/mergedgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
25160e/refgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
25564e/regsgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016071820160715
26492e/mergedgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
26540e/mergedgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
30468e/regsclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
30736e/regsgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
31348e/refclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016071820160715
34364e/refgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016071820160715
35400e/refgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016071820160715
36532e/regsgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715
39756e/refgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016071820160715

Test failure

Implementation: crypto_stream/chacha20/amd64-ssse3
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
error 111

Number of similar (compiler,implementation) pairs: 17, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments amd64-ssse3 e/amd64-xmm6 moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv moon/avx/64 moon/avx2/64 moon/xop/64
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv moon/avx/64 moon/avx2/64 moon/xop/64

Compiler output

Implementation: crypto_stream/chacha20/dolbeau/ppc-altivec
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
chacha.c: In file included from chacha.c:11:
chacha.c: /usr/include/clang/3.5.0/include/altivec.h:27:2: error: "AltiVec support not enabled"
chacha.c: #error "AltiVec support not enabled"
chacha.c: ^
chacha.c: /usr/include/clang/3.5.0/include/altivec.h:39:8: error: unknown type name 'vector'
chacha.c: static vector signed char __ATTRS_o_ai
chacha.c: ^
chacha.c: /usr/include/clang/3.5.0/include/altivec.h:39:15: error: expected identifier or '('
chacha.c: static vector signed char __ATTRS_o_ai
chacha.c: ^
chacha.c: /usr/include/clang/3.5.0/include/altivec.h:42:8: error: unknown type name 'vector'
chacha.c: static vector unsigned char __ATTRS_o_ai
chacha.c: ^
chacha.c: /usr/include/clang/3.5.0/include/altivec.h:42:15: error: expected identifier or '('
chacha.c: static vector unsigned char __ATTRS_o_ai
chacha.c: ^
chacha.c: /usr/include/clang/3.5.0/include/altivec.h:47:8: error: unknown type name 'vector'
chacha.c: static vector bool char __ATTRS_o_ai
chacha.c: ^
chacha.c: /usr/include/clang/3.5.0/include/altivec.h:47:19: error: expected ';' after top level declarator
chacha.c: static vector bool char __ATTRS_o_ai
chacha.c: ^
chacha.c: /usr/include/clang/3.5.0/include/altivec.h:48:10: error: unknown type name 'vector'
chacha.c: vec_perm(vector bool char __a, vector bool char __b, vector unsigned char __c);
chacha.c: ^
chacha.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments dolbeau/ppc-altivec

Compiler output

Implementation: crypto_stream/chacha20/dolbeau/mipsel-msa
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
chacha.c: In file included from chacha.c:11:
chacha.c: /usr/include/clang/3.5.0/include/arm_neon.h:28:2: error: "NEON support not enabled"
chacha.c: #error "NEON support not enabled"
chacha.c: ^
chacha.c: /usr/include/clang/3.5.0/include/arm_neon.h:48:24: error: 'neon_vector_type' attribute is not supported for this target
chacha.c: typedef __attribute__((neon_vector_type(8))) int8_t int8x8_t;
chacha.c: ^
chacha.c: /usr/include/clang/3.5.0/include/arm_neon.h:49:24: error: 'neon_vector_type' attribute is not supported for this target
chacha.c: typedef __attribute__((neon_vector_type(16))) int8_t int8x16_t;
chacha.c: ^
chacha.c: /usr/include/clang/3.5.0/include/arm_neon.h:50:24: error: 'neon_vector_type' attribute is not supported for this target
chacha.c: typedef __attribute__((neon_vector_type(4))) int16_t int16x4_t;
chacha.c: ^
chacha.c: /usr/include/clang/3.5.0/include/arm_neon.h:51:24: error: 'neon_vector_type' attribute is not supported for this target
chacha.c: typedef __attribute__((neon_vector_type(8))) int16_t int16x8_t;
chacha.c: ^
chacha.c: /usr/include/clang/3.5.0/include/arm_neon.h:52:24: error: 'neon_vector_type' attribute is not supported for this target
chacha.c: typedef __attribute__((neon_vector_type(2))) int32_t int32x2_t;
chacha.c: ^
chacha.c: /usr/include/clang/3.5.0/include/arm_neon.h:53:24: error: 'neon_vector_type' attribute is not supported for this target
chacha.c: typedef __attribute__((neon_vector_type(4))) int32_t int32x4_t;
chacha.c: ^
chacha.c: /usr/include/clang/3.5.0/include/arm_neon.h:54:24: error: 'neon_vector_type' attribute is not supported for this target
chacha.c: typedef __attribute__((neon_vector_type(1))) int64_t int64x1_t;
chacha.c: ^
chacha.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments dolbeau/mipsel-msa

Compiler output

Implementation: crypto_stream/chacha20/goll_gueron
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
stream.c: stream.c:126:2: error: -- Implementation supports only microarchitectures with support for Advanced Vector Extensions (AVX2 or AVX512).
stream.c: #error -- Implementation supports only microarchitectures with support for Advanced Vector Extensions (AVX2 or AVX512).
stream.c: ^
stream.c: 1 error generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments goll_gueron

Compiler output

Implementation: crypto_stream/chacha20/krovetz/avx2
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
stream.c: stream.c:54:5: error: use of undeclared identifier '__m256i'
stream.c: __m256i v0,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10,v11;
stream.c: ^
stream.c: stream.c:56:5: error: use of undeclared identifier '__m256i'
stream.c: __m256i s0 = _mm_broadcastsi128_si256((__m128i *)sigma);
stream.c: ^
stream.c: stream.c:60:5: error: use of undeclared identifier '__m256i'
stream.c: __m256i s1 = _mm256_loadu_si256((__m256i *)k);
stream.c: ^
stream.c: stream.c:61:5: error: use of undeclared identifier '__m256i'
stream.c: __m256i s2 = _mm256_permute2x128_si256(s1,s1,0x11);
stream.c: ^
stream.c: stream.c:62:5: error: use of undeclared identifier 's1'
stream.c: s1 = _mm256_permute2x128_si256(s1,s1,0x00);
stream.c: ^
stream.c: stream.c:62:10: warning: implicit declaration of function '_mm256_permute2x128_si256' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: s1 = _mm256_permute2x128_si256(s1,s1,0x00);
stream.c: ^
stream.c: stream.c:62:36: error: use of undeclared identifier 's1'
stream.c: s1 = _mm256_permute2x128_si256(s1,s1,0x00);
stream.c: ^
stream.c: stream.c:62:39: error: use of undeclared identifier 's1'
stream.c: s1 = _mm256_permute2x128_si256(s1,s1,0x00);
stream.c: ^
stream.c: stream.c:63:5: error: use of undeclared identifier '__m256i'
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments krovetz/avx2

Compiler output

Implementation: crypto_stream/chacha20/dolbeau/ppc-altivec
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv
chacha.c: chacha.c:11:21: fatal error: altivec.h: No such file or directory
chacha.c: #include gt;
chacha.c: ^
chacha.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv dolbeau/ppc-altivec
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv dolbeau/ppc-altivec
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv dolbeau/ppc-altivec
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv dolbeau/ppc-altivec

Compiler output

Implementation: crypto_stream/chacha20/dolbeau/mipsel-msa
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv
chacha.c: chacha.c:11:22: fatal error: arm_neon.h: No such file or directory
chacha.c: #include gt;
chacha.c: ^
chacha.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv dolbeau/mipsel-msa
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv dolbeau/mipsel-msa
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv dolbeau/mipsel-msa
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv dolbeau/mipsel-msa

Compiler output

Implementation: crypto_stream/chacha20/krovetz/avx2
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv
stream.c: stream.c: In function 'crypto_stream_chacha20_krovetz_avx2_xor':
stream.c: stream.c:58:13: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
stream.c: __m256i s0 = _mm256_broadcastsi128_si256(*(__m128i *)sigma);
stream.c: ^
stream.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/immintrin.h:43:0,
stream.c: from stream.c:8:
stream.c: /usr/lib/gcc/x86_64-linux-gnu/4.9/include/avx2intrin.h:933:1: error: inlining failed in call to always_inline '_mm256_broadcastsi128_si256': target specific option mismatch
stream.c: _mm256_broadcastsi128_si256 (__m128i __X)
stream.c: ^
stream.c: stream.c:58:13: error: called from here
stream.c: __m256i s0 = _mm256_broadcastsi128_si256(*(__m128i *)sigma);
stream.c: ^
stream.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/immintrin.h:41:0,
stream.c: from stream.c:8:
stream.c: /usr/lib/gcc/x86_64-linux-gnu/4.9/include/avxintrin.h:890:1: error: inlining failed in call to always_inline '_mm256_loadu_si256': target specific option mismatch
stream.c: _mm256_loadu_si256 (__m256i const *__P)
stream.c: ^
stream.c: stream.c:60:13: error: called from here
stream.c: __m256i s1 = _mm256_loadu_si256((__m256i *)k);
stream.c: ^
stream.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/immintrin.h:43:0,
stream.c: from stream.c:8:
stream.c: /usr/lib/gcc/x86_64-linux-gnu/4.9/include/avx2intrin.h:1066:1: error: inlining failed in call to always_inline '_mm256_permute2x128_si256': target specific option mismatch
stream.c: _mm256_permute2x128_si256 (__m256i __X, __m256i __Y, const int __M)
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv krovetz/avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv krovetz/avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv krovetz/avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv krovetz/avx2

Compiler output

Implementation: crypto_stream/chacha20/goll_gueron
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv
stream.c: stream.c:126:2: error: #error -- Implementation supports only microarchitectures with support for Advanced Vector Extensions (AVX2 or AVX512).
stream.c: #error -- Implementation supports only microarchitectures with support for Advanced Vector Extensions (AVX2 or AVX512).
stream.c: ^

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv goll_gueron
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv goll_gueron
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv goll_gueron
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv goll_gueron