Implementation notes: x86, thoth, crypto_stream/chacha8

Computer: thoth
Architecture: x86
CPU ID: AuthenticAMD-00000622-0183f9ff
SUPERCOP version: 20160806
Operation: crypto_stream
Primitive: chacha8
TimeImplementationCompilerBenchmark dateSUPERCOP version
8353e/x86-1gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016072620160724
8359e/x86-1gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016072620160724
8360e/x86-1clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016072620160724
8365e/x86-1gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016072620160724
8369e/x86-1gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016072620160724
9712e/mergedclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016072620160724
9757e/mergedgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016072620160724
9955e/mergedgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016072620160724
9961e/mergedgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016072620160724
10111e/regsgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016072620160724
10411e/refgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016072620160724
10732e/mergedgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016072620160724
11629e/x86-mmxgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016072620160724
11634e/x86-mmxgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016072620160724
11635e/x86-mmxgcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv2016072620160724
11664e/x86-mmxclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016072620160724
11673e/x86-mmxgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016072620160724
13372e/regsclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016072620160724
14020e/refclang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments2016072620160724
14895e/regsgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016072620160724
15179e/regsgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016072620160724
19359e/refgcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv2016072620160724
19365e/refgcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv2016072620160724
19693e/regsgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016072620160724
23592e/refgcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv2016072620160724

Test failure

Implementation: crypto_stream/chacha8/moon/avx/32
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
error 111

Number of similar (compiler,implementation) pairs: 15, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments moon/avx/32 moon/avx2/32 moon/xop/32
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv moon/avx/32 moon/avx2/32 moon/xop/32
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv moon/avx/32 moon/avx2/32 moon/xop/32
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv moon/avx/32 moon/avx2/32 moon/xop/32
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv moon/avx/32 moon/avx2/32 moon/xop/32

Test failure

Implementation: crypto_stream/chacha8/e/x86-xmm
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
error 111
crypto_stream_xor does not match crypto_stream

Number of similar (compiler,implementation) pairs: 20, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments e/x86-xmm e/x86-xmm2 e/x86-xmm5 e/x86-xmm6
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv e/x86-xmm e/x86-xmm2 e/x86-xmm5 e/x86-xmm6
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv e/x86-xmm e/x86-xmm2 e/x86-xmm5 e/x86-xmm6
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv e/x86-xmm e/x86-xmm2 e/x86-xmm5 e/x86-xmm6
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv e/x86-xmm e/x86-xmm2 e/x86-xmm5 e/x86-xmm6

Compiler output

Implementation: crypto_stream/chacha8/dolbeau/ppc-altivec
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
chacha.c: In file included from chacha.c:11:
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/altivec.h:27:2: error: "AltiVec support not enabled"
chacha.c: #error "AltiVec support not enabled"
chacha.c: ^
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/altivec.h:39:8: error: unknown type name 'vector'
chacha.c: static vector signed char __ATTRS_o_ai vec_perm(vector signed char __a,
chacha.c: ^
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/altivec.h:39:15: error: expected identifier or '('
chacha.c: static vector signed char __ATTRS_o_ai vec_perm(vector signed char __a,
chacha.c: ^
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/altivec.h:43:8: error: unknown type name 'vector'
chacha.c: static vector unsigned char __ATTRS_o_ai vec_perm(vector unsigned char __a,
chacha.c: ^
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/altivec.h:43:15: error: expected identifier or '('
chacha.c: static vector unsigned char __ATTRS_o_ai vec_perm(vector unsigned char __a,
chacha.c: ^
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/altivec.h:47:8: error: unknown type name 'vector'
chacha.c: static vector bool char __ATTRS_o_ai vec_perm(vector bool char __a,
chacha.c: ^
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/altivec.h:47:19: error: expected ';' after top level declarator
chacha.c: static vector bool char __ATTRS_o_ai vec_perm(vector bool char __a,
chacha.c: ^
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/altivec.h:47:47: error: unknown type name 'vector'
chacha.c: static vector bool char __ATTRS_o_ai vec_perm(vector bool char __a,
chacha.c: ^
chacha.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments dolbeau/ppc-altivec

Compiler output

Implementation: crypto_stream/chacha8/dolbeau/mipsel-msa
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
chacha.c: In file included from chacha.c:11:
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/arm_neon.h:28:2: error: "NEON support not enabled"
chacha.c: #error "NEON support not enabled"
chacha.c: ^
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/arm_neon.h:48:24: error: 'neon_vector_type' attribute is not supported for this target
chacha.c: typedef __attribute__((neon_vector_type(8))) int8_t int8x8_t;
chacha.c: ^
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/arm_neon.h:49:24: error: 'neon_vector_type' attribute is not supported for this target
chacha.c: typedef __attribute__((neon_vector_type(16))) int8_t int8x16_t;
chacha.c: ^
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/arm_neon.h:50:24: error: 'neon_vector_type' attribute is not supported for this target
chacha.c: typedef __attribute__((neon_vector_type(4))) int16_t int16x4_t;
chacha.c: ^
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/arm_neon.h:51:24: error: 'neon_vector_type' attribute is not supported for this target
chacha.c: typedef __attribute__((neon_vector_type(8))) int16_t int16x8_t;
chacha.c: ^
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/arm_neon.h:52:24: error: 'neon_vector_type' attribute is not supported for this target
chacha.c: typedef __attribute__((neon_vector_type(2))) int32_t int32x2_t;
chacha.c: ^
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/arm_neon.h:53:24: error: 'neon_vector_type' attribute is not supported for this target
chacha.c: typedef __attribute__((neon_vector_type(4))) int32_t int32x4_t;
chacha.c: ^
chacha.c: /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/arm_neon.h:54:24: error: 'neon_vector_type' attribute is not supported for this target
chacha.c: typedef __attribute__((neon_vector_type(1))) int64_t int64x1_t;
chacha.c: ^
chacha.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments dolbeau/mipsel-msa

Compiler output

Implementation: crypto_stream/chacha8/amd64-ssse3
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
chacha.s: chacha.s:22:5: error: register %rsp is only available in 64-bit mode
chacha.s: mov %rsp,%r11
chacha.s: ^~~~
chacha.s: chacha.s:23:9: error: register %r11 is only available in 64-bit mode
chacha.s: and $31,%r11
chacha.s: ^~~~
chacha.s: chacha.s:24:10: error: register %r11 is only available in 64-bit mode
chacha.s: add $384,%r11
chacha.s: ^~~~
chacha.s: chacha.s:25:5: error: register %r11 is only available in 64-bit mode
chacha.s: sub %r11,%rsp
chacha.s: ^~~~
chacha.s: chacha.s:26:6: error: register %rdi is only available in 64-bit mode
chacha.s: mov %rdi,%r8
chacha.s: ^~~~
chacha.s: chacha.s:27:6: error: register %rsi is only available in 64-bit mode
chacha.s: mov %rsi,%rsi
chacha.s: ^~~~
chacha.s: chacha.s:28:6: error: register %rsi is only available in 64-bit mode
chacha.s: mov %rsi,%rdi
chacha.s: ^~~~
chacha.s: chacha.s:29:6: error: register %rdx is only available in 64-bit mode
chacha.s: mov %rdx,%rdx
chacha.s: ^~~~
chacha.s: chacha.s:30:9: error: register %rdx is only available in 64-bit mode
chacha.s: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments amd64-ssse3

Compiler output

Implementation: crypto_stream/chacha8/goll_gueron
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
stream.c: stream.c:126:2: error: -- Implementation supports only microarchitectures with support for Advanced Vector Extensions (AVX2 or AVX512).
stream.c: #error -- Implementation supports only microarchitectures with support for Advanced Vector Extensions (AVX2 or AVX512).
stream.c: ^
stream.c: 1 error generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments goll_gueron

Compiler output

Implementation: crypto_stream/chacha8/krovetz/avx2
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
stream.c: stream.c:56:18: warning: implicit declaration of function '_mm_broadcastsi128_si256' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: __m256i s0 = _mm_broadcastsi128_si256((__m128i *)sigma);
stream.c: ^
stream.c: stream.c:56:13: error: initializing '__m256i' (vector of 4 'long long' values) with an expression of incompatible type 'int'
stream.c: __m256i s0 = _mm_broadcastsi128_si256((__m128i *)sigma);
stream.c: ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream.c: 1 warning and 1 error generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments krovetz/avx2

Compiler output

Implementation: crypto_stream/chacha8/krovetz/vec128
Compiler: clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments
stream.c: stream.c:80:2: error: -- Implementation supports only machines with neon, altivec or SSE2
stream.c: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: ^
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: vec s3 = NONCE(np);
stream.c: ^
stream.c: stream.c:151:9: error: initializing 'vec' (vector of 4 'unsigned int' values) with an expression of incompatible type 'int'
stream.c: vec s3 = NONCE(np);
stream.c: ^ ~~~~~~~~~
stream.c: stream.c:152:36: error: use of undeclared identifier 'VBPI'
stream.c: for (iters = 0; iters stream.c: ^
stream.c: stream.c:91:19: note: expanded from macro 'BPI'
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:152:36: error: use of undeclared identifier 'GPR_TOO'
stream.c: stream.c:91:26: note: expanded from macro 'BPI'
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:155:19: error: use of undeclared identifier 'ONE'
stream.c: v7 = v3 + ONE;
stream.c: ^
stream.c: stream.c:176:13: warning: implicit declaration of function 'ROTW16' is invalid in C99 [-Wimplicit-function-declaration]
stream.c: DQROUND_VECTORS(v0,v1,v2,v3)
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments krovetz/vec128

Compiler output

Implementation: crypto_stream/chacha8/dolbeau/ppc-altivec
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv
chacha.c: chacha.c:11:21: fatal error: altivec.h: No such file or directory
chacha.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv dolbeau/ppc-altivec
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv dolbeau/ppc-altivec
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv dolbeau/ppc-altivec
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv dolbeau/ppc-altivec

Compiler output

Implementation: crypto_stream/chacha8/dolbeau/mipsel-msa
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv
chacha.c: chacha.c:11:22: fatal error: arm_neon.h: No such file or directory
chacha.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv dolbeau/mipsel-msa
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv dolbeau/mipsel-msa
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv dolbeau/mipsel-msa
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv dolbeau/mipsel-msa

Compiler output

Implementation: crypto_stream/chacha8/amd64-ssse3
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv
chacha.s: chacha.s: Assembler messages:
chacha.s: chacha.s:22: Error: bad register name `%rsp'
chacha.s: chacha.s:23: Error: bad register name `%r11'
chacha.s: chacha.s:24: Error: bad register name `%r11'
chacha.s: chacha.s:25: Error: bad register name `%r11'
chacha.s: chacha.s:26: Error: bad register name `%rdi'
chacha.s: chacha.s:27: Error: bad register name `%rsi'
chacha.s: chacha.s:28: Error: bad register name `%rsi'
chacha.s: chacha.s:29: Error: bad register name `%rdx'
chacha.s: chacha.s:30: Error: bad register name `%rdx'
chacha.s: chacha.s:34: Error: bad register name `%rax'
chacha.s: chacha.s:36: Error: bad register name `%rdx'
chacha.s: chacha.s:40: Error: bad register name `%rdx'
chacha.s: chacha.s:50: Error: bad register name `%rsp'
chacha.s: chacha.s:51: Error: bad register name `%r11'
chacha.s: chacha.s:52: Error: bad register name `%r11'
chacha.s: chacha.s:53: Error: bad register name `%r11'
chacha.s: chacha.s:55: Error: bad register name `%rdi'
chacha.s: chacha.s:57: Error: bad register name `%rsi'
chacha.s: chacha.s:59: Error: bad register name `%rdx'
chacha.s: chacha.s:61: Error: bad register name `%rcx'
chacha.s: chacha.s:63: Error: bad register name `%rdx'
chacha.s: chacha.s:75: Error: bad register name `%rsp'
chacha.s: chacha.s:76: Error: bad register name `%r11'
chacha.s: chacha.s:77: Error: bad register name `%r11'
chacha.s: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv amd64-ssse3
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv amd64-ssse3
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv amd64-ssse3
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv amd64-ssse3

Compiler output

Implementation: crypto_stream/chacha8/krovetz/avx2
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv
stream.c: stream.c: In function 'crypto_stream_chacha8_krovetz_avx2_xor':
stream.c: stream.c:58:13: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
stream.c: __m256i s0 = _mm256_broadcastsi128_si256(*(__m128i *)sigma);
stream.c: ^
stream.c: In file included from /usr/lib/gcc/i686-linux-gnu/5/include/immintrin.h:43:0,
stream.c: from stream.c:8:
stream.c: /usr/lib/gcc/i686-linux-gnu/5/include/avx2intrin.h:574:1: error: inlining failed in call to always_inline '_mm256_or_si256': target specific option mismatch
stream.c: _mm256_or_si256 (__m256i __A, __m256i __B)
stream.c: ^
stream.c: stream.c:63:13: error: called from here
stream.c: __m256i s3 = _mm256_or_si256(
stream.c: ^
stream.c: In file included from /usr/lib/gcc/i686-linux-gnu/5/include/immintrin.h:43:0,
stream.c: from stream.c:8:
stream.c: /usr/lib/gcc/i686-linux-gnu/5/include/avx2intrin.h:655:1: error: inlining failed in call to always_inline '_mm256_slli_si256': target specific option mismatch
stream.c: _mm256_slli_si256 (__m256i __A, const int __N)
stream.c: ^
stream.c: stream.c:63:18: error: called from here
stream.c: __m256i s3 = _mm256_or_si256(
stream.c: ^
stream.c: In file included from /usr/lib/gcc/i686-linux-gnu/5/include/immintrin.h:43:0,
stream.c: from stream.c:8:
stream.c: /usr/lib/gcc/i686-linux-gnu/5/include/avx2intrin.h:1006:1: error: inlining failed in call to always_inline '_mm256_broadcastq_epi64': target specific option mismatch
stream.c: _mm256_broadcastq_epi64 (__m128i __X)
stream.c: ^
stream.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv krovetz/avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv krovetz/avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv krovetz/avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv krovetz/avx2

Compiler output

Implementation: crypto_stream/chacha8/goll_gueron
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv
stream.c: stream.c:126:2: error: #error -- Implementation supports only microarchitectures with support for Advanced Vector Extensions (AVX2 or AVX512).
stream.c: #error -- Implementation supports only microarchitectures with support for Advanced Vector Extensions (AVX2 or AVX512).
stream.c: ^

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv goll_gueron
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv goll_gueron
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv goll_gueron
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv goll_gueron

Compiler output

Implementation: crypto_stream/chacha8/krovetz/vec128
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv
stream.c: stream.c:80:2: error: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: #error -- Implementation supports only machines with neon, altivec or SSE2
stream.c: ^
stream.c: stream.c: In function 'crypto_stream_chacha8_krovetz_vec128_xor':
stream.c: stream.c:151:14: warning: implicit declaration of function 'NONCE' [-Wimplicit-function-declaration]
stream.c: vec s3 = NONCE(np);
stream.c: ^
stream.c: stream.c:151:14: error: incompatible types when initializing type 'vec {aka __vector(4) unsigned int}' using type 'int'
stream.c: stream.c:91:19: error: 'VBPI' undeclared (first use in this function)
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: for (iters = 0; iters stream.c: ^
stream.c: stream.c:91:19: note: each undeclared identifier is reported only once for each function it appears in
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: for (iters = 0; iters stream.c: ^
stream.c: stream.c:91:26: error: 'GPR_TOO' undeclared (first use in this function)
stream.c: #define BPI (VBPI + GPR_TOO) /* Blocks computed per loop iteration */
stream.c: ^
stream.c: stream.c:152:36: note: in expansion of macro 'BPI'
stream.c: for (iters = 0; iters stream.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv krovetz/vec128
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv krovetz/vec128
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv krovetz/vec128
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv krovetz/vec128