Implementation notes: amd64, zen3, crypto_aead/hs1sivlov2

Computer: zen3
Architecture: amd64
CPU ID: AuthenticAMD-00a20f10-178bfbff
SUPERCOP version: 20211108
Operation: crypto_aead
Primitive: hs1sivlov2
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
360710238 0 031000 876 1016T:dolbeau/amd64-avx2clang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
378921190 0 049968 876 1048T:fasterclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
380310854 0 034576 852 1080T:fastergcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
384815715 0 039408 852 1080T:dolbeau/amd64-avx2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
39638956 0 030752 852 1080T:fastergcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
403412641 0 034432 852 1080T:dolbeau/amd64-avx2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
407712878 0 033711 844 1080T:dolbeau/amd64-avx2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
41037933 0 028694 868 1016T:fasterclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
41518029 0 027263 828 1048T:fastergcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
41588673 0 029487 844 1080T:fastergcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
417922630 0 051184 876 1048T:fasterclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
42879991 0 030502 868 1016T:dolbeau/amd64-avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
434823360 0 051760 876 1048T:dolbeau/amd64-avx2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
438923344 0 051984 876 1048T:dolbeau/amd64-avx2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
52468622 0 029360 876 1016T:dolbeau/amd64-sseclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
567318314 0 041992 852 1080T:dolbeau/amd64-ssegcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
57198480 0 028990 868 1016T:dolbeau/amd64-sseclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
574310385 0 031199 844 1080T:dolbeau/amd64-ssegcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
577120440 0 049104 876 1048T:dolbeau/amd64-sseclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
580420456 0 048880 876 1048T:dolbeau/amd64-sseclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
580910369 0 032144 852 1080T:dolbeau/amd64-ssegcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
123817795 0 031368 876 1016T:refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
1412412376 0 040984 876 1048T:refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
1429114792 0 043640 876 1048T:refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
219346464 0 030208 852 1080T:refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
219546032 0 027832 852 1080T:refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108
237355194 0 026040 876 1016T:refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
260134809 0 025486 868 1016T:refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2022020620211108
351565996 0 026920 852 1080T:refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2022020620211108

Test failure

Implementation: T:dolbeau/amd64-avx2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE
error 111
crypto_aead_decrypt returns nonzero

Number of similar (compiler,implementation) pairs: 2, namely:
CompilerImplementations
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:dolbeau/amd64-avx2 T:dolbeau/amd64-sse

Test failure

Implementation: T:faster
Security model: timingleaks
Compiler: clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
error 111
crypto_aead_decrypt allows trivial forgeries

Number of similar (compiler,implementation) pairs: 2, namely:
CompilerImplementations
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:faster
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref

Test failure

Implementation: T:faster
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
error 111

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:faster

Compiler output

Implementation: T:dolbeau/amd64-avx2
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
encrypt.c: encrypt.c:90:2: error: "This code requires AVX2 to work"
encrypt.c: #error "This code requires AVX2 to work"
encrypt.c: ^
encrypt.c: 1 error generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-avx2

Compiler output

Implementation: T:dolbeau/amd64-avx2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
encrypt.c: encrypt.c:85: warning: "_bswap64" redefined
encrypt.c: 85 | #define _bswap64(a) __builtin_bswap64(a)
encrypt.c: |
encrypt.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/x86gprintrin.h:27,
encrypt.c: from /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/immintrin.h:27,
encrypt.c: from encrypt.c:54:
encrypt.c: /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/ia32intrin.h:263: note: this is the location of the previous definition
encrypt.c: 263 | #define _bswap64(a) __bswapq(a)
encrypt.c: |
encrypt.c: encrypt.c:86: warning: "_bswap" redefined
encrypt.c: 86 | #define _bswap(a) __builtin_bswap(a)
encrypt.c: |
encrypt.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/x86gprintrin.h:27,
encrypt.c: from /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/immintrin.h:27,
encrypt.c: from encrypt.c:54:
encrypt.c: /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/ia32intrin.h:297: note: this is the location of the previous definition
encrypt.c: 297 | #define _bswap(a) __bswapd(a)
encrypt.c: |

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:dolbeau/amd64-avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:dolbeau/amd64-avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:dolbeau/amd64-avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:dolbeau/amd64-avx2

Compiler output

Implementation: T:dolbeau/amd64-avx512
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
encrypt.c: encrypt.c:90:2: error: "This code requires AVX512F to work"
encrypt.c: #error "This code requires AVX512F to work"
encrypt.c: ^
encrypt.c: encrypt.c:317:20: error: conflicting types for '_mm512_reduce_add_epi64'
encrypt.c: unsigned long long _mm512_reduce_add_epi64 (__m512i a) {
encrypt.c: ^
encrypt.c: /usr/lib/clang/13.0.0/include/avx512fintrin.h:9314:51: note: previous definition is here
encrypt.c: static __inline__ long long __DEFAULT_FN_ATTRS512 _mm512_reduce_add_epi64(__m512i __W) {
encrypt.c: ^
encrypt.c: encrypt.c:330:21: error: invalid input size for constraint 'Yz'
encrypt.c: : [a] "Yz" (a)
encrypt.c: ^
encrypt.c: 3 errors generated.

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-avx512
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-avx512
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-avx512
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-avx512
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-avx512

Compiler output

Implementation: T:dolbeau/amd64-avx512
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
encrypt.c: encrypt.c:85: warning: "_bswap64" redefined
encrypt.c: 85 | #define _bswap64(a) __builtin_bswap64(a)
encrypt.c: |
encrypt.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/x86gprintrin.h:27,
encrypt.c: from /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/immintrin.h:27,
encrypt.c: from encrypt.c:54:
encrypt.c: /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/ia32intrin.h:263: note: this is the location of the previous definition
encrypt.c: 263 | #define _bswap64(a) __bswapq(a)
encrypt.c: |
encrypt.c: encrypt.c:86: warning: "_bswap" redefined
encrypt.c: 86 | #define _bswap(a) __builtin_bswap(a)
encrypt.c: |
encrypt.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/x86gprintrin.h:27,
encrypt.c: from /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/immintrin.h:27,
encrypt.c: from encrypt.c:54:
encrypt.c: /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/ia32intrin.h:297: note: this is the location of the previous definition
encrypt.c: 297 | #define _bswap(a) __bswapd(a)
encrypt.c: |
encrypt.c: encrypt.c:90:2: error: #error "This code requires AVX512F to work"
encrypt.c: 90 | #error "This code requires AVX512F to work"
encrypt.c: | ^~~~~
encrypt.c: encrypt.c:317:20: error: conflicting types for ‘_mm512_reduce_add_epi64’; have ‘long long unsigned int(__m512i)’ {aka ‘long long unsigned int(__vector(8) long long int)’}
encrypt.c: 317 | unsigned long long _mm512_reduce_add_epi64 (__m512i a) {
encrypt.c: | ^~~~~~~~~~~~~~~~~~~~~~~
encrypt.c: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/immintrin.h:49,
encrypt.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:dolbeau/amd64-avx512
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:dolbeau/amd64-avx512
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:dolbeau/amd64-avx512
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:dolbeau/amd64-avx512

Compiler output

Implementation: T:dolbeau/amd64-sse
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
encrypt.c: In file included from encrypt.c:190:
encrypt.c: ./c128.h:99:7: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'chacha_noxor128' that is compiled without support for 'ssse3'
encrypt.c: VEC4_QUARTERROUND( 0, 4, 8,12);
encrypt.c: ^
encrypt.c: ./c128.h:17:36: note: expanded from macro 'VEC4_QUARTERROUND'
encrypt.c: #define VEC4_QUARTERROUND(a,b,c,d) VEC4_QUARTERROUND_SHUFFLE(a,b,c,d)
encrypt.c: ^
encrypt.c: ./c128.h:12:86: note: expanded from macro 'VEC4_QUARTERROUND_SHUFFLE'
encrypt.c: x_##a = _mm_add_epi32(x_##a, x_##b); t_##a = _mm_xor_si128(x_##d, x_##a); x_##d = _mm_shuffle_epi8(t_##a, rot16); \
encrypt.c: ^
encrypt.c: ./c128.h:99:7: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'chacha_noxor128' that is compiled without support for 'ssse3'
encrypt.c: ./c128.h:17:36: note: expanded from macro 'VEC4_QUARTERROUND'
encrypt.c: #define VEC4_QUARTERROUND(a,b,c,d) VEC4_QUARTERROUND_SHUFFLE(a,b,c,d)
encrypt.c: ^
encrypt.c: ./c128.h:14:86: note: expanded from macro 'VEC4_QUARTERROUND_SHUFFLE'
encrypt.c: x_##a = _mm_add_epi32(x_##a, x_##b); t_##a = _mm_xor_si128(x_##d, x_##a); x_##d = _mm_shuffle_epi8(t_##a, rot8); \
encrypt.c: ^
encrypt.c: ./c128.h:100:7: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'chacha_noxor128' that is compiled without support for 'ssse3'
encrypt.c: VEC4_QUARTERROUND( 1, 5, 9,13);
encrypt.c: ^
encrypt.c: ./c128.h:17:36: note: expanded from macro 'VEC4_QUARTERROUND'
encrypt.c: #define VEC4_QUARTERROUND(a,b,c,d) VEC4_QUARTERROUND_SHUFFLE(a,b,c,d)
encrypt.c: ^
encrypt.c: ./c128.h:12:86: note: expanded from macro 'VEC4_QUARTERROUND_SHUFFLE'
encrypt.c: x_##a = _mm_add_epi32(x_##a, x_##b); t_##a = _mm_xor_si128(x_##d, x_##a); x_##d = _mm_shuffle_epi8(t_##a, rot16); \
encrypt.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-sse