Implementation notes: amd64, cezanne, crypto_aead/hs1sivv2

Computer: cezanne
Microarchitecture: amd64; Zen 3 (a50f00)
Architecture: amd64
CPU ID: AuthenticAMD-00a50f00-178bfbff
SUPERCOP version: 20240107
Operation: crypto_aead
Primitive: hs1sivv2
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
603429055 0 051904 796 1080T:dolbeau/amd64-avx2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
603814370 0 034911 788 1080T:dolbeau/amd64-avx2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
613314042 0 035320 796 1080T:dolbeau/amd64-avx2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
618911900 0 034864 796 1080T:fastergcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
63158991 0 030336 796 1080T:fastergcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
63857976 0 027027 772 1048T:fastergcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
64047830 0 027646 820 1016T:fasterclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
663112856 0 035664 828 1048T:dolbeau/amd64-avx2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
664610432 0 033088 828 1048T:dolbeau/amd64-avx2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
66599956 0 029582 820 1016T:dolbeau/amd64-avx2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
67098545 0 029232 796 1080T:fastergcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
682411154 0 030187 772 1048T:dolbeau/amd64-avx2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
69448440 0 028574 820 1016T:fasterclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
876511091 0 031623 788 1080T:dolbeau/amd64-ssegcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
884710786 0 032064 796 1080T:dolbeau/amd64-ssegcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
886427862 0 050736 796 1080T:dolbeau/amd64-ssegcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
91218809 0 031512 828 1048T:dolbeau/amd64-sseclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
913111217 0 034024 828 1048T:dolbeau/amd64-sseclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
91798925 0 027939 772 1048T:dolbeau/amd64-ssegcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
91888411 0 028062 820 1016T:dolbeau/amd64-sseclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
188697810 0 030216 828 1016T:refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
216398442 0 031232 828 1048T:refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
220586821 0 029584 828 1048T:refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
264725871 0 028888 796 1080T:refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
287945719 0 027024 796 1080T:refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
342214175 0 023235 772 1048T:refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
366914193 0 023958 820 1016T:refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
446835616 0 026280 796 1080T:refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
820946016 0 026094 820 1016T:refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212

Test failure

Implementation: T:dolbeau/amd64-avx2
Security model: timingleaks
Compiler: clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
error 111

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-avx2 T:dolbeau/amd64-sse
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:faster
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:faster
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:faster

Compiler output

Implementation: T:dolbeau/amd64-avx2
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
encrypt.c: encrypt.c:90:2: error: "This code requires AVX2 to work"
encrypt.c: #error "This code requires AVX2 to work"
encrypt.c: ^
encrypt.c: 1 error generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-avx2

Compiler output

Implementation: T:dolbeau/amd64-avx512
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
encrypt.c: encrypt.c:90:2: error: "This code requires AVX512F to work"
encrypt.c: #error "This code requires AVX512F to work"
encrypt.c: ^
encrypt.c: encrypt.c:322:20: error: conflicting types for '_mm512_reduce_add_epi64'
encrypt.c: unsigned long long _mm512_reduce_add_epi64 (__m512i a) {
encrypt.c: ^
encrypt.c: /usr/lib/llvm-11/lib/clang/11.0.1/include/avx512fintrin.h:9319:51: note: previous definition is here
encrypt.c: static __inline__ long long __DEFAULT_FN_ATTRS512 _mm512_reduce_add_epi64(__m512i __W) {
encrypt.c: ^
encrypt.c: encrypt.c:335:21: error: invalid input size for constraint 'Yz'
encrypt.c: : [a] "Yz" (a)
encrypt.c: ^
encrypt.c: 3 errors generated.

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-avx512
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-avx512
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-avx512
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-avx512
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-avx512

Compiler output

Implementation: T:dolbeau/amd64-avx512
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
encrypt.c: encrypt.c:90:2: error: #error "This code requires AVX512F to work"
encrypt.c: 90 | #error "This code requires AVX512F to work"
encrypt.c: | ^~~~~
encrypt.c: encrypt.c:322:20: error: conflicting types for '_mm512_reduce_add_epi64'
encrypt.c: 322 | unsigned long long _mm512_reduce_add_epi64 (__m512i a) {
encrypt.c: | ^~~~~~~~~~~~~~~~~~~~~~~
encrypt.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/10/include/immintrin.h:55,
encrypt.c: from encrypt.c:54:
encrypt.c: /usr/lib/gcc/x86_64-linux-gnu/10/include/avx512fintrin.h:16018:1: note: previous definition of '_mm512_reduce_add_epi64' was here
encrypt.c: 16018 | _mm512_reduce_add_epi64 (__m512i __A)
encrypt.c: | ^~~~~~~~~~~~~~~~~~~~~~~
encrypt.c: encrypt.c: In function '_mm512_reduce_add_epi64':
encrypt.c: encrypt.c:322:20: note: the ABI for passing parameters with 64-byte alignment has changed in GCC 4.6
encrypt.c: 322 | unsigned long long _mm512_reduce_add_epi64 (__m512i a) {
encrypt.c: | ^~~~~~~~~~~~~~~~~~~~~~~
encrypt.c: encrypt.c: In function 'prf_hash2_2':
encrypt.c: encrypt.c:483:19: warning: AVX512F vector return without AVX512F enabled changes the ABI [-Wpsabi]
encrypt.c: 483 | __m512i kv0 = _mm512_loadu_si512((const __m512i*)(nhkey+ 0)); // 1
encrypt.c: | ^~~

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:dolbeau/amd64-avx512
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:dolbeau/amd64-avx512
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:dolbeau/amd64-avx512
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:dolbeau/amd64-avx512

Compiler output

Implementation: T:dolbeau/amd64-sse
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
encrypt.c: In file included from encrypt.c:190:
encrypt.c: ./c176.h:99:7: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'chacha_noxor176' that is compiled without support for 'ssse3'
encrypt.c: VEC4_QUARTERROUND( 0, 4, 8,12);
encrypt.c: ^
encrypt.c: ./c176.h:17:36: note: expanded from macro 'VEC4_QUARTERROUND'
encrypt.c: #define VEC4_QUARTERROUND(a,b,c,d) VEC4_QUARTERROUND_SHUFFLE(a,b,c,d)
encrypt.c: ^
encrypt.c: ./c176.h:12:86: note: expanded from macro 'VEC4_QUARTERROUND_SHUFFLE'
encrypt.c: x_##a = _mm_add_epi32(x_##a, x_##b); t_##a = _mm_xor_si128(x_##d, x_##a); x_##d = _mm_shuffle_epi8(t_##a, rot16); \
encrypt.c: ^
encrypt.c: ./c176.h:99:7: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'chacha_noxor176' that is compiled without support for 'ssse3'
encrypt.c: ./c176.h:17:36: note: expanded from macro 'VEC4_QUARTERROUND'
encrypt.c: #define VEC4_QUARTERROUND(a,b,c,d) VEC4_QUARTERROUND_SHUFFLE(a,b,c,d)
encrypt.c: ^
encrypt.c: ./c176.h:14:86: note: expanded from macro 'VEC4_QUARTERROUND_SHUFFLE'
encrypt.c: x_##a = _mm_add_epi32(x_##a, x_##b); t_##a = _mm_xor_si128(x_##d, x_##a); x_##d = _mm_shuffle_epi8(t_##a, rot8); \
encrypt.c: ^
encrypt.c: ./c176.h:100:7: error: always_inline function '_mm_shuffle_epi8' requires target feature 'ssse3', but would be inlined into function 'chacha_noxor176' that is compiled without support for 'ssse3'
encrypt.c: VEC4_QUARTERROUND( 1, 5, 9,13);
encrypt.c: ^
encrypt.c: ./c176.h:17:36: note: expanded from macro 'VEC4_QUARTERROUND'
encrypt.c: #define VEC4_QUARTERROUND(a,b,c,d) VEC4_QUARTERROUND_SHUFFLE(a,b,c,d)
encrypt.c: ^
encrypt.c: ./c176.h:12:86: note: expanded from macro 'VEC4_QUARTERROUND_SHUFFLE'
encrypt.c: x_##a = _mm_add_epi32(x_##a, x_##b); t_##a = _mm_xor_si128(x_##d, x_##a); x_##d = _mm_shuffle_epi8(t_##a, rot16); \
encrypt.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:dolbeau/amd64-sse