Implementation notes: amd64, speed2supercop, crypto_aead/aezv3

Computer: speed2supercop
Microarchitecture: amd64; Haswell+AES (306c3)
Architecture: amd64
CPU ID: GenuineIntel-000306c3-1fc9cbf5
SUPERCOP version: 20240625
Operation: crypto_aead
Primitive: aezv3
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
37408335 0 026496 728 896T:aesnigcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
37449787 0 029997 752 928T:aesnigcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
375216745 0 038941 752 928T:aesnigcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
377210942 0 034126 792 872T:aesniclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
377610910 0 033902 792 872T:aesniclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
382410629 0 029558 792 856T:aesniclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
38249638 0 029695 784 920T:aesniclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
386410222 0 029964 744 928T:aesnigcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
75018033214 0 055422 808 856T:refclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
78114832467 0 056134 808 872T:refclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
78424030796 0 054254 808 872T:refclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
96876824674 0 045085 768 928T:refgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
97680452136 0 074589 768 928T:refgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
115148023763 0 043174 808 856T:refclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
121071224948 0 044917 768 928T:refgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
122552022234 0 042583 800 920T:refclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625
141517621926 0 040232 744 896T:refgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024070320240625

Compiler output


aez_ni.c: aez_ni.c:146:22: error: '__builtin_ia32_vec_set_v16qi' needs target feature sse4.1
aez_ni.c:         __m128i i1 = _mm_insert_epi8(zero, 1, 7);
aez_ni.c:                      ^
aez_ni.c: /usr/lib/llvm-16/lib/clang/16/include/smmintrin.h:923:13: note: expanded from macro '_mm_insert_epi8'
aez_ni.c:   ((__m128i)__builtin_ia32_vec_set_v16qi((__v16qi)(__m128i)(X), (int)(I),      \
aez_ni.c:             ^
aez_ni.c: aez_ni.c:147:22: error: '__builtin_ia32_vec_set_v16qi' needs target feature sse4.1
aez_ni.c:         __m128i i2 = _mm_insert_epi8(zero, 2, 7);
aez_ni.c:                      ^
aez_ni.c: /usr/lib/llvm-16/lib/clang/16/include/smmintrin.h:923:13: note: expanded from macro '_mm_insert_epi8'
aez_ni.c:   ((__m128i)__builtin_ia32_vec_set_v16qi((__v16qi)(__m128i)(X), (int)(I),      \
aez_ni.c:             ^
aez_ni.c: aez_ni.c:148:22: error: '__builtin_ia32_vec_set_v16qi' needs target feature sse4.1
aez_ni.c:         __m128i i3 = _mm_insert_epi8(zero, 3, 7);
aez_ni.c:                      ^
aez_ni.c: /usr/lib/llvm-16/lib/clang/16/include/smmintrin.h:923:13: note: expanded from macro '_mm_insert_epi8'
aez_ni.c:   ((__m128i)__builtin_ia32_vec_set_v16qi((__v16qi)(__m128i)(X), (int)(I),      \
aez_ni.c:             ^
aez_ni.c: aez_ni.c:149:26: error: '__builtin_ia32_vec_set_v16qi' needs target feature sse4.1
aez_ni.c:         __m128i j, one = _mm_insert_epi8(zero, 1, 15);
aez_ni.c:                          ^
aez_ni.c: /usr/lib/llvm-16/lib/clang/16/include/smmintrin.h:923:13: note: expanded from macro '_mm_insert_epi8'
aez_ni.c:   ((__m128i)__builtin_ia32_vec_set_v16qi((__v16qi)(__m128i)(X), (int)(I),      \
aez_ni.c:             ^
aez_ni.c: 4 errors generated.

Number of similar (implementation,compiler) pairs: 1, namely:
ImplementationCompiler
T:aesniclang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_16.0.6_(27+b1))

Compiler output


aez_ni.c: In file included from aez_ni.c:37:
aez_ni.c: In function '_mm_loadu_si128',
aez_ni.c:     inlined from 'zero_pad' at aez_ni.c:67:12,
aez_ni.c:     inlined from 'cipher_aez_tiny' at aez_ni.c:498:18,
aez_ni.c:     inlined from 'aez_encrypt' at aez_ni.c:588:9,
aez_ni.c:     inlined from 'crypto_aead_aezv3_aesni_timingleaks_encrypt' at aez_ni.c:637:5:
aez_ni.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/emmintrin.h:706:10: warning: array subscript '__m128i_u[2]' is partly outside array bounds of 'const unsigned char[48]' [-Warray-bounds=]
aez_ni.c:   706 |   return *__P;
aez_ni.c:       |          ^~~~
aez_ni.c: aez_ni.c: In function 'crypto_aead_aezv3_aesni_timingleaks_encrypt':
aez_ni.c: aez_ni.c:59:28: note: at offset [33, 48] into object 'pad' of size 48
aez_ni.c:    59 | static const unsigned char pad[] = {0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,
aez_ni.c:       |                            ^~~
aez_ni.c: In function 'load_partial',
aez_ni.c:     inlined from 'load_partial' at aez_ni.c:119:16,
aez_ni.c:     inlined from 'cipher_aez_tiny' at aez_ni.c:498:18,
aez_ni.c:     inlined from 'aez_encrypt' at aez_ni.c:588:9,
aez_ni.c:     inlined from 'crypto_aead_aezv3_aesni_timingleaks_encrypt' at aez_ni.c:637:5:
aez_ni.c: aez_ni.c:123:46: warning: '__builtin_memcpy' forming offset [16, 4294967263] is out of the bounds [0, 16] of object 'tmp' with type '__m128i' [-Warray-bounds=]
aez_ni.c:   123 |         for (i=0; i<n; i++) ((char*)&tmp)[i] = ((char*)p)[i];
aez_ni.c:       |                             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
aez_ni.c: aez_ni.c: In function 'crypto_aead_aezv3_aesni_timingleaks_encrypt':
aez_ni.c: aez_ni.c:122:17: note: 'tmp' declared here
aez_ni.c:   122 |         __m128i tmp; unsigned i;
aez_ni.c:       |                 ^~~

Number of similar (implementation,compiler) pairs: 1, namely:
ImplementationCompiler
T:aesnigcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (13.3.0)