Implementation notes: amd64, comet, crypto_aead/grain128aeadv2

Computer: comet
Microarchitecture: amd64; Comet Lake (806ec)
Architecture: amd64
CPU ID: GenuineIntel-000806ec-bfebfbff
SUPERCOP version: 20240107
Operation: crypto_aead
Primitive: grain128aeadv2
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
15612310574 0 024486 884 1024T:sseclang++_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010620231222
15640810315 0 025052 876 1088T:sseclang++_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010620231222
15650220366 0 038350 884 1088T:sseclang++_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010620231222
15704720350 0 038038 884 1056T:sseclang++_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010620231222
16832413673 0 027924 780 1120T:sseg++_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010620231222
16848817239 0 033364 780 1120T:sseg++_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010620231222
16880012051 0 025859 772 1120T:sseg++_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010620231222
1710534203 0 016831 756 1088T:sseg++_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010620231222
31044510050 0 028134 884 1088T:x64clang++_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010620231222
31048210050 0 027838 884 1056T:x64clang++_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010620231222
3119507057 0 021300 780 1120T:x64g++_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010620231222
3138316613 0 020395 772 1120T:x64g++_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010620231222
3201775544 0 019494 884 1024T:x64clang++_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010620231222
3212375078 0 019836 876 1088T:x64clang++_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010620231222
3918032440 0 015031 756 1088T:x64g++_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010620231222

Checksum failure

Implementation: T:x64
Security model: timingleaks
Compiler: g++ -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE
d0dd6234d89dbd95039a8c1c4e9e8f6fa0d58228afa4fda263447f76c3102a36
Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
g++ -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:x64

Compiler output

Implementation: T:avx512
Security model: timingleaks
Compiler: clang++ -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:236:6: error: '__builtin_ia32_pternlogq128_mask' needs target feature avx512vl
grain128aead-v2_opt.cpp: T = _xorand3(_mm_srli_epi16(X, 1), X, _mm_set1_epi16(0x2222));
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:125:27: note: expanded from macro '_xorand3'
grain128aead-v2_opt.cpp: #define _xorand3(a, b, c) _mm_ternarylogic_epi64(a, b, c, 0x28)
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: /usr/lib/llvm-14/lib/clang/14.0.6/include/avx512vlintrin.h:6565:13: note: expanded from macro '_mm_ternarylogic_epi64'
grain128aead-v2_opt.cpp: ((__m128i)__builtin_ia32_pternlogq128_mask((__v2di)(__m128i)(A), \
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:237:6: error: '__builtin_ia32_pternlogq128_mask' needs target feature avx512vl
grain128aead-v2_opt.cpp: X = _xor3(_mm_slli_epi16(T, 1), X, T);
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:122:25: note: expanded from macro '_xor3'
grain128aead-v2_opt.cpp: #define _xor3(a, b, c) _mm_ternarylogic_epi64(a, b, c, 0x96)
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: /usr/lib/llvm-14/lib/clang/14.0.6/include/avx512vlintrin.h:6565:13: note: expanded from macro '_mm_ternarylogic_epi64'
grain128aead-v2_opt.cpp: ((__m128i)__builtin_ia32_pternlogq128_mask((__v2di)(__m128i)(A), \
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:238:6: error: '__builtin_ia32_pternlogq128_mask' needs target feature avx512vl
grain128aead-v2_opt.cpp: T = _xorand3(_mm_srli_epi16(X, 2), X, _mm_set1_epi16(0x0c0c));
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:125:27: note: expanded from macro '_xorand3'
grain128aead-v2_opt.cpp: #define _xorand3(a, b, c) _mm_ternarylogic_epi64(a, b, c, 0x28)
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: /usr/lib/llvm-14/lib/clang/14.0.6/include/avx512vlintrin.h:6565:13: note: expanded from macro '_mm_ternarylogic_epi64'
grain128aead-v2_opt.cpp: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
clang++ -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang++ -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang++ -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512
clang++ -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx512

Compiler output

Implementation: T:avx512
Security model: timingleaks
Compiler: g++ -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
grain128aead-v2_opt.cpp: In file included from /usr/lib/gcc/x86_64-linux-gnu/12/include/immintrin.h:57,
grain128aead-v2_opt.cpp: from /usr/lib/gcc/x86_64-linux-gnu/12/include/x86intrin.h:32,
grain128aead-v2_opt.cpp: from grain128aead-v2_opt.h:41,
grain128aead-v2_opt.cpp: from grain128aead-v2_opt.cpp:10:
grain128aead-v2_opt.cpp: /usr/lib/gcc/x86_64-linux-gnu/12/include/avx512vlintrin.h: In function 'u64 grain_keystream64(grain_ctx*)':
grain128aead-v2_opt.cpp: /usr/lib/gcc/x86_64-linux-gnu/12/include/avx512vlintrin.h:10657:1: error: inlining failed in call to 'always_inline' '__m128i _mm_ternarylogic_epi64(__m128i, __m128i, __m128i, int)': target specific option mismatch
grain128aead-v2_opt.cpp: 10657 | _mm_ternarylogic_epi64 (__m128i __A, __m128i __B, __m128i __C,
grain128aead-v2_opt.cpp: | ^~~~~~~~~~~~~~~~~~~~~~
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:207:39: note: called from here
grain128aead-v2_opt.cpp: 207 | u64 y = ys ^ _mm_cvtsi128_si64(_xor2(shr8(_xor3(_andxor3(s4, s7, s5), _and3(ts, b7, s6), b1), 7), shr8(_andxor3(shr8(b7, 2), s2, b1), 5)));
grain128aead-v2_opt.cpp: | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
grain128aead-v2_opt.cpp: /usr/lib/gcc/x86_64-linux-gnu/12/include/avx512vlintrin.h:10657:1: error: inlining failed in call to 'always_inline' '__m128i _mm_ternarylogic_epi64(__m128i, __m128i, __m128i, int)': target specific option mismatch
grain128aead-v2_opt.cpp: 10657 | _mm_ternarylogic_epi64 (__m128i __A, __m128i __B, __m128i __C,
grain128aead-v2_opt.cpp: | ^~~~~~~~~~~~~~~~~~~~~~
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:207:39: note: called from here
grain128aead-v2_opt.cpp: 207 | u64 y = ys ^ _mm_cvtsi128_si64(_xor2(shr8(_xor3(_andxor3(s4, s7, s5), _and3(ts, b7, s6), b1), 7), shr8(_andxor3(shr8(b7, 2), s2, b1), 5)));
grain128aead-v2_opt.cpp: | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
grain128aead-v2_opt.cpp: /usr/lib/gcc/x86_64-linux-gnu/12/include/avx512vlintrin.h:10657:1: error: inlining failed in call to 'always_inline' '__m128i _mm_ternarylogic_epi64(__m128i, __m128i, __m128i, int)': target specific option mismatch
grain128aead-v2_opt.cpp: 10657 | _mm_ternarylogic_epi64 (__m128i __A, __m128i __B, __m128i __C,
grain128aead-v2_opt.cpp: | ^~~~~~~~~~~~~~~~~~~~~~
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:207:39: note: called from here
grain128aead-v2_opt.cpp: 207 | u64 y = ys ^ _mm_cvtsi128_si64(_xor2(shr8(_xor3(_andxor3(s4, s7, s5), _and3(ts, b7, s6), b1), 7), shr8(_andxor3(shr8(b7, 2), s2, b1), 5)));
grain128aead-v2_opt.cpp: | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
grain128aead-v2_opt.cpp: /usr/lib/gcc/x86_64-linux-gnu/12/include/avx512vlintrin.h:10657:1: error: inlining failed in call to 'always_inline' '__m128i _mm_ternarylogic_epi64(__m128i, __m128i, __m128i, int)': target specific option mismatch
grain128aead-v2_opt.cpp: 10657 | _mm_ternarylogic_epi64 (__m128i __A, __m128i __B, __m128i __C,
grain128aead-v2_opt.cpp: ...

Number of similar (compiler,implementation) pairs: 8, namely:
CompilerImplementations
g++ -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
g++ -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
g++ -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
g++ -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx512
g++ -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:gf2
g++ -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:gf2
g++ -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:gf2
g++ -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:gf2

Compiler output

Implementation: T:gf2
Security model: timingleaks
Compiler: clang++ -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:234:6: error: '__builtin_ia32_vgf2p8affineqb_v16qi' needs target feature gfni
grain128aead-v2_opt.cpp: X = _mm_gf2p8affine_epi64_epi8(X, _mm_set1_epi64x(0x0104104002082080ULL), 0);
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: /usr/lib/llvm-14/lib/clang/14.0.6/include/gfniintrin.h:36:13: note: expanded from macro '_mm_gf2p8affine_epi64_epi8'
grain128aead-v2_opt.cpp: ((__m128i)__builtin_ia32_vgf2p8affineqb_v16qi((__v16qi)(__m128i)(A), \
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:241:6: error: '__builtin_ia32_pternlogq128_mask' needs target feature avx512vl
grain128aead-v2_opt.cpp: T = _xorand3(_mm_srli_epi16(X, 4), X, _mm_set1_epi16(0x00f0));
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:125:27: note: expanded from macro '_xorand3'
grain128aead-v2_opt.cpp: #define _xorand3(a, b, c) _mm_ternarylogic_epi64(a, b, c, 0x28)
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: /usr/lib/llvm-14/lib/clang/14.0.6/include/avx512vlintrin.h:6565:13: note: expanded from macro '_mm_ternarylogic_epi64'
grain128aead-v2_opt.cpp: ((__m128i)__builtin_ia32_pternlogq128_mask((__v2di)(__m128i)(A), \
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:242:6: error: '__builtin_ia32_pternlogq128_mask' needs target feature avx512vl
grain128aead-v2_opt.cpp: X = _xor3(_mm_slli_epi16(T, 4), X, T);
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:122:25: note: expanded from macro '_xor3'
grain128aead-v2_opt.cpp: #define _xor3(a, b, c) _mm_ternarylogic_epi64(a, b, c, 0x96)
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: /usr/lib/llvm-14/lib/clang/14.0.6/include/avx512vlintrin.h:6565:13: note: expanded from macro '_mm_ternarylogic_epi64'
grain128aead-v2_opt.cpp: ((__m128i)__builtin_ia32_pternlogq128_mask((__v2di)(__m128i)(A), \
grain128aead-v2_opt.cpp: ^
grain128aead-v2_opt.cpp: grain128aead-v2_opt.cpp:155:30: error: '__builtin_ia32_pternlogq128_mask' needs target feature avx512vl
grain128aead-v2_opt.cpp: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
clang++ -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:gf2
clang++ -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:gf2
clang++ -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:gf2
clang++ -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:gf2