Implementation notes: amd64, jasper, crypto_aead/norx6441v1

Computer: jasper
Microarchitecture: amd64; Tremont (906c0)
Architecture: amd64
CPU ID: GenuineIntel-000906c0-20-bfebfbff
SUPERCOP version: 20240625
Operation: crypto_aead
Primitive: norx6441v1
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
3253015319 0 035543 772 1080T:xmmgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
3256512718 0 034656 780 1080T:xmmgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
3260812718 0 033488 780 1080T:xmmgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
3273710865 0 029643 756 1048T:xmmgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
3333413010 0 032326 804 1016T:xmmclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
3342013413 0 033166 804 1016T:xmmclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
3344714147 0 037640 812 1016T:xmmclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
3354314147 0 035336 812 1016T:xmmclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
3751315015 0 037152 812 1016T:xmmclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
432643568 8 023142 812 1016T:refclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
443075142 8 028776 820 1016T:refclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
444075286 8 026616 820 1016T:refclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
452563380 8 022292 764 1048T:refgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
466205979 8 028028 788 1080T:refgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
468824739 8 025604 788 1080T:refgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
469145688 8 027936 820 1016T:refclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
517233944 8 024256 780 1080T:refgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625
540964559 8 024670 812 1016T:refclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062720240625

Compiler output


norx.c: norx.c:350:24: error: always_inline function '_mm256_loadu_si256' requires target feature 'avx', but would be inlined into function 'crypto_aead_norx6441v1_ymm_timingleaks_encrypt' that is compiled without support for 'avx'
norx.c:     const __m256i K  = LOADU(k + 0);
norx.c:                        ^
norx.c: norx.c:47:19: note: expanded from macro 'LOADU'
norx.c: #define LOADU(in) _mm256_loadu_si256((__m256i*)(in))
norx.c:                   ^
norx.c: norx.c:350:24: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
norx.c: norx.c:47:19: note: expanded from macro 'LOADU'
norx.c: #define LOADU(in) _mm256_loadu_si256((__m256i*)(in))
norx.c:                   ^
norx.c: norx.c:355:5: error: always_inline function '_mm256_castsi128_si256' requires target feature 'avx', but would be inlined into function 'crypto_aead_norx6441v1_ymm_timingleaks_encrypt' that is compiled without support for 'avx'
norx.c:     INITIALIZE(A, B, C, D, N, K);
norx.c:     ^
norx.c: norx.c:270:9: note: expanded from macro 'INITIALIZE'
norx.c:     A = _mm256_castsi128_si256(N);                                          \
norx.c:         ^
norx.c: norx.c:355:5: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
norx.c: norx.c:270:9: note: expanded from macro 'INITIALIZE'
norx.c:     A = _mm256_castsi128_si256(N);                                          \
norx.c:         ^
norx.c: norx.c:355:5: error: '__builtin_ia32_insert128i256' needs target feature avx2
norx.c: norx.c:271:9: note: expanded from macro 'INITIALIZE'
norx.c:     A = _mm256_inserti128_si256(A, _mm_set_epi64x(U1, U0), 1);              \
norx.c:         ^
norx.c: /usr/lib/llvm-11/lib/clang/11.0.1/include/avx2intrin.h:827:12: note: expanded from macro '_mm256_inserti128_si256'
norx.c: ...

Number of similar (implementation,compiler) pairs: 5, namely:
ImplementationCompiler
T:ymmclang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_11.0.1)
T:ymmclang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_11.0.1)
T:ymmclang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_11.0.1)
T:ymmclang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_11.0.1)
T:ymmclang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Debian_Clang_11.0.1)

Compiler output


norx.c: norx.c: In function 'crypto_aead_norx6441v1_ymm_timingleaks_encrypt':
norx.c: norx.c:350:19: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
norx.c:   350 |     const __m256i K  = LOADU(k + 0);
norx.c:       |                   ^
norx.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/10/include/immintrin.h:51,
norx.c:                  from /usr/lib/gcc/x86_64-linux-gnu/10/include/x86intrin.h:32,
norx.c:                  from norx.c:26:
norx.c: norx.c: In function 'block_copy':
norx.c: /usr/lib/gcc/x86_64-linux-gnu/10/include/avxintrin.h:926:1: error: inlining failed in call to 'always_inline' '_mm256_storeu_si256': target specific option mismatch
norx.c:   926 | _mm256_storeu_si256 (__m256i_u *__P, __m256i __A)
norx.c:       | ^~~~~~~~~~~~~~~~~~~
norx.c: norx.c:48:24: note: called from here
norx.c:    48 | #define STOREU(out, x) _mm256_storeu_si256((__m256i*)(out), (x))
norx.c:       |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
norx.c: norx.c:303:9: note: in expansion of macro 'STOREU'
norx.c:   303 |         STOREU(out + 32, LOADU(in + 32));
norx.c:       |         ^~~~~~
norx.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/10/include/immintrin.h:51,
norx.c:                  from /usr/lib/gcc/x86_64-linux-gnu/10/include/x86intrin.h:32,
norx.c:                  from norx.c:26:
norx.c: /usr/lib/gcc/x86_64-linux-gnu/10/include/avxintrin.h:920:1: error: inlining failed in call to 'always_inline' '_mm256_loadu_si256': target specific option mismatch
norx.c:   920 | _mm256_loadu_si256 (__m256i_u const *__P)
norx.c:       | ^~~~~~~~~~~~~~~~~~
norx.c: norx.c:48:24: note: called from here
norx.c:    48 | #define STOREU(out, x) _mm256_storeu_si256((__m256i*)(out), (x))
norx.c: ...

Number of similar (implementation,compiler) pairs: 4, namely:
ImplementationCompiler
T:ymmgcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (10.2.1_20210110)
T:ymmgcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (10.2.1_20210110)
T:ymmgcc -march=native -mtune=native -O -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (10.2.1_20210110)
T:ymmgcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (10.2.1_20210110)