Implementation notes: amd64, hydra8, crypto_aead/norx6461v2

Computer: hydra8
Microarchitecture: amd64; Ivy Bridge+AES (306a9)
Architecture: amd64
CPU ID: GenuineIntel-000306a9-bfebfbff
SUPERCOP version: 20240107
Operation: crypto_aead
Primitive: norx6461v2
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
2983914903 8 037828 820 1088T:xmmgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
3018214987 8 036155 812 1088T:xmmgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
3084815161 8 034663 796 1056T:xmmgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
3092816045 8 037732 820 1088T:xmmgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
3381814746 8 038320 868 1024T:xmmclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
3385514730 8 036360 868 1024T:xmmclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
3460815199 8 034570 860 1024T:xmmclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
3742919906 8 042160 868 1024T:xmmclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
3867616070 8 035784 868 1024T:xmmclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
438205415 16 027160 876 1024T:refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
438405319 16 029008 876 1024T:refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
443353926 16 023498 868 1024T:refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
451285067 16 027376 876 1024T:refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212
492174549 16 026316 828 1088T:refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
507613936 16 025188 820 1088T:refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
516923699 16 023344 804 1056T:refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
519449296 16 032292 828 1088T:refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121320231212
523673879 16 023808 876 1024T:refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121320231212

Compiler output

Implementation: T:ymm
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
norx.c: norx.c:388:5: error: '__builtin_ia32_pblendd256' needs target feature avx2
norx.c: INITIALISE(A, B, C, D, nonce, key);
norx.c: ^
norx.c: norx.c:289:9: note: expanded from macro 'INITIALISE'
norx.c: A = _mm256_blend_epi32(_mm256_set_epi64x(U3, U2, 0, 0), \
norx.c: ^
norx.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:750:13: note: expanded from macro '_mm256_blend_epi32'
norx.c: ((__m256i)__builtin_ia32_pblendd256((__v8si)(__m256i)(V1), \
norx.c: ^
norx.c: norx.c:388:5: error: always_inline function '_mm256_xor_si256' requires target feature 'avx2', but would be inlined into function 'norx_aead_encrypt' that is compiled without support for 'avx2'
norx.c: norx.c:294:9: note: expanded from macro 'INITIALISE'
norx.c: D = XOR(D, _mm256_set_epi64x(NORX_T, NORX_P, NORX_L, NORX_W)); \
norx.c: ^
norx.c: norx.c:72:19: note: expanded from macro 'XOR'
norx.c: #define XOR(A, B) _mm256_xor_si256((A), (B))
norx.c: ^
norx.c: norx.c:388:5: error: always_inline function '_mm256_xor_si256' requires target feature 'avx2', but would be inlined into function 'norx_aead_encrypt' that is compiled without support for 'avx2'
norx.c: norx.c:295:5: note: expanded from macro 'INITIALISE'
norx.c: PERMUTE(A, B, C, D); \
norx.c: ^
norx.c: norx.c:209:9: note: expanded from macro 'PERMUTE'
norx.c: F(A, B, C, D); \
norx.c: ^
norx.c: norx.c:197:5: note: expanded from macro 'F'
norx.c: G(A, B, C, D); \
norx.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ymm
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ymm
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ymm
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ymm

Compiler output

Implementation: T:ymm
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
norx.c: norx.c:388:5: error: '__builtin_ia32_pblendd256' needs target feature avx2
norx.c: INITIALISE(A, B, C, D, nonce, key);
norx.c: ^
norx.c: norx.c:289:9: note: expanded from macro 'INITIALISE'
norx.c: A = _mm256_blend_epi32(_mm256_set_epi64x(U3, U2, 0, 0), \
norx.c: ^
norx.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/avx2intrin.h:750:13: note: expanded from macro '_mm256_blend_epi32'
norx.c: ((__m256i)__builtin_ia32_pblendd256((__v8si)(__m256i)(V1), \
norx.c: ^
norx.c: norx.c:388:5: error: always_inline function '_mm256_set_epi64x' requires target feature 'avx', but would be inlined into function 'norx_aead_encrypt' that is compiled without support for 'avx'
norx.c: norx.c:289:28: note: expanded from macro 'INITIALISE'
norx.c: A = _mm256_blend_epi32(_mm256_set_epi64x(U3, U2, 0, 0), \
norx.c: ^
norx.c: norx.c:388:5: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
norx.c: norx.c:289:28: note: expanded from macro 'INITIALISE'
norx.c: A = _mm256_blend_epi32(_mm256_set_epi64x(U3, U2, 0, 0), \
norx.c: ^
norx.c: norx.c:388:5: error: always_inline function '_mm256_castsi128_si256' requires target feature 'avx', but would be inlined into function 'norx_aead_encrypt' that is compiled without support for 'avx'
norx.c: norx.c:290:28: note: expanded from macro 'INITIALISE'
norx.c: _mm256_castsi128_si256(LOADU128(NONCE)), 0x0F); \
norx.c: ^
norx.c: norx.c:388:5: error: AVX vector return of type '__m256i' (vector of 4 'long long' values) without 'avx' enabled changes the ABI
norx.c: norx.c:290:28: note: expanded from macro 'INITIALISE'
norx.c: _mm256_castsi128_si256(LOADU128(NONCE)), 0x0F); \
norx.c: ^
norx.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ymm

Compiler output

Implementation: T:ymm
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
norx.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:47,
norx.c: from norx.c:25:
norx.c: norx.c: In function 'norx_aead_encrypt':
norx.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/avx2intrin.h:913:1: error: inlining failed in call to 'always_inline' '_mm256_xor_si256': target specific option mismatch
norx.c: 913 | _mm256_xor_si256 (__m256i __A, __m256i __B)
norx.c: | ^~~~~~~~~~~~~~~~
norx.c: norx.c:72:19: note: called from here
norx.c: 72 | #define XOR(A, B) _mm256_xor_si256((A), (B))
norx.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~
norx.c: norx.c:294:9: note: in expansion of macro 'XOR'
norx.c: 294 | D = XOR(D, _mm256_set_epi64x(NORX_T, NORX_P, NORX_L, NORX_W)); \
norx.c: | ^~~
norx.c: norx.c:388:5: note: in expansion of macro 'INITIALISE'
norx.c: 388 | INITIALISE(A, B, C, D, nonce, key);
norx.c: | ^~~~~~~~~~
norx.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:47,
norx.c: from norx.c:25:
norx.c: /usr/lib/gcc/x86_64-linux-gnu/11/include/avx2intrin.h:974:1: error: inlining failed in call to 'always_inline' '_mm256_blend_epi32': target specific option mismatch
norx.c: 974 | _mm256_blend_epi32 (__m256i __X, __m256i __Y, const int __M)
norx.c: | ^~~~~~~~~~~~~~~~~~
norx.c: norx.c:289:9: note: called from here
norx.c: 289 | A = _mm256_blend_epi32(_mm256_set_epi64x(U3, U2, 0, 0), \
norx.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
norx.c: 290 | _mm256_castsi128_si256(LOADU128(NONCE)), 0x0F); \
norx.c: | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
norx.c: ...

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ymm
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ymm
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ymm
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ymm