Implementation notes: aarch64, pi4b, crypto_sign/picnic2l1fs

Computer: pi4b
Microarchitecture: aarch64; Cortex-A72 (410fd083)
Architecture: aarch64
CPU ID: 410fd083
SUPERCOP version: 20240107
Operation: crypto_sign
Primitive: picnic2l1fs
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
129446232092467 1504 10132223 2408 1600T:optimizedct/neongcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023123020231222
129965261486875 1504 10125695 2408 1584T:optimizedct/neongcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023123020231222
130524799380857 1504 10118719 2392 1568T:optimizedct/neongcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023123020231222
131429446289075 1504 10129590 2440 1584T:optimizedct/cclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023123020231222
131641731786795 1504 10125655 2408 1584T:optimizedct/neongcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023123020231222
135185293383819 1504 10123303 2408 1600T:optimizedct/cgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023123020231222
135789221979543 1504 10118095 2408 1584T:optimizedct/cgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023123020231222
137312621774177 1504 10111767 2392 1568T:optimizedct/cgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023123020231222
143054424179491 1504 10118047 2408 1584T:optimizedct/cgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023123020231222
41782473051054790 4 01092425 896 1584T:refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023123020231222
41868243711047074 4 01083809 896 1568T:refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023123020231222
44260889841067017 4 01105465 920 1568T:refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023123020231222

Compiler output

Implementation: T:optimizedct/neon
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
picnic2_simulate_mul.c: picnic2_simulate_mul.c:295:57: error: argument to '__builtin_neon_vshrq_n_v' must be a constant integer
picnic2_simulate_mul.c: out128[i1 / 2] = mm128_xor(mm128_and(t1, mask), mm128_sr_u64(mm128_and(t2, mask), width));
picnic2_simulate_mul.c: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
picnic2_simulate_mul.c: ./simd.h:109:28: note: expanded from macro 'mm128_sr_u64'
picnic2_simulate_mul.c: #define mm128_sr_u64(x, s) vshrq_n_u64((x), (s))
picnic2_simulate_mul.c: ^
picnic2_simulate_mul.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/arm_neon.h:25260:24: note: expanded from macro 'vshrq_n_u64'
picnic2_simulate_mul.c: __ret = (uint64x2_t) __builtin_neon_vshrq_n_v((int8x16_t)__s0, __p1, 51); \
picnic2_simulate_mul.c: ^
picnic2_simulate_mul.c: ./simd.h:103:41: note: expanded from macro 'mm128_xor'
picnic2_simulate_mul.c: #define mm128_xor(l, r) veorq_u64((l), (r))
picnic2_simulate_mul.c: ^
picnic2_simulate_mul.c: picnic2_simulate_mul.c:296:58: error: argument to '__builtin_neon_vshlq_n_v' must be a constant integer
picnic2_simulate_mul.c: out128[i2 / 2] = mm128_xor(mm128_nand(mask, t2), mm128_sl_u64(mm128_nand(mask, t1), width));
picnic2_simulate_mul.c: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
picnic2_simulate_mul.c: ./simd.h:108:28: note: expanded from macro 'mm128_sl_u64'
picnic2_simulate_mul.c: #define mm128_sl_u64(x, s) vshlq_n_u64((x), (s))
picnic2_simulate_mul.c: ^
picnic2_simulate_mul.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/arm_neon.h:24852:24: note: expanded from macro 'vshlq_n_u64'
picnic2_simulate_mul.c: __ret = (uint64x2_t) __builtin_neon_vshlq_n_v((int8x16_t)__s0, __p1, 51); \
picnic2_simulate_mul.c: ^
picnic2_simulate_mul.c: ./simd.h:103:41: note: expanded from macro 'mm128_xor'
picnic2_simulate_mul.c: #define mm128_xor(l, r) veorq_u64((l), (r))
picnic2_simulate_mul.c: ^
picnic2_simulate_mul.c: 2 errors generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:optimizedct/neon

Compiler output

Implementation: T:ref
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE
picnic_impl.c: picnic_impl.c: In function 'mpc_LowMC_verify':
picnic_impl.c: picnic_impl.c:469:5: warning: 'mpc_matrix_mul' accessing 24 bytes in a region of size 16 [-Wstringop-overflow=]
picnic_impl.c: 469 | mpc_matrix_mul(roundKey, keyShares, KMatrix(0, params), params, 2);
picnic_impl.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
picnic_impl.c: picnic_impl.c:469:5: note: referencing argument 1 of type 'uint32_t **' {aka 'unsigned int **'}
picnic_impl.c: picnic_impl.c:469:5: warning: 'mpc_matrix_mul' accessing 24 bytes in a region of size 16 [-Wstringop-overflow=]
picnic_impl.c: picnic_impl.c:469:5: note: referencing argument 2 of type 'uint32_t **' {aka 'unsigned int **'}
picnic_impl.c: picnic_impl.c:437:6: note: in a call to function 'mpc_matrix_mul'
picnic_impl.c: 437 | void mpc_matrix_mul(uint32_t* output[3], uint32_t* state[3], const uint32_t* matrix,
picnic_impl.c: | ^~~~~~~~~~~~~~
picnic_impl.c: picnic_impl.c:470:5: warning: 'mpc_xor' accessing 24 bytes in a region of size 16 [-Wstringop-overflow=]
picnic_impl.c: 470 | mpc_xor(state, roundKey, params->stateSizeWords, 2);
picnic_impl.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
picnic_impl.c: picnic_impl.c:470:5: note: referencing argument 1 of type 'uint32_t **' {aka 'unsigned int **'}
picnic_impl.c: picnic_impl.c:470:5: warning: 'mpc_xor' accessing 24 bytes in a region of size 16 [-Wstringop-overflow=]
picnic_impl.c: picnic_impl.c:470:5: note: referencing argument 2 of type 'uint32_t **' {aka 'unsigned int **'}
picnic_impl.c: picnic_impl.c:190:6: note: in a call to function 'mpc_xor'
picnic_impl.c: 190 | void mpc_xor(uint32_t* state[3], uint32_t* in[3], uint32_t len, int players)
picnic_impl.c: | ^~~~~~~
picnic_impl.c: picnic_impl.c:473:9: warning: 'mpc_matrix_mul' accessing 24 bytes in a region of size 16 [-Wstringop-overflow=]
picnic_impl.c: 473 | mpc_matrix_mul(roundKey, keyShares, KMatrix(r, params), params, 2);
picnic_impl.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
picnic_impl.c: picnic_impl.c:473:9: note: referencing argument 1 of type 'uint32_t **' {aka 'unsigned int **'}
picnic_impl.c: picnic_impl.c:473:9: warning: 'mpc_matrix_mul' accessing 24 bytes in a region of size 16 [-Wstringop-overflow=]
picnic_impl.c: picnic_impl.c:473:9: note: referencing argument 2 of type 'uint32_t **' {aka 'unsigned int **'}
picnic_impl.c: ...

Number of similar (compiler,implementation) pairs: 2, namely:
CompilerImplementations
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref