Implementation notes: aarch64, pi4b, crypto_kem/firesaber2

Computer: pi4b
Microarchitecture: aarch64; Cortex-A72 (410fd083)
Architecture: aarch64
CPU ID: 410fd083
SUPERCOP version: 20240107
Operation: crypto_kem
Primitive: firesaber2
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
61491453053 0 070387 864 1552T:neon2clang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010220231222
63232342249 0 059307 832 1568T:neon2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010220231222
71697220193 0 036211 832 1552T:neon2gcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010220231222
72385719485 0 035603 832 1552T:neon2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010220231222
72627317981 0 032859 816 1536T:neon2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010220231222
74803352369 0 246469596 856 4016T:neonclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010220231222
78621153261 0 246470244 824 4032T:neongcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010220231222
86010018957 0 246433748 808 4000T:neongcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010220231222
86023123297 0 246439204 824 4016T:neongcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010220231222
87961022993 0 246438988 824 4016T:neongcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010220231222

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
SABER_indcpa.c: In file included from SABER_indcpa.c:5:
SABER_indcpa.c: In file included from ./SABER_indcpa.h:4:
SABER_indcpa.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
SABER_indcpa.c: #error "This header is only meant to be used on x86 and x64 architecture"
SABER_indcpa.c: ^
SABER_indcpa.c: In file included from SABER_indcpa.c:5:
SABER_indcpa.c: In file included from ./SABER_indcpa.h:4:
SABER_indcpa.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:17:
SABER_indcpa.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/x86gprintrin.h:15:
SABER_indcpa.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/hresetintrin.h:42:27: error: invalid input constraint 'a' in asm
SABER_indcpa.c: __asm__ ("hreset $0" :: "a"(__eax));
SABER_indcpa.c: ^
SABER_indcpa.c: In file included from SABER_indcpa.c:5:
SABER_indcpa.c: In file included from ./SABER_indcpa.h:4:
SABER_indcpa.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:21:
SABER_indcpa.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
SABER_indcpa.c: #error "This header is only meant to be used on x86 and x64 architecture"
SABER_indcpa.c: ^
SABER_indcpa.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:54:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
SABER_indcpa.c: return (__m64)__builtin_ia32_vec_init_v2si(__i, 0);
SABER_indcpa.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SABER_indcpa.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:133:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
SABER_indcpa.c: return (__m64)__builtin_ia32_packsswb((__v4hi)__m1, (__v4hi)__m2);
SABER_indcpa.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SABER_indcpa.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:163:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
SABER_indcpa.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
SABER_indcpa.c: In file included from SABER_indcpa.c:5:
SABER_indcpa.c: SABER_indcpa.h:4:10: fatal error: immintrin.h: No such file or directory
SABER_indcpa.c: 4 | #include <immintrin.h>
SABER_indcpa.c: | ^~~~~~~~~~~~~
SABER_indcpa.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2_nttmul
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
SABER_indcpa.c: In file included from SABER_indcpa.c:20:
SABER_indcpa.c: In file included from ./SABER_indcpa.h:4:
SABER_indcpa.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
SABER_indcpa.c: #error "This header is only meant to be used on x86 and x64 architecture"
SABER_indcpa.c: ^
SABER_indcpa.c: In file included from SABER_indcpa.c:20:
SABER_indcpa.c: In file included from ./SABER_indcpa.h:4:
SABER_indcpa.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:17:
SABER_indcpa.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/x86gprintrin.h:15:
SABER_indcpa.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/hresetintrin.h:42:27: error: invalid input constraint 'a' in asm
SABER_indcpa.c: __asm__ ("hreset $0" :: "a"(__eax));
SABER_indcpa.c: ^
SABER_indcpa.c: In file included from SABER_indcpa.c:20:
SABER_indcpa.c: In file included from ./SABER_indcpa.h:4:
SABER_indcpa.c: In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/immintrin.h:21:
SABER_indcpa.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
SABER_indcpa.c: #error "This header is only meant to be used on x86 and x64 architecture"
SABER_indcpa.c: ^
SABER_indcpa.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:54:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
SABER_indcpa.c: return (__m64)__builtin_ia32_vec_init_v2si(__i, 0);
SABER_indcpa.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SABER_indcpa.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:133:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
SABER_indcpa.c: return (__m64)__builtin_ia32_packsswb((__v4hi)__m1, (__v4hi)__m2);
SABER_indcpa.c: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SABER_indcpa.c: /usr/lib/llvm-14/lib/clang/14.0.0/include/mmintrin.h:163:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
SABER_indcpa.c: ...

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2_nttmul

Compiler output

Implementation: T:avx2_nttmul
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
SABER_indcpa.c: In file included from SABER_indcpa.c:20:
SABER_indcpa.c: SABER_indcpa.h:4:10: fatal error: immintrin.h: No such file or directory
SABER_indcpa.c: 4 | #include <immintrin.h>
SABER_indcpa.c: | ^~~~~~~~~~~~~
SABER_indcpa.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2_nttmul
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2_nttmul
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2_nttmul
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2_nttmul

Compiler output

Implementation: T:neon
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
poly.c: poly.c:19:3: warning: implicit declaration of function 'cshake128_simple' is invalid in C99 [-Wimplicit-function-declaration]
poly.c: cshake128_simple(buf,SABER_N,nonce,seed,SABER_NOISESEEDBYTES);
poly.c: ^
poly.c: poly.c:34:3: warning: implicit declaration of function 'cshake128_simple' is invalid in C99 [-Wimplicit-function-declaration]
poly.c: cshake128_simple(buf0,SABER_N,nonce0,seed,SABER_NOISESEEDBYTES);
poly.c: ^
poly.c: 2 warnings generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:neon

Compiler output

Implementation: T:neon
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
poly.c: poly.c: In function 'poly_getnoise':
poly.c: poly.c:19:3: warning: implicit declaration of function 'cshake128_simple' [-Wimplicit-function-declaration]
poly.c: 19 | cshake128_simple(buf,SABER_N,nonce,seed,SABER_NOISESEEDBYTES);
poly.c: | ^~~~~~~~~~~~~~~~

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon

Compiler output

Implementation: T:neon2
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
SABER_indcpa.c: In file included from SABER_indcpa.c:22:
SABER_indcpa.c: ./rq_mul/neon_poly_rq_mul.c:272:15: warning: initializing 'uint16_t *' (aka 'unsigned short *') with an expression of type 'const uint16_t *' (aka 'const unsigned short *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
SABER_indcpa.c: uint16_t *c0 = poly,
SABER_indcpa.c: ^ ~~~~
SABER_indcpa.c: ./rq_mul/neon_poly_rq_mul.c:273:15: warning: initializing 'uint16_t *' (aka 'unsigned short *') with an expression of type 'const uint16_t *' (aka 'const unsigned short *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
SABER_indcpa.c: *c1 = &poly[1 * SB1],
SABER_indcpa.c: ^ ~~~~~~~~~~~~~~
SABER_indcpa.c: ./rq_mul/neon_poly_rq_mul.c:274:15: warning: initializing 'uint16_t *' (aka 'unsigned short *') with an expression of type 'const uint16_t *' (aka 'const unsigned short *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
SABER_indcpa.c: *c2 = &poly[2 * SB1],
SABER_indcpa.c: ^ ~~~~~~~~~~~~~~
SABER_indcpa.c: ./rq_mul/neon_poly_rq_mul.c:275:15: warning: initializing 'uint16_t *' (aka 'unsigned short *') with an expression of type 'const uint16_t *' (aka 'const unsigned short *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
SABER_indcpa.c: *c3 = &poly[3 * SB1],
SABER_indcpa.c: ^ ~~~~~~~~~~~~~~
SABER_indcpa.c: 4 warnings generated.
poly.c: poly.c:19:3: warning: implicit declaration of function 'cshake128_simple' is invalid in C99 [-Wimplicit-function-declaration]
poly.c: cshake128_simple(buf,SABER_N,nonce,seed,SABER_NOISESEEDBYTES);
poly.c: ^
poly.c: poly.c:34:3: warning: implicit declaration of function 'cshake128_simple' is invalid in C99 [-Wimplicit-function-declaration]
poly.c: cshake128_simple(buf0,SABER_N,nonce0,seed,SABER_NOISESEEDBYTES);
poly.c: ^
poly.c: 2 warnings generated.

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:neon2

Compiler output

Implementation: T:neon2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
SABER_indcpa.c: In file included from SABER_indcpa.c:22:
SABER_indcpa.c: rq_mul/neon_poly_rq_mul.c: In function 'tc4_evaluate_neon_SB1':
SABER_indcpa.c: rq_mul/neon_poly_rq_mul.c:272:20: warning: initialization discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
SABER_indcpa.c: 272 | uint16_t *c0 = poly,
SABER_indcpa.c: | ^~~~
SABER_indcpa.c: rq_mul/neon_poly_rq_mul.c:273:20: warning: initialization discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
SABER_indcpa.c: 273 | *c1 = &poly[1 * SB1],
SABER_indcpa.c: | ^
SABER_indcpa.c: rq_mul/neon_poly_rq_mul.c:274:20: warning: initialization discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
SABER_indcpa.c: 274 | *c2 = &poly[2 * SB1],
SABER_indcpa.c: | ^
SABER_indcpa.c: rq_mul/neon_poly_rq_mul.c:275:20: warning: initialization discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
SABER_indcpa.c: 275 | *c3 = &poly[3 * SB1],
SABER_indcpa.c: | ^
poly.c: poly.c: In function 'poly_getnoise':
poly.c: poly.c:19:3: warning: implicit declaration of function 'cshake128_simple' [-Wimplicit-function-declaration]
poly.c: 19 | cshake128_simple(buf,SABER_N,nonce,seed,SABER_NOISESEEDBYTES);
poly.c: | ^~~~~~~~~~~~~~~~

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:neon2

Compiler output

Implementation: T:ref
Security model: timingleaks
Compiler: clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
try.c: /usr/bin/ld: libcrypto_kem_firesaber2.a(SABER_indcpa.o):(.bss+0x0): multiple definition of `clock1'; libcrypto_kem_firesaber2.a(kem.o):(.bss+0x0): first defined here
try.c: /usr/bin/ld: libcrypto_kem_firesaber2.a(SABER_indcpa.o):(.bss+0x8): multiple definition of `clock2'; libcrypto_kem_firesaber2.a(kem.o):(.bss+0x8): first defined here
try.c: /usr/bin/ld: libcrypto_kem_firesaber2.a(SABER_indcpa.o):(.bss+0x10): multiple definition of `clock_kp_mv'; libcrypto_kem_firesaber2.a(kem.o):(.bss+0x10): first defined here
try.c: /usr/bin/ld: libcrypto_kem_firesaber2.a(SABER_indcpa.o):(.bss+0x18): multiple definition of `clock_cl_mv'; libcrypto_kem_firesaber2.a(kem.o):(.bss+0x18): first defined here
try.c: /usr/bin/ld: libcrypto_kem_firesaber2.a(SABER_indcpa.o):(.bss+0x20): multiple definition of `clock_kp_sm'; libcrypto_kem_firesaber2.a(kem.o):(.bss+0x20): first defined here
try.c: /usr/bin/ld: libcrypto_kem_firesaber2.a(SABER_indcpa.o):(.bss+0x28): multiple definition of `clock_cl_sm'; libcrypto_kem_firesaber2.a(kem.o):(.bss+0x28): first defined here
try.c: clang: error: linker command failed with exit code 1 (use -v to see invocation)

Number of similar (compiler,implementation) pairs: 1, namely:
CompilerImplementations
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ref

Compiler output

Implementation: T:ref
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
try.c: /usr/bin/ld: libcrypto_kem_firesaber2.a(SABER_indcpa.o):(.bss+0x0): multiple definition of `clock_cl_sm'; libcrypto_kem_firesaber2.a(kem.o):(.bss+0x0): first defined here
try.c: /usr/bin/ld: libcrypto_kem_firesaber2.a(SABER_indcpa.o):(.bss+0x8): multiple definition of `clock_kp_sm'; libcrypto_kem_firesaber2.a(kem.o):(.bss+0x8): first defined here
try.c: /usr/bin/ld: libcrypto_kem_firesaber2.a(SABER_indcpa.o):(.bss+0x10): multiple definition of `clock_cl_mv'; libcrypto_kem_firesaber2.a(kem.o):(.bss+0x10): first defined here
try.c: /usr/bin/ld: libcrypto_kem_firesaber2.a(SABER_indcpa.o):(.bss+0x18): multiple definition of `clock_kp_mv'; libcrypto_kem_firesaber2.a(kem.o):(.bss+0x18): first defined here
try.c: /usr/bin/ld: libcrypto_kem_firesaber2.a(SABER_indcpa.o):(.bss+0x20): multiple definition of `clock2'; libcrypto_kem_firesaber2.a(kem.o):(.bss+0x20): first defined here
try.c: /usr/bin/ld: libcrypto_kem_firesaber2.a(SABER_indcpa.o):(.bss+0x28): multiple definition of `clock1'; libcrypto_kem_firesaber2.a(kem.o):(.bss+0x28): first defined here
try.c: collect2: error: ld returned 1 exit status

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:ref