Implementation notes: aarch64, gcc185, crypto_kem/ntskem13136

Computer: gcc185
Microarchitecture: aarch64; Skylark (503f0002)
Architecture: aarch64
CPU ID: 503f0002
SUPERCOP version: 20240107
Operation: crypto_kem
Primitive: ntskem13136
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
397372580565 84 1698412 928 1568T:optclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010320231212
3999075131957 84 16151420 928 1584T:optclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010320231212
4257375131957 84 16151420 928 1584T:optclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010320231212
441667563377 84 1679350 920 1568T:optclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010320231212
463027576965 84 1693364 928 1568T:optclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010320231212
695040079784 84 1697928 936 1600T:optgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010320231212
699187564356 84 1681352 936 1584T:optgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010320231212
1088212558813 84 1674615 920 1568T:optgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010320231212
1204792564188 84 1680928 936 1584T:optgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010320231212
3690862533620 76 1651712 936 1600T:refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010320231212
3763642526249 76 1642636 920 1568T:refclang_-march=native_-O_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010320231212
3887062524220 76 1641176 928 1584T:refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010320231212
4200735023608 76 1640352 928 1584T:refgcc_-march=native_-mtune=native_-O_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010320231212
4380135020481 76 1636247 912 1568T:refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2024010320231212
5302665032321 76 1651876 920 1584T:refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010320231212
5304832532321 76 1651876 920 1584T:refclang_-mcpu=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010320231212
5487712529953 76 1647884 920 1568T:refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010320231212
5772405021085 76 1637118 912 1568T:refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024010320231212

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
bitslice_bma_128.c: In file included from bitslice_bma_128.c:18:
bitslice_bma_128.c: In file included from ./bitslice_bma_128.h:18:
bitslice_bma_128.c: /usr/bin/../lib/clang/17/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
bitslice_bma_128.c: 14 | #error "This header is only meant to be used on x86 and x64 architecture"
bitslice_bma_128.c: | ^
bitslice_bma_128.c: In file included from bitslice_bma_128.c:18:
bitslice_bma_128.c: In file included from ./bitslice_bma_128.h:18:
bitslice_bma_128.c: In file included from /usr/bin/../lib/clang/17/include/immintrin.h:17:
bitslice_bma_128.c: In file included from /usr/bin/../lib/clang/17/include/x86gprintrin.h:15:
bitslice_bma_128.c: /usr/bin/../lib/clang/17/include/hresetintrin.h:42:27: error: invalid input constraint 'a' in asm
bitslice_bma_128.c: 42 | __asm__ ("hreset $0" :: "a"(__eax));
bitslice_bma_128.c: | ^
bitslice_bma_128.c: In file included from bitslice_bma_128.c:18:
bitslice_bma_128.c: In file included from ./bitslice_bma_128.h:18:
bitslice_bma_128.c: In file included from /usr/bin/../lib/clang/17/include/immintrin.h:21:
bitslice_bma_128.c: /usr/bin/../lib/clang/17/include/mmintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
bitslice_bma_128.c: 14 | #error "This header is only meant to be used on x86 and x64 architecture"
bitslice_bma_128.c: | ^
bitslice_bma_128.c: /usr/bin/../lib/clang/17/include/mmintrin.h:54:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
bitslice_bma_128.c: 54 | return (__m64)__builtin_ia32_vec_init_v2si(__i, 0);
bitslice_bma_128.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bitslice_bma_128.c: /usr/bin/../lib/clang/17/include/mmintrin.h:133:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
bitslice_bma_128.c: 133 | return (__m64)__builtin_ia32_packsswb((__v4hi)__m1, (__v4hi)__m2);
bitslice_bma_128.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bitslice_bma_128.c: /usr/bin/../lib/clang/17/include/mmintrin.h:163:12: error: invalid conversion between vector type '__m64' (vector of 1 'long long' value) and integer type 'int' of different size
bitslice_bma_128.c: ...

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:avx2

Compiler output

Implementation: T:avx2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
bitslice_bma_128.c: In file included from bitslice_bma_128.c:18:
bitslice_bma_128.c: bitslice_bma_128.h:18:10: fatal error: immintrin.h: No such file or directory
bitslice_bma_128.c: #include <immintrin.h>
bitslice_bma_128.c: ^~~~~~~~~~~~~
bitslice_bma_128.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:avx2

Compiler output

Implementation: T:opt
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
nts_kem.c: nts_kem.c:106:27: warning: passing arguments to 'ff_create' without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
nts_kem.c: 106 | priv->ff2m = ff_create(priv->m);
nts_kem.c: | ^
nts_kem.c: nts_kem.c:228:27: warning: passing arguments to 'ff_create' without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
nts_kem.c: 228 | priv->ff2m = ff_create(priv->m);
nts_kem.c: | ^
nts_kem.c: 2 warnings generated.

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:opt
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:opt
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:opt
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:opt
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:opt

Compiler output

Implementation: T:ref
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
nts_kem.c: nts_kem.c:106:27: warning: passing arguments to 'ff_create' without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
nts_kem.c: 106 | priv->ff2m = ff_create(priv->m);
nts_kem.c: | ^
nts_kem.c: nts_kem.c:242:27: warning: passing arguments to 'ff_create' without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
nts_kem.c: 242 | priv->ff2m = ff_create(priv->m);
nts_kem.c: | ^
nts_kem.c: 2 warnings generated.

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ref
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ref
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ref
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ref
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:ref

Compiler output

Implementation: T:sse2
Security model: timingleaks
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
bitslice_bma_128.c: In file included from bitslice_bma_128.c:17:
bitslice_bma_128.c: ./bits.h:47:9: error: unknown type name '__m128i'
bitslice_bma_128.c: 47 | typedef __m128i vector;
bitslice_bma_128.c: | ^
bitslice_bma_128.c: ./bits.h:98:11: error: unknown type name '__m128i'
bitslice_bma_128.c: 98 | const __m128i a_hi = _mm_unpackhi_epi64(a, a);
bitslice_bma_128.c: | ^
bitslice_bma_128.c: ./bits.h:98:26: error: call to undeclared function '_mm_unpackhi_epi64'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
bitslice_bma_128.c: 98 | const __m128i a_hi = _mm_unpackhi_epi64(a, a);
bitslice_bma_128.c: | ^
bitslice_bma_128.c: ./bits.h:99:21: error: call to undeclared function '_mm_cvtsi128_si64'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
bitslice_bma_128.c: 99 | return popcount(_mm_cvtsi128_si64(a_hi)) + popcount(_mm_cvtsi128_si64(a));
bitslice_bma_128.c: | ^
bitslice_bma_128.c: In file included from bitslice_bma_128.c:18:
bitslice_bma_128.c: In file included from ./bitslice_bma_128.h:18:
bitslice_bma_128.c: /usr/bin/../lib/clang/17/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
bitslice_bma_128.c: 14 | #error "This header is only meant to be used on x86 and x64 architecture"
bitslice_bma_128.c: | ^
bitslice_bma_128.c: In file included from bitslice_bma_128.c:18:
bitslice_bma_128.c: In file included from ./bitslice_bma_128.h:18:
bitslice_bma_128.c: In file included from /usr/bin/../lib/clang/17/include/immintrin.h:17:
bitslice_bma_128.c: In file included from /usr/bin/../lib/clang/17/include/x86gprintrin.h:15:
bitslice_bma_128.c: /usr/bin/../lib/clang/17/include/hresetintrin.h:42:27: error: invalid input constraint 'a' in asm
bitslice_bma_128.c: 42 | __asm__ ("hreset $0" :: "a"(__eax));
bitslice_bma_128.c: | ^
bitslice_bma_128.c: ...

Number of similar (compiler,implementation) pairs: 5, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse2
clang -march=native -O -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse2
clang -mcpu=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE T:sse2

Compiler output

Implementation: T:sse2
Security model: timingleaks
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
bitslice_bma_128.c: In file included from bitslice_bma_128.c:17:
bitslice_bma_128.c: bits.h:47:9: error: unknown type name '__m128i'
bitslice_bma_128.c: typedef __m128i vector;
bitslice_bma_128.c: ^~~~~~~
bitslice_bma_128.c: bits.h: In function 'vector_popcount':
bitslice_bma_128.c: bits.h:98:11: error: unknown type name '__m128i'
bitslice_bma_128.c: const __m128i a_hi = _mm_unpackhi_epi64(a, a);
bitslice_bma_128.c: ^~~~~~~
bitslice_bma_128.c: bits.h:98:26: warning: implicit declaration of function '_mm_unpackhi_epi64' [-Wimplicit-function-declaration]
bitslice_bma_128.c: const __m128i a_hi = _mm_unpackhi_epi64(a, a);
bitslice_bma_128.c: ^~~~~~~~~~~~~~~~~~
bitslice_bma_128.c: bits.h:99:21: warning: implicit declaration of function '_mm_cvtsi128_si64' [-Wimplicit-function-declaration]
bitslice_bma_128.c: return popcount(_mm_cvtsi128_si64(a_hi)) + popcount(_mm_cvtsi128_si64(a));
bitslice_bma_128.c: ^~~~~~~~~~~~~~~~~
bitslice_bma_128.c: bits.h:93:42: note: in definition of macro 'popcount'
bitslice_bma_128.c: #define popcount(x) __builtin_popcountll(x)
bitslice_bma_128.c: ^
bitslice_bma_128.c: In file included from bitslice_bma_128.c:18:
bitslice_bma_128.c: bitslice_bma_128.h: At top level:
bitslice_bma_128.c: bitslice_bma_128.h:18:10: fatal error: immintrin.h: No such file or directory
bitslice_bma_128.c: #include <immintrin.h>
bitslice_bma_128.c: ^~~~~~~~~~~~~
bitslice_bma_128.c: compilation terminated.

Number of similar (compiler,implementation) pairs: 4, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse2
gcc -march=native -mtune=native -O -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE T:sse2