Implementation notes: amd64, kizomba, crypto_hash/simd512

Computer: kizomba
Microarchitecture: amd64; Kaby Lake (906e9)
Architecture: amd64
CPU ID: GenuineIntel-000906e9-1fc9cbf5
SUPERCOP version: 20240716
Operation: crypto_hash
Primitive: simd512
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
3726666823 416 080671 1224 800T:optgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
5056248564 0 059654 808 792T:sphlibclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
5266360243 0 073854 776 800T:sphlibgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
5343762602 0 076476 816 792T:sphlibclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
5379062170 0 075748 816 744T:sphlibclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
5582952167 0 064644 816 728T:sphlibclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
6368650551 0 060772 816 728T:sphlibclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
6387550798 0 062286 776 800T:sphlibgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
6401231003 0 044638 776 800T:sphlib-smallgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
6454250426 0 061470 776 800T:sphlibgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
6808827047 0 038078 808 792T:sphlib-smallclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
7026047868 0 061732 816 792T:sphlib-smallclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
7052447068 0 060636 816 744T:sphlib-smallclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
7088331884 0 044412 816 728T:sphlib-smallclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
7125436611 388 049799 1236 792T:optclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
7256133768 388 045311 1236 728T:optclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
7599630796 388 044647 1236 744T:optclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
7628819023 388 030017 1228 792T:optclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
8128446945 0 056873 752 768T:sphlibgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
8137528385 0 038652 816 728T:sphlib-smallclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
8413029518 0 041030 776 800T:sphlib-smallgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
8901628610 0 039670 776 800T:sphlib-smallgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
10208914841 416 026535 1224 800T:optgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
10229626728 0 036681 752 768T:sphlib-smallgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
10949417122 388 027551 1236 728T:optclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
11591315414 416 026647 1224 800T:optgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
11900013390 416 023498 1200 768T:optgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
63947895898 416 017527 1224 800T:refgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
642904012296 416 026103 1224 800T:refgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
66320325548 416 016711 1224 800T:refgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
678681328072 388 042487 1236 792T:refclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
678954917624 388 031751 1236 744T:refclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
68426235250 388 016369 1228 792T:refclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
68672148800 388 019191 1236 728T:refclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
689816316183 388 028903 1236 728T:refclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716
150857604899 416 014954 1200 768T:refgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024071720240716

Compiler output


optimized.c: optimized.c:437:9: warning: unused variable 'j' [-Wunused-variable]
optimized.c:   int i,j;
optimized.c:         ^
optimized.c: 1 warning generated.

Number of similar (implementation,compiler) pairs: 5, namely:
ImplementationCompiler
T:optclang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:optclang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:optclang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:optclang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:optclang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)

Compiler output


optimized.c: optimized.c: In function 'SIMD_Compress':
optimized.c: optimized.c:437:9: warning: unused variable 'j' [-Wunused-variable]
optimized.c:   437 |   int i,j;
optimized.c:       |         ^

Number of similar (implementation,compiler) pairs: 4, namely:
ImplementationCompiler
T:optgcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:optgcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:optgcc -march=native -mtune=native -O -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:optgcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)

Compiler output


reference.c: reference.c:69:82: warning: expression result unused [-Wunused-value]
reference.c:     state->A[j] = state->D[j] + w[j] + F(state->A[j], state->B[j], state->C[j]), s;
reference.c:                                                                                  ^
reference.c: 1 warning generated.

Number of similar (implementation,compiler) pairs: 5, namely:
ImplementationCompiler
T:refclang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)

Compiler output


vector.c: vector.c:73:9: warning: 'X' macro redefined [-Wmacro-redefined]
vector.c: #define X(i) X##i
vector.c:         ^
vector.c: vector.c:68:9: note: previous definition is here
vector.c: #define X(i) A[i]
vector.c:         ^
vector.c: vector.c:129:3: error: use of unknown builtin '__builtin_ia32_pcmpgtw128' [-Wimplicit-function-declaration]
vector.c:   DO_REDUCE_FULL_S(0);
vector.c:   ^
vector.c: vector.c:56:12: note: expanded from macro 'DO_REDUCE_FULL_S'
vector.c:     X(i) = EXTRA_REDUCE_S(X(i));                \
vector.c:            ^
vector.c: vector.c:42:32: note: expanded from macro 'EXTRA_REDUCE_S'
vector.c:   v16_sub(x, v16_and(V257.v16, v16_cmp(x, V128.v16)))
vector.c:                                ^
vector.c: ./vector.h:92:22: note: expanded from macro 'v16_cmp'
vector.c: #define v16_cmp      __builtin_ia32_pcmpgtw128
vector.c:                      ^
vector.c: vector.c:129:3: error: cannot convert between scalar type 'int' and vector type 'v16' (aka 'v8hi') as implicit conversion would cause truncation
vector.c: vector.c:56:12: note: expanded from macro 'DO_REDUCE_FULL_S'
vector.c:     X(i) = EXTRA_REDUCE_S(X(i));                \
vector.c:            ^
vector.c: vector.c:42:14: note: expanded from macro 'EXTRA_REDUCE_S'
vector.c:   v16_sub(x, v16_and(V257.v16, v16_cmp(x, V128.v16)))
vector.c:              ^
vector.c: ...

Number of similar (implementation,compiler) pairs: 5, namely:
ImplementationCompiler
T:vect128clang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:vect128clang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:vect128clang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:vect128clang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:vect128clang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)

Compiler output


vector.c: vector.c: In function 'fft64':
vector.c: vector.c:73: warning: "X" redefined
vector.c:    73 | #define X(i) X##i
vector.c:       |
vector.c: vector.c:68: note: this is the location of the previous definition
vector.c:    68 | #define X(i) A[i]
vector.c:       |
vector.c: vector.c: In function 'fft128_msg_final':
vector.c: vector.c:326:7: warning: unused variable 'i' [-Wunused-variable]
vector.c:   326 |   int i;
vector.c:       |       ^
vector.c: vector.c: In function 'rounds512':
vector.c: vector.c:796: warning: "STEP_1" redefined
vector.c:   796 | #define STEP_1(a,b,c,d,w,fun,r,s,z)             \
vector.c:       |
vector.c: vector.c:542: note: this is the location of the previous definition
vector.c:   542 | #define STEP_1(a,b,c,d,w,fun,r,s,z)                             \
vector.c:       |
vector.c: vector.c:805: warning: "STEP_2" redefined
vector.c:   805 | #define STEP_2(a,b,c,d,w,fun,r,s)              \
vector.c:       |
vector.c: vector.c:566: note: this is the location of the previous definition
vector.c:   566 | #define STEP_2(a,b,c,d,w,fun,r,s)                               \
vector.c:       |
vector.c: vector.c:808: warning: "STEP" redefined
vector.c: ...

Number of similar (implementation,compiler) pairs: 4, namely:
ImplementationCompiler
T:vect128gcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:vect128gcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:vect128gcc -march=native -mtune=native -O -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:vect128gcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)

Namespace violations


nist.o Final T
nist.o Hash T
nist.o IV_224 D
nist.o IV_256 D
nist.o IV_384 D
nist.o IV_512 D
nist.o IncreaseCounter T
nist.o Init T
nist.o InitIV T
nist.o Update T
optimized.o FFT_128_full T
optimized.o FFT_128_halfzero T
optimized.o FFT_256_halfzero T
optimized.o FFT_64 T
optimized.o RequiredAlignment T
optimized.o Round4 T
optimized.o Round8 T
optimized.o SIMD_Compress T
optimized.o SupportedLength T
optimized.o VERSION T
optimized.o fft128_natural T
optimized.o fft256_natural T
optimized.o p8_xor R
optimized.o revbin T

Number of similar (implementation,compiler) pairs: 9, namely:
ImplementationCompiler
T:optclang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:optclang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:optclang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:optclang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:optclang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:optgcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:optgcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:optgcc -march=native -mtune=native -O -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:optgcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)

Namespace violations


nist.o Final T
nist.o Hash T
nist.o IV_224 D
nist.o IV_256 D
nist.o IV_384 D
nist.o IV_512 D
nist.o IncreaseCounter T
nist.o Init T
nist.o InitIV T
nist.o Update T
reference.o IF T
reference.o MAJ T
reference.o P R
reference.o RequiredAlignment T
reference.o Round T
reference.o SIMD_Compress T
reference.o Step T
reference.o SupportedLength T
reference.o VERSION T
reference.o message_expansion T
reference.o p4 R
reference.o p8 R

Number of similar (implementation,compiler) pairs: 9, namely:
ImplementationCompiler
T:refclang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refgcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:refgcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:refgcc -march=native -mtune=native -O -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:refgcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)

Namespace violations


simd.o sph_simd224 T
simd.o sph_simd224_addbits_and_close T
simd.o sph_simd224_close T
simd.o sph_simd224_init T
simd.o sph_simd256 T
simd.o sph_simd256_addbits_and_close T
simd.o sph_simd256_close T
simd.o sph_simd256_init T
simd.o sph_simd384 T
simd.o sph_simd384_addbits_and_close T
simd.o sph_simd384_close T
simd.o sph_simd384_init T
simd.o sph_simd512 T
simd.o sph_simd512_addbits_and_close T
simd.o sph_simd512_close T
simd.o sph_simd512_init T

Number of similar (implementation,compiler) pairs: 18, namely:
ImplementationCompiler
T:sphlibclang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:sphlibclang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:sphlibclang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:sphlibclang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:sphlibclang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:sphlibgcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:sphlibgcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:sphlibgcc -march=native -mtune=native -O -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:sphlibgcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:sphlib-smallclang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:sphlib-smallclang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:sphlib-smallclang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:sphlib-smallclang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:sphlib-smallclang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:sphlib-smallgcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:sphlib-smallgcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:sphlib-smallgcc -march=native -mtune=native -O -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:sphlib-smallgcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)