Implementation notes: amd64, samba, crypto_hash/simd256

Computer: samba
Microarchitecture: amd64; Skylake (506e3)
Architecture: amd64
CPU ID: GenuineIntel-000506e3-bfebfbff
SUPERCOP version: 20240625
Operation: crypto_hash
Primitive: simd256
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
4435848564 0 060061 836 960T:sphlibclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
4463962170 0 076155 844 928T:sphlibclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
4483962602 0 076883 844 960T:sphlibclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
4562052167 0 065115 844 896T:sphlibclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
4623066823 416 081078 1252 960T:optgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
4627860243 0 074261 804 960T:sphlibgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
5539450551 0 061179 844 896T:sphlibclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
5541950426 0 061957 804 960T:sphlibgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
5773150798 0 062677 804 960T:sphlibgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
5819647068 0 061043 844 928T:sphlib-smallclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
5931527047 0 038485 836 960T:sphlib-smallclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
5951331003 0 045045 804 960T:sphlib-smallgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
5967831884 0 044883 844 896T:sphlib-smallclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
6100347868 0 062139 844 960T:sphlib-smallclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
7212228385 0 039059 844 896T:sphlib-smallclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
7453729518 0 041421 804 960T:sphlib-smallgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
7485728610 0 040157 804 960T:sphlib-smallgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
8475546945 0 057360 780 928T:sphlibgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
8785533768 388 045782 1264 896T:optclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
8798636611 388 050206 1264 960T:optclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
8863726728 0 037168 780 928T:sphlib-smallgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
8868730796 388 045054 1264 928T:optclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
9924519023 388 030424 1256 960T:optclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
10114017122 388 027958 1264 896T:optclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
10696714841 416 026926 1252 960T:optgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
12440715414 416 027134 1252 960T:optgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
12516213390 416 023985 1228 928T:optgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
31154755898 416 017918 1252 960T:refgcc_-march=native_-mtune=native_-O2_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
312845412296 416 026510 1252 960T:refgcc_-march=native_-mtune=native_-O3_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
32535495548 416 017198 1252 960T:refgcc_-march=native_-mtune=native_-O_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
327496828072 388 042894 1264 960T:refclang_-march=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
328929017624 388 032158 1264 928T:refclang_-march=native_-O2_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
32936708800 388 019598 1264 896T:refclang_-march=native_-O_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
32955505250 388 016776 1256 960T:refclang_-march=native_-Os_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
329581116183 388 029374 1264 896T:refclang_-mcpu=native_-O3_-fwrapv_-Qunused-arguments_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625
72483914899 416 015441 1228 928T:refgcc_-march=native_-mtune=native_-Os_-fwrapv_-fPIC_-fPIE_-gdwarf-4_-Wall2024062620240625

Compiler output


optimized.c: optimized.c:437:9: warning: unused variable 'j' [-Wunused-variable]
optimized.c:   int i,j;
optimized.c:         ^
optimized.c: 1 warning generated.

Number of similar (implementation,compiler) pairs: 5, namely:
ImplementationCompiler
T:optclang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:optclang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:optclang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:optclang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:optclang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)

Compiler output


optimized.c: optimized.c: In function 'SIMD_Compress':
optimized.c: optimized.c:437:9: warning: unused variable 'j' [-Wunused-variable]
optimized.c:   437 |   int i,j;
optimized.c:       |         ^

Number of similar (implementation,compiler) pairs: 4, namely:
ImplementationCompiler
T:optgcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:optgcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:optgcc -march=native -mtune=native -O -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:optgcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)

Compiler output


reference.c: reference.c:69:82: warning: expression result unused [-Wunused-value]
reference.c:     state->A[j] = state->D[j] + w[j] + F(state->A[j], state->B[j], state->C[j]), s;
reference.c:                                                                                  ^
reference.c: 1 warning generated.

Number of similar (implementation,compiler) pairs: 5, namely:
ImplementationCompiler
T:refclang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:refclang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)

Compiler output


vector.c: vector.c:73:9: warning: 'X' macro redefined [-Wmacro-redefined]
vector.c: #define X(i) X##i
vector.c:         ^
vector.c: vector.c:68:9: note: previous definition is here
vector.c: #define X(i) A[i]
vector.c:         ^
vector.c: vector.c:129:3: error: use of unknown builtin '__builtin_ia32_pcmpgtw128' [-Wimplicit-function-declaration]
vector.c:   DO_REDUCE_FULL_S(0);
vector.c:   ^
vector.c: vector.c:56:12: note: expanded from macro 'DO_REDUCE_FULL_S'
vector.c:     X(i) = EXTRA_REDUCE_S(X(i));                \
vector.c:            ^
vector.c: vector.c:42:32: note: expanded from macro 'EXTRA_REDUCE_S'
vector.c:   v16_sub(x, v16_and(V257.v16, v16_cmp(x, V128.v16)))
vector.c:                                ^
vector.c: ./vector.h:92:22: note: expanded from macro 'v16_cmp'
vector.c: #define v16_cmp      __builtin_ia32_pcmpgtw128
vector.c:                      ^
vector.c: vector.c:129:3: error: cannot convert between scalar type 'int' and vector type 'v16' (aka 'v8hi') as implicit conversion would cause truncation
vector.c: vector.c:56:12: note: expanded from macro 'DO_REDUCE_FULL_S'
vector.c:     X(i) = EXTRA_REDUCE_S(X(i));                \
vector.c:            ^
vector.c: vector.c:42:14: note: expanded from macro 'EXTRA_REDUCE_S'
vector.c:   v16_sub(x, v16_and(V257.v16, v16_cmp(x, V128.v16)))
vector.c:              ^
vector.c: ...

Number of similar (implementation,compiler) pairs: 5, namely:
ImplementationCompiler
T:vect128clang -march=native -O2 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:vect128clang -march=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:vect128clang -march=native -O -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:vect128clang -march=native -Os -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)
T:vect128clang -mcpu=native -O3 -fwrapv -Qunused-arguments -fPIC -fPIE -gdwarf-4 -Wall (Ubuntu_Clang_14.0.0)

Compiler output


vector.c: vector.c: In function 'fft64':
vector.c: vector.c:73: warning: "X" redefined
vector.c:    73 | #define X(i) X##i
vector.c:       |
vector.c: vector.c:68: note: this is the location of the previous definition
vector.c:    68 | #define X(i) A[i]
vector.c:       |
vector.c: vector.c: In function 'fft128_msg_final':
vector.c: vector.c:326:7: warning: unused variable 'i' [-Wunused-variable]
vector.c:   326 |   int i;
vector.c:       |       ^
vector.c: vector.c: In function 'rounds512':
vector.c: vector.c:796: warning: "STEP_1" redefined
vector.c:   796 | #define STEP_1(a,b,c,d,w,fun,r,s,z)             \
vector.c:       |
vector.c: vector.c:542: note: this is the location of the previous definition
vector.c:   542 | #define STEP_1(a,b,c,d,w,fun,r,s,z)                             \
vector.c:       |
vector.c: vector.c:805: warning: "STEP_2" redefined
vector.c:   805 | #define STEP_2(a,b,c,d,w,fun,r,s)              \
vector.c:       |
vector.c: vector.c:566: note: this is the location of the previous definition
vector.c:   566 | #define STEP_2(a,b,c,d,w,fun,r,s)                               \
vector.c:       |
vector.c: vector.c:808: warning: "STEP" redefined
vector.c: ...

Number of similar (implementation,compiler) pairs: 4, namely:
ImplementationCompiler
T:vect128gcc -march=native -mtune=native -O2 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:vect128gcc -march=native -mtune=native -O3 -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:vect128gcc -march=native -mtune=native -O -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)
T:vect128gcc -march=native -mtune=native -Os -fwrapv -fPIC -fPIE -gdwarf-4 -Wall (11.4.0)