Implementation notes: amd64, hertz, crypto_hash/blake512

Computer: hertz
Microarchitecture: amd64; Zen 4 (a60f12)
Architecture: amd64
CPU ID: AuthenticAMD-00a60f12-178bfbff
SUPERCOP version: 20240425
Operation: crypto_hash
Primitive: blake512
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
857014556 0 024800 780 936regsgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
858215383 0 033055 828 968bswapclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
859714556 0 024800 780 936bswapgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
860514583 0 026168 820 968bswapclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
861216854 0 028597 804 968regsgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
867715415 0 033199 828 968bswapclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
868815418 0 029237 804 1032bswapgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
870810982 0 028758 828 968sse41clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
871415195 0 026941 804 968bswapgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
872810181 0 021792 820 968sse41clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
873510966 0 028630 828 968sse41clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
883729513 0 043405 804 1032sphlibgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
885626059 0 043911 828 968sphlibclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
892628962 0 040797 804 968sphlibgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
897226098 0 037744 820 968sphlibclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
899416893 0 030709 804 1032regsgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
918811516 0 029254 828 968ssse3clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
920611484 0 029110 828 968ssse3clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
923510685 0 022272 820 968ssse3clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
934626059 0 043799 828 968sphlibclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
936711518 0 029190 828 968sse2sclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
936811550 0 029334 828 968sse2sclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
938610719 0 022352 820 968sse2sclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
949911234 0 028886 828 968sse2clang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
953411266 0 029030 828 968sse2clang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
957010435 0 022032 820 968sse2clang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
96334861 0 022614 828 968refclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
96504829 0 022470 828 968refclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
999113910 0 027709 804 1032ssse3gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1009913886 0 025677 804 968ssse3gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1015011521 0 025293 804 1032sse41gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1015711505 0 023261 804 968sse41gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1018211147 0 021408 780 936sse41gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1026813782 0 027549 804 1032sse2sgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1027013758 0 025517 804 968sse2sgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1054118600 0 036358 828 968regsclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
1070218568 0 036214 828 968regsclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
1070917769 0 029376 820 968regsclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
1073224536 0 034872 780 936sphlibgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1086614990 0 026749 804 968sse2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1094615014 0 028781 804 1032sse2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1122113399 0 023656 780 936sse2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1137712557 0 022808 780 936ssse3gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1139512656 0 022920 780 936sse2sgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
115127650 0 017976 780 936sphlib-smallgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
118737258 0 018928 820 968sphlib-smallclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
118779039 0 026839 828 968sphlib-smallclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
119479039 0 026951 828 968sphlib-smallclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
121433706 0 015312 820 968refclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
127403339 0 013576 780 936refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1400911074 0 022909 804 968sphlib-smallgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1404111609 0 025501 804 1032sphlib-smallgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
144636942 0 018717 804 968refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
144937029 0 020813 804 1032refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1614114958 0 032639 828 968sandyclang_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
1614814990 0 032783 828 968sandyclang_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
1618914158 0 025752 820 968sandyclang_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2024042920240425
1650215022 0 025264 780 936sandygcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1670615642 0 029461 804 1032sandygcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1672315419 0 027165 804 968sandygcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217

Compiler output

Implementation: sse41
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: In file included from hash.c:8:
hash.c: ./rounds.h:8:10: warning: '_mm_roti_epi64' macro redefined [-Wmacro-redefined]
hash.c: 8 | #define _mm_roti_epi64(x, c) \
hash.c: | ^
hash.c: /usr/lib/llvm-18/lib/clang/18/include/xopintrin.h:236:9: note: previous definition is here
hash.c: 236 | #define _mm_roti_epi64(A, N) \
hash.c: | ^
hash.c: 1 warning generated.

Number of similar (compiler,implementation) pairs: 3, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41

Compiler output

Implementation: xop
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: hash.c:81:8: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 81 | m0 = BSWAP64(m0);
hash.c: | ^
hash.c: ./rounds.h:13:21: note: expanded from macro 'BSWAP64'
hash.c: 13 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:82:8: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 82 | m1 = BSWAP64(m1);
hash.c: | ^
hash.c: ./rounds.h:13:21: note: expanded from macro 'BSWAP64'
hash.c: 13 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:83:8: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 83 | m2 = BSWAP64(m2);
hash.c: | ^
hash.c: ./rounds.h:13:21: note: expanded from macro 'BSWAP64'
hash.c: 13 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:84:8: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 84 | m3 = BSWAP64(m3);
hash.c: | ^
hash.c: ./rounds.h:13:21: note: expanded from macro 'BSWAP64'
hash.c: 13 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:85:8: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: ...

Number of similar (compiler,implementation) pairs: 3, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop

Compiler output

Implementation: xop
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/13/include/x86intrin.h:38,
hash.c: from hash.c:5:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/xopintrin.h: In function 'blake512_compress':
hash.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/xopintrin.h:272:1: error: inlining failed in call to 'always_inline' '_mm_roti_epi64': target specific option mismatch
hash.c: 272 | _mm_roti_epi64(__m128i __A, const int __B)
hash.c: | ^~~~~~~~~~~~~~
hash.c: In file included from hash.c:8:
hash.c: rounds.h:825:11: note: called from here
hash.c: 825 | row2h = _mm_roti_epi64(row2h, -11); \
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: rounds.h:867:3: note: in expansion of macro 'G2'
hash.c: 867 | G2(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h,b0,b1); \
hash.c: | ^~
hash.c: hash.c:132:3: note: in expansion of macro 'ROUND'
hash.c: 132 | ROUND(15);
hash.c: | ^~~~~
hash.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/xopintrin.h:272:1: error: inlining failed in call to 'always_inline' '_mm_roti_epi64': target specific option mismatch
hash.c: 272 | _mm_roti_epi64(__m128i __A, const int __B)
hash.c: | ^~~~~~~~~~~~~~
hash.c: rounds.h:824:11: note: called from here
hash.c: 824 | row2l = _mm_roti_epi64(row2l, -11); \
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: rounds.h:867:3: note: in expansion of macro 'G2'
hash.c: 867 | G2(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h,b0,b1); \
hash.c: | ^~
hash.c: ...

Number of similar (compiler,implementation) pairs: 3, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE xop
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE xop
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE xop

Compiler output

Implementation: xop-2
Security model: constbranchindex
Compiler: clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: hash.c:92:15: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 92 | m.u128[0] = BSWAP64(m.u128[0]);
hash.c: | ^
hash.c: ./rounds.h:15:21: note: expanded from macro 'BSWAP64'
hash.c: 15 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:93:15: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 93 | m.u128[1] = BSWAP64(m.u128[1]);
hash.c: | ^
hash.c: ./rounds.h:15:21: note: expanded from macro 'BSWAP64'
hash.c: 15 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:94:15: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 94 | m.u128[2] = BSWAP64(m.u128[2]);
hash.c: | ^
hash.c: ./rounds.h:15:21: note: expanded from macro 'BSWAP64'
hash.c: 15 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:95:15: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 95 | m.u128[3] = BSWAP64(m.u128[3]);
hash.c: | ^
hash.c: ./rounds.h:15:21: note: expanded from macro 'BSWAP64'
hash.c: 15 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:96:15: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: ...

Number of similar (compiler,implementation) pairs: 3, namely:
CompilerImplementations
clang -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop-2
clang -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop-2
clang -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop-2

Compiler output

Implementation: xop-2
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/13/include/x86intrin.h:38,
hash.c: from hash.c:5:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/xopintrin.h: In function 'blake512_compress':
hash.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/xopintrin.h:218:1: error: inlining failed in call to 'always_inline' '_mm_perm_epi8': target specific option mismatch
hash.c: 218 | _mm_perm_epi8(__m128i __A, __m128i __B, __m128i __C)
hash.c: | ^~~~~~~~~~~~~
hash.c: In file included from hash.c:8:
hash.c: rounds.h:15:28: note: called from here
hash.c: 15 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: hash.c:99:15: note: in expansion of macro 'BSWAP64'
hash.c: 99 | m.u128[7] = BSWAP64(m.u128[7]);
hash.c: | ^~~~~~~
hash.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/xopintrin.h:218:1: error: inlining failed in call to 'always_inline' '_mm_perm_epi8': target specific option mismatch
hash.c: 218 | _mm_perm_epi8(__m128i __A, __m128i __B, __m128i __C)
hash.c: | ^~~~~~~~~~~~~
hash.c: rounds.h:15:28: note: called from here
hash.c: 15 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: hash.c:98:15: note: in expansion of macro 'BSWAP64'
hash.c: 98 | m.u128[6] = BSWAP64(m.u128[6]);
hash.c: | ^~~~~~~
hash.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/xopintrin.h:218:1: error: inlining failed in call to 'always_inline' '_mm_perm_epi8': target specific option mismatch
hash.c: 218 | _mm_perm_epi8(__m128i __A, __m128i __B, __m128i __C)
hash.c: | ^~~~~~~~~~~~~
hash.c: ...

Number of similar (compiler,implementation) pairs: 3, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE xop-2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE xop-2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE xop-2