Implementation notes: amd64, hertz, crypto_hash/blake512

Computer: hertz
Microarchitecture: amd64; Zen 4 (a60f12)
Architecture: amd64
CPU ID: AuthenticAMD-00a60f12-178bfbff
SUPERCOP version: 20240107
Operation: crypto_hash
Primitive: blake512
TimeObject sizeTest sizeImplementationCompilerBenchmark dateSUPERCOP version
857014556 0 024800 780 936regsgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
859714556 0 024800 780 936bswapgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
861216854 0 028597 804 968regsgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
868815418 0 029237 804 1032bswapgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
869911025 0 025498 844 968sse41clang-17_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
871011009 0 025354 844 968sse41clang-17_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
871415195 0 026941 804 968bswapgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
872910231 0 021868 836 968sse41clang-17_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
883729513 0 043405 804 1032sphlibgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
892628962 0 040797 804 968sphlibgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
899416893 0 030709 804 1032regsgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
912011832 0 023484 836 968bswapclang-17_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
922012632 0 026970 844 968bswapclang-17_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
923511518 0 025962 844 968ssse3clang-17_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
924611486 0 025802 844 968ssse3clang-17_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
925610687 0 022284 836 968ssse3clang-17_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
926312664 0 027130 844 968bswapclang-17_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
937911381 0 025754 844 968sse2sclang-17_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
941511413 0 025914 844 968sse2sclang-17_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
948410582 0 022236 836 968sse2sclang-17_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
999113910 0 027709 804 1032ssse3gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
100163214 0 014828 836 968refclang-17_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
1003825043 0 039642 844 968sphlibclang-17_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
1007124969 0 036668 836 968sphlibclang-17_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
1009913886 0 025677 804 968ssse3gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1015011521 0 025293 804 1032sse41gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1015711505 0 023261 804 968sse41gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1018211147 0 021408 780 936sse41gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1026813782 0 027549 804 1032sse2sgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1027013758 0 025517 804 968sse2sgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1048525043 0 039514 844 968sphlibclang-17_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
1073224536 0 034872 780 936sphlibgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1086614990 0 026749 804 968sse2gcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1094615014 0 028781 804 1032sse2gcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1122113399 0 023656 780 936sse2gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1137712557 0 022808 780 936ssse3gcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1139512656 0 022920 780 936sse2sgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
115127650 0 017976 780 936sphlib-smallgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1185216708 0 031162 844 968regsclang-17_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
1188816676 0 031002 844 968regsclang-17_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
1192015815 0 027436 836 968regsclang-17_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
122934247 0 018586 844 968refclang-17_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
123374279 0 018746 844 968refclang-17_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
126656549 0 018220 836 968sphlib-smallclang-17_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
127056743 0 021338 844 968sphlib-smallclang-17_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
127403339 0 013576 780 936refgcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
127406743 0 021210 844 968sphlib-smallclang-17_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
1341721368 0 035866 844 968sse2clang-17_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
1392820537 0 032156 836 968sse2clang-17_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
1400911074 0 022909 804 968sphlib-smallgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1404111609 0 025501 804 1032sphlib-smallgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1445021336 0 035706 844 968sse2clang-17_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
144636942 0 018717 804 968refgcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
144937029 0 020813 804 1032refgcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1609214776 0 029267 844 968sandyclang-17_-march=native_-O3_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
1609714744 0 029107 844 968sandyclang-17_-march=native_-O2_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
1614513944 0 025540 836 968sandyclang-17_-march=native_-Os_-fomit-frame-pointer_-fwrapv_-Qunused-arguments_-fPIC_-fPIE2023121920231217
1650215022 0 025264 780 936sandygcc_-march=native_-mtune=native_-Os_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1670615642 0 029461 804 1032sandygcc_-march=native_-mtune=native_-O3_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217
1672315419 0 027165 804 968sandygcc_-march=native_-mtune=native_-O2_-fomit-frame-pointer_-fwrapv_-fPIC_-fPIE2023121920231217

Compiler output

Implementation: sse41
Security model: constbranchindex
Compiler: clang-17 -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: In file included from hash.c:8:
hash.c: ./rounds.h:8:10: warning: '_mm_roti_epi64' macro redefined [-Wmacro-redefined]
hash.c: 8 | #define _mm_roti_epi64(x, c) \
hash.c: | ^
hash.c: /usr/lib/llvm-17/lib/clang/17/include/xopintrin.h:236:9: note: previous definition is here
hash.c: 236 | #define _mm_roti_epi64(A, N) \
hash.c: | ^
hash.c: 1 warning generated.

Number of similar (compiler,implementation) pairs: 3, namely:
CompilerImplementations
clang-17 -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41
clang-17 -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41
clang-17 -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE sse41

Compiler output

Implementation: xop
Security model: constbranchindex
Compiler: clang-17 -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: hash.c:81:8: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 81 | m0 = BSWAP64(m0);
hash.c: | ^
hash.c: ./rounds.h:13:21: note: expanded from macro 'BSWAP64'
hash.c: 13 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:82:8: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 82 | m1 = BSWAP64(m1);
hash.c: | ^
hash.c: ./rounds.h:13:21: note: expanded from macro 'BSWAP64'
hash.c: 13 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:83:8: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 83 | m2 = BSWAP64(m2);
hash.c: | ^
hash.c: ./rounds.h:13:21: note: expanded from macro 'BSWAP64'
hash.c: 13 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:84:8: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 84 | m3 = BSWAP64(m3);
hash.c: | ^
hash.c: ./rounds.h:13:21: note: expanded from macro 'BSWAP64'
hash.c: 13 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:85:8: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: ...

Number of similar (compiler,implementation) pairs: 3, namely:
CompilerImplementations
clang-17 -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop
clang-17 -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop
clang-17 -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop

Compiler output

Implementation: xop
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/13/include/x86intrin.h:38,
hash.c: from hash.c:5:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/xopintrin.h: In function 'blake512_compress':
hash.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/xopintrin.h:272:1: error: inlining failed in call to 'always_inline' '_mm_roti_epi64': target specific option mismatch
hash.c: 272 | _mm_roti_epi64(__m128i __A, const int __B)
hash.c: | ^~~~~~~~~~~~~~
hash.c: In file included from hash.c:8:
hash.c: rounds.h:825:11: note: called from here
hash.c: 825 | row2h = _mm_roti_epi64(row2h, -11); \
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: rounds.h:867:3: note: in expansion of macro 'G2'
hash.c: 867 | G2(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h,b0,b1); \
hash.c: | ^~
hash.c: hash.c:132:3: note: in expansion of macro 'ROUND'
hash.c: 132 | ROUND(15);
hash.c: | ^~~~~
hash.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/xopintrin.h:272:1: error: inlining failed in call to 'always_inline' '_mm_roti_epi64': target specific option mismatch
hash.c: 272 | _mm_roti_epi64(__m128i __A, const int __B)
hash.c: | ^~~~~~~~~~~~~~
hash.c: rounds.h:824:11: note: called from here
hash.c: 824 | row2l = _mm_roti_epi64(row2l, -11); \
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: rounds.h:867:3: note: in expansion of macro 'G2'
hash.c: 867 | G2(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h,b0,b1); \
hash.c: | ^~
hash.c: ...

Number of similar (compiler,implementation) pairs: 3, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE xop
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE xop
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE xop

Compiler output

Implementation: xop-2
Security model: constbranchindex
Compiler: clang-17 -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE
hash.c: hash.c:92:15: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 92 | m.u128[0] = BSWAP64(m.u128[0]);
hash.c: | ^
hash.c: ./rounds.h:15:21: note: expanded from macro 'BSWAP64'
hash.c: 15 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:93:15: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 93 | m.u128[1] = BSWAP64(m.u128[1]);
hash.c: | ^
hash.c: ./rounds.h:15:21: note: expanded from macro 'BSWAP64'
hash.c: 15 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:94:15: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 94 | m.u128[2] = BSWAP64(m.u128[2]);
hash.c: | ^
hash.c: ./rounds.h:15:21: note: expanded from macro 'BSWAP64'
hash.c: 15 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:95:15: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: 95 | m.u128[3] = BSWAP64(m.u128[3]);
hash.c: | ^
hash.c: ./rounds.h:15:21: note: expanded from macro 'BSWAP64'
hash.c: 15 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^
hash.c: hash.c:96:15: error: always_inline function '_mm_perm_epi8' requires target feature 'xop', but would be inlined into function 'blake512_compress' that is compiled without support for 'xop'
hash.c: ...

Number of similar (compiler,implementation) pairs: 3, namely:
CompilerImplementations
clang-17 -march=native -O2 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop-2
clang-17 -march=native -O3 -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop-2
clang-17 -march=native -Os -fomit-frame-pointer -fwrapv -Qunused-arguments -fPIC -fPIE xop-2

Compiler output

Implementation: xop-2
Security model: constbranchindex
Compiler: gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE
hash.c: In file included from /usr/lib/gcc/x86_64-linux-gnu/13/include/x86intrin.h:38,
hash.c: from hash.c:5:
hash.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/xopintrin.h: In function 'blake512_compress':
hash.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/xopintrin.h:218:1: error: inlining failed in call to 'always_inline' '_mm_perm_epi8': target specific option mismatch
hash.c: 218 | _mm_perm_epi8(__m128i __A, __m128i __B, __m128i __C)
hash.c: | ^~~~~~~~~~~~~
hash.c: In file included from hash.c:8:
hash.c: rounds.h:15:28: note: called from here
hash.c: 15 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: hash.c:99:15: note: in expansion of macro 'BSWAP64'
hash.c: 99 | m.u128[7] = BSWAP64(m.u128[7]);
hash.c: | ^~~~~~~
hash.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/xopintrin.h:218:1: error: inlining failed in call to 'always_inline' '_mm_perm_epi8': target specific option mismatch
hash.c: 218 | _mm_perm_epi8(__m128i __A, __m128i __B, __m128i __C)
hash.c: | ^~~~~~~~~~~~~
hash.c: rounds.h:15:28: note: called from here
hash.c: 15 | #define BSWAP64(x) _mm_perm_epi8((x),(x),u8to64)
hash.c: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hash.c: hash.c:98:15: note: in expansion of macro 'BSWAP64'
hash.c: 98 | m.u128[6] = BSWAP64(m.u128[6]);
hash.c: | ^~~~~~~
hash.c: /usr/lib/gcc/x86_64-linux-gnu/13/include/xopintrin.h:218:1: error: inlining failed in call to 'always_inline' '_mm_perm_epi8': target specific option mismatch
hash.c: 218 | _mm_perm_epi8(__m128i __A, __m128i __B, __m128i __C)
hash.c: | ^~~~~~~~~~~~~
hash.c: ...

Number of similar (compiler,implementation) pairs: 3, namely:
CompilerImplementations
gcc -march=native -mtune=native -O2 -fomit-frame-pointer -fwrapv -fPIC -fPIE xop-2
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE xop-2
gcc -march=native -mtune=native -Os -fomit-frame-pointer -fwrapv -fPIC -fPIE xop-2