Skip to content

Commit a6f55ee

Browse files
Merge #6796: crypto: assume true 64-bit target, add benchmarks for constituent hashes, apply stronger optimizations
44eebe7 chore: resolve `constVariable` linter error (Kittywhiskers Van Gogh) 30bf19d crypto: isolate sphlib sources and apply stronger optimizations to it (Kittywhiskers Van Gogh) 3ec483e bench: more final proof of work benchmark to `pow_hash` (Kittywhiskers Van Gogh) fb54ce0 bench: add benchmarks for constituent hash algorithms for proof of work (Kittywhiskers Van Gogh) ee6efaa refactor: remove unused macros and function definitions (Kittywhiskers Van Gogh) 32156b2 build: assume true 64-bit target, assume `SPH_64` and related macros (Kittywhiskers Van Gogh) Pull request description: ## Additional Information * sphlib requires platforms to be `SPH_64` in order to build variants like Blake512 ([source](https://github.com/dashpay/dash/blob/489c5f0127891c2d058b65e00750ff3100d6d7b8/src/crypto/x11/blake.c#L632-L663)) and Bmw512 ([source](https://github.com/dashpay/dash/blob/489c5f0127891c2d058b65e00750ff3100d6d7b8/src/crypto/x11/bmw.c#L545-L576)). While other variants have alternate implementations for non-`SPH_64` platforms like Groestl ([source](https://github.com/dashpay/dash/blob/489c5f0127891c2d058b65e00750ff3100d6d7b8/src/crypto/x11/groestl.c#L46-L56)), non-`SPH_64` platforms _cannot_ build Blake512 or Bmw512. As X11 (and by extension, Dash) requires both, we can safely assume that Dash Core doesn't support non-`SPH_64` targets and can remove fallback code. * To inform decisions when optimizing X11, this pull request introduces benchmarks for all constituent hash algorithms across the same set of data sizes (32b, 80b, 128b, 512b, 1024b, 2048b and 1M), all tests can be run with `bench_dash --filter="Pow(.*)"` and can be filtered by data size with `bench_dash --filter="Pow(.*)1024b(.*)"` (see below) <details> <summary>Benchmarks:</summary> ``` | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 1.60 | 625,125,455.46 | 0.3% | 25.60 | 7.10 | 3.603 | 0.13 | 0.0% | 0.01 | `Pow_Blake512_1024b` | 0.75 | 1,333,435,941.89 | 0.2% | 17.06 | 3.33 | 5.124 | 0.12 | 0.0% | 0.01 | `Pow_Bmw512_1024b` | 6.24 | 160,132,129.57 | 0.0% | 118.36 | 27.74 | 4.267 | 0.25 | 0.4% | 0.01 | `Pow_Cubehash512_1024b` | 14.64 | 68,308,839.95 | 0.0% | 261.17 | 65.02 | 4.017 | 1.63 | 0.6% | 0.01 | `Pow_Echo512_1024b` | 8.72 | 114,678,066.51 | 0.1% | 169.16 | 38.73 | 4.368 | 0.25 | 0.4% | 0.01 | `Pow_Groestl512_1024b` | 8.87 | 112,774,410.39 | 0.1% | 153.89 | 39.38 | 3.907 | 0.15 | 0.6% | 0.01 | `Pow_Jh512_1024b` | 3.79 | 264,078,320.88 | 0.1% | 81.14 | 16.81 | 4.826 | 0.23 | 0.0% | 0.01 | `Pow_Keccak512_1024b` | 6.87 | 145,603,041.03 | 0.1% | 131.55 | 30.46 | 4.319 | 1.11 | 0.1% | 0.01 | `Pow_Luffa512_1024b` | 7.54 | 132,578,659.24 | 0.1% | 119.56 | 33.48 | 3.572 | 0.26 | 0.4% | 0.01 | `Pow_Shavite512_1024b` | 7.24 | 138,072,734.74 | 0.5% | 129.58 | 32.09 | 4.037 | 2.38 | 0.0% | 0.01 | `Pow_Simd512_1024b` | 1.17 | 857,609,484.77 | 0.0% | 20.67 | 5.18 | 3.992 | 0.15 | 0.0% | 0.01 | `Pow_Skein512_1024b` | 11.45 | 87,351,960.89 | 0.0% | 195.23 | 50.85 | 3.840 | 1.30 | 0.1% | 0.01 | `Pow_X11_1024b` ``` </details> * This pull request also makes changes that result in performance impact, the baseline measurement is given below Build parameters: `LDFLAGS="-Wl,--as-needed -Wl,-O2" CC=clang-16 CXX=clang++-16 ./configure --prefix=$(pwd)/depends/x86_64-linux-gnu --enable-reduce-exports --disable-tests --disable-gui-tests --disable-fuzz --disable-fuzz-binary --disable-ccache --disable-maintainer-mode --disable-dependency-tracking --without-gui` <details> <summary>Baseline benchmarks (489c5f0):</summary> ``` | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 321.49 | 3,110,502.97 | 0.1% | 5,527.82 | 1,427.78 | 3.872 | 38.97 | 0.1% | 0.01 | `X11_0032b_single` | 128.63 | 7,774,397.65 | 0.1% | 2,211.03 | 571.22 | 3.871 | 15.56 | 0.1% | 0.01 | `X11_0080b_single` | 81.97 | 12,199,203.11 | 0.1% | 1,404.79 | 364.00 | 3.859 | 9.80 | 0.1% | 0.01 | `X11_0128b_single` | 21.55 | 46,394,313.29 | 0.1% | 368.03 | 95.73 | 3.844 | 2.51 | 0.1% | 0.01 | `X11_0512b_single` | 11.48 | 87,085,396.93 | 0.1% | 195.23 | 51.00 | 3.828 | 1.30 | 0.1% | 0.01 | `X11_1024b_single` | 1.43 | 700,791,473.89 | 0.1% | 22.61 | 6.33 | 3.572 | 0.09 | 0.0% | 0.02 | `X11_1M` | 6.45 | 155,097,633.83 | 0.0% | 108.83 | 28.64 | 3.801 | 0.69 | 0.1% | 0.01 | `X11_2048b_single` ``` </details> Enablement of small footprint for JH, CubeHash (~10% improvement) <details> <summary>Benchmarks (30bf19d):</summary> ``` | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 291.68 | 3,428,360.28 | 0.1% | 5,135.01 | 1,294.73 | 3.966 | 41.91 | 0.1% | 0.01 | `Pow_X11_0032b` | 117.46 | 8,513,845.07 | 0.1% | 2,053.90 | 518.66 | 3.960 | 16.74 | 0.1% | 0.01 | `Pow_X11_0080b` | 75.22 | 13,293,803.11 | 0.1% | 1,305.23 | 332.11 | 3.930 | 10.53 | 0.1% | 0.01 | `Pow_X11_0128b` | 19.65 | 50,880,118.90 | 0.0% | 342.12 | 86.79 | 3.942 | 2.70 | 0.1% | 0.01 | `Pow_X11_0512b` | 10.46 | 95,567,763.92 | 0.1% | 181.60 | 46.21 | 3.930 | 1.39 | 0.1% | 0.01 | `Pow_X11_1024b` | 1.20 | 835,966,628.21 | 0.4% | 21.24 | 5.28 | 4.023 | 0.09 | 0.0% | 0.01 | `Pow_X11_1M` | 5.83 | 171,499,428.31 | 0.2% | 101.34 | 25.75 | 3.936 | 0.74 | 0.1% | 0.01 | `Pow_X11_2048b` ``` </details> ## Breaking Changes None expected. ## Checklist: - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have added or updated relevant unit/integration/functional/e2e tests - [x] I have made corresponding changes to the documentation **(note: N/A)** - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_ ACKs for top commit: UdjinM6: utACK 44eebe7 Tree-SHA512: 26afe6b005b76285f8d7032e516d70daaa257e64066d77a4626f9da17f5b7208d5ca345975074fb539bcd7984ba2a923d02c73c2e9cdbe440d2fb7175df8b542
2 parents 73e56f0 + 44eebe7 commit a6f55ee

23 files changed

+379
-1933
lines changed

configure.ac

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -378,6 +378,9 @@ if test "$enable_debug" = "yes"; then
378378
AX_CHECK_PREPROC_FLAG([-DABORT_ON_FAILED_ASSUME], [DEBUG_CPPFLAGS="$DEBUG_CPPFLAGS -DABORT_ON_FAILED_ASSUME"], [], [$CXXFLAG_WERROR])
379379
AX_CHECK_COMPILE_FLAG([-ftrapv], [DEBUG_CXXFLAGS="$DEBUG_CXXFLAGS -ftrapv"], [], [$CXXFLAG_WERROR])
380380
else
381+
dnl If not debugging, enable more aggressive optimizations for sphlib sources
382+
AX_CHECK_COMPILE_FLAG([-O3], [SPHLIB_CFLAGS="$SPHLIB_CFLAGS -O3"], [], [$CXXFLAG_WERROR])
383+
381384
# We always enable at at least -g1 debug info to support proper stacktraces in crash infos
382385
# Stacktraces will be suboptimal due to optimization, but better than nothing. Also, -fno-omit-frame-pointer
383386
# mitigates this a little bit
@@ -1874,6 +1877,7 @@ AC_SUBST(PIC_FLAGS)
18741877
AC_SUBST(PIE_FLAGS)
18751878
AC_SUBST(SANITIZER_CXXFLAGS)
18761879
AC_SUBST(SANITIZER_LDFLAGS)
1880+
AC_SUBST(SPHLIB_CFLAGS)
18771881
AC_SUBST(SSE42_CXXFLAGS)
18781882
AC_SUBST(SSE41_CXXFLAGS)
18791883
AC_SUBST(CLMUL_CXXFLAGS)

src/Makefile.am

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,8 @@ LIBBITCOIN_WALLET_TOOL=libbitcoin_wallet_tool.a
7373
endif
7474

7575
LIBBITCOIN_CRYPTO = $(LIBBITCOIN_CRYPTO_BASE)
76+
LIBBITCOIN_CRYPTO_SPH = crypto/libbitcoin_crypto_sph.la
77+
LIBBITCOIN_CRYPTO += $(LIBBITCOIN_CRYPTO_SPH)
7678
if ENABLE_SSE41
7779
LIBBITCOIN_CRYPTO_SSE41 = crypto/libbitcoin_crypto_sse41.la
7880
LIBBITCOIN_CRYPTO += $(LIBBITCOIN_CRYPTO_SSE41)
@@ -717,8 +719,16 @@ crypto_libbitcoin_crypto_avx2_la_CXXFLAGS += $(AVX2_CXXFLAGS)
717719
crypto_libbitcoin_crypto_avx2_la_CPPFLAGS += -DENABLE_AVX2
718720
crypto_libbitcoin_crypto_avx2_la_SOURCES = crypto/sha256_avx2.cpp
719721

720-
# x11
721-
crypto_libbitcoin_crypto_base_la_SOURCES += \
722+
# See explanation for -static in crypto_libbitcoin_crypto_base_la's LDFLAGS and
723+
# CXXFLAGS above
724+
crypto_libbitcoin_crypto_sph_la_LDFLAGS = $(AM_LDFLAGS) -static
725+
crypto_libbitcoin_crypto_sph_la_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS) -static
726+
crypto_libbitcoin_crypto_sph_la_CPPFLAGS = $(AM_CPPFLAGS)
727+
crypto_libbitcoin_crypto_sph_la_CFLAGS = $(SPHLIB_CFLAGS)
728+
crypto_libbitcoin_crypto_sph_la_CPPFLAGS += \
729+
-DSPH_SMALL_FOOTPRINT_CUBEHASH=1 \
730+
-DSPH_SMALL_FOOTPRINT_JH=1
731+
crypto_libbitcoin_crypto_sph_la_SOURCES = \
722732
crypto/x11/aes_helper.c \
723733
crypto/x11/blake.c \
724734
crypto/x11/bmw.c \
@@ -1008,7 +1018,7 @@ endif
10081018
# dashconsensus library #
10091019
if BUILD_BITCOIN_LIBS
10101020
include_HEADERS = script/bitcoinconsensus.h
1011-
libdashconsensus_la_SOURCES = support/cleanse.cpp $(crypto_libbitcoin_crypto_base_la_SOURCES) $(libbitcoin_consensus_a_SOURCES)
1021+
libdashconsensus_la_SOURCES = support/cleanse.cpp $(crypto_libbitcoin_crypto_base_la_SOURCES) $(crypto_libbitcoin_crypto_sph_la_SOURCES) $(libbitcoin_consensus_a_SOURCES)
10121022

10131023
libdashconsensus_la_LDFLAGS = $(AM_LDFLAGS) -no-undefined $(RELDFLAGS)
10141024
libdashconsensus_la_LIBADD = $(LIBDASHBLS) $(LIBSECP256K1) $(GMP_LIBS)

src/Makefile.bench.include

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ bench_bench_dash_SOURCES = \
4343
bench/nanobench.cpp \
4444
bench/peer_eviction.cpp \
4545
bench/pool.cpp \
46+
bench/pow_hash.cpp \
4647
bench/rpc_blockchain.cpp \
4748
bench/rpc_mempool.cpp \
4849
bench/strencodings.cpp \

src/bench/crypto_hash.cpp

Lines changed: 0 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -295,15 +295,6 @@ static void DSHA256_1M(benchmark::Bench& bench)
295295
});
296296
}
297297

298-
static void X11_1M(benchmark::Bench& bench)
299-
{
300-
uint256 hash;
301-
std::vector<uint8_t> in(BUFFER_SIZE,0);
302-
bench.batch(in.size()).unit("byte").run([&] {
303-
hash = HashX11(in.begin(), in.end());
304-
});
305-
}
306-
307298
/* Hash different number of bytes via DSHA256 */
308299
static void DSHA256_0032b_single(benchmark::Bench& bench)
309300
{
@@ -359,74 +350,11 @@ static void DSHA256_2048b_single(benchmark::Bench& bench)
359350
});
360351
}
361352

362-
/* Hash different number of bytes via X11 */
363-
static void X11_0032b_single(benchmark::Bench& bench)
364-
{
365-
uint256 hash;
366-
std::vector<uint8_t> in(32,0);
367-
bench.batch(in.size()).unit("byte").run([&] {
368-
hash = HashX11(in.begin(), in.end());
369-
});
370-
}
371-
372-
static void X11_0080b_single(benchmark::Bench& bench)
373-
{
374-
uint256 hash;
375-
std::vector<uint8_t> in(80,0);
376-
bench.batch(in.size()).unit("byte").run([&] {
377-
hash = HashX11(in.begin(), in.end());
378-
});
379-
}
380-
381-
static void X11_0128b_single(benchmark::Bench& bench)
382-
{
383-
uint256 hash;
384-
std::vector<uint8_t> in(128,0);
385-
bench.batch(in.size()).unit("byte").run([&] {
386-
hash = HashX11(in.begin(), in.end());
387-
});
388-
}
389-
390-
static void X11_0512b_single(benchmark::Bench& bench)
391-
{
392-
uint256 hash;
393-
std::vector<uint8_t> in(512,0);
394-
bench.batch(in.size()).unit("byte").run([&] {
395-
hash = HashX11(in.begin(), in.end());
396-
});
397-
}
398-
399-
static void X11_1024b_single(benchmark::Bench& bench)
400-
{
401-
uint256 hash;
402-
std::vector<uint8_t> in(1024,0);
403-
bench.batch(in.size()).unit("byte").run([&] {
404-
hash = HashX11(in.begin(), in.end());
405-
});
406-
}
407-
408-
static void X11_2048b_single(benchmark::Bench& bench)
409-
{
410-
uint256 hash;
411-
std::vector<uint8_t> in(2048,0);
412-
bench.batch(in.size()).unit("byte").run([&] {
413-
hash = HashX11(in.begin(), in.end());
414-
});
415-
}
416-
417353
BENCHMARK(DSHA256_1M);
418-
BENCHMARK(X11_1M);
419354

420355
BENCHMARK(DSHA256_0032b_single);
421356
BENCHMARK(DSHA256_0080b_single);
422357
BENCHMARK(DSHA256_0128b_single);
423358
BENCHMARK(DSHA256_0512b_single);
424359
BENCHMARK(DSHA256_1024b_single);
425360
BENCHMARK(DSHA256_2048b_single);
426-
427-
BENCHMARK(X11_0032b_single);
428-
BENCHMARK(X11_0080b_single);
429-
BENCHMARK(X11_0128b_single);
430-
BENCHMARK(X11_0512b_single);
431-
BENCHMARK(X11_1024b_single);
432-
BENCHMARK(X11_2048b_single);

0 commit comments

Comments
 (0)