Skip to content

Commit 84e62f0

Browse files
authored
Add Demucs v3/hdemucs_mmi inference (#17)
1 parent fedd352 commit 84e62f0

23 files changed

+6871
-254
lines changed

.github/PERFORMANCE.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,3 +54,10 @@ sys 3m28.465s
5454
```
5555

5656
More than 2x faster for 4 threads. This is inspired by the parallelism strategy used in <https://freemusicdemixer.com>.
57+
58+
V3 is a faster algorithm and the mt variant (with 4 threads) runs in 2.5 min:
59+
```
60+
real 2m35.737s
61+
user 10m28.019s
62+
sys 2m42.292s
63+
```

.github/SDR_scores.md

Lines changed: 38 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,10 @@ other ==> SDR: 7.421 SIR: 11.289 ISR: 14.241 SAR: 8.179
1313
```
1414
CPP inference (this codebase):
1515
```
16-
vocals ==> SDR: 8.339 SIR: 18.276 ISR: 15.836 SAR: 8.346
17-
drums ==> SDR: 10.058 SIR: 18.596 ISR: 17.019 SAR: 10.810
18-
bass ==> SDR: 3.919 SIR: 12.436 ISR: 6.931 SAR: 3.182
19-
other ==> SDR: 7.421 SIR: 11.286 ISR: 14.252 SAR: 8.183
16+
vocals ==> SDR: 8.370 SIR: 18.188 ISR: 15.924 SAR: 8.475
17+
drums ==> SDR: 10.002 SIR: 18.571 ISR: 17.027 SAR: 10.645
18+
bass ==> SDR: 4.021 SIR: 12.407 ISR: 7.031 SAR: 3.223
19+
other ==> SDR: 7.469 SIR: 11.367 ISR: 14.186 SAR: 8.182
2020
```
2121
*n.b.* for the above results, the random shift in the beginning of the song was fixed to 1337 in both PyTorch and C++.
2222

@@ -33,10 +33,10 @@ other ==> SDR: 0.168 SIR: 11.449 ISR: 0.411 SAR: -2.720
3333
```
3434
CPP inference (this codebase):
3535
```
36-
vocals ==> SDR: 8.395 SIR: 18.699 ISR: 16.076 SAR: 8.576
37-
drums ==> SDR: 9.927 SIR: 17.921 ISR: 17.518 SAR: 10.635
38-
bass ==> SDR: 4.519 SIR: 10.458 ISR: 8.606 SAR: 4.370
39-
other ==> SDR: 0.164 SIR: 11.443 ISR: 0.409 SAR: -2.713
36+
vocals ==> SDR: 8.395 SIR: 18.581 ISR: 16.101 SAR: 8.579
37+
drums ==> SDR: 9.922 SIR: 18.013 ISR: 17.477 SAR: 10.669
38+
bass ==> SDR: 4.523 SIR: 10.482 ISR: 8.567 SAR: 4.336
39+
other ==> SDR: 0.167 SIR: 11.145 ISR: 0.448 SAR: -1.238
4040
```
4141

4242
*n.b.* the "other" score will be artificially low because of the extra guitar + piano separation where there are no stems to compare to
@@ -54,10 +54,36 @@ other ==> SDR: 7.384 SIR: 12.812 ISR: 12.977 SAR: 7.798
5454
```
5555
CPP inference (this codebase, `demucs_ft.cpp`)
5656
```
57-
vocals ==> SDR: 8.594 SIR: 19.045 ISR: 16.313 SAR: 8.617
58-
drums ==> SDR: 10.463 SIR: 19.782 ISR: 17.144 SAR: 11.132
59-
bass ==> SDR: 4.584 SIR: 9.359 ISR: 9.068 SAR: 4.885
60-
other ==> SDR: 7.426 SIR: 12.793 ISR: 12.975 SAR: 7.830
57+
vocals ==> SDR: 8.679 SIR: 18.861 ISR: 16.611 SAR: 8.664
58+
drums ==> SDR: 10.480 SIR: 19.898 ISR: 17.125 SAR: 11.053
59+
bass ==> SDR: 4.590 SIR: 9.516 ISR: 9.102 SAR: 4.935
60+
other ==> SDR: 7.370 SIR: 12.853 ISR: 12.926 SAR: 7.805
61+
```
62+
63+
### Performance of v3 (hdemucs_mmi) model
64+
65+
Track 'Zeno - Signs' from MUSDB18-HQ test set
66+
67+
PyTorch inference (using v3-mmi default segment length + LSTM max length of 200):
68+
```
69+
vocals ==> SDR: 8.328 SIR: 18.943 ISR: 16.097 SAR: 8.563
70+
drums ==> SDR: 9.284 SIR: 18.123 ISR: 16.230 SAR: 10.125
71+
bass ==> SDR: 3.612 SIR: 10.313 ISR: 6.958 SAR: 3.077
72+
other ==> SDR: 7.122 SIR: 11.391 ISR: 14.363 SAR: 7.910
73+
```
74+
PyTorch inference (using v4 7.8s segment length + LSTM max length of 336):
75+
```
76+
vocals ==> SDR: 8.304 SIR: 18.916 ISR: 16.087 SAR: 8.557
77+
drums ==> SDR: 9.279 SIR: 18.149 ISR: 16.203 SAR: 10.109
78+
bass ==> SDR: 3.601 SIR: 10.350 ISR: 6.971 SAR: 3.076
79+
other ==> SDR: 7.123 SIR: 11.373 ISR: 14.373 SAR: 7.907
80+
```
81+
CPP inference (this codebase, `demucs_v3.cpp`):
82+
```
83+
vocals ==> SDR: 8.332 SIR: 18.889 ISR: 16.083 SAR: 8.557
84+
drums ==> SDR: 9.285 SIR: 18.242 ISR: 16.194 SAR: 10.140
85+
bass ==> SDR: 3.668 SIR: 10.040 ISR: 7.056 SAR: 3.210
86+
other ==> SDR: 7.130 SIR: 11.440 ISR: 14.257 SAR: 7.860
6187
```
6288

6389
### Performance of multi-threaded inference

CMakeLists.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,16 @@ target_include_directories(demucs_ft_mt.cpp.main PRIVATE vendor/libnyquist/inclu
104104
target_include_directories(demucs_ft_mt.cpp.main PRIVATE cli-apps)
105105
target_link_libraries(demucs_ft_mt.cpp.main demucs.cpp.lib libnyquist)
106106

107+
add_executable(demucs_v3.cpp.main "cli-apps/demucs_v3.cpp")
108+
target_include_directories(demucs_v3.cpp.main PRIVATE vendor/libnyquist/include)
109+
target_include_directories(demucs_v3.cpp.main PRIVATE cli-apps)
110+
target_link_libraries(demucs_v3.cpp.main demucs.cpp.lib libnyquist)
111+
112+
add_executable(demucs_v3_mt.cpp.main "cli-apps/demucs_v3_mt.cpp")
113+
target_include_directories(demucs_v3_mt.cpp.main PRIVATE vendor/libnyquist/include)
114+
target_include_directories(demucs_v3_mt.cpp.main PRIVATE cli-apps)
115+
target_link_libraries(demucs_v3_mt.cpp.main demucs.cpp.lib libnyquist)
116+
107117
file(GLOB SOURCES_TO_LINT "src/*.cpp" "src/*.hpp" "cli-apps/*.cpp" "cli-apps/*.hpp")
108118

109119
# add target to run standard lints and formatters

README.md

Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# demucs.cpp
22

3-
C++17 library that implements the inference of the [Demucs v4 hybrid transformer model](https://github.com/facebookresearch/demucs), a PyTorch neural network for music demixing.
3+
C++17 library that implements inference for the [Demucs v4 hybrid transformer](https://github.com/facebookresearch/demucs) and [Demucs v3 hybrid](https://github.com/facebookresearch/demucs/tree/v3) models, which are high-performance PyTorch neural networks for music source separation.
44

5-
It uses only the standard library and the header-only library [Eigen](https://eigen.tuxfamily.org/index.php?title=Main_Page) as dependencies, making it suitable to compile and run on many platforms. It was designed for low-memory environments by sacrificing the speed of the Torch implementation.
5+
It uses only the standard library (C++17) and the header-only library [Eigen](https://eigen.tuxfamily.org/index.php?title=Main_Page) as dependencies, making it suitable to compile and run on many platforms. It was designed for low-memory environments by sacrificing the speed of the Torch implementation.
66

77
Demucs.cpp powers my websites (<https://freemusicdemixer.com>, <https://pro.freemusicdemixer.com>) and now my new Android app [Music Demixer](https://play.google.com/store/apps/details?id=com.freemusicdemixer.pro) to bring Demucs to your pocket!
88

@@ -12,9 +12,11 @@ See my other project [umx.cpp](https://github.com/sevagh/umx.cpp) for a similar
1212

1313
### Library design
1414

15-
It uses [libnyquist](https://github.com/ddiakopoulos/libnyquist) to load audio files, the [ggml](https://github.com/ggerganov/ggml) file format to serialize the PyTorch weights of `htdemucs`, `htdemucs_6s`, and `htdemucs_ft` (4-source, 6-source, fine-tuned) to a binary file format, and [Eigen](https://eigen.tuxfamily.org/index.php?title=Main_Page) (+ OpenMP) to implement the inference. There are also programs for multi-threaded Demucs inference using C++11's `std::thread`.
15+
The inference library (in `src/`) uses the [ggml](https://github.com/ggerganov/ggml) file format to serialize the PyTorch weights of `hdemucs_mmi`, `htdemucs`, `htdemucs_6s`, and `htdemucs_ft` (v3, v4 4-source, v4 6-source, v4 fine-tuned) to a binary file format, and [Eigen](https://eigen.tuxfamily.org/index.php?title=Main_Page) to implement the inference (with OpenMP as a requirement).
1616

17-
**All Hybrid-Transformer weights** (4-source, 6-source, fine-tuned) are supported. See the [Convert weights](#convert-weights) section below. Demixing quality is nearly identical to PyTorch as shown in the [SDR scores doc](./.github/SDR_scores.md).
17+
The cli programs (in `cli-apps/`) additionally use [libnyquist](https://github.com/ddiakopoulos/libnyquist) to read and write audio files, and the multithreaded cli programs use C++11's `std::thread`.
18+
19+
**All Hybrid-Transformer weights** (4-source, 6-source, fine-tuned) are supported. See the [Convert weights](#convert-weights) section below. Inference for the **Demucs v3 Hybrid model weights** `hdemucs_mmi` is also supported. Demixing quality is practically identical to PyTorch as shown in the [SDR scores doc](./.github/SDR_scores.md).
1820

1921
### Directory structure
2022

@@ -23,8 +25,10 @@ It uses [libnyquist](https://github.com/ddiakopoulos/libnyquist) to load audio f
2325
1. `demucs_ft.cpp.main`: run all four fine-tuned models for `htdemucs_ft` inference, same as the BagOfModels idea of PyTorch Demucs
2426
1. `demucs_mt.cpp.main`: run a single model, multi-threaded
2527
1. `demucs_ft_mt.cpp.main`: run all four fine-tuned models, multi-threaded
28+
1. `demucs_v3.cpp.main`: run a single model for v3 `hdemucs_mmi`
29+
1. `demucs_v3_mt.cpp.main`: run a single model for v3 `hdemucs_mmi`, multi-threaded
2630

27-
See the [PERFORMANCE doc](./.github/PERFORMANCE.md) for details on multi-threading, external BLAS libraries, etc..
31+
See the [PERFORMANCE doc](./.github/PERFORMANCE.md) for time measurements, benchmarks, details on multi-threading, external BLAS libraries, etc.
2832

2933
## Instructions
3034

@@ -45,10 +49,6 @@ $ sudo apt-get install gcc g++ cmake clang-tools libopenblas0-openmp libopenblas
4549
Compile with CMake:
4650
```
4751
$ mkdir -p build && cd build && cmake .. && make -j16
48-
libdemucs.cpp.lib.a <--- library
49-
demucs.cpp.main <--- single-model (4s, 6s, ft)
50-
demucs_ft.cpp.main <--- bag of ft models
51-
demucs.cpp.test <--- unit tests
5252
```
5353

5454
### Convert weights
@@ -62,7 +62,7 @@ $ mamba activate demucscpp
6262
$ python -m pip install -r ./scripts/requirements.txt
6363
```
6464

65-
Dump Demucs weights to ggml file, with flag `--six-source` for the 6-source variant, and all of `--ft-drums, --ft-vocals, --ft-bass, --ft-other` for the fine-tuned models:
65+
Dump Demucs weights to ggml file, with flag `--six-source` for the 6-source variant, all of `--ft-drums, --ft-vocals, --ft-bass, --ft-other` for the fine-tuned models, and `--v3` for the v3 model:
6666
```
6767
$ python ./scripts/convert-pth-to-ggml.py ./ggml-demucs
6868
...
@@ -76,14 +76,15 @@ Done. Output file: ggml-demucs/ggml-model-htdemucs-4s-f16.bin
7676

7777
All supported models would look like this:
7878
```
79-
$ ls ../ggml-demucs/
80-
total 133M
81-
81M Jan 10 22:40 ggml-model-htdemucs-4s-f16.bin
82-
53M Jan 10 22:41 ggml-model-htdemucs-6s-f16.bin
83-
81M Jan 10 22:41 ggml-model-htdemucs_ft_drums-4s-f16.bin
84-
81M Jan 10 22:43 ggml-model-htdemucs_ft_bass-4s-f16.bin
85-
81M Jan 10 22:43 ggml-model-htdemucs_ft_other-4s-f16.bin
86-
81M Jan 10 22:43 ggml-model-htdemucs_ft_vocals-4s-f16.bin
79+
$ ls ./ggml-demucs/
80+
total 613M
81+
160M May 5 14:38 ggml-model-hdemucs_mmi-v3-f16.bin
82+
53M May 5 16:50 ggml-model-htdemucs-6s-f16.bin
83+
81M May 5 16:50 ggml-model-htdemucs_ft_vocals-4s-f16.bin
84+
81M May 5 16:50 ggml-model-htdemucs_ft_bass-4s-f16.bin
85+
81M May 5 16:50 ggml-model-htdemucs_ft_drums-4s-f16.bin
86+
81M May 5 16:50 ggml-model-htdemucs_ft_other-4s-f16.bin
87+
81M May 5 16:51 ggml-model-htdemucs-4s-f16.bin
8788
```
8889

8990
### Run demucs.cpp

0 commit comments

Comments
 (0)