Optimize BSS use of FFT with cupy, speed up of up to 3x for full tracks

Hello,
I have been working on some potential performance optimizations for the BSS evaluation (which is rather slow/compute intensive for full tracks).

Baseline measurement with original museval code (the total execution involves also computing the IRM, adapted from https://github.com/sigsep/sigsep-mus-oracle/blob/master/IRM.py):
```
museval bss original execution time, 1 track of musdb
pybin: /home/sevagh/venvs/museval-orig/bin/python3
evaluating track AM Contra - Heart Peripheral

real    3m22.702s
user    3m21.577s
sys     0m39.376s
```
The original code takes ~3:20 minutes.

The second optimization uses cupy and the GPU, which is in my opinion a big cost/burden for end users. Installing the CUDA toolkit etc. is no joke. Here is the code: https://github.com/sigsep/sigsep-mus-eval/compare/master...sevagh:feat/cupy-accel
However, the performance is rather good at ~1:20 minutes, so maybe almost ~3x faster than the original code:
```
museval bss optimization 2 (cupy on gpu) execution time, 1 track of musdb
pybin: /home/sevagh/venvs/museval-optimization-2/bin/python3
evaluating track AM Contra - Heart Peripheral

real    1m19.801s
user    1m27.077s
sys     0m29.615s
```
One final note is that the CUDA/cupy version has slight differences in the outputs due to numerical precision differences. It doesn't look too significant to me - here's an excerpt of a diff between the evaluated json files, showing small differences in the BSS scores:
```
@@ -10459,8 +10459,8 @@
-            "SAR": 30.60528,
-            "ISR": 30.67039
+            "SAR": 30.60525,
+            "ISR": 30.67036
@@ -10469,8 +10469,8 @@
-            "SAR": 30.45440,
-            "ISR": 30.52629
+            "SAR": 30.45438,
+            "ISR": 30.52627
@@ -10480,7 +10480,7 @@
-            "ISR": 20.99668
+            "ISR": 20.99667
```
I'm also trying to find a way to use CPU parallelism with scipy.fft and combining several of the FFTs in a single call, but this isn't really helping as much as the CUDA change. My code attempts can be seen here: https://github.com/sigsep/sigsep-mus-eval/compare/master...sevagh:multiple-1d-fft

I'm aware of the separate repo for bss at https://github.com/sigsep/bsseval/ but I wasn't sure which project to discuss it in - I'm using museval because I'm trying to recreate the SiSec 2018 testbench.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize BSS use of FFT with cupy, speed up of up to 3x for full tracks #83

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize BSS use of FFT with cupy, speed up of up to 3x for full tracks #83

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions