Skip to content

Optimize BSS use of FFT with cupy, speed up of up to 3x for full tracks #83

@sevagh

Description

@sevagh

Hello,
I have been working on some potential performance optimizations for the BSS evaluation (which is rather slow/compute intensive for full tracks).

Baseline measurement with original museval code (the total execution involves also computing the IRM, adapted from https://github.com/sigsep/sigsep-mus-oracle/blob/master/IRM.py):

museval bss original execution time, 1 track of musdb
pybin: /home/sevagh/venvs/museval-orig/bin/python3
evaluating track AM Contra - Heart Peripheral

real    3m22.702s
user    3m21.577s
sys     0m39.376s

The original code takes ~3:20 minutes.

The second optimization uses cupy and the GPU, which is in my opinion a big cost/burden for end users. Installing the CUDA toolkit etc. is no joke. Here is the code: master...sevagh:feat/cupy-accel
However, the performance is rather good at ~1:20 minutes, so maybe almost ~3x faster than the original code:

museval bss optimization 2 (cupy on gpu) execution time, 1 track of musdb
pybin: /home/sevagh/venvs/museval-optimization-2/bin/python3
evaluating track AM Contra - Heart Peripheral

real    1m19.801s
user    1m27.077s
sys     0m29.615s

One final note is that the CUDA/cupy version has slight differences in the outputs due to numerical precision differences. It doesn't look too significant to me - here's an excerpt of a diff between the evaluated json files, showing small differences in the BSS scores:

@@ -10459,8 +10459,8 @@
-            "SAR": 30.60528,
-            "ISR": 30.67039
+            "SAR": 30.60525,
+            "ISR": 30.67036
@@ -10469,8 +10469,8 @@
-            "SAR": 30.45440,
-            "ISR": 30.52629
+            "SAR": 30.45438,
+            "ISR": 30.52627
@@ -10480,7 +10480,7 @@
-            "ISR": 20.99668
+            "ISR": 20.99667

I'm also trying to find a way to use CPU parallelism with scipy.fft and combining several of the FFTs in a single call, but this isn't really helping as much as the CUDA change. My code attempts can be seen here: master...sevagh:multiple-1d-fft

I'm aware of the separate repo for bss at https://github.com/sigsep/bsseval/ but I wasn't sure which project to discuss it in - I'm using museval because I'm trying to recreate the SiSec 2018 testbench.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions