Skip to content

stb_vorbis: Use a buffer to cache SDL_RWops reads #671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 24, 2025

Conversation

bjorn
Copy link

@bjorn bjorn commented Feb 21, 2025

Requesting the data one byte at a time can be prohibitively expensive, especially when the SDL_RWops isn't backed by constant memory but routed through PhysicsFS for example.

This change introduces a 2kb buffer, which feels small enough to not add too much overhead and is the same size as used by libvorbisfile. In my measurements, using a larger buffer no longer made a significant difference when starting to play an OGG file.

Testing with an uncompressed OGG file read from zip file, timing the call to Mix_LoadMUS_RW:

  • No buffer: ~ 47 ms
  • 10 bytes: ~ 5.2 ms
  • 100 bytes: ~ 1.1 ms
  • 1 kb: ~ 0.62 ms
  • 2 kb: ~ 0.6 ms
  • 8 kb: ~ 0.58 ms

When using the libvorbisfile backend it takes about 0.2 ms.

See #670.

@bjorn
Copy link
Author

bjorn commented Feb 21, 2025

My application isn't updated to SDL3 so I've done the SDL2 patch first, but I can look at adjusting it to SDL3 later.

@sezero
Copy link
Contributor

sezero commented Feb 21, 2025

For C90 compilers:

--- stb_vorbis.h~
+++ stb_vorbis.h
@@ -1441,6 +1441,8 @@ static int getn(vorb *z, uint8 *data, int n)
 {
    #ifdef STB_VORBIS_SDL
    while (n > 0) {
+      int chunk;
+
       if (z->rwops_buffer_pos >= z->rwops_buffer_fill) {
          z->rwops_buffer_fill = SDL_RWread(z->rwops, z->rwops_buffer, 1, RWOPS_BUFFER_SIZE);
          z->rwops_buffer_pos = 0;
@@ -1450,7 +1452,7 @@ static int getn(vorb *z, uint8 *data, int n)
          }
       }
 
-      int chunk = z->rwops_buffer_fill - z->rwops_buffer_pos;
+      chunk = z->rwops_buffer_fill - z->rwops_buffer_pos;
       if (chunk > n) chunk = n;
 
       memcpy(data, z->rwops_buffer + z->rwops_buffer_pos, chunk);
@@ -1510,6 +1512,8 @@ static int set_file_offset(stb_vorbis *f, unsigned int loc)
    f->eof = 0;
 
    #ifdef STB_VORBIS_SDL
+ {
+   unsigned int rwops_pos;
    uint32 buffer_start = f->rwops_virtual_pos - f->rwops_buffer_pos;
    uint32 buffer_end = buffer_start + f->rwops_buffer_fill;
    f->rwops_virtual_pos = loc;
@@ -1521,7 +1525,7 @@ static int set_file_offset(stb_vorbis *f, unsigned int loc)
       return 1;
    }
 
-   unsigned int rwops_pos = loc + f->rwops_start;
+   rwops_pos = loc + f->rwops_start;
    if (rwops_pos < loc || loc >= 0x80000000) {
       rwops_pos = 0x7fffffff;
       f->eof = 1;
@@ -1533,6 +1537,7 @@ static int set_file_offset(stb_vorbis *f, unsigned int loc)
    f->eof = 1;
    SDL_RWseek(f->rwops, f->rwops_start, RW_SEEK_END);
    return 0;
+ }
 
    #else
    if (USE_MEMORY(f)) {

@slouken
Copy link
Collaborator

slouken commented Feb 21, 2025

This looks good! Can you apply @sezero's changes, and I'll go ahead and merge this?

@sezero
Copy link
Contributor

sezero commented Feb 21, 2025

P.S.: Can we apply this to @icculus' SDL_sound, too?

@icculus
Copy link
Collaborator

icculus commented Feb 21, 2025

(This is worth fixing in SDL_mixer, but to be clear: PhysicsFS lets you buffer data, too: PHYSFS_setBuffer())

Requesting the data one byte at a time can be prohibitively expensive,
especially when the SDL_RWops isn't backed by constant memory but routed
through PhysicsFS for example.

This change introduces a 2kb buffer, which feels small enough to not add
too much overhead and is the same size as used by libvorbisfile. In my
measurements, using a larger buffer no longer made a significant
difference when starting to play an OGG file.

Testing with an uncompressed OGG file read from zip file, timing the
call to Mix_LoadMUS_RW:

* No buffer: ~ 47 ms
* 10 bytes: ~ 5.2 ms
* 100 bytes: ~ 1.1 ms
* 1 kb: ~ 0.62 ms
* 2 kb: ~ 0.6 ms
* 8 kb: ~ 0.58 ms

When using the libvorbisfile backend it takes about 0.2 ms.
@sezero
Copy link
Contributor

sezero commented Feb 23, 2025

I took the libert of amending the C90 patch to this branch: OK to merge?

And, here is the SDL3 version of the patch: 671-3.patch
OK to apply after this? (Or can create a PR..)

@bjorn
Copy link
Author

bjorn commented Feb 24, 2025

This is worth fixing in SDL_mixer, but to be clear: PhysicsFS lets you buffer data, too: PHYSFS_setBuffer())

Great, that would have saved quite some time! A quick comparison for reading a particular OGG file with a 2048 byte buffer yielded:

  • No buffer: 47.59 ms
  • Using SDL_RWops wrapping buffer: 0.86 ms
  • Using PHYSFS_setBuffer: 1.41 ms
  • Using buffer in stb_vorbis: 0.65 ms

So PHYSFS_setBuffer is indeed an acceptable solution even if it has a bit more overhead. This patch is still preferred, even if it's now mostly just because it doesn't require the user to set up the buffer explicitly.

I took the libert of amending the C90 patch to this branch: OK to merge?

@sezero Thanks for the C90 fixes and for porting this patch to SDL3! I would only suggest to amend the commit message to note that you've ported it over, but otherwise feel free to apply it wherever it helps.

(btw, if SDL_mixer is supposed to compile as C90, maybe the CMake project could be set up to enforce this?)

@icculus
Copy link
Collaborator

icculus commented Feb 24, 2025

This can all be applied.

(At @bjorn's request, the SDL3 version should have you as the author, and just note it was based on his work in the commit message, though.)

Thanks for all the effort and benchmarking, @bjorn!

@sezero sezero merged commit da1704a into libsdl-org:SDL2 Feb 24, 2025
8 checks passed
sezero added a commit that referenced this pull request Feb 24, 2025
Requesting the data one byte at a time can be prohibitively expensive,
especially when the SDL_RWops isn't backed by constant memory but routed
through PhysicsFS for example.

This change introduces a 2kb buffer, which feels small enough to not add
too much overhead and is the same size as used by libvorbisfile. In my
measurements, using a larger buffer no longer made a significant
difference when starting to play an OGG file.

Testing with an uncompressed OGG file read from zip file, timing the
call to Mix_LoadMUS_RW:

* No buffer: ~ 47 ms
* 10 bytes: ~ 5.2 ms
* 100 bytes: ~ 1.1 ms
* 1 kb: ~ 0.62 ms
* 2 kb: ~ 0.6 ms
* 8 kb: ~ 0.58 ms

When using the libvorbisfile backend it takes about 0.2 ms.

Fixes #670
Forward port of #671
Original patch by: Thorbjørn Lindeijer <bjorn@lindeijer.nl>
sezero added a commit to icculus/SDL_sound that referenced this pull request Feb 24, 2025
Requesting the data one byte at a time can be prohibitively expensive,
especially when the SDL_RWops isn't backed by constant memory but routed
through PhysicsFS for example.

This change introduces a 2kb buffer, which feels small enough to not add
too much overhead and is the same size as used by libvorbisfile. In my
measurements, using a larger buffer no longer made a significant
difference when starting to play an OGG file.

Testing with an uncompressed OGG file read from zip file, timing the
call to Mix_LoadMUS_RW:

* No buffer: ~ 47 ms
* 10 bytes: ~ 5.2 ms
* 100 bytes: ~ 1.1 ms
* 1 kb: ~ 0.62 ms
* 2 kb: ~ 0.6 ms
* 8 kb: ~ 0.58 ms

When using the libvorbisfile backend it takes about 0.2 ms.

Fixes libsdl-org/SDL_mixer#670
Forward port of libsdl-org/SDL_mixer#671
Original patch by: Thorbjørn Lindeijer <bjorn@lindeijer.nl>
sezero pushed a commit to icculus/SDL_sound that referenced this pull request Feb 24, 2025
Requesting the data one byte at a time can be prohibitively expensive,
especially when the SDL_RWops isn't backed by constant memory but routed
through PhysicsFS for example.

This change introduces a 2kb buffer, which feels small enough to not add
too much overhead and is the same size as used by libvorbisfile. In my
measurements, using a larger buffer no longer made a significant
difference when starting to play an OGG file.

Testing with an uncompressed OGG file read from zip file, timing the
call to Mix_LoadMUS_RW:

* No buffer: ~ 47 ms
* 10 bytes: ~ 5.2 ms
* 100 bytes: ~ 1.1 ms
* 1 kb: ~ 0.62 ms
* 2 kb: ~ 0.6 ms
* 8 kb: ~ 0.58 ms

When using the libvorbisfile backend it takes about 0.2 ms.

From libsdl-org/SDL_mixer#671
See libsdl-org/SDL_mixer#670
@sezero
Copy link
Contributor

sezero commented Feb 24, 2025

OK, patches are in. Applied to SDL_sound, too.

Thanks.

@bjorn
Copy link
Author

bjorn commented Feb 24, 2025

(At @bjorn's request, the SDL3 version should have you as the author, and just note it was based on his work in the commit message, though.)

Hmm, actually when I said "note that you've ported it over" I meant the other way around. I'm a bit sad that this way I don't show up as contributor on GitHub.

I'm glad the patch is in, though!

bjorn added a commit to mana/mana that referenced this pull request Mar 11, 2025
This removes the buffered SDL_RWops functionality that was just added in
51b0c32. We just call PHYSFS_setBuffer
instead.

Using the PhysicsFS buffer is a little bit slower, likely because it
operates at a lower level, but it is fast enough for our purposes.
Uncompressed music files loaded from ZIP files can start playing in
about 1-2 ms.

I've also landed a fix for the problem in SDL_mixer, which works even
better in any case: libsdl-org/SDL_mixer#671
@bjorn bjorn deleted the stb-vorbis-buffer branch June 2, 2025 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants