-
Notifications
You must be signed in to change notification settings - Fork 171
vello_hybrid: remove suspending work from scheduler #1174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
#[cfg(debug_assertions)] | ||
clear: [Vec::new(), Vec::new()], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the clear debug assertion because it was not providing value. I couldn't get tests to fail or provide any strong signal with it.
Removing it cleans up a bunch of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for explaining this so well in our call. LGTM - really makes sense! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see that we can rely solely on round as the synchronization primitive! LGTM!
As a side note, it might be worth adding some tests to the scheduler, not just for verification, but also to clarify the expected states of texture slots.
This PR addresses multiple tasks tracked in #1165
Context
Recently we landed #1155 but it had lots of caveats. It also utilised suspending wide tile commands in order to pack rounds. A large pain point is that the system could deadlock itself.
At renderer office hours we discussed pre-planning out wide tiles into rounds and whether suspending was necessary. It was agreed that suspending was used as it was lower cost – but I felt like I hadn't explored a non-suspending solution adequately.
Originally I was working towards solving the deadlock issue – which would require either removing the suspending concept, or making suspending smarter such that it wouldn't get into deadlocks as much. Adding another system on top of the scheduler to "guess" resources to prevent deadlocks was really thorny and complex, so I also explored simply removing the complexity of suspending wide tile commands, and it worked!
This PR removes the entire concept of suspending.
Removing suspending lets us:
How
The most pertinent changes in this PR are the round calculations in
PushBuf
. I messed up the rounds where we copy the destination slot contents back to the temporary texture. By fixing these rounds, the whole system "just works" without suspending because work is scheduled into rounds correctly.What is the insight and why could suspending be removed?
On
main
note that suspending only happens when pushing a buffer. It happens when the current top of stack (tos
) has an invalid temporary slot which needs to be copied to the alternate texture such that a future blend operation will work.There was a bug such that rounds were calculated incorrectly in push buffer. Hence, by suspending we would force enough rounds to pass such that temporary slots are made valid.
The key insight is that
rounds
tracked on the tiles track the round in which that tile is doing work. You can do work in parallel on different tiles in the stack as long as they are mutually exclusive. It's only when blending or popping buffers that wemax(nos.round, tos.round)
to synchronize the rounds and ensure that all work is done before blending.This insight is what lets us fix the round calculations for compositing and allows us to remove suspending as a hack to get around the round calculation bug. This explains why we get better round utilization with this PR.
But, I also vigorously empirically proved that this change is extremely worthwhile.
Empirical Tests
Round stats
With the following code in the
Scheduler::flush
method we can capture some insightful statistics:Command:
cargo nextest run complex_composed_layers_hybrid --locked --all-features --no-fail-fast --no-capture
On
main
:This PR:
This PR results in fewer rounds for drawing the same complex blend scene.
Deadlock tests
This was tested with the following diff:
The command:
cargo nextest run --locked --all-features --no-fail-fast --release
run onsparse_strips
as working directory.With the extremely limited slots, this PR passes all tests. On main, the tests fail with a deadlock.
wgpu_webgl
example (scalar)(this is very rough and measured while dragging and zooming)
On main each rAF for the blend example is ~6.5ms.
This PR makes the rAF ~5ms.
Test plan
Tested manually with
cargo run_wasm -p wgpu_webgl --release
,cargo run -p vello_hybrid_winit --release
, and by leaning on the existing test corpus of tests that include hybrid compositing tests.Risks
I don't think there are major risks introduced by this PR. This PR really polishes rough edges introduced by the initial blend layers PR and eliminates the deadlock hazard.