-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[hal metal] ray tracing acceleration structures #8071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
[hal metal] ray tracing acceleration structures #8071
Conversation
This removes the manual implementation introduced in 7b00140.
This is no longer necessary, since Winit can now properly issue redraw events on resize.
To keep the diff smaller and easier to review, this uses a temporary fork of `objc2-metal` and `objc2-quartz-core` whose methods use the naming scheme of the `metal` crate. One particular difficult part with this is that the `metal` crate has several methods where the order of the arguments are swapped relative to the corresponding Objective-C methods. This includes most perilously (since these have both an offset and an index argument, both of which are integers): - `set_bytes` - `set_vertex_bytes` - `set_fragment_bytes` - `set_buffer` - `set_vertex_buffer` - `set_fragment_buffer` - `set_threadgroup_memory_length` But also: - `set_vertex_texture` - `set_fragment_texture` - `set_sampler_state` - `set_vertex_sampler_state` - `set_fragment_sampler_state` Another noteworthy thing to mention is that `objc2-metal` does not (yet) provide a fallback for MTLCopyAllDevices, so we have to do that ourselves: madsmtm/objc2@3543940
…ation_structure_build_sizes().
e1218ce
to
edeaaba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the migrated things look good, but I'm not sure the residency sets would work in complex scenarios. Please correct me if I've gotten anything wrong.
wgpu-hal/src/metal/command.rs
Outdated
self.state.residency_sets.insert(dst.residency_set.clone()); | ||
dst.residency_set.requestResidency(); | ||
dst.residency_set.removeAllAllocations(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clone of the residency set (and the other clone in build_acceleration_structures
) feel like they are trying to duplicate the residency set, but it seems from reading the docs that they are instead duplicating the Retained
, which will still reference the same residency set. If not, won't this edit the residency set that we've previously added to command encoders?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is supposed to clone the Arc
, not be a deep copy. I can use .retained()
to make it more clear. And yes, it does edit what we will add to command encoders state (self.state
) which is then later committed in when the queue is submitted.
The line self.state.residency_sets.insert(dst.residency_set.clone());
registers the residency_set
for a delayed commitment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to use Arc::clone() to make it explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it is not a Rust Arc
but a objc2 Retained
, so the Clone impl for that would be: https://docs.rs/objc2/latest/objc2/rc/struct.Retained.html#method.retain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that this may still allow state leaks. I think a user could create an encoder which built a empty TLAS without ever submitting it and thereby clear the residency set in another command encoder, which could leave BLASes un-resident.
} | ||
} | ||
super::BufferLikeResource::AccelerationStructure(ptr) => { | ||
encoder.setVertexAccelerationStructure_atBufferIndex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't dependencies be inserted somewhere in set bind group? (not necessarily here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This here is set_bind_group()
.
By inserted
I assume you mean residency_set.makeResident()
, and yes we should probably move that part here.
pub struct AccelerationStructure; | ||
pub struct AccelerationStructure { | ||
raw: Retained<ProtocolObject<dyn MTLAccelerationStructure>>, | ||
residency_set: Retained<ProtocolObject<dyn MTLResidencySet>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about putting the residency sets into the acceleration structure, I thought the main benefit of residency sets was that one could make them after encoding for before execution and so use the data for validating submissions (validate_acceleration_structure_actions
, AsAction::UseTlas
), then send the dependencies down to either the command buffer (e.g device.set_command_buffer_blas_dependencies(cmd_buf, blases)
), or as an extra argument in queue.submit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Residency sets behave like command buffers: You can pre record a bunch of actions without an immediate effect and then commit them later. And that is how it is used here: The residency set is recorded once when its corresponding acceleration structure is build (because that is when the dependencies can actually change) and then that recording is saved to be committed later, when the build action is submitted in a queue.
When we turn our dependency list into a residency set is kinda irrelevant, but we have to allocate residency sets from the device thus it is easiest to have them allocated together with the acceleration structure.
@@ -929,6 +932,7 @@ struct CommandState { | |||
index: Option<IndexState>, | |||
raw_wg_size: MTLSize, | |||
stage_infos: MultiStageData<PipelineStageInfo>, | |||
residency_sets: HashSet<Retained<ProtocolObject<dyn MTLResidencySet>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vecvec we don't need extra interfaces like set_command_buffer_blas_dependencies
because we can just store that in the encoder state here, from which command buffers are then emitted alongside residency sets (see CommandBuffer::residency_sets
).
Open problems remaining:
|
edeaaba
to
1244389
Compare
1244389
to
aa6be88
Compare
@Lichtso I think that we think of |
Yes, supposedly In other words it should be fine to remove live resources from a residency set, as long as those are not committed until all pending command buffers that are still referencing these resources have finished. @Vecvec afaik the device space you have drawn isn't affected until a command buffer referencing a certain |
I don't think it was very clear that what is put in the residency set is not the same as what is made resident, I was meaning calling |
Connections
Fixes: #7402
Depends on: #5641
Supersedes: #7660
Description
Implements the missing ray tracing acceleration structures in the HAL metal backend.
Testing
The examples
ray_scene
,ray_shadows
,ray_cube_compute
,ray_cube_fragment
andray_traced_triangle
all work.That is if invoked via
cargo run --bin wgpu-examples ray_traced_triangle
, but not viacargo xtask test ray_traced_triangle
, still current CI runner is too old to catch that as it does not support hardware ray tracing.Squash or Rebase?
Squash
Checklist
cargo fmt
.taplo format
.cargo clippy --tests
.cargo xtask test
to run tests.CHANGELOG.md
entry.