Skip to content

[hal metal] ray tracing acceleration structures #8071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: trunk
Choose a base branch
from

Conversation

Lichtso
Copy link
Contributor

@Lichtso Lichtso commented Aug 9, 2025

Connections
Fixes: #7402
Depends on: #5641
Supersedes: #7660

Description
Implements the missing ray tracing acceleration structures in the HAL metal backend.

Testing
The examples ray_scene, ray_shadows, ray_cube_compute, ray_cube_fragment and ray_traced_triangle all work.
That is if invoked via cargo run --bin wgpu-examples ray_traced_triangle, but not via cargo xtask test ray_traced_triangle, still current CI runner is too old to catch that as it does not support hardware ray tracing.

Squash or Rebase?
Squash

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests.
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

madsmtm and others added 9 commits August 9, 2025 14:56
This removes the manual implementation introduced in 7b00140.
This is no longer necessary, since Winit can now properly issue redraw
events on resize.
To keep the diff smaller and easier to review, this uses a temporary
fork of `objc2-metal` and `objc2-quartz-core` whose methods use the
naming scheme of the `metal` crate.

One particular difficult part with this is that the `metal` crate has
several methods where the order of the arguments are swapped relative
to the corresponding Objective-C methods.

This includes most perilously (since these have both an offset and an
index argument, both of which are integers):
- `set_bytes`
- `set_vertex_bytes`
- `set_fragment_bytes`
- `set_buffer`
- `set_vertex_buffer`
- `set_fragment_buffer`
- `set_threadgroup_memory_length`

But also:
- `set_vertex_texture`
- `set_fragment_texture`
- `set_sampler_state`
- `set_vertex_sampler_state`
- `set_fragment_sampler_state`

Another noteworthy thing to mention is that `objc2-metal` does not (yet)
provide a fallback for MTLCopyAllDevices, so we have to do that
ourselves:
madsmtm/objc2@3543940
@Lichtso Lichtso requested a review from a team as a code owner August 9, 2025 13:19
@Lichtso Lichtso force-pushed the objc2_metal/ray_tracing_acceleration_structures branch from e1218ce to edeaaba Compare August 9, 2025 13:21
Copy link
Collaborator

@Vecvec Vecvec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the migrated things look good, but I'm not sure the residency sets would work in complex scenarios. Please correct me if I've gotten anything wrong.

Comment on lines 463 to 467
self.state.residency_sets.insert(dst.residency_set.clone());
dst.residency_set.requestResidency();
dst.residency_set.removeAllAllocations();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clone of the residency set (and the other clone in build_acceleration_structures) feel like they are trying to duplicate the residency set, but it seems from reading the docs that they are instead duplicating the Retained, which will still reference the same residency set. If not, won't this edit the residency set that we've previously added to command encoders?

Copy link
Contributor Author

@Lichtso Lichtso Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to clone the Arc, not be a deep copy. I can use .retained() to make it more clear. And yes, it does edit what we will add to command encoders state (self.state) which is then later committed in when the queue is submitted.

The line self.state.residency_sets.insert(dst.residency_set.clone()); registers the residency_set for a delayed commitment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use Arc::clone() to make it explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it is not a Rust Arc but a objc2 Retained, so the Clone impl for that would be: https://docs.rs/objc2/latest/objc2/rc/struct.Retained.html#method.retain

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that this may still allow state leaks. I think a user could create an encoder which built a empty TLAS without ever submitting it and thereby clear the residency set in another command encoder, which could leave BLASes un-resident.

}
}
super::BufferLikeResource::AccelerationStructure(ptr) => {
encoder.setVertexAccelerationStructure_atBufferIndex(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't dependencies be inserted somewhere in set bind group? (not necessarily here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This here is set_bind_group().

By inserted I assume you mean residency_set.makeResident(), and yes we should probably move that part here.

pub struct AccelerationStructure;
pub struct AccelerationStructure {
raw: Retained<ProtocolObject<dyn MTLAccelerationStructure>>,
residency_set: Retained<ProtocolObject<dyn MTLResidencySet>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about putting the residency sets into the acceleration structure, I thought the main benefit of residency sets was that one could make them after encoding for before execution and so use the data for validating submissions (validate_acceleration_structure_actions, AsAction::UseTlas), then send the dependencies down to either the command buffer (e.g device.set_command_buffer_blas_dependencies(cmd_buf, blases)), or as an extra argument in queue.submit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Residency sets behave like command buffers: You can pre record a bunch of actions without an immediate effect and then commit them later. And that is how it is used here: The residency set is recorded once when its corresponding acceleration structure is build (because that is when the dependencies can actually change) and then that recording is saved to be committed later, when the build action is submitted in a queue.

When we turn our dependency list into a residency set is kinda irrelevant, but we have to allocate residency sets from the device thus it is easiest to have them allocated together with the acceleration structure.

@@ -929,6 +932,7 @@ struct CommandState {
index: Option<IndexState>,
raw_wg_size: MTLSize,
stage_infos: MultiStageData<PipelineStageInfo>,
residency_sets: HashSet<Retained<ProtocolObject<dyn MTLResidencySet>>>,
Copy link
Contributor Author

@Lichtso Lichtso Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vecvec we don't need extra interfaces like set_command_buffer_blas_dependencies because we can just store that in the encoder state here, from which command buffers are then emitted alongside residency sets (see CommandBuffer::residency_sets).

@Lichtso
Copy link
Contributor Author

Lichtso commented Aug 10, 2025

Open problems remaining:

  • How to solve the issue with the tests not passing because the metal debug layer breaks acceleration structures if no window is present.
  • When to call residency_set.endResidency()? Possibly when the next set_bind_group() call happens.
  • What to do about a bindless usage of TLAS and its residency_set.

@Lichtso Lichtso force-pushed the objc2_metal/ray_tracing_acceleration_structures branch from edeaaba to 1244389 Compare August 10, 2025 15:56
@Lichtso Lichtso force-pushed the objc2_metal/ray_tracing_acceleration_structures branch from 1244389 to aa6be88 Compare August 10, 2025 16:04
@Vecvec
Copy link
Collaborator

Vecvec commented Aug 10, 2025

@Lichtso I think that we think of MTLResidencySets differently. I think you think of them like a encoder, where you stage commands and then commit them to run those commands. I think of them like an array of resources which get made resident when commit gets called, and requestResidency tells the driver that you are likely to try to make these resource resident, and to maybe try make them resident if it can so you don't have to block on residency commit.

@Vecvec
Copy link
Collaborator

Vecvec commented Aug 10, 2025

residencysets

I've drawn an image of what I think could happen based on my understanding of residency sets. I've used green to be user space and red to be device space.

Edit: it's not very clear, but the resources in the residency set are only made resident at queue submit (when it gets committed), not when they are added to the residency set

@MarijnS95
Copy link
Contributor

MarijnS95 commented Aug 10, 2025

Yes, supposedly MTLResidencySet doesn't make a "shadow copy" whenever it is attached to anything. Instead commit() simply applies all the add/remove changes, that any command buffer to which it is attached can subsequently read to know which resources need to be made or kept resident. (Or the rest of the system can read to evict live resources when starved for memory, based on running command buffers?).

In other words it should be fine to remove live resources from a residency set, as long as those are not committed until all pending command buffers that are still referencing these resources have finished.


@Vecvec afaik the device space you have drawn isn't affected until a command buffer referencing a certain MTLResidencySet is submitted and starts running (and it'll reference the most recently committed state), or explicit residency is requested ahead of time using requestResidency() (which doesn't preclude attaching it to a queue or command buffer).

@Vecvec
Copy link
Collaborator

Vecvec commented Aug 10, 2025

@Vecvec afaik the device space you have drawn isn't affected until a command buffer referencing a certain MTLResidencySet is submitted and starts running (and it'll reference the most recently committed state), or explicit residency is requested ahead of time using requestResidency() (which doesn't preclude attaching it to a queue or command buffer).

I don't think it was very clear that what is put in the residency set is not the same as what is made resident, I was meaning calling addAllocation which does happen in the encoder calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Ray Tracing on Metal
5 participants