Skip to content

Adds contact point location reporting to ContactSensor #2842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Aug 14, 2025

Conversation

jtigue-bdai
Copy link
Collaborator

@jtigue-bdai jtigue-bdai commented Jul 2, 2025

Description

This PR:

  • Adds ContactSensorCfg.track_contact_points to toggle tracking of contact point locations between sensor bodies and filtered bodies.
  • Adds ContactSensorCfg.max_contact_data_per_prim to configure the maximum amount of contacts per sensor body.
  • Adds ContactSensorData.contact_pos_w data field for tracking contact point locations.

Fixes # (issue)

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@jtigue-bdai jtigue-bdai marked this pull request as ready for review July 2, 2025 18:41
@jtigue-bdai jtigue-bdai self-assigned this Jul 2, 2025
@ooctipus
Copy link
Collaborator

ooctipus commented Jul 9, 2025

This is so nice, I can see it be very useful!!

@virgilio96upm
Copy link
Contributor

Hi! Thanks so much for implementing this feature. I really appreciate the effort!

I had previously attempted a first version in #2026 and ran into similar performance issues. I was curious: have you tested this implementation in simulation with more than 1000 environments? I'd be interested to know how it impacts simulation time, and whether the reported forces per point remain accurate at that scale.

@ooctipus
Copy link
Collaborator

ooctipus commented Jul 17, 2025

@virgilio96upm
There is nothing we can speed up on TensorAPI side, it is likely already very performant. On the lab side, if we use vectorized processing we can speed up the location reporting a lot. In my recent profiling, the vectorized verision seems to scale quite well.

Batch Device Vectorized (ms) Loop (ms)
128 cpu 0.144 4.824
128 cuda 0.070 13.539
512 cpu 0.667 19.592
512 cuda 0.088 53.661
2048 cpu 1.365 76.604
2048 cuda 0.067 206.825
4096 cpu 1.938 150.368
4096 cuda 0.067 413.493

jtigue-bdai and others added 2 commits July 18, 2025 15:28
Signed-off-by: James Tigue <166445701+jtigue-bdai@users.noreply.github.com>
@jtigue-bdai
Copy link
Collaborator Author

Thank you @ooctipus for helping vectorize the operations. This should be good to review.

@ooctipus
Copy link
Collaborator

Just review, LGTM! Thank you James for bringing this great feature.

@zhexulong
Copy link

zhexulong commented Aug 8, 2025

Update: After multiple attempts, this issue was resolved by increasing maxContactDataCount to 2000.
I intend to leave this post so that anyone who feels confused about the same issue later can get help.


I've tried implementing the changes from PR #2842 with Isaac Sim 5.0.0 and Isaac Lab 2.2.0, and encountered issues in my contact-rich environment. Specifically, the runtime error involves contact data handling and CUDA assertions: contact data points exceed the predefined buffer limit, leading to PyTorch index out-of-bounds errors on the GPU, and ultimately a CUDA device-side assertion failure.

Error Logs

  1. PhysX Contact Data Warning:

    [Warning] [omni.physx.tensors.plugin] Incomplete contact data is reported in GpuRigidContactView::getContactData because there are more contact data points than specified maxContactDataCount = 100.
    
  2. PyTorch CUDA Index Errors:

    /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [1,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [1,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [1,0,0], thread: [98,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [1,0,0], thread: [99,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    
  3. Crash Stack Trace:

    RuntimeError: CUDA error: device-side assert triggered
    ...
    File "/workspace/isaaclab/source/isaaclab/isaaclab/envs/manager_based_rl_env.py", line 82, in __init__
     super().__init__(cfg=cfg)
    File "/workspace/isaaclab/source/isaaclab/isaaclab/envs/manager_based_env.py", line 170, in __init__
     self.scene.update(dt=self.physics_dt)
    File "/workspace/isaaclab/source/isaaclab/isaaclab/scene/interactive_scene.py", line 496, in update
     sensor.update(dt, force_recompute=not self.cfg.lazy_sensor_update)
    File "/workspace/isaaclab/source/isaaclab/isaaclab/sensors/sensor_base.py", line 191, in update
     self._update_outdated_buffers()
    File "/workspace/isaaclab/source/isaaclab/isaaclab/sensors/sensor_base.py", line 353, in _update_outdated_buffers
     self._update_buffers_impl(outdated_env_ids)
    File "/workspace/isaaclab/source/isaaclab/isaaclab/sensors/contact_sensor/contact_sensor.py", line 392, in _update_buffers_impl
    pts = buffer_contact_points[buffer_start_indices.view(-1).unsqueeze(1) + rel]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
    
    
  4. Debug logs

buffer_contact_points shape: torch.Size([1000, 3])
buffer_contact_points: tensor([[0.1370, 0.5897, 0.9309],
        [0.0927, 0.6344, 0.9370],
        [0.1169, 0.5697, 0.9310],
        ...,
        [0.1303, 0.6089, 0.9668],
        [0.0980, 0.5708, 0.8940],
        [0.0690, 0.6084, 0.9158]], device='cuda:0')
buffer_start_indices shape: torch.Size([1, 14])
buffer_start_indices: tensor([[  0,   8,  61, 120, 202, 335, 395, 458, 560, 583, 741, 889, 897, 928]],
       device='cuda:0', dtype=torch.int32)
rel shape: torch.Size([14, 158])
rel: tensor([[  0,   1,   2,  ..., 155, 156, 157],
        [  0,   1,   2,  ..., 155, 156, 157],
        [  0,   1,   2,  ..., 155, 156, 157],
        ...,
        [  0,   1,   2,  ..., 155, 156, 157],
        [  0,   1,   2,  ..., 155, 156, 157],
        [  0,   1,   2,  ..., 155, 156, 157]], device='cuda:0')

@ooctipus
Copy link
Collaborator

@zhexulong

Thanks for trying out,
do you think maybe is there some precheck we can do so that if such error is possible to can raise the error ahead instead have the user to encounter illegal memory access error?

@zhexulong
Copy link

@zhexulong

Thanks for trying out, do you think maybe is there some precheck we can do so that if such error is possible to can raise the error ahead instead have the user to encounter illegal memory access error?

@ooctipus

Instead of doing some prechecks, I might be more inclined to think this is a problematic implementation.

The code:

rel = torch.arange(max_count, device=counts.device).unsqueeze(0).expand(counts.size(0), max_count)
pts = buffer_contact_points[buffer_start_indices.view(-1).unsqueeze(1) + rel]

In this code, if the result of buffer_start_indices + rel exceeds the total length of buffer_contact_points, it will cause an index out-of-bounds error.

We know buffer_start_indices[-1] + counts[-1] = len(buffer_contact_points).Therefore, if counts[-1] (count of points corresponding to buffer_start_indices[-1]) is less than max_count, the maximum value of buffer_start_indices[-1] + rel will reach buffer_start_indices[-1] + max_count > len(buffer_contact_points), thus causing an error.

Perhaps the original implementation here attempted to accelerate via parallel processing, but it actually caused errors. However, if we adopt the correct implementation below, I'm concerned that it may not be as efficient as the current version.

(Refer to NVIDIA's official RigidContactView example):

import omni.physics.tensors as tensors
sim_view = tensors.create_simulation_view("warp")
rigid_contact_view = sim_view.create_rigid_contact_view("/World/Cube_*") # This assumes that the prims referenced by "/World/Cube_*" were already created in the stage
forces, points, normals, distances, counts, start_indices = rigid_contact_view.get_contact_data(dt) # Get the detailed contact data between sensors and filter prims per patch

forces_np = forces.numpy().reshape(rigid_contact_view.max_contact_data_count) # Reshape the obtained array in a 1D numpy array on the host
points_np = points.numpy().reshape(rigid_contact_view.max_contact_data_count * 3) # Reshape the obtained array in a 1D numpy array on the host
normals_np = normals.numpy().reshape(rigid_contact_view.max_contact_data_count * 3) # Reshape the obtained array in a 1D numpy array on the host
distances_np = distances.numpy().reshape(rigid_contact_view.max_contact_data_count) # Reshape the obtained array in a 1D numpy array on the host
counts_np = counts.numpy().reshape(rigid_contact_view.sensor_count, rigid_contact_view.filter_count) # Reshape the obtained array in a 2D numpy array on the host
start_indices_np = start_indices.numpy().reshape(rigid_contact_view.sensor_count, rigid_contact_view.filter_count) # Reshape the obtained array in a 2D numpy array on the host


start_indices_ij = start_indices_np[i, j]
count_ij = counts_np[i, j]
if count_ij > 0:
    forces_ij = forces_np[start_indices_ij:start_indices_ij + count_ij] * normals_np[start_indices_ij:start_indices_ij + count_ij]
force_aggregate = numpy.sum(forces_ij, axis=0)

@ooctipus
Copy link
Collaborator

ooctipus commented Aug 10, 2025

@zhexulong good catch — the bug was doing advanced indexing before masking via max_count, which can produce OOB when a row’s count < max_count (especially for the last block). I switched to build a flat index of only valid contacts, then aggregating on device. This keeps it safe and still O(total_valid_contacts) with no host hops.

        if self.cfg.track_contact_points:
            _, buffer_contact_points, _, _, buffer_count, buffer_start_indices = (
                self.contact_physx_view.get_contact_data(dt=self._sim_physics_dt)
            )
            # unpack the contact points: see RigidContactView.get_contact_data() documentation for details:
            # https://docs.omniverse.nvidia.com/kit/docs/omni_physics/107.3/extensions/runtime/source/omni.physics.tensors/docs/api/python.html#omni.physics.tensors.impl.api.RigidContactView.get_net_contact_forces
            # buffer_count: (N_envs * N_bodies, N_filters), buffer_contact_points: (N_envs * N_bodies, 3)
            counts, starts = buffer_count.view(-1), buffer_start_indices.view(-1)
            n_rows, total = counts.numel(), int(counts.sum())
            # default to NaN rows
            agg = torch.full((n_rows, 3), float("nan"), device=self._device, dtype=buffer_contact_points.dtype)
            if total > 0:
                row_ids = torch.repeat_interleave(torch.arange(n_rows, device=self._device), counts)
                total = row_ids.numel()

                block_starts = counts.cumsum(0) - counts
                deltas = torch.arange(total, device=counts.device) - block_starts.repeat_interleave(counts)
                flat_idx = starts[row_ids] + deltas

                pts = buffer_contact_points.index_select(0, flat_idx)
                agg = agg.zero_().index_add_(0, row_ids, pts) / counts.clamp_min(1).unsqueeze(1)
                agg[counts == 0] = float("nan")

            self._contact_position_aggregate_buffer[:] = agg.view(self._num_envs * self.num_bodies, -1, 3)
            self._data.contact_pos_w[env_ids] = self._contact_position_aggregate_buffer.view(
                self._num_envs, self._num_bodies, self.contact_physx_view.filter_count, 3
            )[env_ids]

This code block is passing the tests, could you please try if it works for you as well? If it does, we can replace the original implementation with this new one.

@zhexulong
Copy link

zhexulong commented Aug 10, 2025

@ooctipus
I apologize for my previous incorrect understanding of RigidContactView. The equation buffer_start_indices[-1] + counts[-1] = len(buffer_contact_points) does not hold. The shape of buffer_start_indices is (sensor_count, filter_count), so buffer_start_indices[-1] refers to the index corresponding to the last filter, which is irrelevant to the size of buffer_contact_points. buffer_contact_points is only related to the max_contact_data_count we set. The equation should be corrected to: buffer_start_indices[-1] + counts[-1] = total number of contacts.

But even if max_contact_data_count is greater than the total number of contacts, the previous implementation might still cause errors in some edge cases ( buffer_start_indices[-1] + max_count may be greater than the total number of contacts). Therefore, your new implementation may have resolved that issue. However, since I introduce some randomness every time the environment resets to ensure the collection of diverse data, I might not be able to reproduce such edge cases.

For me, the actual issue with the previous error should be that the contact count[-1] (the fifth return value of get_contact_data) and buffer_start_indices[-1] (the sixth return value of get_contact_data) is greater than max_contact_data_count, leading to an out-of-bounds error for buffer_contact_points.This can also be understood as the total number of contacts for one of our sensors exceeding max_contact_data_count.This is precisely the reason for the errors in my rich contact force environment, and this is also related to the following warning:

[Warning] [omni.physx.tensors.plugin] Incomplete contact data is reported in GpuRigidContactView::getContactData because there are more contact data points than specified maxContactDataCount = 10.

However, I'm not sure if Isaac Lab has a responsibility to provide additional prompts to users of this feature beyond this warning, informing them that if your maxContactDataCount is too small, the contact data you obtain will be incomplete and the contact_pos_w you get may be inaccurate. If necessary, perhaps a pre-check is also necessary?

You can refer to the definition of the data structure returned by get_contact_data:

The first array is the contact forces with shape (max_contact_data_count, 1) where max_contact_data_count is the maximum number of contact data that can be stored in the view.
The second array is the contact points with shape (max_contact_data_count, 3) where the 3 elements of the last dimension are the x, y, z components of the contact points.
The third array is the contact normals with shape (max_contact_data_count, 3) where the 3 elements of the last dimension are the x, y, z components of the contact normals vector.
The fourth array is the separation distances with shape (max_contact_data_count, 1).
The fifth array is the contact count buffer with shape (sensor_count, filter_count) where sensor_count is the number of sensors in the view and filter_count is the number of filter prims in the view.
The sixth array is the start indices buffer with shape (sensor_count, filter_count).

Copy link
Collaborator

@ooctipus ooctipus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you @jtigue-bdai

ooctipus and others added 2 commits August 14, 2025 07:40
Signed-off-by: ooctipus <zhengyuz@nvidia.com>
Signed-off-by: James Tigue <166445701+jtigue-bdai@users.noreply.github.com>
Signed-off-by: James Tigue <166445701+jtigue-bdai@users.noreply.github.com>
Signed-off-by: James Tigue <166445701+jtigue-bdai@users.noreply.github.com>
@ooctipus ooctipus merged commit 626e08e into isaac-sim:main Aug 14, 2025
7 of 8 checks passed
StephenWelch pushed a commit to StephenWelch/IsaacLab that referenced this pull request Aug 17, 2025
# Description

This PR: 

- Adds ContactSensorCfg.track_contact_points to toggle tracking of
contact point locations between sensor bodies and filtered bodies.
- Adds ContactSensorCfg.max_contact_data_per_prim to configure the
maximum amount of contacts per sensor body.
- Adds ContactSensorData.contact_pos_w data field for tracking contact
point locations.

Fixes # (issue)

<!-- As a practice, it is recommended to open an issue to have
discussions on the proposed pull request.
This makes it easier for the community to keep track of what is being
developed or added, and if a given feature
is demanded by more than one party. -->

## Type of change

<!-- As you go through the list, delete the ones that are not
applicable. -->

- New feature (non-breaking change which adds functionality)

## Checklist

- [ ] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [ ] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------

Signed-off-by: James Tigue <166445701+jtigue-bdai@users.noreply.github.com>
Signed-off-by: Kelly Guo <kellyg@nvidia.com>
Signed-off-by: ooctipus <zhengyuz@nvidia.com>
Co-authored-by: Ashwin Khadke <133695616+akhadke-bdai@users.noreply.github.com>
Co-authored-by: Kelly Guo <kellyg@nvidia.com>
Co-authored-by: ooctipus <zhengyuz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants