Skip to content

Classify all variables of a SimState as per-node, per-system, and global features #227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

curtischong
Copy link
Collaborator

@curtischong curtischong commented Jul 26, 2025

Summary

This PR makes handling SimStates much simpler. Rather than guessing if an attribute is per-node/system/global, we just know. My solution is to use a dictionary to store ALL of a State's attributes:

node_features: dict[str, torch.Tensor]
system_features: dict[str, torch.Tensor]
global_features: dict[str, torch.Tensor]
  • Even states like cell/pbc/system_index etc. are stored inside these dictionaries. By not handling exceptions we make iterating through these properties much simpler.

For ease of access to these "standard" properties, I've added custom getters/setters for these properties:

@property
def positions(self) -> torch.Tensor:
    return self.node_features["positions"]

@positions.setter
def positions(self, positions: torch.Tensor) -> None:
    self.node_features["positions"] = positions

Checklist

Before a pull request can be merged, the following items must be checked:

  • Doc strings have been added in the Google docstring format.
    Run ruff on your code.
  • Tests have been added for any new functionality or bug fixes.

We highly recommended installing the pre-commit hooks running in CI locally to speedup the development process. Simply run pip install pre-commit && pre-commit install to install the hooks which will check your code before each commit.

@cla-bot cla-bot bot added the cla-signed Contributor license agreement signed label Jul 26, 2025
@curtischong curtischong marked this pull request as draft July 26, 2025 19:54
Copy link

coderabbitai bot commented Jul 26, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch classify-range-of-simstate-feats

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@curtischong curtischong changed the title Classify range of simstate feats Classify all variables of a SimState as per-node, per-system, and global features Jul 26, 2025
@orionarcher
Copy link
Collaborator

orionarcher commented Jul 26, 2025

HI @curtischong, this is something I thought about a fair amount and I am happy to revisit. I am not convinced I made the right decision to make everything implicit. I have a few thoughts here:

  1. I did consider making the atom, batch, and global features explicit in the SimState. The advantage is obvious: it's more explicit and we no longer have to call infer_property_scope. The disadvantage is more subtle: it adds unneeded bloat and complicates the definition of every single State that inherits from SimState. In practice infer_property_scope is pretty cheap and fails infrequently.
  2. If we did want to make the distinction explicit, I'd advocate for just using a tuple of strings instead of a dict. These are immutable and contain the same information. Then we could just rename infer_property_scope -> return_property_scope and have it return the tuples directly instead of inferring them. Something like this:
class SimState:
    positions: torch.Tensor
    masses: torch.Tensor
    cell: torch.Tensor
    pbc: bool  # TODO: do all calculators support mixed pbc?
    atomic_numbers: torch.Tensor
    system_idx: torch.Tensor | None = field(default=None, kw_only=True)
    _atom_features: tuple[str] = ("positions", "masses", "atomic_numbers")
    _system_features: tuple[str] = ("cell", "system_idx")
    _global_features: tuple[str] = ("pbc")
    
    
    @property
    def atom_features(self) -> torch.Tensor:
        return self._atom_features

    @property
    def atom_features(self) -> torch.Tensor:
        return self._atom_features

    @property
    def atom_features(self) -> torch.Tensor:
        return self._atom_features

    def return_property_scope(self):
        return {"global": self.global_features, "per_atom": self.atom_features", "per_system": self.system_features}
  1. I think adding setters and getters for every attribute is way too bloated. That would have to be done for every single State. What does it add?

  2. If we think this is the right option, let's use atom instead of node.

@curtischong
Copy link
Collaborator Author

curtischong commented Jul 26, 2025

Thank you for your response Orion. The main reason why I'm doing this is because I'm trying to get type safety in torchsim. Having type safety can catch many bugs which is why it's important.

If we do not explicitly define the attributes we cannot guarantee type safely. (e.g. when we call getattr here https://github.com/Radical-AI/torch-sim/blob/main/torch_sim/state.py#L100 the types are not enforced.

Like you said, by removing infer_property_scope, we no longer hit the edge cases that can crash torch sim. I believe that the orb models are still failing because of something related to infer_property_scope

I agree that manually typing out the getters/setters is ugly as it covers many lines. I'll research to see if I can make it simpler. But having getters/setters does have more benefits:

  • We can guarantee type safety when variables are accessed / set
  • We can run code when they set attributes (e.g. warn the user if they initialize velocities with NaN)
  • The bloat is inside the library, the user doesn't see this.

I agree with the atoms > node definition if this does go in

Copy link
Collaborator

@orionarcher orionarcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I am a big fan of typing and would support implementing ty for static type analysis. Runtime type checking, however, will add both code complexity and (a tiny bit) of computational cost.

Even if we decided that was worth it (maybe it is), I am not sure consolidating the variables into three attributes is the best approach. It makes it a bit less readable and removes the assurance that all necessary attributes are defined. It would also make the autocomplete engines less reliable at inferring what attributes are valid. We could instead:

  1. leave all the attributes as is, no need for getters
  2. run a method in the post_init that adds setters for every attribute that check for shape and type.

@curtischong
Copy link
Collaborator Author

curtischong commented Jul 27, 2025

I agree. The three variable thing isn't very nice. I think your suggestion works.

   _atom_features: tuple[str] = ("positions", "masses", "atomic_numbers")
    _system_features: tuple[str] = ("cell", "system_idx")
    _global_features: tuple[str] = ("pbc")

@curtischong
Copy link
Collaborator Author

closing in favor of #228

@curtischong curtischong deleted the classify-range-of-simstate-feats branch July 27, 2025 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed Contributor license agreement signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants