Skip to content

InitVar dataclass initialization (and subclass checks) #230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

curtischong
Copy link
Collaborator

@curtischong curtischong commented Aug 2, 2025

Summary

In order to properly type SimState, I want to remove the | None type annotation on system_idx. This is solved via the InitVar variable init_system_idx.

In addition to typing, this PR introduces the __post_init__ check for subclasses of SimState. This paradigm allows us to verify properties of derived classes. In particular, this PR uses __post_init__ to verify that all InitVar classes start with the init_ prefix (which is required to properly mangle params during concatenation / splitting of SimStates.

In the next PR, we will use the __post_init__ to enforce that all derived SimState classes cannot have a | None attribute inside (which will break functions like torch.concatenate since we cannot concat attributes that are tensors and attributes that are none - See the description of #229 for more info)

This PR is breaking since we modify the constructor for SimState.

Checklist

Before a pull request can be merged, the following items must be checked:

  • Doc strings have been added in the Google docstring format.
    Run ruff on your code.
  • Tests have been added for any new functionality or bug fixes.

We highly recommended installing the pre-commit hooks running in CI locally to speedup the development process. Simply run pip install pre-commit && pre-commit install to install the hooks which will check your code before each commit.

Summary by CodeRabbit

  • Refactor
    • Improved handling of initialization variables and system indices for simulation state objects.
    • Enforced stricter validation rules on subclassing and tensor attribute definitions.
    • Unified and streamlined state object construction to ensure correct initialization and prevent errors with optional tensors.
    • Updated optimizer state classes to require velocity attributes to always be present.

@cla-bot cla-bot bot added the cla-signed Contributor license agreement signed label Aug 2, 2025
Copy link

coderabbitai bot commented Aug 2, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes refactor how system indices and initialization variables are handled in the simulation state classes. The SimState dataclass now uses an explicit InitVar for initialization, enforces stricter subclass constraints, and introduces a helper function for constructing state objects. Related updates propagate to optimizer state classes and utility functions.

Changes

Cohort / File(s) Change Summary
SimState Initialization & Construction
torch_sim/state.py
Refactored SimState to use init_system_idx as an InitVar, moved system index validation to __post_init__, added __init_subclass__ for stricter subclass checks, and introduced construct_state helper for consistent object instantiation. Updated all relevant methods to use the new construction pattern.
Optimizer State Classes & FIRE Initialization
torch_sim/optimizers.py
Made velocities and cell_velocities non-optional in FIRE-related state classes. Changed attribute name from system_idx to init_system_idx in FIRE optimizer state initialization to align with new SimState construction.
Atoms-to-State Conversion
torch_sim/io.py
Changed argument in atoms_to_state from system_idx=system_idx to init_system_idx=system_idx for compatibility with new SimState initialization.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Atoms
    participant io.py
    participant SimState

    User->>Atoms: Provides Atoms object
    io.py->>Atoms: Reads system_idx from Atoms
    io.py->>SimState: Constructs with init_system_idx=system_idx
    SimState->>SimState: __post_init__ sets system_idx
    SimState-->>io.py: Returns initialized SimState
Loading
sequenceDiagram
    participant Optimizer
    participant FireState
    participant construct_state

    Optimizer->>FireState: Requests initialization
    FireState->>construct_state: Uses new attribute dict with init_system_idx
    construct_state->>FireState: Instantiates with InitVar
    FireState-->>Optimizer: Returns new FireState instance
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

Oh, what a hop in the stateful spring,
With InitVars now doing their thing!
System indices checked, subclasses behave,
Constructors unified, oh how they pave!
FIRE runs faster, no tensors are missed—
A bunny’s delight, in changes like this! 🐇✨

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch init-subclass-checks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@curtischong curtischong marked this pull request as draft August 2, 2025 19:21
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (4)
torch_sim/io.py (2)

306-313: Missing parameter rename in structures_to_state

The system_idx parameter should be renamed to init_system_idx to match the SimState refactoring, similar to the change made in atoms_to_state.

Apply this diff:

 return ts.SimState(
     positions=positions,
     masses=masses,
     cell=cell,
     pbc=True,  # Structures are always periodic
     atomic_numbers=atomic_numbers,
-    system_idx=system_idx,
+    init_system_idx=system_idx,
 )

384-391: Missing parameter rename in phonopy_to_state

The system_idx parameter should be renamed to init_system_idx to match the SimState refactoring.

Apply this diff:

 return ts.SimState(
     positions=positions,
     masses=masses,
     cell=cell,
     pbc=True,
     atomic_numbers=atomic_numbers,
-    system_idx=system_idx,
+    init_system_idx=system_idx,
 )
torch_sim/optimizers.py (2)

865-872: Type inconsistency with velocities initialization in UnitCellFireState

The velocities and cell_velocities fields are typed as non-optional but initialized as None.

Initialize with zero tensors:

 pbc=state.pbc,
-velocities=None,
+velocities=torch.zeros_like(state.positions),
 forces=forces,
 energy=energy,
 stress=stress,
 # Cell attributes
 cell_positions=torch.zeros(n_systems, 3, 3, device=device, dtype=dtype),
-cell_velocities=None,
+cell_velocities=torch.zeros(n_systems, 3, 3, device=device, dtype=dtype),
 cell_forces=cell_forces,

1163-1171: Type inconsistency with velocities initialization in FrechetCellFIREState

The velocities and cell_velocities fields are typed as non-optional but initialized as None.

Initialize with zero tensors:

 pbc=state.pbc,
-velocities=None,
+velocities=torch.zeros_like(state.positions),
 forces=forces,
 energy=energy,
 stress=stress,
 # Cell attributes
 cell_positions=cell_positions,
-cell_velocities=None,
+cell_velocities=torch.zeros(n_systems, 3, 3, device=device, dtype=dtype),
 cell_forces=cell_forces,
🧹 Nitpick comments (3)
torch_sim/optimizers.py (1)

1247-1257: Reconsider velocity initialization strategy

With velocities now being non-optional, the if state.velocities is None: check becomes problematic. Consider either:

  1. Keep velocities optional in the type system, or
  2. Use a different mechanism to track whether velocities have been initialized (e.g., a separate boolean flag)

The current approach of initializing velocities as zero tensors and then checking for None won't work as intended.

torch_sim/state.py (2)

379-432: Consider improving error messages for better developer experience

The validation logic is excellent, but the error messages could be more actionable. Consider adding examples to help developers fix issues quickly.

For example:

 raise TypeError(
     f"Attribute '{attr_name}' in class '{cls.__name__}' is not "
     "allowed to be of type 'torch.Tensor | None'. "
     "Optional tensor attributes are disallowed in SimState "
     "subclasses to prevent concatenation errors.\n"
     "If this attribute will take on a default value in the "
     "post_init method, please use an InitVar for that attribute "
-    "but with a prepended 'init_' to the name. (e.g. init_system_idx)"
+    "but with a prepended 'init_' to the name.\n"
+    f"Example: Change '{attr_name}: torch.Tensor | None' to:\n"
+    f"  {attr_name}: torch.Tensor = field(init=False)\n"
+    f"  init_{attr_name}: InitVar[torch.Tensor | None]"
 )

786-787: Move TypeVar declaration to module level

TypeVar declarations are typically placed at the module level with other imports for better organization and reusability.

Move this to the top of the file after imports:

+SimStateT = TypeVar("SimStateT", bound=SimState)
+

 @dataclass
 class SimState:

And remove lines 786-787.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16bf8f8 and 5cf6b76.

📒 Files selected for processing (3)
  • torch_sim/io.py (1 hunks)
  • torch_sim/optimizers.py (3 hunks)
  • torch_sim/state.py (10 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
torch_sim/io.py (2)
tests/test_io.py (2)
  • test_single_atoms_to_state (58-70)
  • test_multiple_atoms_to_state (73-88)
tests/test_state.py (1)
  • test_initialize_state_from_atoms (308-314)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
  • GitHub Check: test-examples (examples/scripts/7_Others/7.3_Batched_neighbor_list.py)
  • GitHub Check: test-examples (examples/scripts/1_Introduction/1.2_MACE.py)
  • GitHub Check: test-examples (examples/scripts/1_Introduction/1.3_Fairchem.py)
  • GitHub Check: test-examples (examples/scripts/6_Phonons/6.2_QuasiHarmonic_MACE.py)
  • GitHub Check: test-examples (examples/scripts/2_Structural_optimization/2.5_MACE_UnitCellFilter_Gradient_Descen...
  • GitHub Check: test-examples (examples/scripts/2_Structural_optimization/2.7_MACE_FrechetCellFilter_FIRE.py)
  • GitHub Check: test-examples (examples/scripts/2_Structural_optimization/2.4_MACE_FIRE.py)
  • GitHub Check: test-examples (examples/scripts/4_High_level_api/4.2_auto_batching_api.py)
  • GitHub Check: test-examples (examples/scripts/5_Workflow/5.3_Elastic.py)
  • GitHub Check: test-examples (examples/scripts/3_Dynamics/3.12_MACE_NPT_Langevin.py)
  • GitHub Check: test-model (macos-14, 3.12, lowest-direct, mattersim, tests/models/test_mattersim.py)
  • GitHub Check: test-model (macos-14, 3.12, lowest-direct, orb, tests/models/test_orb.py)
  • GitHub Check: test-model (macos-14, 3.11, highest, metatomic, tests/models/test_metatomic.py)
  • GitHub Check: test-model (macos-14, 3.11, highest, mace, tests/test_elastic.py)
  • GitHub Check: test-model (macos-14, 3.11, highest, fairchem, tests/models/test_fairchem.py)
  • GitHub Check: test-model (ubuntu-latest, 3.12, lowest-direct, sevenn, tests/models/test_sevennet.py)
  • GitHub Check: test-model (ubuntu-latest, 3.12, lowest-direct, orb, tests/models/test_orb.py)
  • GitHub Check: test-model (ubuntu-latest, 3.11, highest, sevenn, tests/models/test_sevennet.py)
  • GitHub Check: test-model (ubuntu-latest, 3.11, highest, orb, tests/models/test_orb.py)
  • GitHub Check: test-model (ubuntu-latest, 3.12, lowest-direct, fairchem, tests/models/test_fairchem.py)
  • GitHub Check: test-model (ubuntu-latest, 3.11, highest, fairchem, tests/models/test_fairchem.py)
  • GitHub Check: test-core (ubuntu-latest, 3.12, lowest-direct)
🔇 Additional comments (3)
torch_sim/io.py (1)

238-245: LGTM!

The parameter rename from system_idx to init_system_idx correctly aligns with the SimState refactoring to use InitVar for initialization.

torch_sim/state.py (2)

86-88: LGTM! Clean InitVar implementation

The refactoring of system_idx to use InitVar pattern is well-designed and provides explicit initialization control.


113-131: LGTM! Proper initialization order

The reordering of cell shape adjustment to occur after system_idx initialization is correct, as n_systems calculation depends on system_idx.

Comment on lines 789 to 808
def construct_state(
old_state: SimStateT,
new_state_attrs: dict[str, typing.Any],
) -> SimStateT:
"""Construct a new state from an old state and new state parameters."""
# 1) process the attrs so they are the init params
processed_params = {}
for param in inspect.signature(old_state.__class__).parameters:
if param.startswith("init_"):
# this is an InitVar field
# we need to rename the corresponding field in system_attrs to have
# an "init_" prefix
non_init_attr_name = param.removeprefix("init_")
processed_params[param] = new_state_attrs[non_init_attr_name]
else:
processed_params[param] = new_state_attrs[param]

# 2) construct the new state
return type(old_state)(**processed_params)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add comprehensive docstring to construct_state

The construct_state function is a key utility but lacks documentation. Please add a docstring following Google style as mentioned in the PR objectives.

 def construct_state(
     old_state: SimStateT,
     new_state_attrs: dict[str, typing.Any],
 ) -> SimStateT:
-    """Construct a new state from an old state and new state parameters."""
+    """Construct a new state from an old state and new state parameters.
+    
+    This function handles the mapping of InitVar fields by automatically
+    prefixing the corresponding attribute names with 'init_' when calling
+    the constructor.
+    
+    Args:
+        old_state: The state instance whose type will be used for construction
+        new_state_attrs: Dictionary of attributes for the new state. Keys 
+            corresponding to InitVar fields should not have the 'init_' prefix;
+            it will be added automatically.
+            
+    Returns:
+        A new instance of the same type as old_state with the provided attributes
+        
+    Example:
+        >>> attrs = {'positions': tensor1, 'system_idx': tensor2}
+        >>> new_state = construct_state(old_state, attrs)
+        # This will call type(old_state)(positions=tensor1, init_system_idx=tensor2)
+    """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def construct_state(
old_state: SimStateT,
new_state_attrs: dict[str, typing.Any],
) -> SimStateT:
"""Construct a new state from an old state and new state parameters."""
# 1) process the attrs so they are the init params
processed_params = {}
for param in inspect.signature(old_state.__class__).parameters:
if param.startswith("init_"):
# this is an InitVar field
# we need to rename the corresponding field in system_attrs to have
# an "init_" prefix
non_init_attr_name = param.removeprefix("init_")
processed_params[param] = new_state_attrs[non_init_attr_name]
else:
processed_params[param] = new_state_attrs[param]
# 2) construct the new state
return type(old_state)(**processed_params)
def construct_state(
old_state: SimStateT,
new_state_attrs: dict[str, typing.Any],
) -> SimStateT:
"""Construct a new state from an old state and new state parameters.
This function handles the mapping of InitVar fields by automatically
prefixing the corresponding attribute names with 'init_' when calling
the constructor.
Args:
old_state: The state instance whose type will be used for construction.
new_state_attrs: Dictionary of attributes for the new state. Keys
corresponding to InitVar fields should not have the 'init_' prefix;
it will be added automatically.
Returns:
A new instance of the same type as old_state with the provided attributes.
Example:
>>> attrs = {'positions': tensor1, 'system_idx': tensor2}
>>> new_state = construct_state(old_state, attrs)
# This will call type(old_state)(positions=tensor1, init_system_idx=tensor2)
"""
# 1) process the attrs so they are the init params
processed_params = {}
for param in inspect.signature(old_state.__class__).parameters:
if param.startswith("init_"):
# this is an InitVar field
# we need to rename the corresponding field in system_attrs to have
# an "init_" prefix
non_init_attr_name = param.removeprefix("init_")
processed_params[param] = new_state_attrs[non_init_attr_name]
else:
processed_params[param] = new_state_attrs[param]
# 2) construct the new state
return type(old_state)(**processed_params)
🤖 Prompt for AI Agents
In torch_sim/state.py around lines 789 to 808, the construct_state function
lacks a comprehensive docstring. Add a detailed Google style docstring that
describes the function's purpose, its parameters including old_state and
new_state_attrs with their types, and the return value specifying it returns a
new SimStateT instance constructed from the provided attributes.

@@ -107,24 +110,25 @@ def __post_init__(self) -> None:
f"masses {shapes[1]}, atomic_numbers {shapes[2]}"
)

if self.cell.ndim != 3 and self.system_idx is None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved these checks down since it depends on self.system_idx (also to bundle it with the self.cell.shape[0] check)

@@ -272,7 +276,7 @@ def clone(self) -> Self:
else:
attrs[attr_name] = copy.deepcopy(attr_value)

return self.__class__(**attrs)
return construct_state(self, attrs)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we now have InitVar params in the constructor, creating a class is not as easy as __class__(**attrs). We need to properly handle the init_ constructor params

@curtischong curtischong added the breaking Breaking changes label Aug 2, 2025
@curtischong curtischong changed the title Init subclass checks InitVar dataclass initialization (and subclass checks) Aug 2, 2025
@curtischong
Copy link
Collaborator Author

closing since I don't like this approach. it doesn't feel right to call initialization params init_<attribute> in the constructor

@curtischong curtischong closed this Aug 2, 2025
@curtischong curtischong deleted the init-subclass-checks branch August 2, 2025 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking changes cla-signed Contributor license agreement signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant