InitVar dataclass initialization (and subclass checks) #230

curtischong · 2025-08-02T19:21:22Z

Summary

In order to properly type SimState, I want to remove the | None type annotation on system_idx. This is solved via the InitVar variable init_system_idx.

In addition to typing, this PR introduces the __post_init__ check for subclasses of SimState. This paradigm allows us to verify properties of derived classes. In particular, this PR uses __post_init__ to verify that all InitVar classes start with the init_ prefix (which is required to properly mangle params during concatenation / splitting of SimStates.

In the next PR, we will use the __post_init__ to enforce that all derived SimState classes cannot have a | None attribute inside (which will break functions like torch.concatenate since we cannot concat attributes that are tensors and attributes that are none - See the description of #229 for more info)

This PR is breaking since we modify the constructor for SimState.

Checklist

Before a pull request can be merged, the following items must be checked:

Doc strings have been added in the Google docstring format.
Run ruff on your code.
Tests have been added for any new functionality or bug fixes.

We highly recommended installing the pre-commit hooks running in CI locally to speedup the development process. Simply run pip install pre-commit && pre-commit install to install the hooks which will check your code before each commit.

Summary by CodeRabbit

Refactor
- Improved handling of initialization variables and system indices for simulation state objects.
- Enforced stricter validation rules on subclassing and tensor attribute definitions.
- Unified and streamlined state object construction to ensure correct initialization and prevent errors with optional tensors.
- Updated optimizer state classes to require velocity attributes to always be present.

coderabbitai · 2025-08-02T19:21:29Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes refactor how system indices and initialization variables are handled in the simulation state classes. The SimState dataclass now uses an explicit InitVar for initialization, enforces stricter subclass constraints, and introduces a helper function for constructing state objects. Related updates propagate to optimizer state classes and utility functions.

Changes

Cohort / File(s)	Change Summary
SimState Initialization & Construction `torch_sim/state.py`	Refactored `SimState` to use `init_system_idx` as an `InitVar`, moved system index validation to `__post_init__`, added `__init_subclass__` for stricter subclass checks, and introduced `construct_state` helper for consistent object instantiation. Updated all relevant methods to use the new construction pattern.
Optimizer State Classes & FIRE Initialization `torch_sim/optimizers.py`	Made `velocities` and `cell_velocities` non-optional in FIRE-related state classes. Changed attribute name from `system_idx` to `init_system_idx` in FIRE optimizer state initialization to align with new `SimState` construction.
Atoms-to-State Conversion `torch_sim/io.py`	Changed argument in `atoms_to_state` from `system_idx=system_idx` to `init_system_idx=system_idx` for compatibility with new `SimState` initialization.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Atoms
    participant io.py
    participant SimState

    User->>Atoms: Provides Atoms object
    io.py->>Atoms: Reads system_idx from Atoms
    io.py->>SimState: Constructs with init_system_idx=system_idx
    SimState->>SimState: __post_init__ sets system_idx
    SimState-->>io.py: Returns initialized SimState

sequenceDiagram
    participant Optimizer
    participant FireState
    participant construct_state

    Optimizer->>FireState: Requests initialization
    FireState->>construct_state: Uses new attribute dict with init_system_idx
    construct_state->>FireState: Instantiates with InitVar
    FireState-->>Optimizer: Returns new FireState instance

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

Oh, what a hop in the stateful spring,
With InitVars now doing their thing!
System indices checked, subclasses behave,
Constructors unified, oh how they pave!
FIRE runs faster, no tensors are missed—
A bunny’s delight, in changes like this! 🐇✨

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch init-subclass-checks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (4)

torch_sim/io.py (2)
306-313: Missing parameter rename in structures_to_state

The system_idx parameter should be renamed to init_system_idx to match the SimState refactoring, similar to the change made in atoms_to_state.

Apply this diff:
 return ts.SimState(
     positions=positions,
     masses=masses,
     cell=cell,
     pbc=True,  # Structures are always periodic
     atomic_numbers=atomic_numbers,
-    system_idx=system_idx,
+    init_system_idx=system_idx,
 )
384-391: Missing parameter rename in phonopy_to_state

The system_idx parameter should be renamed to init_system_idx to match the SimState refactoring.

Apply this diff:
 return ts.SimState(
     positions=positions,
     masses=masses,
     cell=cell,
     pbc=True,
     atomic_numbers=atomic_numbers,
-    system_idx=system_idx,
+    init_system_idx=system_idx,
 )
torch_sim/optimizers.py (2)
865-872: Type inconsistency with velocities initialization in UnitCellFireState

The velocities and cell_velocities fields are typed as non-optional but initialized as None.

Initialize with zero tensors:
 pbc=state.pbc,
-velocities=None,
+velocities=torch.zeros_like(state.positions),
 forces=forces,
 energy=energy,
 stress=stress,
 # Cell attributes
 cell_positions=torch.zeros(n_systems, 3, 3, device=device, dtype=dtype),
-cell_velocities=None,
+cell_velocities=torch.zeros(n_systems, 3, 3, device=device, dtype=dtype),
 cell_forces=cell_forces,
1163-1171: Type inconsistency with velocities initialization in FrechetCellFIREState

The velocities and cell_velocities fields are typed as non-optional but initialized as None.

Initialize with zero tensors:
 pbc=state.pbc,
-velocities=None,
+velocities=torch.zeros_like(state.positions),
 forces=forces,
 energy=energy,
 stress=stress,
 # Cell attributes
 cell_positions=cell_positions,
-cell_velocities=None,
+cell_velocities=torch.zeros(n_systems, 3, 3, device=device, dtype=dtype),
 cell_forces=cell_forces,

🧹 Nitpick comments (3)

torch_sim/optimizers.py (1)

1247-1257: Reconsider velocity initialization strategy

With velocities now being non-optional, the if state.velocities is None: check becomes problematic. Consider either:

Keep velocities optional in the type system, or

Use a different mechanism to track whether velocities have been initialized (e.g., a separate boolean flag)

The current approach of initializing velocities as zero tensors and then checking for None won't work as intended.
torch_sim/state.py (2)
379-432: Consider improving error messages for better developer experience

The validation logic is excellent, but the error messages could be more actionable. Consider adding examples to help developers fix issues quickly.

For example:
 raise TypeError(
     f"Attribute '{attr_name}' in class '{cls.__name__}' is not "
     "allowed to be of type 'torch.Tensor | None'. "
     "Optional tensor attributes are disallowed in SimState "
     "subclasses to prevent concatenation errors.\n"
     "If this attribute will take on a default value in the "
     "post_init method, please use an InitVar for that attribute "
-    "but with a prepended 'init_' to the name. (e.g. init_system_idx)"
+    "but with a prepended 'init_' to the name.\n"
+    f"Example: Change '{attr_name}: torch.Tensor | None' to:\n"
+    f"  {attr_name}: torch.Tensor = field(init=False)\n"
+    f"  init_{attr_name}: InitVar[torch.Tensor | None]"
 )
786-787: Move TypeVar declaration to module level

TypeVar declarations are typically placed at the module level with other imports for better organization and reusability.

Move this to the top of the file after imports:
+SimStateT = TypeVar("SimStateT", bound=SimState)
+

 @dataclass
 class SimState:
And remove lines 786-787.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16bf8f8 and 5cf6b76.

📒 Files selected for processing (3)

torch_sim/io.py (1 hunks)
torch_sim/optimizers.py (3 hunks)
torch_sim/state.py (10 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

torch_sim/io.py (2)

tests/test_io.py (2)

test_single_atoms_to_state (58-70)

test_multiple_atoms_to_state (73-88)

tests/test_state.py (1)

test_initialize_state_from_atoms (308-314)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)

GitHub Check: test-examples (examples/scripts/7_Others/7.3_Batched_neighbor_list.py)
GitHub Check: test-examples (examples/scripts/1_Introduction/1.2_MACE.py)
GitHub Check: test-examples (examples/scripts/1_Introduction/1.3_Fairchem.py)
GitHub Check: test-examples (examples/scripts/6_Phonons/6.2_QuasiHarmonic_MACE.py)
GitHub Check: test-examples (examples/scripts/2_Structural_optimization/2.5_MACE_UnitCellFilter_Gradient_Descen...
GitHub Check: test-examples (examples/scripts/2_Structural_optimization/2.7_MACE_FrechetCellFilter_FIRE.py)
GitHub Check: test-examples (examples/scripts/2_Structural_optimization/2.4_MACE_FIRE.py)
GitHub Check: test-examples (examples/scripts/4_High_level_api/4.2_auto_batching_api.py)
GitHub Check: test-examples (examples/scripts/5_Workflow/5.3_Elastic.py)
GitHub Check: test-examples (examples/scripts/3_Dynamics/3.12_MACE_NPT_Langevin.py)
GitHub Check: test-model (macos-14, 3.12, lowest-direct, mattersim, tests/models/test_mattersim.py)
GitHub Check: test-model (macos-14, 3.12, lowest-direct, orb, tests/models/test_orb.py)
GitHub Check: test-model (macos-14, 3.11, highest, metatomic, tests/models/test_metatomic.py)
GitHub Check: test-model (macos-14, 3.11, highest, mace, tests/test_elastic.py)
GitHub Check: test-model (macos-14, 3.11, highest, fairchem, tests/models/test_fairchem.py)
GitHub Check: test-model (ubuntu-latest, 3.12, lowest-direct, sevenn, tests/models/test_sevennet.py)
GitHub Check: test-model (ubuntu-latest, 3.12, lowest-direct, orb, tests/models/test_orb.py)
GitHub Check: test-model (ubuntu-latest, 3.11, highest, sevenn, tests/models/test_sevennet.py)
GitHub Check: test-model (ubuntu-latest, 3.11, highest, orb, tests/models/test_orb.py)
GitHub Check: test-model (ubuntu-latest, 3.12, lowest-direct, fairchem, tests/models/test_fairchem.py)
GitHub Check: test-model (ubuntu-latest, 3.11, highest, fairchem, tests/models/test_fairchem.py)
GitHub Check: test-core (ubuntu-latest, 3.12, lowest-direct)

🔇 Additional comments (3)

torch_sim/io.py (1)

238-245: LGTM!

The parameter rename from system_idx to init_system_idx correctly aligns with the SimState refactoring to use InitVar for initialization.

torch_sim/state.py (2)

86-88: LGTM! Clean InitVar implementation

The refactoring of system_idx to use InitVar pattern is well-designed and provides explicit initialization control.

113-131: LGTM! Proper initialization order

The reordering of cell shape adjustment to occur after system_idx initialization is correct, as n_systems calculation depends on system_idx.

torch_sim/optimizers.py

coderabbitai · 2025-08-02T19:25:25Z

torch_sim/state.py

+def construct_state(
+    old_state: SimStateT,
+    new_state_attrs: dict[str, typing.Any],
+) -> SimStateT:
+    """Construct a new state from an old state and new state parameters."""
+    # 1) process the attrs so they are the init params
+    processed_params = {}
+    for param in inspect.signature(old_state.__class__).parameters:
+        if param.startswith("init_"):
+            # this is an InitVar field
+            # we need to rename the corresponding field in system_attrs to have
+            # an "init_" prefix
+            non_init_attr_name = param.removeprefix("init_")
+            processed_params[param] = new_state_attrs[non_init_attr_name]
+        else:
+            processed_params[param] = new_state_attrs[param]
+
+    # 2) construct the new state
+    return type(old_state)(**processed_params)
+


🛠️ Refactor suggestion

Add comprehensive docstring to construct_state

The construct_state function is a key utility but lacks documentation. Please add a docstring following Google style as mentioned in the PR objectives.

def construct_state( old_state: SimStateT, new_state_attrs: dict[str, typing.Any], ) -> SimStateT: - """Construct a new state from an old state and new state parameters.""" + """Construct a new state from an old state and new state parameters. + + This function handles the mapping of InitVar fields by automatically + prefixing the corresponding attribute names with 'init_' when calling + the constructor. + + Args: + old_state: The state instance whose type will be used for construction + new_state_attrs: Dictionary of attributes for the new state. Keys + corresponding to InitVar fields should not have the 'init_' prefix; + it will be added automatically. + + Returns: + A new instance of the same type as old_state with the provided attributes + + Example: + >>> attrs = {'positions': tensor1, 'system_idx': tensor2} + >>> new_state = construct_state(old_state, attrs) + # This will call type(old_state)(positions=tensor1, init_system_idx=tensor2) + """

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def construct_state(

old_state: SimStateT,

new_state_attrs: dict[str, typing.Any],

) -> SimStateT:

"""Construct a new state from an old state and new state parameters."""

# 1) process the attrs so they are the init params

processed_params = {}

for param in inspect.signature(old_state.__class__).parameters:

if param.startswith("init_"):

# this is an InitVar field

# we need to rename the corresponding field in system_attrs to have

# an "init_" prefix

non_init_attr_name = param.removeprefix("init_")

processed_params[param] = new_state_attrs[non_init_attr_name]

else:

processed_params[param] = new_state_attrs[param]

# 2) construct the new state

return type(old_state)(**processed_params)

def construct_state(

old_state: SimStateT,

new_state_attrs: dict[str, typing.Any],

) -> SimStateT:

"""Construct a new state from an old state and new state parameters.

This function handles the mapping of InitVar fields by automatically

prefixing the corresponding attribute names with 'init_' when calling

the constructor.

Args:

old_state: The state instance whose type will be used for construction.

new_state_attrs: Dictionary of attributes for the new state. Keys

corresponding to InitVar fields should not have the 'init_' prefix;

it will be added automatically.

Returns:

A new instance of the same type as old_state with the provided attributes.

Example:

>>> attrs = {'positions': tensor1, 'system_idx': tensor2}

>>> new_state = construct_state(old_state, attrs)

# This will call type(old_state)(positions=tensor1, init_system_idx=tensor2)

"""

# 1) process the attrs so they are the init params

processed_params = {}

for param in inspect.signature(old_state.__class__).parameters:

if param.startswith("init_"):

# this is an InitVar field

# we need to rename the corresponding field in system_attrs to have

# an "init_" prefix

non_init_attr_name = param.removeprefix("init_")

processed_params[param] = new_state_attrs[non_init_attr_name]

else:

processed_params[param] = new_state_attrs[param]

# 2) construct the new state

return type(old_state)(**processed_params)

🤖 Prompt for AI Agents

In torch_sim/state.py around lines 789 to 808, the construct_state function lacks a comprehensive docstring. Add a detailed Google style docstring that describes the function's purpose, its parameters including old_state and new_state_attrs with their types, and the return value specifying it returns a new SimStateT instance constructed from the provided attributes.

curtischong · 2025-08-02T19:26:06Z

torch_sim/state.py

@@ -107,24 +110,25 @@ def __post_init__(self) -> None:
                f"masses {shapes[1]}, atomic_numbers {shapes[2]}"
            )

-        if self.cell.ndim != 3 and self.system_idx is None:


I moved these checks down since it depends on self.system_idx (also to bundle it with the self.cell.shape[0] check)

curtischong · 2025-08-02T19:27:26Z

torch_sim/state.py

@@ -272,7 +276,7 @@ def clone(self) -> Self:
            else:
                attrs[attr_name] = copy.deepcopy(attr_value)

-        return self.__class__(**attrs)
+        return construct_state(self, attrs)


since we now have InitVar params in the constructor, creating a class is not as easy as __class__(**attrs). We need to properly handle the init_ constructor params

curtischong · 2025-08-02T20:08:45Z

closing since I don't like this approach. it doesn't feel right to call initialization params init_<attribute> in the constructor

curtischong added 5 commits August 2, 2025 12:20

we can properly identify variables that are tensor | None

2d77db5

simstate properly handles system_idx_init now

6767059

properly set the system idx field init = false

a1fc8b5

ran into problem with using the system_index hack

80cc62a

init subclass runs properly

5cf6b76

cla-bot bot added the cla-signed Contributor license agreement signed label Aug 2, 2025

curtischong marked this pull request as draft August 2, 2025 19:21

revert setting velocities to | None

fcb9e7b

coderabbitai bot reviewed Aug 2, 2025

View reviewed changes

curtischong commented Aug 2, 2025

View reviewed changes

curtischong added 3 commits August 2, 2025 12:38

cleanup

fa5c697

more cleanup

904b268

fix types of firestates

afd7aaa

curtischong added the breaking Breaking changes label Aug 2, 2025

curtischong added 3 commits August 2, 2025 12:43

document system_idx better

e5ca217

cleanup _construct_state

456ad46

rm state_class

5efbb7c

curtischong changed the title ~~Init subclass checks~~ InitVar dataclass initialization (and subclass checks) Aug 2, 2025

curtischong closed this Aug 2, 2025

curtischong deleted the init-subclass-checks branch August 2, 2025 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

InitVar dataclass initialization (and subclass checks) #230

InitVar dataclass initialization (and subclass checks) #230

Uh oh!

curtischong commented Aug 2, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Aug 2, 2025 •

edited

Loading

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Aug 2, 2025

Uh oh!

curtischong Aug 2, 2025

Uh oh!

curtischong Aug 2, 2025

Uh oh!

curtischong commented Aug 2, 2025

Uh oh!

Uh oh!

InitVar dataclass initialization (and subclass checks) #230

InitVar dataclass initialization (and subclass checks) #230

Uh oh!

Conversation

curtischong commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

curtischong Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

curtischong Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

curtischong commented Aug 2, 2025

Uh oh!

Uh oh!

curtischong commented Aug 2, 2025 •

edited

Loading

coderabbitai bot commented Aug 2, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)