Skip to content

Conversation

Tar-ive
Copy link

@Tar-ive Tar-ive commented Aug 5, 2025

#28

The core issue was a bug in the original HRM source code that made it incompatible with modern versions of PyTorch. The error message AttributeError: module 'torch.nn' has no attribute 'Buffer' told us that the code was trying to use a feature in a way that doesn't exist.

The Cause: Incorrect PyTorch Usage
In PyTorch, a "buffer" is a tensor that is part of a model's state (like weights) but is not a parameter that gets updated during training (e.g., a running mean in a normalization layer).

The original developer wrote code like this:
self.weights = nn.Buffer(...)

This is incorrect. nn.Buffer is not a function you can call directly to create a buffer. This might have worked in a very old, pre-release version of PyTorch, but it is not the correct way to do it.

The Change: Using the Correct Method
The official and correct way to create and register a buffer in a PyTorch model is by using the self.register_buffer() method.

We fixed the code by changing lines like the one above to the following pattern:

# 1. Create the tensor you want to be a buffer
weights_tensor = trunc_normal_init_(...)

# 2. Register it as a buffer using the correct method
self.register_buffer('weights', weights_tensor)
We had to apply this same logical fix in three different files because the original developer repeated this same coding mistake throughout the repository:

models/sparse_embedding.py

models/layers.py

models/hrm/hrm_act_v1.py

By making these changes, we made the code compliant with the modern PyTorch API, which allowed the training to proceed without errors.

automenta pushed a commit to deepstupid/hrem that referenced this pull request Aug 25, 2025
This commit refactors, enhances, and robustifies the unified optimization/evaluation/comparison process. It also incorporates a critical bug fix from PR sapientinc#30.

- Applied a fix from PR sapientinc#30 to address an `AttributeError` related to `nn.Buffer` by replacing it with `register_buffer`.
- Created an `optimization` directory and moved/renamed the relevant scripts.
- Created `optimization/utils.py` to deduplicate code.
- The hyperparameter search space is now loaded from a YAML file.
- Added support for parallel execution of Optuna trials.
- Improved the detail and location of the `comparison_report.md`.
- Improved the console output.
- Added robust error handling to the main scripts.
- Updated `README.md` to reflect the changes.

The final verification step was blocked by a `ModuleNotFoundError` which can be fixed by adding the project root to the Python path.
@automenta
Copy link

I've applied it here: deepstupid@45a3b2c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants