Skip to content

Conversation

Jemoka
Copy link
Member

@Jemoka Jemoka commented Aug 13, 2025

Pull Request

Description

Implements the Proximal Policy Optimization solver. As a corollary this enables the implementation of Perez, et al., 2022.

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📚 Documentation update
  • 🔧 Refactoring (no functional changes)
  • ⚡ Performance improvement
  • 🧪 Test additions or improvements
  • 🏗️ Infrastructure/build changes

Changes Made

  • implements PPO algorithm
  • to support PPO, implements a ValueFunctionProblem which extends Problem, allowing the user to train a value function for Actor-Critic type algorithms
  • refactors the entire codebase to return sequence-wide logprobs (i.e. so we have sequence-wide supervision); where multiplied logprobs are needed, they are multiplied right before use
  • implements examples/ast_ppo.py, an example of how to run PPO with our package.

Testing

Manually, loss goes down on example after a few gradient steps.

Documentation

  • Updated docstrings for new/modified functions
  • Updated README.md if applicable
  • Updated documentation in docs/ if applicable
  • Added examples for new features

Pre-submission Checklist

  • Code follows the project's style guidelines
  • Pre-commit hooks pass (pre-commit run --all-files)
  • Self-review of the code completed
  • Comments added for hard-to-understand areas
  • No new warnings introduced
  • Changes are backwards compatible (or breaking changes are documented)

@Jemoka Jemoka requested a review from duncaneddy August 13, 2025 20:41
@Jemoka Jemoka merged commit 351cd04 into main Aug 19, 2025
2 checks passed
@Jemoka Jemoka deleted the feat/ppo branch August 19, 2025 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant