Skip to content

Commit 1fead76

Browse files
committed
patch various small PPO implementation bugs
1 parent f9ab96f commit 1fead76

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

src/astra_rl/algorithms/ppo.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@ def step(
8989
torch.tensor(batch.reward)
9090
.to(logprobs_attacker.device)
9191
.unsqueeze(-1)
92+
.unsqueeze(-1)
9293
.repeat(1, *values.shape[1:])
9394
)
9495
A = Q - values

0 commit comments

Comments
 (0)