Skip to content

[FixBug] compute early stopping with real batch size #3418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 19, 2025

Conversation

zeroRains
Copy link
Contributor

@zeroRains zeroRains commented Aug 15, 2025

pcard-71500

repetition early stopping实现时,还没有real batch size的机制,每次传入的batch_size是固定的max-num-seqs。但引入real batch size之后,windows的batch size可能会与probs的batch size不一致, 导致triton kernel的计算出现CUDA 700。triton的BUG已经在#3375 进行修复。

本PR主要修复小算子的实现方式,同时新增单测test_consistency_with_real_batch_size,对real batch size的情况进行测试。

Copy link

paddle-bot bot commented Aug 15, 2025

Thanks for your contribution!

yuanlehome
yuanlehome previously approved these changes Aug 15, 2025
@yuanlehome yuanlehome merged commit 8b12c80 into PaddlePaddle:develop Aug 19, 2025
13 of 15 checks passed
@zeroRains zeroRains deleted the es branch August 19, 2025 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants