Skip to content

【Inference Optimize】DeepSeek-v3 model inference performance optimization #3455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 19, 2025

Conversation

chang-wenbin
Copy link
Collaborator

  1. Encoder access to Flash Attention v3
    flash_attention_v3_varlen
  2. Eliminate redundant parts in the Attention Layer in the network structure
    端到端性能提升10%左右。

Copy link

paddle-bot bot commented Aug 18, 2025

Thanks for your contribution!

@chang-wenbin chang-wenbin marked this pull request as ready for review August 18, 2025 09:33
Copy link
Collaborator

@K11OntheBoat K11OntheBoat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +157 to +171
if self.flash_attn_func is None:
prop = paddle.device.cuda.get_device_properties()
cc = prop.major * 10 + prop.minor
is_current_sm_supported = cc >= 90
is_paddle_supported = any(num >= 90 for num in paddle.version.cuda_archs())
if is_current_sm_supported and is_paddle_supported:
self.flash_attn_func = flash_attention_v3_varlen
print("The current platform supports Flash Attention V3.")
self.flash_attn_kwargs = {"softmax_scale": self.attn_softmax_scale}
else:
self.flash_attn_func = flash_attn_unpadded
self.flash_attn_kwargs = {"scale": self.attn_softmax_scale, "training": False}
print(
"The current platform does not support Flash Attention V3, so Flash Attention V2 will be used instead."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以把这一块代码从各个派生类中挪到基类里去吗?这样只需要写一次

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

收到🫡

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit beec24f into PaddlePaddle:develop Aug 19, 2025
12 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants