【Inference Optimize】DeepSeek-v3 model inference performance optimization #3455

chang-wenbin · 2025-08-18T09:33:05Z

Encoder access to Flash Attention v3
flash_attention_v3_varlen
Eliminate redundant parts in the Attention Layer in the network structure
端到端性能提升10%左右。

paddle-bot · 2025-08-18T09:33:10Z

Thanks for your contribution!

K11OntheBoat

LGTM

yuanlehome · 2025-08-18T13:17:59Z

fastdeploy/model_executor/layers/attention/mla_attention_backend.py

+        if self.flash_attn_func is None:
+            prop = paddle.device.cuda.get_device_properties()
+            cc = prop.major * 10 + prop.minor
+            is_current_sm_supported = cc >= 90
+            is_paddle_supported = any(num >= 90 for num in paddle.version.cuda_archs())
+            if is_current_sm_supported and is_paddle_supported:
+                self.flash_attn_func = flash_attention_v3_varlen
+                print("The current platform supports Flash Attention V3.")
+                self.flash_attn_kwargs = {"softmax_scale": self.attn_softmax_scale}
+            else:
+                self.flash_attn_func = flash_attn_unpadded
+                self.flash_attn_kwargs = {"scale": self.attn_softmax_scale, "training": False}
+                print(
+                    "The current platform does not support Flash Attention V3, so Flash Attention V2 will be used instead."
+                )


可以把这一块代码从各个派生类中挪到基类里去吗？这样只需要写一次

DSK_OPT_01

9c4df6e

chang-wenbin marked this pull request as ready for review August 18, 2025 09:33

chang-wenbin added 2 commits August 18, 2025 18:12

Merge remote-tracking branch 'cwb/DSK_OPT01' into develop

ff24431

update FA3

9f4b900

K11OntheBoat approved these changes Aug 18, 2025

View reviewed changes

yuanlehome reviewed Aug 18, 2025

View reviewed changes

Jiang-Jia-Jun merged commit beec24f into PaddlePaddle:develop Aug 19, 2025
12 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【Inference Optimize】DeepSeek-v3 model inference performance optimization #3455

【Inference Optimize】DeepSeek-v3 model inference performance optimization #3455

Uh oh!

chang-wenbin commented Aug 18, 2025

Uh oh!

paddle-bot bot commented Aug 18, 2025

Uh oh!

K11OntheBoat left a comment

Uh oh!

yuanlehome Aug 18, 2025

Uh oh!

chang-wenbin Aug 18, 2025

Uh oh!

Uh oh!

Uh oh!

【Inference Optimize】DeepSeek-v3 model inference performance optimization #3455

【Inference Optimize】DeepSeek-v3 model inference performance optimization #3455

Uh oh!

Conversation

chang-wenbin commented Aug 18, 2025

Uh oh!

paddle-bot bot commented Aug 18, 2025

Uh oh!

K11OntheBoat left a comment

Choose a reason for hiding this comment

Uh oh!

yuanlehome Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

chang-wenbin Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!