[Feat] support mixed ep #2969

Wanglongzhi2001 · 2025-07-22T12:51:05Z

Support mixed ep.
TPOT, OTPS, speed of decode are all improved to some extent at different bsz.

Average input length: 1018
Average output length: 384
model: ERNIE-4_5-300B-A47B-FP8-Paddle
Version of paddle: 3.0.1

paddle-bot · 2025-07-22T12:51:12Z

Thanks for your contribution!

RichardWooSJTU · 2025-07-23T13:24:13Z

fastdeploy/worker/gpu_model_runner.py

@@ -959,6 +959,11 @@ class at the server level, which is too granular for ModelRunner.
            We plan to replace it with 'ModelForwardBatch'.
            intermediate_tensors:
        """
+        is_decode_batch = paddle.to_tensor(not ((self.share_inputs["seq_lens_this_time"]
+                        > 1).sum() > 0))
+        paddle.distributed.broadcast(is_decode_batch, src=0)


这里不是broadcast的逻辑，而且src也不一定是0，这里是任意一张卡是False，其他卡必须是False，只有所有卡是True才能是True

Wanglongzhi2001 · 2025-07-25T02:20:03Z

fastdeploy/model_executor/layers/moe/ep.py

+            num_max_dispatch_tokens_per_rank,
+            None,
+            num_experts,
+        )


这段代码之前被删了，但是我发现3.0.1的paddle这里还是会报错，而Fastdeploy的使用文档里推荐用户使用的paddle版本还是3.0.1，所以从用户使用角度考虑先留着

这里也需要适配develop 下一个PR这里都兼容一下吧

gongshaotian · 2025-07-25T03:10:19Z

Batch 越大加速越明显吗

Wanglongzhi2001 · 2025-07-25T03:26:06Z

Batch 越大加速越明显吗

理论上是的，因为单机EP现在默认走的是DeepEP的prefill模式，加上混合EP后可以用上DeepEP的low latency模式（decode），bsz越大加速可能更明显一点。

RichardWooSJTU · 2025-07-25T07:28:36Z

fastdeploy/model_executor/layers/moe/ep.py

+            num_max_dispatch_tokens_per_rank,
+            None,
+            num_experts,
+        )


这里也需要适配develop 下一个PR这里都兼容一下吧

zhoutianzi666 · 2025-07-27T10:24:05Z

fastdeploy/model_executor/layers/moe/ep.py

-        elif moe_phase == MoEPhase.PREFILL:
-            self.deepep_engine = deep_ep.Buffer(
+            # prefill engine
+            self.prefill_deepep_engine = deep_ep.Buffer(
                self.group,
                int(5e8),
                0,
                low_latency_mode=False,


mixed模式下，这个low_latency_mode不应该设置成True吗？

RichardWooSJTU reviewed Jul 23, 2025

View reviewed changes

Wanglongzhi2001 force-pushed the mixed_ep branch 2 times, most recently from 9a5badc to 34ebf2a Compare July 24, 2025 10:02

Wanglongzhi2001 added 10 commits July 24, 2025 22:49

Support mixed ep

bda8ad2

fix comment

50b78d3

fix comment

62db77a

update mixep

e6a620d

fix conflict

f206ac2

fix typo

9ede5a2

update

e414fae

fix typo

84afd99

fix code style

7299c1e

fix conflict

c81a3f9

Wanglongzhi2001 force-pushed the mixed_ep branch from 34ebf2a to c81a3f9 Compare July 24, 2025 15:12

Wanglongzhi2001 commented Jul 25, 2025

View reviewed changes

yuanlehome mentioned this pull request Jul 25, 2025

[Perf] Remove unnecessary operations in non-cuda_graph #3010

Merged

RichardWooSJTU approved these changes Jul 25, 2025

View reviewed changes

RichardWooSJTU merged commit 0700c90 into PaddlePaddle:develop Jul 25, 2025
8 of 9 checks passed

zhoutianzi666 reviewed Jul 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] support mixed ep #2969

[Feat] support mixed ep #2969

Wanglongzhi2001 commented Jul 22, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Jul 22, 2025

Uh oh!

RichardWooSJTU Jul 23, 2025

Uh oh!

Wanglongzhi2001 Jul 25, 2025

Uh oh!

RichardWooSJTU Jul 25, 2025

Uh oh!

Wanglongzhi2001 Jul 25, 2025

Uh oh!

gongshaotian commented Jul 25, 2025

Uh oh!

Wanglongzhi2001 commented Jul 25, 2025

Uh oh!

RichardWooSJTU Jul 25, 2025

Uh oh!

Uh oh!

zhoutianzi666 Jul 27, 2025

Uh oh!

Uh oh!

[Feat] support mixed ep #2969

[Feat] support mixed ep #2969

Conversation

Wanglongzhi2001 commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Jul 22, 2025

Uh oh!

RichardWooSJTU Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Wanglongzhi2001 Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

RichardWooSJTU Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Wanglongzhi2001 Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

gongshaotian commented Jul 25, 2025

Uh oh!

Wanglongzhi2001 commented Jul 25, 2025

Uh oh!

RichardWooSJTU Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhoutianzi666 Jul 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Wanglongzhi2001 commented Jul 22, 2025 •

edited

Loading