FastDeploy 2.0: Large Language Model Deployment #2633

Jiang-Jia-Jun · 2025-06-29T23:53:10Z

News

🔥 Released FastDeploy v2.0: Supports inference and deployment for ERNIE 4.5. Furthermore, we open-source an industrial-grade PD disaggregation with context caching, dynamic role switching for effective resource utilization to further enhance inference performance for MoE models.

About

FastDeploy is an inference and deployment toolkit for large language models and visual language models based on PaddlePaddle. It delivers production-ready, out-of-the-box deployment solutions with core acceleration technologies:

🚀 Load-Balanced PD Disaggregation: Industrial-grade solution featuring context caching and dynamic instance role switching. Optimizes resource utilization while balancing SLO compliance and throughput.
🔄 Unified KV Cache Transmission: Lightweight high-performance transport library with intelligent NVLink/RDMA selection.
🤝 OpenAI API Server and vLLM Compatible: One-command deployment with https://github.com/vllm-project/vllm/ interface compatibility.
🧮 Comprehensive Quantization Format Support: W8A16, W8A8, W4A16, W4A8, W2A16, FP8, and more.
⏩ Advanced Acceleration Techniques: Speculative decoding, Multi-Token Prediction (MTP) and Chunked Prefill.
🖥️ Multi-Hardware Support: NVIDIA GPU, Kunlunxin XPU, Hygon DCU, Ascend NPU, Iluvatar GPU, Enflame GCU, MetaX GPU etc.

Get Started

Learn how to use FastDeploy through our documentation:

Supported Models

Model	Data Type	PD Disaggregation	Chunked Prefill	Prefix Caching	MTP	CUDA Graph	Maximum Context Length
ERNIE-4.5-300B-A47B	BF16/WINT4/WINT8/W4A8C8/WINT2/FP8	✅（WINT4/W4A8C8/Expert Parallelism)	✅	✅	✅(WINT4)	WIP	128K
ERNIE-4.5-300B-A47B-Base	BF16/WINT4/WINT8	✅（WINT4/Expert Parallelism)	✅	✅	✅(WINT4)	❌	128K
ERNIE-4.5-VL-424B-A47B	BF16/WINT4/WINT8	WIP	✅	WIP	❌	WIP	128K
ERNIE-4.5-VL-28B-A3B	BF16/WINT4/WINT8	❌	✅	WIP	❌	WIP	128K
ERNIE-4.5-21B-A3B	BF16/WINT4/WINT8/FP8	❌	✅	✅	WIP	✅	128K
ERNIE-4.5-21B-A3B-Base	BF16/WINT4/WINT8/FP8	❌	✅	✅	WIP	✅	128K
ERNIE-4.5-0.3B	BF16/WINT8/FP8	❌	✅	✅	❌	✅	128K

paddle-bot · 2025-06-29T23:53:14Z

Thanks for your contribution!

CLAassistant · 2025-06-29T23:53:16Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

ZeyuChen

LGTM

jiangjiajun added 3 commits June 29, 2025 19:11

Add requirement

d151496

Sync v2.0 version of code to github repo

92c2cfa

Update mkdocs navigation

aba655c

ZeyuChen approved these changes Jun 29, 2025

View reviewed changes

Jiang-Jia-Jun merged commit 53ddc68 into PaddlePaddle:develop Jun 29, 2025
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FastDeploy 2.0: Large Language Model Deployment #2633

FastDeploy 2.0: Large Language Model Deployment #2633

Jiang-Jia-Jun commented Jun 29, 2025

Uh oh!

paddle-bot bot commented Jun 29, 2025

Uh oh!

CLAassistant commented Jun 29, 2025

Uh oh!

ZeyuChen left a comment

Uh oh!

Uh oh!

Uh oh!

FastDeploy 2.0: Large Language Model Deployment #2633

FastDeploy 2.0: Large Language Model Deployment #2633

Conversation

Jiang-Jia-Jun commented Jun 29, 2025

News

About

Get Started

Supported Models

Uh oh!

paddle-bot bot commented Jun 29, 2025

Uh oh!

CLAassistant commented Jun 29, 2025

Uh oh!

ZeyuChen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!