Skip to content

[doc] best practice for eb45 text models #3002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 31, 2025

Conversation

zoooo0820
Copy link
Collaborator

Best practice for eb45 text models

Copy link

paddle-bot bot commented Jul 24, 2025

Thanks for your contribution!


- 模型下载,**请注意使用Fastdeploy部署需要Paddle后缀的模型**:
- 执行时直接指定模型名(如`baidu/ERNIE-4.5-0.3B-Paddle`)即可自动下载,默认下载路径为 `~/`(即用户主目录),也可以通过配置环境变量 `FD_MODEL_CACHE`修改默认下载的路径
- 如受到网络或其他因素影响,也可以通过[huggingface](https://huggingface.co/)、[modelscope](https://www.modelscope.cn/home)等下载模型,并在启动时指定模型路径
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

和安装文档里一样的话,直接链接到对应的中、英文档即可。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,已修改

- 执行时直接指定模型名(如`baidu/ERNIE-4.5-0.3B-Paddle`)即可自动下载,默认下载路径为 `~/`(即用户主目录),也可以通过配置环境变量 `FD_MODEL_CACHE`修改默认下载的路径
- 如受到网络或其他因素影响,也可以通过[huggingface](https://huggingface.co/)、[modelscope](https://www.modelscope.cn/home)等下载模型,并在启动时指定模型路径

## 二、启动服务
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,已修改

其中:
- `--quantization`: 表示模型采用的量化策略。不同量化策略,模型的性能和精度也会不同。
- `--max-model-len`:表示当前部署的服务所支持的最长Token数量。设置得越大,模型可支持的上下文长度也越大,但相应占用的显存也越多,可能影响并发数。
- `--kv-cache-ratio`: 表示KVCache块按kv_cache_ratio比例分给Prefill阶段和Decode阶段。设置不合理会导致某个阶段的KVCache块不足,从而影响性能。如果开启服务管理全局Block功能,可以不用设置。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kv-cache-ratio和最新的用法保持一致吧,可以不设置这个参数了

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经移除--kv-cache-ratio,建议用户开启全局管理的FLAG

--max-num-seqs 128
```
其中:
- `--quantization`: 表示模型采用的量化策略。不同量化策略,模型的性能和精度也会不同。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以写下都支持哪些量化类型,包括Hopper架构也可以跑FP8

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

启动前增加下列环境变量
```
export FD_SAMPLING_CLASS=rejection
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lite 模型也支持 PD 分离,后续可以补充下

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A3B 已经补充PD分离部分

--quantization wint4 \
--innode-prefill-ports 8182 \
--splitwise-role "decode"
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续也再补充下PD+EP分离部署

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,后续验证EP后补充

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 1ef38b1 into PaddlePaddle:develop Jul 31, 2025
13 of 19 checks passed
@zoooo0820 zoooo0820 deleted the best_practice branch July 31, 2025 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants