-
Notifications
You must be signed in to change notification settings - Fork 596
[doc] best practice for eb45 text models #3002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for your contribution! |
|
||
- 模型下载,**请注意使用Fastdeploy部署需要Paddle后缀的模型**: | ||
- 执行时直接指定模型名(如`baidu/ERNIE-4.5-0.3B-Paddle`)即可自动下载,默认下载路径为 `~/`(即用户主目录),也可以通过配置环境变量 `FD_MODEL_CACHE`修改默认下载的路径 | ||
- 如受到网络或其他因素影响,也可以通过[huggingface](https://huggingface.co/)、[modelscope](https://www.modelscope.cn/home)等下载模型,并在启动时指定模型路径 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
和安装文档里一样的话,直接链接到对应的中、英文档即可。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done,已修改
- 执行时直接指定模型名(如`baidu/ERNIE-4.5-0.3B-Paddle`)即可自动下载,默认下载路径为 `~/`(即用户主目录),也可以通过配置环境变量 `FD_MODEL_CACHE`修改默认下载的路径 | ||
- 如受到网络或其他因素影响,也可以通过[huggingface](https://huggingface.co/)、[modelscope](https://www.modelscope.cn/home)等下载模型,并在启动时指定模型路径 | ||
|
||
## 二、启动服务 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done,已修改
其中: | ||
- `--quantization`: 表示模型采用的量化策略。不同量化策略,模型的性能和精度也会不同。 | ||
- `--max-model-len`:表示当前部署的服务所支持的最长Token数量。设置得越大,模型可支持的上下文长度也越大,但相应占用的显存也越多,可能影响并发数。 | ||
- `--kv-cache-ratio`: 表示KVCache块按kv_cache_ratio比例分给Prefill阶段和Decode阶段。设置不合理会导致某个阶段的KVCache块不足,从而影响性能。如果开启服务管理全局Block功能,可以不用设置。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kv-cache-ratio和最新的用法保持一致吧,可以不设置这个参数了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经移除--kv-cache-ratio
,建议用户开启全局管理的FLAG
--max-num-seqs 128 | ||
``` | ||
其中: | ||
- `--quantization`: 表示模型采用的量化策略。不同量化策略,模型的性能和精度也会不同。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以写下都支持哪些量化类型,包括Hopper架构也可以跑FP8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
启动前增加下列环境变量 | ||
``` | ||
export FD_SAMPLING_CLASS=rejection | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lite 模型也支持 PD 分离,后续可以补充下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A3B 已经补充PD分离部分
--quantization wint4 \ | ||
--innode-prefill-ports 8182 \ | ||
--splitwise-role "decode" | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后续也再补充下PD+EP分离部署
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,后续验证EP后补充
Best practice for eb45 text models