File tree Expand file tree Collapse file tree 1 file changed +2
-2
lines changed Expand file tree Collapse file tree 1 file changed +2
-2
lines changed Original file line number Diff line number Diff line change @@ -108,13 +108,13 @@ python -m fastdeploy.entrypoints.openai.api_server \
108
108
```
109
109
110
110
## 🧠 使用 Ngram 解码
111
- 该算法通过 n-gram 窗口从 prompt 和已生成的 Token 中进行匹配生成草稿 Token,适合输入和输出有很大 overlap 的场景如代码编辑、文档查询等查看论文地址 。
111
+ 该算法通过 n-gram 窗口从 prompt 和已生成的 Token 中进行匹配生成草稿 Token,适合输入和输出有很大 overlap 的场景,如代码续写、文档查询等 。
112
112
> 使用 4×H100;量化方式选择 WINT4
113
113
> 配置文件:benchmarks/yaml/eb45t-32k-wint4-mtp-h100-tp4.yaml
114
114
```
115
115
python -m fastdeploy.entrypoints.openai.api_server \
116
116
--model ${path_to_main_model} \
117
117
--tensor-parallel-size 4 \
118
118
--config ${path_to_FastDeploy}benchmarks/yaml/eb45t-32k-wint4-mtp-h100-tp4.yaml \
119
- --speculative-config '{"method": "mtp ", "num_speculative_tokens": 1, "model": "${mtp_model_path}"}'
119
+ --speculative-config '{"method": "ngram ", "num_speculative_tokens": 1, "model": "${mtp_model_path}"}'
120
120
```
You can’t perform that action at this time.
0 commit comments