Skip to content

Commit 7e7ff98

Browse files
committed
update quantization doc
1 parent 92428a5 commit 7e7ff98

File tree

4 files changed

+6
-6
lines changed

4 files changed

+6
-6
lines changed

docs/quantization/online_quantization.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
2222
--max-num-seqs 32
2323
```
2424

25-
- By specifying `--model baidu/ERNIE-4.5-300B-A47B-Paddle`, the model can be automatically downloaded from AIStudio. FastDeploy depends on Paddle format models. For more information, please refer to [Supported Model List](https://console.cloud.baidu-int.com/devops/icode/repos/baidu/paddle_internal/FastDeploy/blob/feature%2Finference-refactor-20250528/docs/supported_models.md).
25+
- By specifying `--model baidu/ERNIE-4.5-300B-A47B-Paddle`, the model can be automatically downloaded from AIStudio. FastDeploy depends on Paddle format models. For more information, please refer to [Supported Model List](../supported_models.md).
2626
- By setting `--quantization` to `wint8` or `wint4`, online INT8/INT4 quantization can be selected.
2727
- Deploying ERNIE-4.5-300B-A47B-Paddle WINT8 requires at least 80G * 8 cards, while WINT4 requires 80GB * 4 cards.
2828
- For more deployment tutorials, please refer to [get_started](../get_started/ernie-4.5.md).
@@ -48,7 +48,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
4848
--max-num-seqs 32
4949
```
5050

51-
- By specifying `--model baidu/ERNIE-4.5-300B-A47B-Paddle`, the model can be automatically downloaded from AIStudio. FastDeploy depends on Paddle format models. For more information, please refer to [Supported Model List](https://console.cloud.baidu-int.com/devops/icode/repos/baidu/paddle_internal/FastDeploy/blob/feature%2Finference-refactor-20250528/docs/supported_models.md).
51+
- By specifying `--model baidu/ERNIE-4.5-300B-A47B-Paddle`, the model can be automatically downloaded from AIStudio. FastDeploy depends on Paddle format models. For more information, please refer to [Supported Model List](../supported_models.md).
5252
- By setting `--quantization` to `block_wise_fp8`, online Block-wise FP8 quantization can be selected.
5353
- Deploying ERNIE-4.5-300B-A47B-Paddle Block-wise FP8 requires at least 80G * 8 cards.
5454
- For more deployment tutorials, please refer to [get_started](../get_started/ernie-4.5.md)

docs/quantization/wint2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ Example of quantization configuration in the model's config.json file:
4646
```
4747

4848
- For more deployment tutorials, please refer to [get_started](../get_started/ernie-4.5.md);
49-
- For more model descriptions, please refer to [Supported Model List](https://console.cloud.baidu-int.com/devops/icode/repos/baidu/paddle_internal/FastDeploy/blob/feature%2Finference-refactor-20250528/docs/supported_models.md).
49+
- For more model descriptions, please refer to [Supported Model List](../supported_models.md).
5050

5151
## WINT2 Performance
5252

docs/zh/quantization/online_quantization.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
2222
--max-num-seqs 32
2323
```
2424

25-
- 通过指定 `--model baidu/ERNIE-4.5-300B-A47B-Paddle` 可自动从AIStudio下载模型。FastDeploy依赖Paddle格式的模型,更多说明参考[支持模型列表](https://console.cloud.baidu-int.com/devops/icode/repos/baidu/paddle_internal/FastDeploy/blob/feature%2Finference-refactor-20250528/docs/supported_models.md)
25+
- 通过指定 `--model baidu/ERNIE-4.5-300B-A47B-Paddle` 可自动从AIStudio下载模型。FastDeploy依赖Paddle格式的模型,更多说明参考[支持模型列表](../supported_models.md)
2626
- 通过设置 `--quantization``wint8``wint4` 选择在线 INT8/INT4 量化。
2727
- 部署 ERNIE-4.5-300B-A47B-Paddle WINT8 最少需要 80G * 8卡, WINT4 则需要 80GB * 4卡。
2828
- 更多部署教程请参考[get_started](../get_started/ernie-4.5.md).
@@ -48,7 +48,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
4848
--max-num-seqs 32
4949
```
5050

51-
- 通过指定 `--model baidu/ERNIE-4.5-300B-A47B-Paddle` 可自动从AIStudio下载模型。FastDeploy依赖Paddle格式的模型,更多说明参考[支持模型列表](https://console.cloud.baidu-int.com/devops/icode/repos/baidu/paddle_internal/FastDeploy/blob/feature%2Finference-refactor-20250528/docs/supported_models.md)
51+
- 通过指定 `--model baidu/ERNIE-4.5-300B-A47B-Paddle` 可自动从AIStudio下载模型。FastDeploy依赖Paddle格式的模型,更多说明参考[支持模型列表](../supported_models.md)
5252
- 通过设置 `--quantization``block_wise_fp8` 选择在线 Block-wise FP8 量化。
5353
- 部署 ERNIE-4.5-300B-A47B-Paddle Block-wise FP8 最少需要 80G * 8卡。
5454
- 更多部署教程请参考[get_started](../get_started/ernie-4.5.md)

docs/zh/quantization/wint2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
4646
```
4747

4848
- 更多部署教程请参考[get_started](../get_started/ernie-4.5.md)
49-
- 更多模型说明请参考[支持模型列表](https://console.cloud.baidu-int.com/devops/icode/repos/baidu/paddle_internal/FastDeploy/blob/feature%2Finference-refactor-20250528/docs/supported_models.md)
49+
- 更多模型说明请参考[支持模型列表](../supported_models.md)
5050

5151

5252
## WINT2效果

0 commit comments

Comments
 (0)