Skip to content

Commit fffeec5

Browse files
AniZpZhuangtingwei9988laixinn
authored andcommitted
[2/3] fix dsv3 awq issue (sgl-project#4625)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
1 parent 49731f1 commit fffeec5

File tree

8 files changed

+1139
-42
lines changed

8 files changed

+1139
-42
lines changed

benchmark/deepseek_v3/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -178,10 +178,11 @@ python3 -m sglang.bench_one_batch_server --model None --base-url http://10.0.0.1
178178

179179
### Example: Serving with 8 A100/A800 with AWQ Quantization
180180

181-
AWQ does not support BF16, so add the `--dtype half` flag if AWQ is used for quantization. One example is as follows:
181+
Add `--quantization moe_wna16` flag to enable moe wna16 kernel for better performance.
182+
One example is as follows:
182183

183184
```bash
184-
python3 -m sglang.launch_server --model cognitivecomputations/DeepSeek-R1-AWQ --tp 8 --trust-remote-code --dtype half
185+
python3 -m sglang.launch_server --model cognitivecomputations/DeepSeek-R1-AWQ --tp 8 --trust-remote-code --quantization moe_wna16
185186
```
186187

187188

python/sglang/srt/configs/model_config.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -258,6 +258,7 @@ def _verify_quantization(self) -> None:
258258
"experts_int8",
259259
"w8a8_int8",
260260
"w8a8_fp8",
261+
"moe_wna16",
261262
]
262263
compatible_quantization_methods = {
263264
"w8a8_int8": ["compressed-tensors", "compressed_tensors"],

0 commit comments

Comments
 (0)