[feat]support inflight quant #3277

bukejiyu · 2025-08-08T11:34:25Z

支持动态量化loader 使用需开启 load_choices="inflight_quant" 或 --inflight_quant "inflight_quant"
可以观察到 moe系列模型 load性能有 30%左右的提升
模型支持[qwen3/qwen3moe]

模型	模型类型	精度	tpsize	loader	内存峰值	load耗时	精度
qwen3 32B	qwen3	wint8	4	default	4×22.7G	19.78s	逐token对齐
				inflight_quant	4×7.4G	18.96s
Qwen3-235B-A22B	qwen3moe	wint4	4	default	116×4G	368.767s	逐token对齐
				inflight_quant	4×9.2G	261.65s
Qwen3-30B-A3B	qwen3moe	wint4	4	default		75.764s	逐token对齐
				inflight_quant		53.599s

paddle-bot · 2025-08-08T11:34:31Z

Thanks for your contribution!

bukejiyu requested review from YuanRisheng, jiangjiajun, yuanlehome and qingqing01 and removed request for YuanRisheng, jiangjiajun and yuanlehome August 11, 2025 03:10

support qwen3

270e373

bukejiyu force-pushed the inflight_quant_loader branch from 2508956 to 270e373 Compare August 11, 2025 06:19

bukejiyu added 6 commits August 11, 2025 06:37

update

19e0d4b

update

318c662

update

6ea05c8

support qwen3_moe

098091b

update

69a64e1

fix wint4

bdcf6f3

bukejiyu closed this Aug 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat]support inflight quant #3277

[feat]support inflight quant #3277

bukejiyu commented Aug 8, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Aug 8, 2025

Uh oh!

Uh oh!

[feat]support inflight quant #3277

[feat]support inflight quant #3277

Conversation

bukejiyu commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Aug 8, 2025

Uh oh!

Uh oh!

bukejiyu commented Aug 8, 2025 •

edited

Loading