Skip to content

bug on awq v2 #432

@ChenBinfighting1

Description

@ChenBinfighting1

1,现状:目前使用deepseek-r1进行awq量化,采用clip_version: v2报错
2,awq_w_only.yml参数如下:
base:
seed: &seed 42
model:
type: DeepseekV3
path: /mnt/DeepSeek-R1
tokenizer_mode: slow
torch_dtype: auto
calib:
name: pileval
download: False
path: /home/llmc/data/pileval
n_samples: 128
bs: -1
seq_len: 512
preproc: pileval_awq
seed: *seed
eval:
eval_pos: [pretrain, transformed, fake_quant]
name: wikitext2
download: False
path: /home/llmc/data/wikitext2
seq_len: 2048 # 2048
# For 7B / 13B model eval, bs can be set to "1", and inference_per_block can be set to "False".
# For 70B model eval, bs can be set to "20", and inference_per_block can be set to "True".
bs: 20
inference_per_block: True
quant:
method: Awq
weight:
bit: 4
symmetric: True
granularity: per_group
group_size: 128
calib_algo: learnable
special:
trans: True
# The options for "trans_version" include "v1" and "v2".
# But their results don't differ significantly.
trans_version: v2
weight_clip: True
clip_version: v2
# For 2-bit quantization, setting "clip_sym: False" will yield better results.
clip_sym: True
save_scale: True
scale_path: /home/llmc/scale_data
save_clip: True
clip_path: /home/llmc/clip_data
save:
save_trans: False
save_fake: False
save_path: /home/llmc/deepseek_quat

3,报错
[rank0]: File "/home/llmc/llmc/compression/quantization/base_blockwise_quantization.py", line 453, in run
[rank0]: self.block_transform(block, input_feat, self.input['kwargs'])
[rank0]: File "/home/wangke/miniconda3/envs/llmc/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/llmc/llmc/compression/quantization/awq.py", line 294, in block_transform
[rank0]: self.auto_clipper.run(
[rank0]: File "/home/wangke/miniconda3/envs/llmc/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/llmc/llmc/compression/quantization/auto_clip.py", line 68, in run
[rank0]: max_val, min_val = self.auto_clip_layer(
[rank0]: File "/home/miniconda3/envs/llmc/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/llmc/llmc/compression/quantization/auto_clip.py", line 161, in auto_clip_layer
[rank0]: q_w = self.fake_quantize_weight(
[rank0]: File "/home/llmc/llmc/compression/quantization/auto_clip.py", line 271, in fake_quantize_weight
[rank0]: q_w = self.wquantizer.fake_quant_weight_static(w, args)
[rank0]: File "/home/llmc/llmc/compression/quantization/quant.py", line 814, in fake_quant_weight_static
[rank0]: q_weight = self.quant_dequant(
[rank0]: File "/home/llmc/llmc/compression/quantization/quant.py", line 715, in quant_dequant
[rank0]: tensor = self.quant(tensor, scales, zeros, qmax, qmin)
[rank0]: File "/home/llmc/llmc/compression/quantization/quant.py", line 701, in quant
[rank0]: tensor = torch.clamp(self.round_func(tensor / scales) + zeros, qmin, qmax)
[rank0]: RuntimeError: The size of tensor a (3584) must match the size of tensor b (56) at non-singleton dimension 2

4,解决方案
调试发现其中的tensor和scales 维度问题。quant.py中的方法fake_quant_weight_static增加scales.reshape(-1, 1)就可以。是否可以提交pr

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions