bug on awq v2

1，现状：目前使用deepseek-r1进行awq量化，采用clip_version: v2报错
2，awq_w_only.yml参数如下：
base:
    seed: &seed 42
model:
    type: DeepseekV3
    path: /mnt/DeepSeek-R1
    tokenizer_mode: slow
    torch_dtype: auto
calib:
    name: pileval
    download: False
    path: /home/llmc/data/pileval
    n_samples: 128
    bs: -1
    seq_len: 512
    preproc: pileval_awq
    seed: *seed
eval:
    eval_pos: [pretrain, transformed, fake_quant]
    name: wikitext2
    download: False
    path: /home/llmc/data/wikitext2
    seq_len: 2048 # 2048
    # For 7B / 13B model eval, bs can be set to "1", and inference_per_block can be set to "False".
    # For 70B model eval, bs can be set to "20", and inference_per_block can be set to "True".
    bs: 20
    inference_per_block: True
quant:
    method: Awq
    weight:
        bit: 4
        symmetric: True
        granularity: per_group
        group_size: 128
        calib_algo: learnable
    special:
        trans: True
        # The options for "trans_version" include "v1" and "v2".
        # But their results don't differ significantly.
        trans_version: v2
        weight_clip: True
        clip_version: v2
        # For 2-bit quantization, setting "clip_sym: False" will yield better results.
        clip_sym: True
        save_scale: True
        scale_path: /home/llmc/scale_data
        save_clip: True
        clip_path: /home/llmc/clip_data
save:
    save_trans: False
    save_fake: False
    save_path: /home/llmc/deepseek_quat

3，报错
**[rank0]:**   File "/home/llmc/llmc/compression/quantization/base_blockwise_quantization.py", line 453, in run
[rank0]:     self.block_transform(block, input_feat, self.input['kwargs'])
[rank0]:   File "/home/wangke/miniconda3/envs/llmc/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/llmc/llmc/compression/quantization/awq.py", line 294, in block_transform
[rank0]:     self.auto_clipper.run(
[rank0]:   File "/home/wangke/miniconda3/envs/llmc/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/llmc/llmc/compression/quantization/auto_clip.py", line 68, in run
[rank0]:     max_val, min_val = self.auto_clip_layer(
[rank0]:   File "/home/miniconda3/envs/llmc/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/llmc/llmc/compression/quantization/auto_clip.py", line 161, in auto_clip_layer
[rank0]:     q_w = self.fake_quantize_weight(
[rank0]:   File "/home/llmc/llmc/compression/quantization/auto_clip.py", line 271, in fake_quantize_weight
[rank0]:     q_w = self.wquantizer.fake_quant_weight_static(w, args)
[rank0]:   File "/home/llmc/llmc/compression/quantization/quant.py", line 814, in fake_quant_weight_static
[rank0]:     q_weight = self.quant_dequant(
[rank0]:   File "/home/llmc/llmc/compression/quantization/quant.py", line 715, in quant_dequant
[rank0]:     tensor = self.quant(tensor, scales, zeros, qmax, qmin)
[rank0]:   File "/home/llmc/llmc/compression/quantization/quant.py", line 701, in quant
[rank0]:     tensor = torch.clamp(self.round_func(tensor / scales) + zeros, qmin, qmax)
[rank0]: RuntimeError: The size of tensor a (3584) must match the size of tensor b (56) at non-singleton dimension 2

4，解决方案
调试发现其中的tensor和scales 维度问题。quant.py中的方法fake_quant_weight_static增加scales.reshape(-1, 1)就可以。是否可以提交pr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug on awq v2 #432

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug on awq v2 #432

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions