Skip to content

Commit afef2ff

Browse files
authored
Merge pull request #404 from ModelTC/temp
update README & docs
2 parents c70c7f6 + a4d5889 commit afef2ff

File tree

6 files changed

+499
-580
lines changed

6 files changed

+499
-580
lines changed

README.md

Lines changed: 114 additions & 165 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
1-
# LLMC: Towards Accurate and Efficient LLM Compression
1+
<div align="center" style="font-family: charter;">
2+
<h1> LLMC: Towards Accurate and Efficient LLM Compression </h1>
23

3-
<img src="./imgs/llmc.png" alt="llmc" style="zoom:35%;" />
4-
5-
<div align="center">
4+
<img src="./imgs/llmc.png" alt="llmc" width="75%" />
65

76
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
87
[![arXiv](https://img.shields.io/badge/LLMC-2405.06001-b31b1b)](https://arxiv.org/abs/2405.06001)
@@ -11,7 +10,7 @@
1110
[![Discord Banner](https://img.shields.io/discord/1139835312592392214?logo=discord&logoColor=white)](https://discord.com/invite/NfJzbkK3jY)
1211
[![QQ](https://img.shields.io/badge/QQ-EB1923?logo=tencent-qq&logoColor=white)](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592)
1312
[![Doc](https://img.shields.io/badge/docs-English-99cc2)](https://llmc-en.readthedocs.io/en/latest/)
14-
[![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://llmc-zhcn.readthedocs.io/en/latest/)
13+
[![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://llmc-zhcn.readthedocs.io/en/latest/)&#160;
1514

1615
**\[ English | [中文](README_zh.md) | [日本語](README_ja.md) \]**
1716

@@ -27,36 +26,33 @@ docker pull llmcompression/llmc:pure-latest
2726
docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-latest
2827
```
2928

30-
**Community**:
31-
32-
- [Discord Server](https://discord.com/invite/NfJzbkK3jY)
33-
- [Tencent QQ Group](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592)
29+
**Community**: [Discord Server](https://discord.com/invite/NfJzbkK3jY), [Tencent QQ Group](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592).
3430

35-
**Docs**:
31+
**Docs**: [English](https://llmc-en.readthedocs.io/en/latest/), [Chinese](https://llmc-zhcn.readthedocs.io/en/latest/).
3632

37-
- [English](https://llmc-en.readthedocs.io/en/latest/)
38-
- [Chinese](https://llmc-zhcn.readthedocs.io/en/latest/)
39-
40-
## Latest News
33+
## :fire: Latest News
4134

4235
- **May 12, 2025:** 🔥 We now fully support quantization for the **`Wan2.1`** series of video generation models and provide export of truly quantized **INT8/FP8** weights, compatible with the [lightx2v](https://github.com/ModelTC/lightx2v) inference framework. For details, please refer to the [lightx2v documentation](https://llmc-en.readthedocs.io/en/latest/backend/lightx2v.html).
4336

44-
- **Feb 7, 2025:** 🔥 We now fully support quantization of large-scale **`MOE`** models like **`DeepSeekv3`**, **`DeepSeek-R1`**, and **`DeepSeek-R1-zero`** with **`671B`** parameters. You can now directly load FP8 weights without any extra conversion. AWQ and RTN quantization can run on a single 80GB GPU, and we also support the export of true quantized **INT4/INT8** weights.
37+
- **Feb 07, 2025:** 🔥 We now fully support quantization of large-scale **`MOE`** models like **`DeepSeekv3`**, **`DeepSeek-R1`**, and **`DeepSeek-R1-zero`** with **`671B`** parameters. You can now directly load FP8 weights without any extra conversion. AWQ and RTN quantization can run on a single 80GB GPU, and we also support the export of true quantized **INT4/INT8** weights.
4538

4639
- **Nov 20, 2024:** 🔥 We now fully support the quantization of ✨`DeepSeekv2(2.5)` and other `MOE` models, as well as ✨`Qwen2VL`, `Llama3.2`, and other `VLM` models. Supported quantization methods include ✅integer quantization, ✅floating-point quantization, and advanced algorithms like ✅AWQ, ✅GPTQ, ✅SmoothQuant, and ✅Quarot.
4740

4841
- **Nov 12, 2024:** 🔥 We have added support for 💥`static per-tensor activation quantization` across various models and algorithms, covering ✅integer quantization and ✅floating-point quantization to further optimize performance and efficiency. Additionally, we now support exporting ✨`real quantized models` and using the [VLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang) backends for inference acceleration. For more details, refer to the [VLLM documentation](https://llmc-en.readthedocs.io/en/latest/backend/vllm.html) and [SGLang documentation](https://llmc-en.readthedocs.io/en/latest/backend/sglang.html).
4942

5043
- **Sep 26, 2024:** 🔥 We now support exporting 💥`FP8 quantized(E4M3, E5M2)` models from 🚀`LLMC` to advanced inference backends such as [VLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang). For detailed usage, please refer to the [VLLM documentation](https://llmc-en.readthedocs.io/en/latest/backend/vllm.html) and [SGLang documentation](https://llmc-en.readthedocs.io/en/latest/backend/sglang.html).
5144

45+
<details close>
46+
<summary>Previous News</summary>
47+
5248
- **Sep 24, 2024:** 🔥 We have officially released ✅INT4 and ✅INT8 models of ✨`Llama-3.1-405B`, quantized using 🚀`LLMC` in `save_lightllm` mode. You can download the model parameters [here](https://huggingface.co/Dongz/llama31-405b-quant).
5349

5450
- **Sep 23, 2024:** 🔥 We now support exporting ✨`real quantized(INT4, INT8)` models from 🚀`LLMC` to advanced inference backends such as [VLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), and [MLC-LLM](https://github.com/mlc-ai/mlc-llm) for quantized inference deployment, enabling ✨`reduced memory usage` and ✨`faster inference speeds`.
5551
For detailed usage, please refer to the [VLLM documentation](https://llmc-en.readthedocs.io/en/latest/backend/vllm.html), [SGLang documentation](https://llmc-en.readthedocs.io/en/latest/backend/sglang.html), [AutoAWQ documentation](https://llmc-en.readthedocs.io/en/latest/backend/autoawq.html), and [MLC-LLM documentation](https://llmc-en.readthedocs.io/en/latest/backend/mlcllm.html).
5652

57-
- **Sep 9, 2024:** 🔥 We provide some configs of our best practice towards superior performance (see Best Practice [here](https://llmc-en.readthedocs.io/en/latest/)).
53+
- **Sep 09, 2024:** 🔥 We provide some configs of our best practice towards superior performance (see Best Practice [here](https://llmc-en.readthedocs.io/en/latest/)).
5854

59-
* **Sep 3, 2024:** 🔥 We support [opencompass](https://github.com/open-compass/opencompass) 🤗 to eval 🚀`LLMC` model. Follow this [doc](https://llmc-en.readthedocs.io/en/latest/advanced/model_test_v2.html) and have a try!
55+
* **Sep 03, 2024:** 🔥 We support [opencompass](https://github.com/open-compass/opencompass) 🤗 to eval 🚀`LLMC` model. Follow this [doc](https://llmc-en.readthedocs.io/en/latest/advanced/model_test_v2.html) and have a try!
6056

6157
* **Aug 22, 2024:** 🔥We support lots of small language models, including current SOTA [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)(see [Supported Model List](#supported-model-list)).
6258

@@ -70,9 +66,6 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
7066

7167
(\* denotes equal contribution, 📧 denotes corresponding author.)
7268

73-
<details close>
74-
<summary>Previous News</summary>
75-
7669
- **Jul 16, 2024:** 🔥We support Wanda/Naive(Magnitude) for llm sparsification and layer-wise mix bits quantization now!
7770

7871
- **Jul 14, 2024:** 🔥We support rotation based quantization QuaRot now!
@@ -95,11 +88,11 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
9588
on the calibration data, algorithm pipeline, and quantization configuration selection. Based on the takeaways, a best practice for the LLM PTQ pipeline is designed, to achieve the best accuracy and efficiency performance balance
9689
under various scenarios.
9790

98-
- **Mar 7, 2024:** 🚀 We release the quantization part of a powerful and efficient LLM compression tool. Notably, our benchmark paper is coming soon😊.
91+
- **Mar 07, 2024:** 🚀 We release the quantization part of a powerful and efficient LLM compression tool. Notably, our benchmark paper is coming soon😊.
9992

10093
</details>
10194

102-
## Highlight Feature
95+
## 🚀 Highlight Feature
10396

10497
- 💥**Comprehensive Algorithm Support**: Provides a broad range of ✨`SOTA compression algorithms`, including ✅quantization, ✅mixed-precision quantization, and ✅sparsity, while maintaining accuracy consistent with the original repositories. ✨`Quantization best practices` (see 🚀`Best Practices` [here](https://llmc-en.readthedocs.io/en/latest/)) are also available to ensure optimal performance and efficiency.
10598

@@ -111,175 +104,131 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
111104

112105
- 💥**Performance Efficiency**: Enables quantization of large LLMs, such as ✨`Llama3.1-405B` and ✨`DeepSeek-R1-671B`, with PPL evaluation on a `single A100/H100/H800 GPU`.
113106

114-
## Usage
107+
## ⚙️ Usage
115108

116109
Please refer to the 🚀`Quick Start` section in the [documentation](https://llmc-en.readthedocs.io/en/latest/).
117110

118-
## Supported Model List
119-
120-
[BLOOM](https://huggingface.co/bigscience/bloom)
121-
122-
[LLaMA](https://github.com/facebookresearch/llama)
123-
124-
[LLaMA V2](https://huggingface.co/meta-llama)
125-
126-
[StarCoder](https://github.com/bigcode-project/starcoder)
127-
128-
[OPT](https://huggingface.co/docs/transformers/model_doc/opt)
129-
130-
[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)
131-
132-
[InternLM2](https://huggingface.co/internlm)
133-
134-
[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)
135-
136-
[LLaMA V3](https://huggingface.co/meta-llama)
137-
138-
[Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)
139-
140-
[Qwen V2](https://github.com/QwenLM/Qwen2)
141-
142-
[LLaVA](https://github.com/haotian-liu/LLaVA)
143-
144-
[InternLM2.5](https://huggingface.co/internlm)
145-
146-
[StableLM](https://github.com/Stability-AI/StableLM)
147-
148-
[Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2)
149-
150-
[Phi2](https://huggingface.co/microsoft/phi-2)
151-
152-
[Phi 1.5](https://huggingface.co/microsoft/phi-1_5)
153-
154-
[MiniCPM](https://github.com/OpenBMB/MiniCPM)
155-
156-
[SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)
157-
158-
[DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5)
159-
160-
[LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)
111+
## :robot: Supported Model List
112+
113+
-[BLOOM](https://huggingface.co/bigscience/bloom)
114+
-[LLaMA](https://github.com/facebookresearch/llama)
115+
-[LLaMA V2](https://huggingface.co/meta-llama)
116+
-[StarCoder](https://github.com/bigcode-project/starcoder)
117+
-[OPT](https://huggingface.co/docs/transformers/model_doc/opt)
118+
119+
<details>
120+
<summary>More Supported Models&nbsp</summary>
121+
122+
-[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)
123+
-[InternLM2](https://huggingface.co/internlm)
124+
-[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)
125+
-[LLaMA V3](https://huggingface.co/meta-llama)
126+
-[Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)
127+
-[Qwen V2](https://github.com/QwenLM/Qwen2)
128+
-[LLaVA](https://github.com/haotian-liu/LLaVA)
129+
-[InternLM2.5](https://huggingface.co/internlm)
130+
-[StableLM](https://github.com/Stability-AI/StableLM)
131+
-[Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2)
132+
-[Phi2](https://huggingface.co/microsoft/phi-2)
133+
-[Phi 1.5](https://huggingface.co/microsoft/phi-1_5)
134+
-[MiniCPM](https://github.com/OpenBMB/MiniCPM)
135+
-[SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)
136+
-[DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5)
137+
-[LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)
138+
-[Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)
139+
-[Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
140+
-[InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)
161141

162-
[Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)
163-
164-
[Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
165-
166-
[InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)
142+
</details>
167143

168144
You can add your own model type referring to files under `llmc/models/*.py`.
169145

170-
## Supported Backend List
171-
172-
[VLLM](https://github.com/vllm-project/vllm)
146+
## :bus: Supported Backend List
173147

174-
[LightLLM](https://github.com/ModelTC/lightllm)
148+
-[VLLM](https://github.com/vllm-project/vllm)
149+
-[LightLLM](https://github.com/ModelTC/lightllm)
150+
-[Sglang](https://github.com/sgl-project/sglang)
151+
-[MLC-LLM](https://github.com/mlc-ai/mlc-llm)
152+
-[AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
175153

176-
[Sglang](https://github.com/sgl-project/sglang)
177-
178-
[MLC-LLM](https://github.com/mlc-ai/mlc-llm)
179-
180-
[AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
181-
182-
## Supported Algorithm List
154+
## 💡 Supported Algorithm List
183155

184156
### Quantization
185157

186-
✅ Naive
187-
188-
[AWQ](https://arxiv.org/abs/2306.00978)
189-
190-
[GPTQ](https://arxiv.org/abs/2210.17323)
191-
192-
[SmoothQuant](https://arxiv.org/abs/2211.10438)
193-
194-
[OS+](https://arxiv.org/abs/2304.09145)
195-
196-
[OmniQuant](https://arxiv.org/abs/2308.13137)
197-
198-
[NormTweaking](https://arxiv.org/abs/2309.02784)
199-
200-
[AdaDim](https://arxiv.org/pdf/2309.15531.pdf)
201-
202-
[QUIK](https://arxiv.org/abs/2310.09259)
158+
- ✅ Naive
159+
-[AWQ](https://arxiv.org/abs/2306.00978)
160+
-[GPTQ](https://arxiv.org/abs/2210.17323)
161+
-[SmoothQuant](https://arxiv.org/abs/2211.10438)
162+
-[OS+](https://arxiv.org/abs/2304.09145)
163+
164+
<details>
165+
<summary>More Supported Algorithms&nbsp</summary>
166+
167+
-[OmniQuant](https://arxiv.org/abs/2308.13137)
168+
-[NormTweaking](https://arxiv.org/abs/2309.02784)
169+
-[AdaDim](https://arxiv.org/pdf/2309.15531.pdf)
170+
-[QUIK](https://arxiv.org/abs/2310.09259)
171+
-[SpQR](https://arxiv.org/abs/2306.03078)
172+
-[DGQ](https://arxiv.org/abs/2310.04836)
173+
-[OWQ](https://arxiv.org/abs/2306.02272)
174+
-[LLM.int8()](https://arxiv.org/abs/2208.07339)
175+
-[HQQ](https://mobiusml.github.io/hqq_blog/)
176+
-[QuaRot](https://arxiv.org/abs/2404.00456)
177+
-[SpinQuant](https://arxiv.org/abs/2405.16406) **([See this branch](https://github.com/ModelTC/llmc/tree/dev_spinquant))**
178+
-[TesseraQ](https://arxiv.org/abs/2410.19103)
203179

204-
[SpQR](https://arxiv.org/abs/2306.03078)
205-
206-
[DGQ](https://arxiv.org/abs/2310.04836)
207-
208-
[OWQ](https://arxiv.org/abs/2306.02272)
209-
210-
[LLM.int8()](https://arxiv.org/abs/2208.07339)
211-
212-
[HQQ](https://mobiusml.github.io/hqq_blog/)
213-
214-
[QuaRot](https://arxiv.org/abs/2404.00456)
215-
216-
[SpinQuant](https://arxiv.org/abs/2405.16406) **([See this branch](https://github.com/ModelTC/llmc/tree/dev_spinquant))**
217-
218-
[TesseraQ](https://arxiv.org/abs/2410.19103)
180+
</details>
219181

220182
### Pruning
221183

222-
✅ Naive(Magnitude)
184+
- ✅ Naive(Magnitude)
185+
-[Wanda](https://arxiv.org/abs/2306.11695)
186+
-[ShortGPT](https://arxiv.org/abs/2403.03853)
223187

224-
[Wanda](https://arxiv.org/abs/2306.11695)
188+
## 🤝 Acknowledgments
225189

226-
[ShortGPT](https://arxiv.org/abs/2403.03853)
190+
We develop our code referring to the following repos:
227191

228-
## Acknowledgments
192+
- [mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq)
193+
- [mit-han-lab/smoothquant](https://github.com/mit-han-lab/smoothquant)
194+
- [OpenGVLab/OmniQuant](https://github.com/OpenGVLab/OmniQuant)
195+
- [IST-DASLab/gptq](https://github.com/IST-DASLab/gptq)
196+
- [ModelTC/Outlier_Suppression_Plus](https://github.com/ModelTC/Outlier_Suppression_Plus)
197+
198+
<details>
199+
<summary>More Related Implementations&nbsp</summary>
200+
201+
- [IST-DASLab/QUIK](https://github.com/IST-DASLab/QUIK)
202+
- [Vahe1994/SpQR](https://github.com/Vahe1994/SpQR)
203+
- [ilur98/DGQ](https://github.com/ilur98/DGQ)
204+
- [xvyaward/owq](https://github.com/xvyaward/owq)
205+
- [TimDettmers/bitsandbytes](https://github.com/TimDettmers/bitsandbytes)
206+
- [mobiusml/hqq](https://github.com/mobiusml/hqq)
207+
- [spcl/QuaRot](https://github.com/spcl/QuaRot)
208+
- [locuslab/wanda](https://github.com/locuslab/wanda)
209+
- [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
210+
- [facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant)
211+
- [Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ)
229212

230-
We develop our code referring to the following repos:
213+
</details>
231214

232-
- https://github.com/mit-han-lab/llm-awq
233-
- https://github.com/mit-han-lab/smoothquant
234-
- https://github.com/OpenGVLab/OmniQuant
235-
- https://github.com/IST-DASLab/gptq
236-
- https://github.com/ModelTC/Outlier_Suppression_Plus
237-
- https://github.com/IST-DASLab/QUIK
238-
- https://github.com/Vahe1994/SpQR
239-
- https://github.com/ilur98/DGQ
240-
- https://github.com/xvyaward/owq
241-
- https://github.com/TimDettmers/bitsandbytes
242-
- https://github.com/mobiusml/hqq
243-
- [https://github.com/spcl/QuaRot](https://github.com/spcl/QuaRot)
244-
- [https://github.com/locuslab/wanda](https://github.com/locuslab/wanda)
245-
- [https://github.com/EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
246-
- [https://github.com/facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant)
247-
- [https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ)
248-
249-
## Star History
215+
## 🌟 Star History
250216

251217
[![Star History Chart](https://api.star-history.com/svg?repos=ModelTC/llmc&type=Timeline)](https://star-history.com/#ModelTC/llmc&Timeline)
252218

253-
## Citation
219+
## ✏️ Citation
254220

255-
If you find our LLM-QBench paper/llmc toolkit useful or relevant to your research, please kindly cite our paper:
221+
If you find our toolkit or research paper useful or relevant to your research, please kindly cite our work:
256222

257223
```
258-
@misc{llmc,
259-
author = {llmc contributors},
260-
title = {llmc: Towards Accurate and Efficient LLM Compression},
261-
year = {2024},
262-
publisher = {GitHub},
263-
journal = {GitHub repository},
264-
howpublished = {\url{https://github.com/ModelTC/llmc}},
265-
}
266-
267-
@misc{gong2024llmqbench,
268-
title={LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models},
269-
author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Yunchen Zhang and Xianglong Liu and Dacheng Tao},
270-
year={2024},
271-
eprint={2405.06001},
272-
archivePrefix={arXiv},
273-
primaryClass={cs.LG}
274-
}
275-
276-
@misc{gong2024llmcbenchmarkinglargelanguage,
277-
title={LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit},
278-
author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chentao Lv and Yunchen Zhang and Xianglong Liu and Dacheng Tao},
279-
year={2024},
280-
eprint={2405.06001},
281-
archivePrefix={arXiv},
282-
primaryClass={cs.LG},
283-
url={https://arxiv.org/abs/2405.06001},
224+
@inproceedings{DBLP:conf/emnlp/GongYGHLZT024,
225+
author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chengtao Lv and Yunchen Zhang and Dacheng Tao and Xianglong Liu},
226+
title={LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit},
227+
year={2024},
228+
cdate={1704067200000},
229+
pages={132-152},
230+
url={https://aclanthology.org/2024.emnlp-industry.12},
231+
booktitle={EMNLP (Industry Track)},
232+
crossref={conf/emnlp/2024i}
284233
}
285234
```

0 commit comments

Comments
 (0)