+

[](https://opensource.org/licenses/Apache-2.0)
[](https://arxiv.org/abs/2405.06001)
@@ -11,7 +10,7 @@
[](https://discord.com/invite/NfJzbkK3jY)
[](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592)
[](https://llmc-en.readthedocs.io/en/latest/)
-[](https://llmc-zhcn.readthedocs.io/en/latest/)
+[](https://llmc-zhcn.readthedocs.io/en/latest/)
**\[ English | [中文](README_zh.md) | [日本語](README_ja.md) \]**
@@ -27,21 +26,15 @@ docker pull llmcompression/llmc:pure-latest
docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-latest
```
-**Community**:
-
-- [Discord Server](https://discord.com/invite/NfJzbkK3jY)
-- [Tencent QQ Group](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592)
+**Community**: [Discord Server](https://discord.com/invite/NfJzbkK3jY), [Tencent QQ Group](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592).
-**Docs**:
+**Docs**: [English](https://llmc-en.readthedocs.io/en/latest/), [Chinese](https://llmc-zhcn.readthedocs.io/en/latest/).
-- [English](https://llmc-en.readthedocs.io/en/latest/)
-- [Chinese](https://llmc-zhcn.readthedocs.io/en/latest/)
-
-## Latest News
+## :fire: Latest News
- **May 12, 2025:** 🔥 We now fully support quantization for the **`Wan2.1`** series of video generation models and provide export of truly quantized **INT8/FP8** weights, compatible with the [lightx2v](https://github.com/ModelTC/lightx2v) inference framework. For details, please refer to the [lightx2v documentation](https://llmc-en.readthedocs.io/en/latest/backend/lightx2v.html).
-- **Feb 7, 2025:** 🔥 We now fully support quantization of large-scale **`MOE`** models like **`DeepSeekv3`**, **`DeepSeek-R1`**, and **`DeepSeek-R1-zero`** with **`671B`** parameters. You can now directly load FP8 weights without any extra conversion. AWQ and RTN quantization can run on a single 80GB GPU, and we also support the export of true quantized **INT4/INT8** weights.
+- **Feb 07, 2025:** 🔥 We now fully support quantization of large-scale **`MOE`** models like **`DeepSeekv3`**, **`DeepSeek-R1`**, and **`DeepSeek-R1-zero`** with **`671B`** parameters. You can now directly load FP8 weights without any extra conversion. AWQ and RTN quantization can run on a single 80GB GPU, and we also support the export of true quantized **INT4/INT8** weights.
- **Nov 20, 2024:** 🔥 We now fully support the quantization of ✨`DeepSeekv2(2.5)` and other `MOE` models, as well as ✨`Qwen2VL`, `Llama3.2`, and other `VLM` models. Supported quantization methods include ✅integer quantization, ✅floating-point quantization, and advanced algorithms like ✅AWQ, ✅GPTQ, ✅SmoothQuant, and ✅Quarot.
@@ -49,14 +42,17 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
- **Sep 26, 2024:** 🔥 We now support exporting 💥`FP8 quantized(E4M3, E5M2)` models from 🚀`LLMC` to advanced inference backends such as [VLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang). For detailed usage, please refer to the [VLLM documentation](https://llmc-en.readthedocs.io/en/latest/backend/vllm.html) and [SGLang documentation](https://llmc-en.readthedocs.io/en/latest/backend/sglang.html).
+
+Previous News
+
- **Sep 24, 2024:** 🔥 We have officially released ✅INT4 and ✅INT8 models of ✨`Llama-3.1-405B`, quantized using 🚀`LLMC` in `save_lightllm` mode. You can download the model parameters [here](https://huggingface.co/Dongz/llama31-405b-quant).
- **Sep 23, 2024:** 🔥 We now support exporting ✨`real quantized(INT4, INT8)` models from 🚀`LLMC` to advanced inference backends such as [VLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), and [MLC-LLM](https://github.com/mlc-ai/mlc-llm) for quantized inference deployment, enabling ✨`reduced memory usage` and ✨`faster inference speeds`.
For detailed usage, please refer to the [VLLM documentation](https://llmc-en.readthedocs.io/en/latest/backend/vllm.html), [SGLang documentation](https://llmc-en.readthedocs.io/en/latest/backend/sglang.html), [AutoAWQ documentation](https://llmc-en.readthedocs.io/en/latest/backend/autoawq.html), and [MLC-LLM documentation](https://llmc-en.readthedocs.io/en/latest/backend/mlcllm.html).
-- **Sep 9, 2024:** 🔥 We provide some configs of our best practice towards superior performance (see Best Practice [here](https://llmc-en.readthedocs.io/en/latest/)).
+- **Sep 09, 2024:** 🔥 We provide some configs of our best practice towards superior performance (see Best Practice [here](https://llmc-en.readthedocs.io/en/latest/)).
-* **Sep 3, 2024:** 🔥 We support [opencompass](https://github.com/open-compass/opencompass) 🤗 to eval 🚀`LLMC` model. Follow this [doc](https://llmc-en.readthedocs.io/en/latest/advanced/model_test_v2.html) and have a try!
+* **Sep 03, 2024:** 🔥 We support [opencompass](https://github.com/open-compass/opencompass) 🤗 to eval 🚀`LLMC` model. Follow this [doc](https://llmc-en.readthedocs.io/en/latest/advanced/model_test_v2.html) and have a try!
* **Aug 22, 2024:** 🔥We support lots of small language models, including current SOTA [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)(see [Supported Model List](#supported-model-list)).
@@ -70,9 +66,6 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
(\* denotes equal contribution, 📧 denotes corresponding author.)
-
-Previous News
-
- **Jul 16, 2024:** 🔥We support Wanda/Naive(Magnitude) for llm sparsification and layer-wise mix bits quantization now!
- **Jul 14, 2024:** 🔥We support rotation based quantization QuaRot now!
@@ -95,11 +88,11 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
on the calibration data, algorithm pipeline, and quantization configuration selection. Based on the takeaways, a best practice for the LLM PTQ pipeline is designed, to achieve the best accuracy and efficiency performance balance
under various scenarios.
-- **Mar 7, 2024:** 🚀 We release the quantization part of a powerful and efficient LLM compression tool. Notably, our benchmark paper is coming soon😊.
+- **Mar 07, 2024:** 🚀 We release the quantization part of a powerful and efficient LLM compression tool. Notably, our benchmark paper is coming soon😊.
-## Highlight Feature
+## 🚀 Highlight Feature
- 💥**Comprehensive Algorithm Support**: Provides a broad range of ✨`SOTA compression algorithms`, including ✅quantization, ✅mixed-precision quantization, and ✅sparsity, while maintaining accuracy consistent with the original repositories. ✨`Quantization best practices` (see 🚀`Best Practices` [here](https://llmc-en.readthedocs.io/en/latest/)) are also available to ensure optimal performance and efficiency.
@@ -111,175 +104,131 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
- 💥**Performance Efficiency**: Enables quantization of large LLMs, such as ✨`Llama3.1-405B` and ✨`DeepSeek-R1-671B`, with PPL evaluation on a `single A100/H100/H800 GPU`.
-## Usage
+## ⚙️ Usage
Please refer to the 🚀`Quick Start` section in the [documentation](https://llmc-en.readthedocs.io/en/latest/).
-## Supported Model List
-
-✅ [BLOOM](https://huggingface.co/bigscience/bloom)
-
-✅ [LLaMA](https://github.com/facebookresearch/llama)
-
-✅ [LLaMA V2](https://huggingface.co/meta-llama)
-
-✅ [StarCoder](https://github.com/bigcode-project/starcoder)
-
-✅ [OPT](https://huggingface.co/docs/transformers/model_doc/opt)
-
-✅ [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)
-
-✅ [InternLM2](https://huggingface.co/internlm)
-
-✅ [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)
-
-✅ [LLaMA V3](https://huggingface.co/meta-llama)
-
-✅ [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)
-
-✅ [Qwen V2](https://github.com/QwenLM/Qwen2)
-
-✅ [LLaVA](https://github.com/haotian-liu/LLaVA)
-
-✅ [InternLM2.5](https://huggingface.co/internlm)
-
-✅ [StableLM](https://github.com/Stability-AI/StableLM)
-
-✅ [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2)
-
-✅ [Phi2](https://huggingface.co/microsoft/phi-2)
-
-✅ [Phi 1.5](https://huggingface.co/microsoft/phi-1_5)
-
-✅ [MiniCPM](https://github.com/OpenBMB/MiniCPM)
-
-✅ [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)
-
-✅ [DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5)
-
-✅ [LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)
+## :robot: Supported Model List
+
+- ✅ [BLOOM](https://huggingface.co/bigscience/bloom)
+- ✅ [LLaMA](https://github.com/facebookresearch/llama)
+- ✅ [LLaMA V2](https://huggingface.co/meta-llama)
+- ✅ [StarCoder](https://github.com/bigcode-project/starcoder)
+- ✅ [OPT](https://huggingface.co/docs/transformers/model_doc/opt)
+
+
+More Supported Models 
+
+- ✅ [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)
+- ✅ [InternLM2](https://huggingface.co/internlm)
+- ✅ [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)
+- ✅ [LLaMA V3](https://huggingface.co/meta-llama)
+- ✅ [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)
+- ✅ [Qwen V2](https://github.com/QwenLM/Qwen2)
+- ✅ [LLaVA](https://github.com/haotian-liu/LLaVA)
+- ✅ [InternLM2.5](https://huggingface.co/internlm)
+- ✅ [StableLM](https://github.com/Stability-AI/StableLM)
+- ✅ [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2)
+- ✅ [Phi2](https://huggingface.co/microsoft/phi-2)
+- ✅ [Phi 1.5](https://huggingface.co/microsoft/phi-1_5)
+- ✅ [MiniCPM](https://github.com/OpenBMB/MiniCPM)
+- ✅ [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)
+- ✅ [DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5)
+- ✅ [LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)
+- ✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)
+- ✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
+- ✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)
-✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)
-
-✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
-
-✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)
+
You can add your own model type referring to files under `llmc/models/*.py`.
-## Supported Backend List
-
-✅ [VLLM](https://github.com/vllm-project/vllm)
+## :bus: Supported Backend List
-✅ [LightLLM](https://github.com/ModelTC/lightllm)
+- ✅ [VLLM](https://github.com/vllm-project/vllm)
+- ✅ [LightLLM](https://github.com/ModelTC/lightllm)
+- ✅ [Sglang](https://github.com/sgl-project/sglang)
+- ✅ [MLC-LLM](https://github.com/mlc-ai/mlc-llm)
+- ✅ [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
-✅ [Sglang](https://github.com/sgl-project/sglang)
-
-✅ [MLC-LLM](https://github.com/mlc-ai/mlc-llm)
-
-✅ [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
-
-## Supported Algorithm List
+## 💡 Supported Algorithm List
### Quantization
-✅ Naive
-
-✅ [AWQ](https://arxiv.org/abs/2306.00978)
-
-✅ [GPTQ](https://arxiv.org/abs/2210.17323)
-
-✅ [SmoothQuant](https://arxiv.org/abs/2211.10438)
-
-✅ [OS+](https://arxiv.org/abs/2304.09145)
-
-✅ [OmniQuant](https://arxiv.org/abs/2308.13137)
-
-✅ [NormTweaking](https://arxiv.org/abs/2309.02784)
-
-✅ [AdaDim](https://arxiv.org/pdf/2309.15531.pdf)
-
-✅ [QUIK](https://arxiv.org/abs/2310.09259)
+- ✅ Naive
+- ✅ [AWQ](https://arxiv.org/abs/2306.00978)
+- ✅ [GPTQ](https://arxiv.org/abs/2210.17323)
+- ✅ [SmoothQuant](https://arxiv.org/abs/2211.10438)
+- ✅ [OS+](https://arxiv.org/abs/2304.09145)
+
+
+More Supported Algorithms 
+
+- ✅ [OmniQuant](https://arxiv.org/abs/2308.13137)
+- ✅ [NormTweaking](https://arxiv.org/abs/2309.02784)
+- ✅ [AdaDim](https://arxiv.org/pdf/2309.15531.pdf)
+- ✅ [QUIK](https://arxiv.org/abs/2310.09259)
+- ✅ [SpQR](https://arxiv.org/abs/2306.03078)
+- ✅ [DGQ](https://arxiv.org/abs/2310.04836)
+- ✅ [OWQ](https://arxiv.org/abs/2306.02272)
+- ✅ [LLM.int8()](https://arxiv.org/abs/2208.07339)
+- ✅ [HQQ](https://mobiusml.github.io/hqq_blog/)
+- ✅ [QuaRot](https://arxiv.org/abs/2404.00456)
+- ✅ [SpinQuant](https://arxiv.org/abs/2405.16406) **([See this branch](https://github.com/ModelTC/llmc/tree/dev_spinquant))**
+- ✅ [TesseraQ](https://arxiv.org/abs/2410.19103)
-✅ [SpQR](https://arxiv.org/abs/2306.03078)
-
-✅ [DGQ](https://arxiv.org/abs/2310.04836)
-
-✅ [OWQ](https://arxiv.org/abs/2306.02272)
-
-✅ [LLM.int8()](https://arxiv.org/abs/2208.07339)
-
-✅ [HQQ](https://mobiusml.github.io/hqq_blog/)
-
-✅ [QuaRot](https://arxiv.org/abs/2404.00456)
-
-✅ [SpinQuant](https://arxiv.org/abs/2405.16406) **([See this branch](https://github.com/ModelTC/llmc/tree/dev_spinquant))**
-
-✅ [TesseraQ](https://arxiv.org/abs/2410.19103)
+
### Pruning
-✅ Naive(Magnitude)
+- ✅ Naive(Magnitude)
+- ✅ [Wanda](https://arxiv.org/abs/2306.11695)
+- ✅ [ShortGPT](https://arxiv.org/abs/2403.03853)
-✅ [Wanda](https://arxiv.org/abs/2306.11695)
+## 🤝 Acknowledgments
-✅ [ShortGPT](https://arxiv.org/abs/2403.03853)
+We develop our code referring to the following repos:
-## Acknowledgments
+- [mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq)
+- [mit-han-lab/smoothquant](https://github.com/mit-han-lab/smoothquant)
+- [OpenGVLab/OmniQuant](https://github.com/OpenGVLab/OmniQuant)
+- [IST-DASLab/gptq](https://github.com/IST-DASLab/gptq)
+- [ModelTC/Outlier_Suppression_Plus](https://github.com/ModelTC/Outlier_Suppression_Plus)
+
+
+More Related Implementations 
+
+- [IST-DASLab/QUIK](https://github.com/IST-DASLab/QUIK)
+- [Vahe1994/SpQR](https://github.com/Vahe1994/SpQR)
+- [ilur98/DGQ](https://github.com/ilur98/DGQ)
+- [xvyaward/owq](https://github.com/xvyaward/owq)
+- [TimDettmers/bitsandbytes](https://github.com/TimDettmers/bitsandbytes)
+- [mobiusml/hqq](https://github.com/mobiusml/hqq)
+- [spcl/QuaRot](https://github.com/spcl/QuaRot)
+- [locuslab/wanda](https://github.com/locuslab/wanda)
+- [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
+- [facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant)
+- [Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ)
-We develop our code referring to the following repos:
+
-- https://github.com/mit-han-lab/llm-awq
-- https://github.com/mit-han-lab/smoothquant
-- https://github.com/OpenGVLab/OmniQuant
-- https://github.com/IST-DASLab/gptq
-- https://github.com/ModelTC/Outlier_Suppression_Plus
-- https://github.com/IST-DASLab/QUIK
-- https://github.com/Vahe1994/SpQR
-- https://github.com/ilur98/DGQ
-- https://github.com/xvyaward/owq
-- https://github.com/TimDettmers/bitsandbytes
-- https://github.com/mobiusml/hqq
-- [https://github.com/spcl/QuaRot](https://github.com/spcl/QuaRot)
-- [https://github.com/locuslab/wanda](https://github.com/locuslab/wanda)
-- [https://github.com/EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
-- [https://github.com/facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant)
-- [https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ)
-
-## Star History
+## 🌟 Star History
[](https://star-history.com/#ModelTC/llmc&Timeline)
-## Citation
+## ✏️ Citation
-If you find our LLM-QBench paper/llmc toolkit useful or relevant to your research, please kindly cite our paper:
+If you find our toolkit or research paper useful or relevant to your research, please kindly cite our work:
```
-@misc{llmc,
- author = {llmc contributors},
- title = {llmc: Towards Accurate and Efficient LLM Compression},
- year = {2024},
- publisher = {GitHub},
- journal = {GitHub repository},
- howpublished = {\url{https://github.com/ModelTC/llmc}},
-}
-
-@misc{gong2024llmqbench,
- title={LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models},
- author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Yunchen Zhang and Xianglong Liu and Dacheng Tao},
- year={2024},
- eprint={2405.06001},
- archivePrefix={arXiv},
- primaryClass={cs.LG}
-}
-
-@misc{gong2024llmcbenchmarkinglargelanguage,
- title={LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit},
- author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chentao Lv and Yunchen Zhang and Xianglong Liu and Dacheng Tao},
- year={2024},
- eprint={2405.06001},
- archivePrefix={arXiv},
- primaryClass={cs.LG},
- url={https://arxiv.org/abs/2405.06001},
+@inproceedings{DBLP:conf/emnlp/GongYGHLZT024,
+ author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chengtao Lv and Yunchen Zhang and Dacheng Tao and Xianglong Liu},
+ title={LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit},
+ year={2024},
+ cdate={1704067200000},
+ pages={132-152},
+ url={https://aclanthology.org/2024.emnlp-industry.12},
+ booktitle={EMNLP (Industry Track)},
+ crossref={conf/emnlp/2024i}
}
```
diff --git a/README_ja.md b/README_ja.md
index 6dead79f1..064079c58 100644
--- a/README_ja.md
+++ b/README_ja.md
@@ -1,8 +1,7 @@
-# LLMC: 正確で効率的なLLM圧縮に向けて
+
+
LLMC: 正確で効率的な LLM 圧縮に向けて
-

-
-
+

[](https://opensource.org/licenses/Apache-2.0)
[](https://arxiv.org/abs/2405.06001)
@@ -11,7 +10,7 @@
[](https://discord.com/invite/NfJzbkK3jY)
[](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592)
[](https://llmc-en.readthedocs.io/en/latest/)
-[](https://llmc-zhcn.readthedocs.io/en/latest/)
+[](https://llmc-zhcn.readthedocs.io/en/latest/)
**\[ [English](README.md) | [中文](README_zh.md) | 日本語 \]**
@@ -20,24 +19,18 @@
**LLMC** は、大規模言語モデル(LLM)の圧縮を目的とした、最新の圧縮アルゴリズムを活用して、パフォーマンスを損なうことなく効率を向上させ、モデルサイズを削減するためのツールです。以下のコマンドを使用して、llmcを実行できるDockerイメージをダウンロードできます。中国大陸のユーザーは、阿里云Dockerを使用することを推奨します。
```shell
-# docker hub: https://hub.docker.com/r/llmcompression/llmc
+# Docker Hub: https://hub.docker.com/r/llmcompression/llmc
docker pull llmcompression/llmc:pure-latest
-# 阿里云Docker: registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:[tag]
+# Aliyun Docker: registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:[tag]
docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-latest
```
-**コミュニティ**:
-
-- [Discordサーバー](https://discord.com/invite/NfJzbkK3jY)
-- [Tencent QQグループ](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592)
+**コミュニティ**: [Discord サーバー](https://discord.com/invite/NfJzbkK3jY)、[Tencent QQ グループ](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592)。
-**ドキュメント**:
+**ドキュメント**: [English](https://llmc-en.readthedocs.io/en/latest/)、[中文](https://llmc-zhcn.readthedocs.io/en/latest/)。
-- [英語](https://llmc-en.readthedocs.io/en/latest/)
-- [中国語](https://llmc-zhcn.readthedocs.io/en/latest/)
-
-## 最新情報
+## :fire: 最新ニュース
- **2025年5月12日:** 🔥 **`Wan2.1`** シリーズのビデオ生成モデルの量子化を完全にサポートし、実際に量子化された **INT8/FP8** 重みのエクスポートにも対応しました。これらは [lightx2v](https://github.com/ModelTC/lightx2v) 推論フレームワークと互換性があります。詳細は [lightx2v ドキュメント](https://llmc-en.readthedocs.io/en/latest/backend/lightx2v.html) をご参照ください。
@@ -49,6 +42,9 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
- **2024年9月26日:** 🔥 `LLMC`からの✨ `FP8量子化(E4M3、E5M2)`モデルを、VLLMやSGLangのような高度な推理バックエンドにエクスポートできるようになりました。🚀 詳細な使用方法については、[VLLMのドキュメント](https://llmc-en.readthedocs.io/en/latest/backend/vllm.html)と[SGLangのドキュメント](https://llmc-en.readthedocs.io/en/latest/backend/sglang.html)を参照してください。
+
+以前のニュース
+
- **2024年9月24日:** 🔥 私たちは正式に ✨`Llama-3.1-405B` の ✅INT4 と ✅INT8 モデルをリリースしました。これらは 🚀`LLMC` の `save_lightllm` モードを使用して量子化されています。モデルパラメータは[こちら](https://huggingface.co/Dongz/llama31-405b-quant)からダウンロードできます。
- **2024年9月23日:** 🔥 私たちは、🚀`LLMC` から ✨`実際の量子化された(INT4, INT8)` モデルを、 [VLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), [MLC-LLM](https://github.com/mlc-ai/mlc-llm) などの高度な推論バックエンドにエクスポートするサポートを追加しました。これにより、✨`メモリ使用量の削減` と ✨`推論速度の向上` が可能になります。
@@ -70,9 +66,6 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
(\*は同等の貢献を示し、📧は対応する著者を示します。)
-
-過去のニュース
-
- **2024年7月16日:** 🔥私たちはLLMの疎化のためのWanda/Naive(マグニチュード)および層ごとの混合ビット量子化のサポートを追加しました!
- **2024年7月14日:** 🔥私たちは回転ベースの量子化 QuaRot のサポートを追加しました!
@@ -97,7 +90,7 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
-## 主要機能
+## 🚀 特徴
- 💥**包括的なアルゴリズムサポート**: 広範な ✨`SOTA圧縮アルゴリズム` をサポートし、✅量子化、✅混合精度量子化、✅疎性を含み、元のリポジトリと同じ精度を維持します。✨`量子化ベストプラクティス`(ベストプラクティスは[こちら](https://llmc-en.readthedocs.io/en/latest/)をご覧ください)も提供されており、最適なパフォーマンスと効率を確保します。
@@ -109,175 +102,129 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
- 💥**パフォーマンス効率**: ✨`Llama3.1-405B` や ✨`DeepSeek-R1-671B` などの大規模LLMの量子化をサポートし、`単一の A100/H100/H800 GPU` でPPL評価を可能にします。
-## 使用方法
+## ⚙️ 使い方
使用ガイドは 🚀`Quick Start`セクション[こちら](https://llmc-en.readthedocs.io/en/latest/)をご覧ください。
-## サポートされているモデルリスト
-
-✅ [BLOOM](https://huggingface.co/bigscience/bloom)
-
-✅ [LLaMA](https://github.com/facebookresearch/llama)
-
-✅ [LLaMA V2](https://huggingface.co/meta-llama)
-
-✅ [StarCoder](https://github.com/bigcode-project/starcoder)
-
-✅ [OPT](https://huggingface.co/docs/transformers/model_doc/opt)
-
-✅ [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)
-
-✅ [InternLM2](https://huggingface.co/internlm)
-
-✅ [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)
-
-✅ [LLaMA V3](https://huggingface.co/meta-llama)
-
-✅ [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)
-
-✅ [Qwen V2](https://github.com/QwenLM/Qwen2)
-
-✅ [LLaVA](https://github.com/haotian-liu/LLaVA)
-
-✅ [InternLM2.5](https://huggingface.co/internlm)
-
-✅ [StableLM](https://github.com/Stability-AI/StableLM)
-
-✅ [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2)
-
-✅ [Phi2](https://huggingface.co/microsoft/phi-2)
-
-✅ [Phi 1.5](https://huggingface.co/microsoft/phi-1_5)
-
-✅ [MiniCPM](https://github.com/OpenBMB/MiniCPM)
-
-✅ [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)
-
-✅ [DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5)
-
-✅ [LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)
+## :robot: 対応モデル
+
+- ✅ [BLOOM](https://huggingface.co/bigscience/bloom)
+- ✅ [LLaMA](https://github.com/facebookresearch/llama)
+- ✅ [LLaMA V2](https://huggingface.co/meta-llama)
+- ✅ [StarCoder](https://github.com/bigcode-project/starcoder)
+- ✅ [OPT](https://huggingface.co/docs/transformers/model_doc/opt)
+
+
+その他のモデル
+
+- ✅ [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)
+- ✅ [InternLM2](https://huggingface.co/internlm)
+- ✅ [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)
+- ✅ [LLaMA V3](https://huggingface.co/meta-llama)
+- ✅ [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)
+- ✅ [Qwen V2](https://github.com/QwenLM/Qwen2)
+- ✅ [LLaVA](https://github.com/haotian-liu/LLaVA)
+- ✅ [InternLM2.5](https://huggingface.co/internlm)
+- ✅ [StableLM](https://github.com/Stability-AI/StableLM)
+- ✅ [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2)
+- ✅ [Phi2](https://huggingface.co/microsoft/phi-2)
+- ✅ [Phi 1.5](https://huggingface.co/microsoft/phi-1_5)
+- ✅ [MiniCPM](https://github.com/OpenBMB/MiniCPM)
+- ✅ [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)
+- ✅ [DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5)
+- ✅ [LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)
+- ✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)
+- ✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
+- ✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)
-✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)
-
-✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
-
-✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)
-
-独自のモデルタイプを追加するには、`llmc/models/*.py` ファイルを参照してください。
-
-## サポートされているバックエンドリスト
-
-✅ [VLLM](https://github.com/vllm-project/vllm)
-
-✅ [LightLLM](https://github.com/ModelTC/lightllm)
+
-✅ [Sglang](https://github.com/sgl-project/sglang)
+独自モデルを追加する場合は `llmc/models/*.py` を参照してください。
-✅ [MLC-LLM](https://github.com/mlc-ai/mlc-llm)
+## :bus: 対応バックエンド
-✅ [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
+- ✅ [VLLM](https://github.com/vllm-project/vllm)
+- ✅ [LightLLM](https://github.com/ModelTC/lightllm)
+- ✅ [Sglang](https://github.com/sgl-project/sglang)
+- ✅ [MLC-LLM](https://github.com/mlc-ai/mlc-llm)
+- ✅ [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
-## サポートされているアルゴリズムリスト
+## 💡 対応アルゴリズム
### 量子化
-✅ Naive
-
-✅ [AWQ](https://arxiv.org/abs/2306.00978)
-
-✅ [GPTQ](https://arxiv.org/abs/2210.17323)
-
-✅ [SmoothQuant](https://arxiv.org/abs/2211.10438)
-
-✅ [OS+](https://arxiv.org/abs/2304.09145)
-
-✅ [OmniQuant](https://arxiv.org/abs/2308.13137)
-
-✅ [NormTweaking](https://arxiv.org/abs/2309.02784)
-
-✅ [AdaDim](https://arxiv.org/pdf/2309.15531.pdf)
+- ✅ Naive
+- ✅ [AWQ](https://arxiv.org/abs/2306.00978)
+- ✅ [GPTQ](https://arxiv.org/abs/2210.17323)
+- ✅ [SmoothQuant](https://arxiv.org/abs/2211.10438)
+- ✅ [OS+](https://arxiv.org/abs/2304.09145)
+
+
+その他のアルゴリズム
+
+- ✅ [OmniQuant](https://arxiv.org/abs/2308.13137)
+- ✅ [NormTweaking](https://arxiv.org/abs/2309.02784)
+- ✅ [AdaDim](https://arxiv.org/pdf/2309.15531.pdf)
+- ✅ [QUIK](https://arxiv.org/abs/2310.09259)
+- ✅ [SpQR](https://arxiv.org/abs/2306.03078)
+- ✅ [DGQ](https://arxiv.org/abs/2310.04836)
+- ✅ [OWQ](https://arxiv.org/abs/2306.02272)
+- ✅ [LLM.int8()](https://arxiv.org/abs/2208.07339)
+- ✅ [HQQ](https://mobiusml.github.io/hqq_blog/)
+- ✅ [QuaRot](https://arxiv.org/abs/2404.00456)
+- ✅ [SpinQuant](https://arxiv.org/abs/2405.16406) **([このブランチを参照してください](https://github.com/ModelTC/llmc/tree/dev_spinquant))**
+- ✅ [TesseraQ](https://arxiv.org/abs/2410.19103)
-✅ [QUIK](https://arxiv.org/abs/2310.09259)
-
-✅ [SpQR](https://arxiv.org/abs/2306.03078)
-
-✅ [DGQ](https://arxiv.org/abs/2310.04836)
-
-✅ [OWQ](https://arxiv.org/abs/2306.02272)
-
-✅ [LLM.int8()](https://arxiv.org/abs/2208.07339)
-
-✅ [HQQ](https://mobiusml.github.io/hqq_blog/)
-
-✅ [QuaRot](https://arxiv.org/abs/2404.00456)
-
-✅ [SpinQuant](https://arxiv.org/abs/2405.16406) **([このブランチを参照してください](https://github.com/ModelTC/llmc/tree/dev_spinquant))**
+
-✅ [TesseraQ](https://arxiv.org/abs/2410.19103)
+### プルーニング
-### プルーニング(剪定)
+- ✅ Naive(Magnitude)
+- ✅ [Wanda](https://arxiv.org/abs/2306.11695)
+- ✅ [ShortGPT](https://arxiv.org/abs/2403.03853)
-✅ Naive(マグニチュード)
+## 🤝 謝辞
-✅ [Wanda](https://arxiv.org/abs/2306.11695)
+本プロジェクトは以下のリポジトリを参考にしています:
-✅ [ShortGPT](https://arxiv.org/abs/2403.03853)
+- [mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq)
+- [mit-han-lab/smoothquant](https://github.com/mit-han-lab/smoothquant)
+- [OpenGVLab/OmniQuant](https://github.com/OpenGVLab/OmniQuant)
+- [IST-DASLab/gptq](https://github.com/IST-DASLab/gptq)
+- [ModelTC/Outlier_Suppression_Plus](https://github.com/ModelTC/Outlier_Suppression_Plus)
-## 謝辞
+
+その他の実装
-以下のリポジトリを参考にしてコードを開発しました:
+- [IST-DASLab/QUIK](https://github.com/IST-DASLab/QUIK)
+- [Vahe1994/SpQR](https://github.com/Vahe1994/SpQR)
+- [ilur98/DGQ](https://github.com/ilur98/DGQ)
+- [xvyaward/owq](https://github.com/xvyaward/owq)
+- [TimDettmers/bitsandbytes](https://github.com/TimDettmers/bitsandbytes)
+- [mobiusml/hqq](https://github.com/mobiusml/hqq)
+- [spcl/QuaRot](https://github.com/spcl/QuaRot)
+- [locuslab/wanda](https://github.com/locuslab/wanda)
+- [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
+- [facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant)
+- [Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ)
-- https://github.com/mit-han-lab/llm-awq
-- https://github.com/mit-han-lab/smoothquant
-- https://github.com/OpenGVLab/OmniQuant
-- https://github.com/IST-DASLab/gptq
-- https://github.com/ModelTC/Outlier_Suppression_Plus
-- https://github.com/IST-DASLab/QUIK
-- https://github.com/Vahe1994/SpQR
-- https://github.com/ilur98/DGQ
-- https://github.com/xvyaward/owq
-- https://github.com/TimDettmers/bitsandbytes
-- https://github.com/mobiusml/hqq
-- [https://github.com/spcl/QuaRot](https://github.com/spcl/QuaRot)
-- [https://github.com/locuslab/wanda](https://github.com/locuslab/wanda)
-- [https://github.com/EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
-- [https://github.com/facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant)
-- [https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ)
+
-## スター履歴
+## 🌟 Star 履歴
-[](https://star-history.com/#ModelTC/llmc&Timeline)
+[](https://star-history.com/#ModelTC/llmc&Timeline)
-## 引用
+## ✏️ 引用
-LLM-QBench論文/llmcツールキットが研究に役立つまたは関連している場合は、論文を引用してください:
+本ツールキットまたは論文が参考になった場合は、以下を引用してください:
```
-@misc{llmc,
- author = {llmc contributors},
- title = {llmc: Towards Accurate and Efficient LLM Compression},
- year = {2024},
- publisher = {GitHub},
- journal = {GitHub repository},
- howpublished = {\url{https://github.com/ModelTC/llmc}},
-}
-
-@misc{gong2024llmqbench,
- title={LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models},
- author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Yunchen Zhang and Xianglong Liu and Dacheng Tao},
- year={2024},
- eprint={2405.06001},
- archivePrefix={arXiv},
- primaryClass={cs.LG}
-}
-
-@misc{gong2024llmcbenchmarkinglargelanguage,
- title={LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit},
- author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chentao Lv and Yunchen Zhang and Xianglong Liu and Dacheng Tao},
- year={2024},
- eprint={2405.06001},
- archivePrefix={arXiv},
- primaryClass={cs.LG},
- url={https://arxiv.org/abs/2405.06001},
+@inproceedings{DBLP:conf/emnlp/GongYGHLZT024,
+ author = {Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chengtao Lv and Yunchen Zhang and Dacheng Tao and Xianglong Liu},
+ title = {LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit},
+ booktitle = {EMNLP (Industry Track)},
+ year = {2024},
+ pages = {132--152},
+ url = {https://aclanthology.org/2024.emnlp-industry.12}
}
```
diff --git a/README_zh.md b/README_zh.md
index ae2b3e5f6..9699fe275 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -1,8 +1,7 @@
-# LLMC: 准确高效的LLM压缩工具
+
+
LLMC:迈向准确且高效的大语言模型压缩
-

-
-
+

[](https://opensource.org/licenses/Apache-2.0)
[](https://arxiv.org/abs/2405.06001)
@@ -11,7 +10,7 @@
[](https://discord.com/invite/NfJzbkK3jY)
[](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592)
[](https://llmc-en.readthedocs.io/en/latest/)
-[](https://llmc-zhcn.readthedocs.io/en/latest/)
+[](https://llmc-zhcn.readthedocs.io/en/latest/)
**\[ [English](README.md) | 中文 | [日本語](README_ja.md) \]**
@@ -20,24 +19,18 @@
**LLMC** 是一个开箱即用的工具,专为压缩LLM设计,利用最先进的压缩算法提高效率并减少模型体积,同时不影响预测精度。你可以通过以下命令下载可以运行llmc的docker镜像,中国大陆用户推荐使用阿里云docker。
```shell
-# docker hub: https://hub.docker.com/r/llmcompression/llmc
+# Docker Hub: https://hub.docker.com/r/llmcompression/llmc
docker pull llmcompression/llmc:pure-latest
-# 阿里云docker: registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:[tag]
+# 阿里云镜像: registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:[tag]
docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-latest
```
-**社区**:
-
-- [Discord 服务器](https://discord.com/invite/NfJzbkK3jY)
-- [腾讯QQ群](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592)
+**社区**: [Discord 服务器](https://discord.com/invite/NfJzbkK3jY)、[腾讯 QQ 群](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592)。
-**文档**:
+**文档**: [English](https://llmc-en.readthedocs.io/en/latest/)、[中文](https://llmc-zhcn.readthedocs.io/en/latest/)。
-- [英文](https://llmc-en.readthedocs.io/en/latest/)
-- [中文](https://llmc-zhcn.readthedocs.io/en/latest/)
-
-## 最新消息
+## :fire: 最新动态
- **2025年5月12日:** 🔥 我们现已全面支持 **`Wan2.1`** 系列视频生成模型的量化,并支持导出真实量化的 **INT8/FP8** 权重,兼容 [lightx2v](https://github.com/ModelTC/lightx2v) 推理框架。详情请参考 [lightx2v 使用文档](https://llmc-zhcn.readthedocs.io/en/latest/backend/lightx2v.html)。
@@ -49,6 +42,9 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
- **2024年9月26日:** 🔥 我们现在支持从🚀 `LLMC`导出💥 `FP8 量化(E4M3,E5M2)`模型到一些先进的推理后端,例如[VLLM](https://github.com/vllm-project/vllm)和[SGLang](https://github.com/sgl-project/sglang)。关于详细使用方法,请参阅[VLLM文档](https://llmc-zhcn.readthedocs.io/en/latest/backend/vllm.html)和[SGLang文档](https://llmc-zhcn.readthedocs.io/en/latest/backend/sglang.html)。
+
+更早动态
+
- **2024年9月24日:** 🔥 我们正式发布了 ✨`Llama-3.1-405B` 的 ✅INT4 和 ✅INT8 模型,这些模型通过 🚀`LLMC` 使用 `save_lightllm` 模式进行量化。你可以在[此处](https://huggingface.co/Dongz/llama31-405b-quant)下载模型参数。
- **2024年9月23日:** 🔥 我们现在支持从 🚀`LLMC` 导出 ✨`真正量化的(INT4, INT8)` 模型到先进推理后端,例如 [VLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), 和 [MLC-LLM](https://github.com/mlc-ai/mlc-llm) 用于量化推理部署,从而实现 ✨`减少内存使用` 和 ✨`加快推理速度`。
@@ -70,9 +66,6 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
(\* 表示同等贡献,📧 表示通讯作者。)
-
-历史消息
-
- **2024年7月16日:** 🔥我们现在支持 Wanda/Naive(幅度)进行 LLM 稀疏化和逐层混合比特量化!
- **2024年7月14日:** 🔥我们现在支持基于旋转的量化 QuaRot!
@@ -97,7 +90,7 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
-## 亮点功能
+## 🚀 亮点功能
- 💥**综合算法支持**: 提供广泛的 ✨`SOTA压缩算法` 支持,包括 ✅量化、✅混合精度量化 和 ✅稀疏化,同时保持与原始仓库一致的精度。我们还提供 ✨`量化最佳实践`(参见✨`最佳实践` 章节[此处](https://llmc-zhcn.readthedocs.io/en/latest/)),确保最佳性能和效率。
@@ -109,177 +102,129 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
- 💥**性能效率**: 支持大规模LLM的量化,例如 ✨`Llama3.1-405B` 和 ✨`DeepSeek-R1-671B`,并可在 `单个 A100/H100/H800 GPU` 上评估 PPL。
-## 使用指南
+## ⚙️ 快速上手
请参阅 🚀`快速入门`章节[此处](https://llmc-zhcn.readthedocs.io/en/latest/)。
-## 支持的模型列表
-
-✅ [BLOOM](https://huggingface.co/bigscience/bloom)
-
-✅ [LLaMA](https://github.com/facebookresearch/llama)
-
-✅ [LLaMA V2](https://huggingface.co/meta-llama)
-
-✅ [StarCoder](https://github.com/bigcode-project/starcoder)
-
-✅ [OPT](https://huggingface.co/docs/transformers/model_doc/opt)
-
-✅ [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)
-
-✅ [InternLM2](https://huggingface.co/internlm)
-
-✅ [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)
-
-✅ [LLaMA V3](https://huggingface.co/meta-llama)
-
-✅ [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)
-
-✅ [Qwen V2](https://github.com/QwenLM/Qwen2)
-
-✅ [LLaVA](https://github.com/haotian-liu/LLaVA)
-
-✅ [InternLM2.5](https://huggingface.co/internlm)
-
-✅ [StableLM](https://github.com/Stability-AI/StableLM)
-
-✅ [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2)
-
-✅ [Phi2](https://huggingface.co/microsoft/phi-2)
-
-✅ [Phi 1.5](https://huggingface.co/microsoft/phi-1_5)
-
-✅ [MiniCPM](https://github.com/OpenBMB/MiniCPM)
-
-✅ [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)
-
-✅ [DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5)
-
-✅ [LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)
-
-✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)
+## :robot: 支持的模型
+
+- ✅ [BLOOM](https://huggingface.co/bigscience/bloom)
+- ✅ [LLaMA](https://github.com/facebookresearch/llama)
+- ✅ [LLaMA V2](https://huggingface.co/meta-llama)
+- ✅ [StarCoder](https://github.com/bigcode-project/starcoder)
+- ✅ [OPT](https://huggingface.co/docs/transformers/model_doc/opt)
+
+
+更多模型
+
+- ✅ [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)
+- ✅ [InternLM2](https://huggingface.co/internlm)
+- ✅ [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)
+- ✅ [LLaMA V3](https://huggingface.co/meta-llama)
+- ✅ [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)
+- ✅ [Qwen V2](https://github.com/QwenLM/Qwen2)
+- ✅ [LLaVA](https://github.com/haotian-liu/LLaVA)
+- ✅ [InternLM2.5](https://huggingface.co/internlm)
+- ✅ [StableLM](https://github.com/Stability-AI/StableLM)
+- ✅ [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2)
+- ✅ [Phi2](https://huggingface.co/microsoft/phi-2)
+- ✅ [Phi 1.5](https://huggingface.co/microsoft/phi-1_5)
+- ✅ [MiniCPM](https://github.com/OpenBMB/MiniCPM)
+- ✅ [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)
+- ✅ [DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5)
+- ✅ [LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)
+- ✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)
+- ✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
+- ✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)
-✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
-
-✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)
-
-你可以参考 `llmc/models/*.py` 文件添加自己的模型类型。
-
-## 支持的后端列表
-
-✅ [VLLM](https://github.com/vllm-project/vllm)
-
-✅ [LightLLM](https://github.com/ModelTC/lightllm)
+
-✅ [Sglang](https://github.com/sgl-project/sglang)
+您可参考 `llmc/models/*.py` 添加自定义模型。
-✅ [MLC-LLM](https://github.com/mlc-ai/mlc-llm)
+## :bus: 支持的后端
-✅ [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
+- ✅ [VLLM](https://github.com/vllm-project/vllm)
+- ✅ [LightLLM](https://github.com/ModelTC/lightllm)
+- ✅ [Sglang](https://github.com/sgl-project/sglang)
+- ✅ [MLC-LLM](https://github.com/mlc-ai/mlc-llm)
+- ✅ [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
-## 支持的算法列表
+## 💡 支持的算法
### 量化
-✅ Naive
-
-✅ [AWQ](https://arxiv.org/abs/2306.00978)
-
-✅ [GPTQ](https://arxiv.org/abs/2210.17323)
-
-✅ [SmoothQuant](https://arxiv.org/abs/2211.10438)
-
-✅ [OS+](https://arxiv.org/abs/2304.09145)
-
-✅ [OmniQuant](https://arxiv.org/abs/2308.13137)
-
-✅ [NormTweaking](https://arxiv.org/abs/2309.02784)
-
-✅ [AdaDim](https://arxiv.org/pdf/2309.15531.pdf)
-
-✅ [QUIK](https://arxiv.org/abs/2310.09259)
-
-✅ [SpQR](https://arxiv.org/abs/2306.03078)
+- ✅ Naive
+- ✅ [AWQ](https://arxiv.org/abs/2306.00978)
+- ✅ [GPTQ](https://arxiv.org/abs/2210.17323)
+- ✅ [SmoothQuant](https://arxiv.org/abs/2211.10438)
+- ✅ [OS+](https://arxiv.org/abs/2304.09145)
+
+
+更多算法
+
+- ✅ [OmniQuant](https://arxiv.org/abs/2308.13137)
+- ✅ [NormTweaking](https://arxiv.org/abs/2309.02784)
+- ✅ [AdaDim](https://arxiv.org/pdf/2309.15531.pdf)
+- ✅ [QUIK](https://arxiv.org/abs/2310.09259)
+- ✅ [SpQR](https://arxiv.org/abs/2306.03078)
+- ✅ [DGQ](https://arxiv.org/abs/2310.04836)
+- ✅ [OWQ](https://arxiv.org/abs/2306.02272)
+- ✅ [LLM.int8()](https://arxiv.org/abs/2208.07339)
+- ✅ [HQQ](https://mobiusml.github.io/hqq_blog/)
+- ✅ [QuaRot](https://arxiv.org/abs/2404.00456)
+- ✅ [SpinQuant](https://arxiv.org/abs/2405.16406) **([见此分支](https://github.com/ModelTC/llmc/tree/dev_spinquant))**
+- ✅ [TesseraQ](https://arxiv.org/abs/2410.19103)
-✅ [DGQ](https://arxiv.org/abs/2310.04836)
-
-✅ [OWQ](https://arxiv.org/abs/2306.02272)
-
-✅ [LLM.int8()](https://arxiv.org/abs/2208.07339)
-
-✅ [HQQ](https://mobiusml.github.io/hqq_blog/)
-
-✅ [QuaRot](https://arxiv.org/abs/2404.00456)
-
-✅ [SpinQuant](https://arxiv.org/abs/2405.16406) **([见此分支](https://github.com/ModelTC/llmc/tree/dev_spinquant))**
-
-✅ [TesseraQ](https://arxiv.org/abs/2410.19103)
+
### 剪枝
-✅ Naive(Magnitude)
+- ✅ Naive(Magnitude)
+- ✅ [Wanda](https://arxiv.org/abs/2306.11695)
+- ✅ [ShortGPT](https://arxiv.org/abs/2403.03853)
-✅ [Wanda](https://arxiv.org/abs/2306.11695)
+## 🤝 致谢
-✅ [ShortGPT](https://arxiv.org/abs/2403.03853)
+本项目参考了以下仓库:
-## 鸣谢
+- [mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq)
+- [mit-han-lab/smoothquant](https://github.com/mit-han-lab/smoothquant)
+- [OpenGVLab/OmniQuant](https://github.com/OpenGVLab/OmniQuant)
+- [IST-DASLab/gptq](https://github.com/IST-DASLab/gptq)
+- [ModelTC/Outlier_Suppression_Plus](https://github.com/ModelTC/Outlier_Suppression_Plus)
-我们的代码参考了以下仓库:
+
+更多相关实现
-- https://github.com/mit-han-lab/llm-awq
-- https://github.com/mit-han-lab/smoothquant
-- https://github.com/OpenGVLab/OmniQuant
-- https://github.com/IST-DASLab/gptq
-- https://github.com/ModelTC/Outlier_Suppression_Plus
-- https://github.com/IST-DASLab/QUIK
-- https://github.com/Vahe1994/SpQR
-- https://github.com/ilur98/DGQ
-- https://github.com/xvyaward/owq
-- https://github.com/TimDettmers/bitsandbytes
-- https://github.com/mobiusml/hqq
-- [https://github.com/spcl/QuaRot](https://github.com/spcl/QuaRot)
-- [https://github.com/locuslab/wanda](https://github.com/locuslab/wanda)
-- [https://github.com/EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
-- [https://github.com/facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant)
-- [https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ)
+- [IST-DASLab/QUIK](https://github.com/IST-DASLab/QUIK)
+- [Vahe1994/SpQR](https://github.com/Vahe1994/SpQR)
+- [ilur98/DGQ](https://github.com/ilur98/DGQ)
+- [xvyaward/owq](https://github.com/xvyaward/owq)
+- [TimDettmers/bitsandbytes](https://github.com/TimDettmers/bitsandbytes)
+- [mobiusml/hqq](https://github.com/mobiusml/hqq)
+- [spcl/QuaRot](https://github.com/spcl/QuaRot)
+- [locuslab/wanda](https://github.com/locuslab/wanda)
+- [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
+- [facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant)
+- [Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ)
-## Star 历史
+
-[](https://star-history.com/#ModelTC/llmc&Timeline)
+## 🌟 Star 历史
-## 引用
+[](https://star-history.com/#ModelTC/llmc&Timeline)
-## 引用
+## ✏️ 引用
-如果您认为我们的 LLM-QBench 论文/llmc 工具对您的研究有用或相关,请务必引用我们的论文:
+如果您觉得本工具包或相关论文对您的研究有帮助,请引用:
```
-@misc{llmc,
- author = {llmc contributors},
- title = {llmc: Towards Accurate and Efficient LLM Compression},
- year = {2024},
- publisher = {GitHub},
- journal = {GitHub repository},
- howpublished = {\url{https://github.com/ModelTC/llmc}},
-}
-
-@misc{gong2024llmqbench,
- title={LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models},
- author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Yunchen Zhang and Xianglong Liu and Dacheng Tao},
- year={2024},
- eprint={2405.06001},
- archivePrefix={arXiv},
- primaryClass={cs.LG}
-}
-
-@misc{gong2024llmcbenchmarkinglargelanguage,
- title={LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit},
- author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chentao Lv and Yunchen Zhang and Xianglong Liu and Dacheng Tao},
- year={2024},
- eprint={2405.06001},
- archivePrefix={arXiv},
- primaryClass={cs.LG},
- url={https://arxiv.org/abs/2405.06001},
+@inproceedings{DBLP:conf/emnlp/GongYGHLZT024,
+ author = {Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chengtao Lv and Yunchen Zhang and Dacheng Tao and Xianglong Liu},
+ title = {LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit},
+ booktitle = {EMNLP (Industry Track)},
+ year = {2024},
+ pages = {132--152},
+ url = {https://aclanthology.org/2024.emnlp-industry.12}
}
```
diff --git a/docs/en/source/conf.py b/docs/en/source/conf.py
index 879d62c58..7e78c0fb8 100644
--- a/docs/en/source/conf.py
+++ b/docs/en/source/conf.py
@@ -1,17 +1,26 @@
# Configuration file for the Sphinx documentation builder.
#
-# For the full list of built-in configuration values, see the documentation:
-# https://www.sphinx-doc.org/en/master/usage/configuration.html
+# This file adopts the theme and basic settings used by the Lightx2v docs
+# but keeps the llmc-specific information from the original configuration.
+# -----------------------------------------------------------------------------
-# -- Project information -----------------------------------------------------
-# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
+import os
+import sys
+from typing import List
+
+# -- Path setup --------------------------------------------------------------
+# Add project root (two levels up) so autodoc can find the modules.
+ROOT_DIR = os.path.abspath(os.path.join(__file__, "../../.."))
+sys.path.append(ROOT_DIR)
+# -- Project information -----------------------------------------------------
project = "llmc"
copyright = "2024, llmc contributors"
author = "ModelTC"
release = "1.0.0"
-github_url = f"https://github.com/ModelTC/llmc"
+# GitHub repository ----------------------------------------------------------
+github_url = "https://github.com/ModelTC/llmc"
html_context = {
"display_github": True,
@@ -20,50 +29,86 @@
"github_version": "main",
"conf_py_path": "/docs/en/source/", # Path in the checkout to the docs root
}
-html_theme_options = {
- "github_url": github_url,
- "doc_items": {
- "paper": "https://arxiv.org/abs/2405.06001",
- "institution": "https://github.com/ModelTC",
- },
- "logo": "images/logo/llmc.svg",
- "logo_dark": "images/logo/llmc.svg",
- "logo_icon": "images/logo/llmc.svg",
-}
-
# -- General configuration ---------------------------------------------------
-# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
extensions = [
- "myst_parser",
- "sphinx.ext.autodoc",
- "sphinx.ext.viewcode",
"sphinx.ext.napoleon",
- "sphinxcontrib.contentui",
+ "sphinx.ext.viewcode",
+ "sphinx.ext.intersphinx",
+ "sphinx.ext.autodoc",
+ "sphinx.ext.autosummary",
+ "myst_parser",
+ "sphinx_copybutton",
"sphinx.ext.doctest",
"sphinx.ext.mathjax",
"sphinx.ext.ifconfig",
- "sphinx-prompt",
- "sphinxcontrib.jquery",
- "sphinx.ext.autosectionlabel",
"sphinx.ext.githubpages",
- "sphinx.ext.intersphinx",
+ "sphinx.ext.autosectionlabel",
"sphinxcontrib.katex",
- "sphinx_copybutton",
+ "sphinxcontrib.contentui",
]
-templates_path = ["_templates"]
-exclude_patterns = []
+templates_path: List[str] = ["_templates"]
+exclude_patterns: List[str] = []
language = "en"
+# Exclude the prompt "$" when copying code blocks --------------------------
+copybutton_prompt_text = r"\$ "
+copybutton_prompt_is_regexp = True
+
# -- Options for HTML output -------------------------------------------------
-# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
+html_title = project
+html_theme = "sphinx_book_theme"
+html_logo = "images/logo/llmc.svg"
+html_static_path = ["_static"]
+
+# Theme options compatible with sphinx_book_theme / pydata-sphinx-theme
+html_theme_options = {
+ "path_to_docs": "docs/en/source",
+ "repository_url": github_url,
+ "use_repository_button": True,
+ "logo": {
+ "text": "LLMC",
+ "image_light": "images/logo/llmc.svg",
+ "image_dark": "images/logo/llmc.svg",
+ },
+ "doc_items": {
+ "paper": "https://arxiv.org/abs/2405.06001",
+ "institution": "https://github.com/ModelTC",
+ },
+}
+
+# -- Intersphinx mapping (optional) -----------------------------------------
+intersphinx_mapping = {
+ "python": ("https://docs.python.org/3", {}),
+ "sphinx": ("https://www.sphinx-doc.org/en/master", {}),
+}
+# -- Mock heavy external dependencies ---------------------------------------
+autodoc_mock_imports = [
+ "torch",
+ "transformers",
+ "sentencepiece",
+ "tensorizer",
+]
-html_theme = "trojanzoo_sphinx_theme"
+# Remove base-class note in generated docs ----------------------------------
+from sphinx.ext import autodoc # noqa: E402, isort: skip
-html_static_path = ["_static"]
+class MockedClassDocumenter(autodoc.ClassDocumenter):
+ """Remove note about base class when a class is derived from object."""
+
+ def add_line(self, line: str, source: str, *lineno: int) -> None:
+ if line == " Bases: :py:class:`object`":
+ return
+ super().add_line(line, source, *lineno)
+
+autodoc.ClassDocumenter = MockedClassDocumenter
+
+# -- Customisation hooks -----------------------------------------------------
-source_suffix = [".rst", ".md"]
+def setup(app):
+ """Optional Sphinx setup hooks."""
+ pass
diff --git a/docs/zh_cn/source/conf.py b/docs/zh_cn/source/conf.py
index 9b1ae0785..f6ef270a4 100644
--- a/docs/zh_cn/source/conf.py
+++ b/docs/zh_cn/source/conf.py
@@ -1,69 +1,110 @@
-# Configuration file for the Sphinx documentation builder.
-#
-# For the full list of built-in configuration values, see the documentation:
-# https://www.sphinx-doc.org/en/master/usage/configuration.html
+# Configuration file for the Sphinx documentation builder (中文文档).
+# -----------------------------------------------------------------------------
+# 参考 Lightx2v 样式,把原先 trojanzoo_sphinx_theme 改为 sphinx_book_theme,
+# 并修正 logo 配置格式。
-# -- Project information -----------------------------------------------------
-# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
+import os
+import sys
+from typing import List
+# -- Path setup --------------------------------------------------------------
+ROOT_DIR = os.path.abspath(os.path.join(__file__, "../../.."))
+sys.path.append(ROOT_DIR)
+
+# -- 项目信息 ---------------------------------------------------------------
project = "llmc"
copyright = "2024, llmc contributors"
author = "ModelTC"
release = "1.0.0"
-github_url = f"https://github.com/ModelTC/llmc"
+# GitHub 信息 ---------------------------------------------------------------
+github_url = "https://github.com/ModelTC/llmc"
html_context = {
"display_github": True,
"github_user": author,
"github_repo": "llmc",
"github_version": "main",
- "conf_py_path": "/docs/zh_cn/source/", # Path in the checkout to the docs root
-}
-html_theme_options = {
- "github_url": github_url,
- "doc_items": {
- "paper": "https://arxiv.org/abs/2405.06001",
- "institution": "https://github.com/ModelTC",
- },
- "logo": "images/logo/llmc.svg",
- "logo_dark": "images/logo/llmc.svg",
- "logo_icon": "images/logo/llmc.svg",
+ "conf_py_path": "/docs/zh_cn/source/", # 文档根路径
}
-
-# -- General configuration ---------------------------------------------------
-# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
-
+# -- 通用配置 ----------------------------------------------------------------
extensions = [
- "myst_parser",
- "sphinx.ext.autodoc",
- "sphinx.ext.viewcode",
"sphinx.ext.napoleon",
- "sphinxcontrib.contentui",
+ "sphinx.ext.viewcode",
+ "sphinx.ext.intersphinx",
+ "sphinx.ext.autodoc",
+ "sphinx.ext.autosummary",
+ "myst_parser",
+ "sphinx_copybutton",
"sphinx.ext.doctest",
"sphinx.ext.mathjax",
"sphinx.ext.ifconfig",
- "sphinx-prompt",
- "sphinxcontrib.jquery",
- "sphinx.ext.autosectionlabel",
"sphinx.ext.githubpages",
- "sphinx.ext.intersphinx",
+ "sphinx.ext.autosectionlabel",
"sphinxcontrib.katex",
- "sphinx_copybutton",
+ "sphinxcontrib.contentui",
]
-templates_path = ["_templates"]
-exclude_patterns = []
+templates_path: List[str] = ["_templates"]
+exclude_patterns: List[str] = []
+
+language = "zh_CN"
+
+# 复制代码块时去除shell提示符 ---------------------------------------------
+copybutton_prompt_text = r"\$ "
+copybutton_prompt_is_regexp = True
+
+# -- HTML 输出选项 -----------------------------------------------------------
+html_title = project
+html_theme = "sphinx_book_theme"
+html_logo = "images/logo/llmc.svg"
+html_static_path = ["_static"]
+
+html_theme_options = {
+ "path_to_docs": "docs/zh_cn/source",
+ "repository_url": github_url,
+ "use_repository_button": True,
+ "logo": {
+ "text": "LLMC",
+ "image_light": "images/logo/llmc.svg",
+ "image_dark": "images/logo/llmc.svg",
+ },
+ "doc_items": {
+ "paper": "https://arxiv.org/abs/2405.06001",
+ "institution": "https://github.com/ModelTC",
+ },
+}
+
+# -- Intersphinx -------------------------------------------------------------
+intersphinx_mapping = {
+ "python": ("https://docs.python.org/3", {}),
+ "sphinx": ("https://www.sphinx-doc.org/en/master", {}),
+}
-language = "cn"
+# -- Mock 外部依赖 -----------------------------------------------------------
+autodoc_mock_imports = [
+ "torch",
+ "transformers",
+ "sentencepiece",
+ "tensorizer",
+]
-# -- Options for HTML output -------------------------------------------------
-# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
+# -- 自定义处理 -------------------------------------------------------------
+from sphinx.ext import autodoc # noqa: E402, isort: skip
+class MockedClassDocumenter(autodoc.ClassDocumenter):
+ """移除“Bases: object”行。"""
-html_theme = "trojanzoo_sphinx_theme"
+ def add_line(self, line: str, source: str, *lineno: int) -> None:
+ if line == " Bases: :py:class:`object`":
+ return
+ super().add_line(line, source, *lineno)
-html_static_path = ["_static"]
+autodoc.ClassDocumenter = MockedClassDocumenter
+
+# -- 额外钩子 ---------------------------------------------------------------
-source_suffix = [".rst", ".md"]
+def setup(app):
+ """可选的 Sphinx setup。"""
+ pass
diff --git a/requirements/docs.txt b/requirements/docs.txt
index a15a4fc07..1c8eec42f 100644
--- a/requirements/docs.txt
+++ b/requirements/docs.txt
@@ -1,15 +1,7 @@
-docutils
-modelindex
-myst-parser
-sphinx
-sphinx-copybutton
-sphinx-design
-sphinx-notfound-page
-sphinx-tabs
-sphinxcontrib-jquery
-tabulate
-sphinxcontrib.contentui
--e git+https://github.com/ain-soph/trojanzoo_sphinx_theme.git#egg=trojanzoo_sphinx_theme
-sphinx-prompt
-sphinxcontrib-katex
-sphinx-copybutton
+sphinx == 6.2.1
+sphinx-book-theme == 1.0.1
+sphinx-copybutton == 0.5.2
+myst-parser == 2.0.0
+sphinx-argparse
+sphinxcontrib.redoc
+sphinxcontrib.openapi