diff --git a/README.md b/README.md index 3fe90a545..cc7d1377b 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,7 @@ -# LLMC: Towards Accurate and Efficient LLM Compression +
+

LLMC: Towards Accurate and Efficient LLM Compression

-llmc - -
+llmc [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![arXiv](https://img.shields.io/badge/LLMC-2405.06001-b31b1b)](https://arxiv.org/abs/2405.06001) @@ -11,7 +10,7 @@ [![Discord Banner](https://img.shields.io/discord/1139835312592392214?logo=discord&logoColor=white)](https://discord.com/invite/NfJzbkK3jY) [![QQ](https://img.shields.io/badge/QQ-EB1923?logo=tencent-qq&logoColor=white)](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592) [![Doc](https://img.shields.io/badge/docs-English-99cc2)](https://llmc-en.readthedocs.io/en/latest/) -[![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://llmc-zhcn.readthedocs.io/en/latest/) +[![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://llmc-zhcn.readthedocs.io/en/latest/)  **\[ English | [中文](README_zh.md) | [日本語](README_ja.md) \]** @@ -27,21 +26,15 @@ docker pull llmcompression/llmc:pure-latest docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-latest ``` -**Community**: - -- [Discord Server](https://discord.com/invite/NfJzbkK3jY) -- [Tencent QQ Group](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592) +**Community**: [Discord Server](https://discord.com/invite/NfJzbkK3jY), [Tencent QQ Group](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592). -**Docs**: +**Docs**: [English](https://llmc-en.readthedocs.io/en/latest/), [Chinese](https://llmc-zhcn.readthedocs.io/en/latest/). -- [English](https://llmc-en.readthedocs.io/en/latest/) -- [Chinese](https://llmc-zhcn.readthedocs.io/en/latest/) - -## Latest News +## :fire: Latest News - **May 12, 2025:** 🔥 We now fully support quantization for the **`Wan2.1`** series of video generation models and provide export of truly quantized **INT8/FP8** weights, compatible with the [lightx2v](https://github.com/ModelTC/lightx2v) inference framework. For details, please refer to the [lightx2v documentation](https://llmc-en.readthedocs.io/en/latest/backend/lightx2v.html). -- **Feb 7, 2025:** 🔥 We now fully support quantization of large-scale **`MOE`** models like **`DeepSeekv3`**, **`DeepSeek-R1`**, and **`DeepSeek-R1-zero`** with **`671B`** parameters. You can now directly load FP8 weights without any extra conversion. AWQ and RTN quantization can run on a single 80GB GPU, and we also support the export of true quantized **INT4/INT8** weights. +- **Feb 07, 2025:** 🔥 We now fully support quantization of large-scale **`MOE`** models like **`DeepSeekv3`**, **`DeepSeek-R1`**, and **`DeepSeek-R1-zero`** with **`671B`** parameters. You can now directly load FP8 weights without any extra conversion. AWQ and RTN quantization can run on a single 80GB GPU, and we also support the export of true quantized **INT4/INT8** weights. - **Nov 20, 2024:** 🔥 We now fully support the quantization of ✨`DeepSeekv2(2.5)` and other `MOE` models, as well as ✨`Qwen2VL`, `Llama3.2`, and other `VLM` models. Supported quantization methods include ✅integer quantization, ✅floating-point quantization, and advanced algorithms like ✅AWQ, ✅GPTQ, ✅SmoothQuant, and ✅Quarot. @@ -49,14 +42,17 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates - **Sep 26, 2024:** 🔥 We now support exporting 💥`FP8 quantized(E4M3, E5M2)` models from 🚀`LLMC` to advanced inference backends such as [VLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang). For detailed usage, please refer to the [VLLM documentation](https://llmc-en.readthedocs.io/en/latest/backend/vllm.html) and [SGLang documentation](https://llmc-en.readthedocs.io/en/latest/backend/sglang.html). +
+Previous News + - **Sep 24, 2024:** 🔥 We have officially released ✅INT4 and ✅INT8 models of ✨`Llama-3.1-405B`, quantized using 🚀`LLMC` in `save_lightllm` mode. You can download the model parameters [here](https://huggingface.co/Dongz/llama31-405b-quant). - **Sep 23, 2024:** 🔥 We now support exporting ✨`real quantized(INT4, INT8)` models from 🚀`LLMC` to advanced inference backends such as [VLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), and [MLC-LLM](https://github.com/mlc-ai/mlc-llm) for quantized inference deployment, enabling ✨`reduced memory usage` and ✨`faster inference speeds`. For detailed usage, please refer to the [VLLM documentation](https://llmc-en.readthedocs.io/en/latest/backend/vllm.html), [SGLang documentation](https://llmc-en.readthedocs.io/en/latest/backend/sglang.html), [AutoAWQ documentation](https://llmc-en.readthedocs.io/en/latest/backend/autoawq.html), and [MLC-LLM documentation](https://llmc-en.readthedocs.io/en/latest/backend/mlcllm.html). -- **Sep 9, 2024:** 🔥 We provide some configs of our best practice towards superior performance (see Best Practice [here](https://llmc-en.readthedocs.io/en/latest/)). +- **Sep 09, 2024:** 🔥 We provide some configs of our best practice towards superior performance (see Best Practice [here](https://llmc-en.readthedocs.io/en/latest/)). -* **Sep 3, 2024:** 🔥 We support [opencompass](https://github.com/open-compass/opencompass) 🤗 to eval 🚀`LLMC` model. Follow this [doc](https://llmc-en.readthedocs.io/en/latest/advanced/model_test_v2.html) and have a try! +* **Sep 03, 2024:** 🔥 We support [opencompass](https://github.com/open-compass/opencompass) 🤗 to eval 🚀`LLMC` model. Follow this [doc](https://llmc-en.readthedocs.io/en/latest/advanced/model_test_v2.html) and have a try! * **Aug 22, 2024:** 🔥We support lots of small language models, including current SOTA [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)(see [Supported Model List](#supported-model-list)). @@ -70,9 +66,6 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates (\* denotes equal contribution, 📧 denotes corresponding author.) -
-Previous News - - **Jul 16, 2024:** 🔥We support Wanda/Naive(Magnitude) for llm sparsification and layer-wise mix bits quantization now! - **Jul 14, 2024:** 🔥We support rotation based quantization QuaRot now! @@ -95,11 +88,11 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates on the calibration data, algorithm pipeline, and quantization configuration selection. Based on the takeaways, a best practice for the LLM PTQ pipeline is designed, to achieve the best accuracy and efficiency performance balance under various scenarios. -- **Mar 7, 2024:** 🚀 We release the quantization part of a powerful and efficient LLM compression tool. Notably, our benchmark paper is coming soon😊. +- **Mar 07, 2024:** 🚀 We release the quantization part of a powerful and efficient LLM compression tool. Notably, our benchmark paper is coming soon😊.
-## Highlight Feature +## 🚀 Highlight Feature - 💥**Comprehensive Algorithm Support**: Provides a broad range of ✨`SOTA compression algorithms`, including ✅quantization, ✅mixed-precision quantization, and ✅sparsity, while maintaining accuracy consistent with the original repositories. ✨`Quantization best practices` (see 🚀`Best Practices` [here](https://llmc-en.readthedocs.io/en/latest/)) are also available to ensure optimal performance and efficiency. @@ -111,175 +104,131 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates - 💥**Performance Efficiency**: Enables quantization of large LLMs, such as ✨`Llama3.1-405B` and ✨`DeepSeek-R1-671B`, with PPL evaluation on a `single A100/H100/H800 GPU`. -## Usage +## ⚙️ Usage Please refer to the 🚀`Quick Start` section in the [documentation](https://llmc-en.readthedocs.io/en/latest/). -## Supported Model List - -✅ [BLOOM](https://huggingface.co/bigscience/bloom) - -✅ [LLaMA](https://github.com/facebookresearch/llama) - -✅ [LLaMA V2](https://huggingface.co/meta-llama) - -✅ [StarCoder](https://github.com/bigcode-project/starcoder) - -✅ [OPT](https://huggingface.co/docs/transformers/model_doc/opt) - -✅ [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon) - -✅ [InternLM2](https://huggingface.co/internlm) - -✅ [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral) - -✅ [LLaMA V3](https://huggingface.co/meta-llama) - -✅ [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral) - -✅ [Qwen V2](https://github.com/QwenLM/Qwen2) - -✅ [LLaVA](https://github.com/haotian-liu/LLaVA) - -✅ [InternLM2.5](https://huggingface.co/internlm) - -✅ [StableLM](https://github.com/Stability-AI/StableLM) - -✅ [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2) - -✅ [Phi2](https://huggingface.co/microsoft/phi-2) - -✅ [Phi 1.5](https://huggingface.co/microsoft/phi-1_5) - -✅ [MiniCPM](https://github.com/OpenBMB/MiniCPM) - -✅ [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966) - -✅ [DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5) - -✅ [LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision) +## :robot: Supported Model List + +- ✅ [BLOOM](https://huggingface.co/bigscience/bloom) +- ✅ [LLaMA](https://github.com/facebookresearch/llama) +- ✅ [LLaMA V2](https://huggingface.co/meta-llama) +- ✅ [StarCoder](https://github.com/bigcode-project/starcoder) +- ✅ [OPT](https://huggingface.co/docs/transformers/model_doc/opt) + +
+More Supported Models  + +- ✅ [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon) +- ✅ [InternLM2](https://huggingface.co/internlm) +- ✅ [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral) +- ✅ [LLaMA V3](https://huggingface.co/meta-llama) +- ✅ [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral) +- ✅ [Qwen V2](https://github.com/QwenLM/Qwen2) +- ✅ [LLaVA](https://github.com/haotian-liu/LLaVA) +- ✅ [InternLM2.5](https://huggingface.co/internlm) +- ✅ [StableLM](https://github.com/Stability-AI/StableLM) +- ✅ [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2) +- ✅ [Phi2](https://huggingface.co/microsoft/phi-2) +- ✅ [Phi 1.5](https://huggingface.co/microsoft/phi-1_5) +- ✅ [MiniCPM](https://github.com/OpenBMB/MiniCPM) +- ✅ [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966) +- ✅ [DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5) +- ✅ [LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision) +- ✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B) +- ✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) +- ✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B) -✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B) - -✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) - -✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B) +
You can add your own model type referring to files under `llmc/models/*.py`. -## Supported Backend List - -✅ [VLLM](https://github.com/vllm-project/vllm) +## :bus: Supported Backend List -✅ [LightLLM](https://github.com/ModelTC/lightllm) +- ✅ [VLLM](https://github.com/vllm-project/vllm) +- ✅ [LightLLM](https://github.com/ModelTC/lightllm) +- ✅ [Sglang](https://github.com/sgl-project/sglang) +- ✅ [MLC-LLM](https://github.com/mlc-ai/mlc-llm) +- ✅ [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) -✅ [Sglang](https://github.com/sgl-project/sglang) - -✅ [MLC-LLM](https://github.com/mlc-ai/mlc-llm) - -✅ [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - -## Supported Algorithm List +## 💡 Supported Algorithm List ### Quantization -✅ Naive - -✅ [AWQ](https://arxiv.org/abs/2306.00978) - -✅ [GPTQ](https://arxiv.org/abs/2210.17323) - -✅ [SmoothQuant](https://arxiv.org/abs/2211.10438) - -✅ [OS+](https://arxiv.org/abs/2304.09145) - -✅ [OmniQuant](https://arxiv.org/abs/2308.13137) - -✅ [NormTweaking](https://arxiv.org/abs/2309.02784) - -✅ [AdaDim](https://arxiv.org/pdf/2309.15531.pdf) - -✅ [QUIK](https://arxiv.org/abs/2310.09259) +- ✅ Naive +- ✅ [AWQ](https://arxiv.org/abs/2306.00978) +- ✅ [GPTQ](https://arxiv.org/abs/2210.17323) +- ✅ [SmoothQuant](https://arxiv.org/abs/2211.10438) +- ✅ [OS+](https://arxiv.org/abs/2304.09145) + +
+More Supported Algorithms  + +- ✅ [OmniQuant](https://arxiv.org/abs/2308.13137) +- ✅ [NormTweaking](https://arxiv.org/abs/2309.02784) +- ✅ [AdaDim](https://arxiv.org/pdf/2309.15531.pdf) +- ✅ [QUIK](https://arxiv.org/abs/2310.09259) +- ✅ [SpQR](https://arxiv.org/abs/2306.03078) +- ✅ [DGQ](https://arxiv.org/abs/2310.04836) +- ✅ [OWQ](https://arxiv.org/abs/2306.02272) +- ✅ [LLM.int8()](https://arxiv.org/abs/2208.07339) +- ✅ [HQQ](https://mobiusml.github.io/hqq_blog/) +- ✅ [QuaRot](https://arxiv.org/abs/2404.00456) +- ✅ [SpinQuant](https://arxiv.org/abs/2405.16406) **([See this branch](https://github.com/ModelTC/llmc/tree/dev_spinquant))** +- ✅ [TesseraQ](https://arxiv.org/abs/2410.19103) -✅ [SpQR](https://arxiv.org/abs/2306.03078) - -✅ [DGQ](https://arxiv.org/abs/2310.04836) - -✅ [OWQ](https://arxiv.org/abs/2306.02272) - -✅ [LLM.int8()](https://arxiv.org/abs/2208.07339) - -✅ [HQQ](https://mobiusml.github.io/hqq_blog/) - -✅ [QuaRot](https://arxiv.org/abs/2404.00456) - -✅ [SpinQuant](https://arxiv.org/abs/2405.16406) **([See this branch](https://github.com/ModelTC/llmc/tree/dev_spinquant))** - -✅ [TesseraQ](https://arxiv.org/abs/2410.19103) +
### Pruning -✅ Naive(Magnitude) +- ✅ Naive(Magnitude) +- ✅ [Wanda](https://arxiv.org/abs/2306.11695) +- ✅ [ShortGPT](https://arxiv.org/abs/2403.03853) -✅ [Wanda](https://arxiv.org/abs/2306.11695) +## 🤝 Acknowledgments -✅ [ShortGPT](https://arxiv.org/abs/2403.03853) +We develop our code referring to the following repos: -## Acknowledgments +- [mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq) +- [mit-han-lab/smoothquant](https://github.com/mit-han-lab/smoothquant) +- [OpenGVLab/OmniQuant](https://github.com/OpenGVLab/OmniQuant) +- [IST-DASLab/gptq](https://github.com/IST-DASLab/gptq) +- [ModelTC/Outlier_Suppression_Plus](https://github.com/ModelTC/Outlier_Suppression_Plus) + +
+More Related Implementations  + +- [IST-DASLab/QUIK](https://github.com/IST-DASLab/QUIK) +- [Vahe1994/SpQR](https://github.com/Vahe1994/SpQR) +- [ilur98/DGQ](https://github.com/ilur98/DGQ) +- [xvyaward/owq](https://github.com/xvyaward/owq) +- [TimDettmers/bitsandbytes](https://github.com/TimDettmers/bitsandbytes) +- [mobiusml/hqq](https://github.com/mobiusml/hqq) +- [spcl/QuaRot](https://github.com/spcl/QuaRot) +- [locuslab/wanda](https://github.com/locuslab/wanda) +- [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) +- [facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant) +- [Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ) -We develop our code referring to the following repos: +
-- https://github.com/mit-han-lab/llm-awq -- https://github.com/mit-han-lab/smoothquant -- https://github.com/OpenGVLab/OmniQuant -- https://github.com/IST-DASLab/gptq -- https://github.com/ModelTC/Outlier_Suppression_Plus -- https://github.com/IST-DASLab/QUIK -- https://github.com/Vahe1994/SpQR -- https://github.com/ilur98/DGQ -- https://github.com/xvyaward/owq -- https://github.com/TimDettmers/bitsandbytes -- https://github.com/mobiusml/hqq -- [https://github.com/spcl/QuaRot](https://github.com/spcl/QuaRot) -- [https://github.com/locuslab/wanda](https://github.com/locuslab/wanda) -- [https://github.com/EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) -- [https://github.com/facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant) -- [https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ) - -## Star History +## 🌟 Star History [![Star History Chart](https://api.star-history.com/svg?repos=ModelTC/llmc&type=Timeline)](https://star-history.com/#ModelTC/llmc&Timeline) -## Citation +## ✏️ Citation -If you find our LLM-QBench paper/llmc toolkit useful or relevant to your research, please kindly cite our paper: +If you find our toolkit or research paper useful or relevant to your research, please kindly cite our work: ``` -@misc{llmc, - author = {llmc contributors}, - title = {llmc: Towards Accurate and Efficient LLM Compression}, - year = {2024}, - publisher = {GitHub}, - journal = {GitHub repository}, - howpublished = {\url{https://github.com/ModelTC/llmc}}, -} - -@misc{gong2024llmqbench, - title={LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models}, - author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Yunchen Zhang and Xianglong Liu and Dacheng Tao}, - year={2024}, - eprint={2405.06001}, - archivePrefix={arXiv}, - primaryClass={cs.LG} -} - -@misc{gong2024llmcbenchmarkinglargelanguage, - title={LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit}, - author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chentao Lv and Yunchen Zhang and Xianglong Liu and Dacheng Tao}, - year={2024}, - eprint={2405.06001}, - archivePrefix={arXiv}, - primaryClass={cs.LG}, - url={https://arxiv.org/abs/2405.06001}, +@inproceedings{DBLP:conf/emnlp/GongYGHLZT024, + author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chengtao Lv and Yunchen Zhang and Dacheng Tao and Xianglong Liu}, + title={LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit}, + year={2024}, + cdate={1704067200000}, + pages={132-152}, + url={https://aclanthology.org/2024.emnlp-industry.12}, + booktitle={EMNLP (Industry Track)}, + crossref={conf/emnlp/2024i} } ``` diff --git a/README_ja.md b/README_ja.md index 6dead79f1..064079c58 100644 --- a/README_ja.md +++ b/README_ja.md @@ -1,8 +1,7 @@ -# LLMC: 正確で効率的なLLM圧縮に向けて +
+

LLMC: 正確で効率的な LLM 圧縮に向けて

-llmc - -
+llmc [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![arXiv](https://img.shields.io/badge/LLMC-2405.06001-b31b1b)](https://arxiv.org/abs/2405.06001) @@ -11,7 +10,7 @@ [![Discord Banner](https://img.shields.io/discord/1139835312592392214?logo=discord&logoColor=white)](https://discord.com/invite/NfJzbkK3jY) [![QQ](https://img.shields.io/badge/QQ-EB1923?logo=tencent-qq&logoColor=white)](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592) [![Doc](https://img.shields.io/badge/docs-English-99cc2)](https://llmc-en.readthedocs.io/en/latest/) -[![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://llmc-zhcn.readthedocs.io/en/latest/) +[![Doc](https://img.shields.io/badge/ドキュメント-日本語-99cc2)](https://llmc-zhcn.readthedocs.io/en/latest/)  **\[ [English](README.md) | [中文](README_zh.md) | 日本語 \]** @@ -20,24 +19,18 @@ **LLMC** は、大規模言語モデル(LLM)の圧縮を目的とした、最新の圧縮アルゴリズムを活用して、パフォーマンスを損なうことなく効率を向上させ、モデルサイズを削減するためのツールです。以下のコマンドを使用して、llmcを実行できるDockerイメージをダウンロードできます。中国大陸のユーザーは、阿里云Dockerを使用することを推奨します。 ```shell -# docker hub: https://hub.docker.com/r/llmcompression/llmc +# Docker Hub: https://hub.docker.com/r/llmcompression/llmc docker pull llmcompression/llmc:pure-latest -# 阿里云Docker: registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:[tag] +# Aliyun Docker: registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:[tag] docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-latest ``` -**コミュニティ**: - -- [Discordサーバー](https://discord.com/invite/NfJzbkK3jY) -- [Tencent QQグループ](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592) +**コミュニティ**: [Discord サーバー](https://discord.com/invite/NfJzbkK3jY)、[Tencent QQ グループ](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592)。 -**ドキュメント**: +**ドキュメント**: [English](https://llmc-en.readthedocs.io/en/latest/)、[中文](https://llmc-zhcn.readthedocs.io/en/latest/)。 -- [英語](https://llmc-en.readthedocs.io/en/latest/) -- [中国語](https://llmc-zhcn.readthedocs.io/en/latest/) - -## 最新情報 +## :fire: 最新ニュース - **2025年5月12日:** 🔥 **`Wan2.1`** シリーズのビデオ生成モデルの量子化を完全にサポートし、実際に量子化された **INT8/FP8** 重みのエクスポートにも対応しました。これらは [lightx2v](https://github.com/ModelTC/lightx2v) 推論フレームワークと互換性があります。詳細は [lightx2v ドキュメント](https://llmc-en.readthedocs.io/en/latest/backend/lightx2v.html) をご参照ください。 @@ -49,6 +42,9 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates - **2024年9月26日:** 🔥 `LLMC`からの✨ `FP8量子化(E4M3、E5M2)`モデルを、VLLMやSGLangのような高度な推理バックエンドにエクスポートできるようになりました。🚀 詳細な使用方法については、[VLLMのドキュメント](https://llmc-en.readthedocs.io/en/latest/backend/vllm.html)と[SGLangのドキュメント](https://llmc-en.readthedocs.io/en/latest/backend/sglang.html)を参照してください。 +
+以前のニュース + - **2024年9月24日:** 🔥 私たちは正式に ✨`Llama-3.1-405B` の ✅INT4 と ✅INT8 モデルをリリースしました。これらは 🚀`LLMC` の `save_lightllm` モードを使用して量子化されています。モデルパラメータは[こちら](https://huggingface.co/Dongz/llama31-405b-quant)からダウンロードできます。 - **2024年9月23日:** 🔥 私たちは、🚀`LLMC` から ✨`実際の量子化された(INT4, INT8)` モデルを、 [VLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), [MLC-LLM](https://github.com/mlc-ai/mlc-llm) などの高度な推論バックエンドにエクスポートするサポートを追加しました。これにより、✨`メモリ使用量の削減` と ✨`推論速度の向上` が可能になります。 @@ -70,9 +66,6 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates (\*は同等の貢献を示し、📧は対応する著者を示します。) -
-過去のニュース - - **2024年7月16日:** 🔥私たちはLLMの疎化のためのWanda/Naive(マグニチュード)および層ごとの混合ビット量子化のサポートを追加しました! - **2024年7月14日:** 🔥私たちは回転ベースの量子化 QuaRot のサポートを追加しました! @@ -97,7 +90,7 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
-## 主要機能 +## 🚀 特徴 - 💥**包括的なアルゴリズムサポート**: 広範な ✨`SOTA圧縮アルゴリズム` をサポートし、✅量子化、✅混合精度量子化、✅疎性を含み、元のリポジトリと同じ精度を維持します。✨`量子化ベストプラクティス`(ベストプラクティスは[こちら](https://llmc-en.readthedocs.io/en/latest/)をご覧ください)も提供されており、最適なパフォーマンスと効率を確保します。 @@ -109,175 +102,129 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates - 💥**パフォーマンス効率**: ✨`Llama3.1-405B` や ✨`DeepSeek-R1-671B` などの大規模LLMの量子化をサポートし、`単一の A100/H100/H800 GPU` でPPL評価を可能にします。 -## 使用方法 +## ⚙️ 使い方 使用ガイドは 🚀`Quick Start`セクション[こちら](https://llmc-en.readthedocs.io/en/latest/)をご覧ください。 -## サポートされているモデルリスト - -✅ [BLOOM](https://huggingface.co/bigscience/bloom) - -✅ [LLaMA](https://github.com/facebookresearch/llama) - -✅ [LLaMA V2](https://huggingface.co/meta-llama) - -✅ [StarCoder](https://github.com/bigcode-project/starcoder) - -✅ [OPT](https://huggingface.co/docs/transformers/model_doc/opt) - -✅ [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon) - -✅ [InternLM2](https://huggingface.co/internlm) - -✅ [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral) - -✅ [LLaMA V3](https://huggingface.co/meta-llama) - -✅ [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral) - -✅ [Qwen V2](https://github.com/QwenLM/Qwen2) - -✅ [LLaVA](https://github.com/haotian-liu/LLaVA) - -✅ [InternLM2.5](https://huggingface.co/internlm) - -✅ [StableLM](https://github.com/Stability-AI/StableLM) - -✅ [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2) - -✅ [Phi2](https://huggingface.co/microsoft/phi-2) - -✅ [Phi 1.5](https://huggingface.co/microsoft/phi-1_5) - -✅ [MiniCPM](https://github.com/OpenBMB/MiniCPM) - -✅ [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966) - -✅ [DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5) - -✅ [LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision) +## :robot: 対応モデル + +- ✅ [BLOOM](https://huggingface.co/bigscience/bloom) +- ✅ [LLaMA](https://github.com/facebookresearch/llama) +- ✅ [LLaMA V2](https://huggingface.co/meta-llama) +- ✅ [StarCoder](https://github.com/bigcode-project/starcoder) +- ✅ [OPT](https://huggingface.co/docs/transformers/model_doc/opt) + +
+その他のモデル + +- ✅ [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon) +- ✅ [InternLM2](https://huggingface.co/internlm) +- ✅ [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral) +- ✅ [LLaMA V3](https://huggingface.co/meta-llama) +- ✅ [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral) +- ✅ [Qwen V2](https://github.com/QwenLM/Qwen2) +- ✅ [LLaVA](https://github.com/haotian-liu/LLaVA) +- ✅ [InternLM2.5](https://huggingface.co/internlm) +- ✅ [StableLM](https://github.com/Stability-AI/StableLM) +- ✅ [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2) +- ✅ [Phi2](https://huggingface.co/microsoft/phi-2) +- ✅ [Phi 1.5](https://huggingface.co/microsoft/phi-1_5) +- ✅ [MiniCPM](https://github.com/OpenBMB/MiniCPM) +- ✅ [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966) +- ✅ [DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5) +- ✅ [LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision) +- ✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B) +- ✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) +- ✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B) -✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B) - -✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) - -✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B) - -独自のモデルタイプを追加するには、`llmc/models/*.py` ファイルを参照してください。 - -## サポートされているバックエンドリスト - -✅ [VLLM](https://github.com/vllm-project/vllm) - -✅ [LightLLM](https://github.com/ModelTC/lightllm) +
-✅ [Sglang](https://github.com/sgl-project/sglang) +独自モデルを追加する場合は `llmc/models/*.py` を参照してください。 -✅ [MLC-LLM](https://github.com/mlc-ai/mlc-llm) +## :bus: 対応バックエンド -✅ [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) +- ✅ [VLLM](https://github.com/vllm-project/vllm) +- ✅ [LightLLM](https://github.com/ModelTC/lightllm) +- ✅ [Sglang](https://github.com/sgl-project/sglang) +- ✅ [MLC-LLM](https://github.com/mlc-ai/mlc-llm) +- ✅ [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) -## サポートされているアルゴリズムリスト +## 💡 対応アルゴリズム ### 量子化 -✅ Naive - -✅ [AWQ](https://arxiv.org/abs/2306.00978) - -✅ [GPTQ](https://arxiv.org/abs/2210.17323) - -✅ [SmoothQuant](https://arxiv.org/abs/2211.10438) - -✅ [OS+](https://arxiv.org/abs/2304.09145) - -✅ [OmniQuant](https://arxiv.org/abs/2308.13137) - -✅ [NormTweaking](https://arxiv.org/abs/2309.02784) - -✅ [AdaDim](https://arxiv.org/pdf/2309.15531.pdf) +- ✅ Naive +- ✅ [AWQ](https://arxiv.org/abs/2306.00978) +- ✅ [GPTQ](https://arxiv.org/abs/2210.17323) +- ✅ [SmoothQuant](https://arxiv.org/abs/2211.10438) +- ✅ [OS+](https://arxiv.org/abs/2304.09145) + +
+その他のアルゴリズム + +- ✅ [OmniQuant](https://arxiv.org/abs/2308.13137) +- ✅ [NormTweaking](https://arxiv.org/abs/2309.02784) +- ✅ [AdaDim](https://arxiv.org/pdf/2309.15531.pdf) +- ✅ [QUIK](https://arxiv.org/abs/2310.09259) +- ✅ [SpQR](https://arxiv.org/abs/2306.03078) +- ✅ [DGQ](https://arxiv.org/abs/2310.04836) +- ✅ [OWQ](https://arxiv.org/abs/2306.02272) +- ✅ [LLM.int8()](https://arxiv.org/abs/2208.07339) +- ✅ [HQQ](https://mobiusml.github.io/hqq_blog/) +- ✅ [QuaRot](https://arxiv.org/abs/2404.00456) +- ✅ [SpinQuant](https://arxiv.org/abs/2405.16406) **([このブランチを参照してください](https://github.com/ModelTC/llmc/tree/dev_spinquant))** +- ✅ [TesseraQ](https://arxiv.org/abs/2410.19103) -✅ [QUIK](https://arxiv.org/abs/2310.09259) - -✅ [SpQR](https://arxiv.org/abs/2306.03078) - -✅ [DGQ](https://arxiv.org/abs/2310.04836) - -✅ [OWQ](https://arxiv.org/abs/2306.02272) - -✅ [LLM.int8()](https://arxiv.org/abs/2208.07339) - -✅ [HQQ](https://mobiusml.github.io/hqq_blog/) - -✅ [QuaRot](https://arxiv.org/abs/2404.00456) - -✅ [SpinQuant](https://arxiv.org/abs/2405.16406) **([このブランチを参照してください](https://github.com/ModelTC/llmc/tree/dev_spinquant))** +
-✅ [TesseraQ](https://arxiv.org/abs/2410.19103) +### プルーニング -### プルーニング(剪定) +- ✅ Naive(Magnitude) +- ✅ [Wanda](https://arxiv.org/abs/2306.11695) +- ✅ [ShortGPT](https://arxiv.org/abs/2403.03853) -✅ Naive(マグニチュード) +## 🤝 謝辞 -✅ [Wanda](https://arxiv.org/abs/2306.11695) +本プロジェクトは以下のリポジトリを参考にしています: -✅ [ShortGPT](https://arxiv.org/abs/2403.03853) +- [mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq) +- [mit-han-lab/smoothquant](https://github.com/mit-han-lab/smoothquant) +- [OpenGVLab/OmniQuant](https://github.com/OpenGVLab/OmniQuant) +- [IST-DASLab/gptq](https://github.com/IST-DASLab/gptq) +- [ModelTC/Outlier_Suppression_Plus](https://github.com/ModelTC/Outlier_Suppression_Plus) -## 謝辞 +
+その他の実装 -以下のリポジトリを参考にしてコードを開発しました: +- [IST-DASLab/QUIK](https://github.com/IST-DASLab/QUIK) +- [Vahe1994/SpQR](https://github.com/Vahe1994/SpQR) +- [ilur98/DGQ](https://github.com/ilur98/DGQ) +- [xvyaward/owq](https://github.com/xvyaward/owq) +- [TimDettmers/bitsandbytes](https://github.com/TimDettmers/bitsandbytes) +- [mobiusml/hqq](https://github.com/mobiusml/hqq) +- [spcl/QuaRot](https://github.com/spcl/QuaRot) +- [locuslab/wanda](https://github.com/locuslab/wanda) +- [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) +- [facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant) +- [Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ) -- https://github.com/mit-han-lab/llm-awq -- https://github.com/mit-han-lab/smoothquant -- https://github.com/OpenGVLab/OmniQuant -- https://github.com/IST-DASLab/gptq -- https://github.com/ModelTC/Outlier_Suppression_Plus -- https://github.com/IST-DASLab/QUIK -- https://github.com/Vahe1994/SpQR -- https://github.com/ilur98/DGQ -- https://github.com/xvyaward/owq -- https://github.com/TimDettmers/bitsandbytes -- https://github.com/mobiusml/hqq -- [https://github.com/spcl/QuaRot](https://github.com/spcl/QuaRot) -- [https://github.com/locuslab/wanda](https://github.com/locuslab/wanda) -- [https://github.com/EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) -- [https://github.com/facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant) -- [https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ) +
-## スター履歴 +## 🌟 Star 履歴 -[![スター履歴チャート](https://api.star-history.com/svg?repos=ModelTC/llmc&type=Timeline)](https://star-history.com/#ModelTC/llmc&Timeline) +[![Star History Chart](https://api.star-history.com/svg?repos=ModelTC/llmc&type=Timeline)](https://star-history.com/#ModelTC/llmc&Timeline) -## 引用 +## ✏️ 引用 -LLM-QBench論文/llmcツールキットが研究に役立つまたは関連している場合は、論文を引用してください: +本ツールキットまたは論文が参考になった場合は、以下を引用してください: ``` -@misc{llmc, - author = {llmc contributors}, - title = {llmc: Towards Accurate and Efficient LLM Compression}, - year = {2024}, - publisher = {GitHub}, - journal = {GitHub repository}, - howpublished = {\url{https://github.com/ModelTC/llmc}}, -} - -@misc{gong2024llmqbench, - title={LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models}, - author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Yunchen Zhang and Xianglong Liu and Dacheng Tao}, - year={2024}, - eprint={2405.06001}, - archivePrefix={arXiv}, - primaryClass={cs.LG} -} - -@misc{gong2024llmcbenchmarkinglargelanguage, - title={LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit}, - author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chentao Lv and Yunchen Zhang and Xianglong Liu and Dacheng Tao}, - year={2024}, - eprint={2405.06001}, - archivePrefix={arXiv}, - primaryClass={cs.LG}, - url={https://arxiv.org/abs/2405.06001}, +@inproceedings{DBLP:conf/emnlp/GongYGHLZT024, + author = {Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chengtao Lv and Yunchen Zhang and Dacheng Tao and Xianglong Liu}, + title = {LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit}, + booktitle = {EMNLP (Industry Track)}, + year = {2024}, + pages = {132--152}, + url = {https://aclanthology.org/2024.emnlp-industry.12} } ``` diff --git a/README_zh.md b/README_zh.md index ae2b3e5f6..9699fe275 100644 --- a/README_zh.md +++ b/README_zh.md @@ -1,8 +1,7 @@ -# LLMC: 准确高效的LLM压缩工具 +
+

LLMC:迈向准确且高效的大语言模型压缩

-llmc - -
+llmc [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![arXiv](https://img.shields.io/badge/LLMC-2405.06001-b31b1b)](https://arxiv.org/abs/2405.06001) @@ -11,7 +10,7 @@ [![Discord Banner](https://img.shields.io/discord/1139835312592392214?logo=discord&logoColor=white)](https://discord.com/invite/NfJzbkK3jY) [![QQ](https://img.shields.io/badge/QQ-EB1923?logo=tencent-qq&logoColor=white)](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592) [![Doc](https://img.shields.io/badge/docs-English-99cc2)](https://llmc-en.readthedocs.io/en/latest/) -[![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://llmc-zhcn.readthedocs.io/en/latest/) +[![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://llmc-zhcn.readthedocs.io/en/latest/)  **\[ [English](README.md) | 中文 | [日本語](README_ja.md) \]** @@ -20,24 +19,18 @@ **LLMC** 是一个开箱即用的工具,专为压缩LLM设计,利用最先进的压缩算法提高效率并减少模型体积,同时不影响预测精度。你可以通过以下命令下载可以运行llmc的docker镜像,中国大陆用户推荐使用阿里云docker。 ```shell -# docker hub: https://hub.docker.com/r/llmcompression/llmc +# Docker Hub: https://hub.docker.com/r/llmcompression/llmc docker pull llmcompression/llmc:pure-latest -# 阿里云docker: registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:[tag] +# 阿里云镜像: registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:[tag] docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-latest ``` -**社区**: - -- [Discord 服务器](https://discord.com/invite/NfJzbkK3jY) -- [腾讯QQ群](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592) +**社区**: [Discord 服务器](https://discord.com/invite/NfJzbkK3jY)、[腾讯 QQ 群](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592)。 -**文档**: +**文档**: [English](https://llmc-en.readthedocs.io/en/latest/)、[中文](https://llmc-zhcn.readthedocs.io/en/latest/)。 -- [英文](https://llmc-en.readthedocs.io/en/latest/) -- [中文](https://llmc-zhcn.readthedocs.io/en/latest/) - -## 最新消息 +## :fire: 最新动态 - **2025年5月12日:** 🔥 我们现已全面支持 **`Wan2.1`** 系列视频生成模型的量化,并支持导出真实量化的 **INT8/FP8** 权重,兼容 [lightx2v](https://github.com/ModelTC/lightx2v) 推理框架。详情请参考 [lightx2v 使用文档](https://llmc-zhcn.readthedocs.io/en/latest/backend/lightx2v.html)。 @@ -49,6 +42,9 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates - **2024年9月26日:** 🔥 我们现在支持从🚀 `LLMC`导出💥 `FP8 量化(E4M3,E5M2)`模型到一些先进的推理后端,例如[VLLM](https://github.com/vllm-project/vllm)和[SGLang](https://github.com/sgl-project/sglang)。关于详细使用方法,请参阅[VLLM文档](https://llmc-zhcn.readthedocs.io/en/latest/backend/vllm.html)和[SGLang文档](https://llmc-zhcn.readthedocs.io/en/latest/backend/sglang.html)。 +
+更早动态 + - **2024年9月24日:** 🔥 我们正式发布了 ✨`Llama-3.1-405B` 的 ✅INT4 和 ✅INT8 模型,这些模型通过 🚀`LLMC` 使用 `save_lightllm` 模式进行量化。你可以在[此处](https://huggingface.co/Dongz/llama31-405b-quant)下载模型参数。 - **2024年9月23日:** 🔥 我们现在支持从 🚀`LLMC` 导出 ✨`真正量化的(INT4, INT8)` 模型到先进推理后端,例如 [VLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), 和 [MLC-LLM](https://github.com/mlc-ai/mlc-llm) 用于量化推理部署,从而实现 ✨`减少内存使用` 和 ✨`加快推理速度`。 @@ -70,9 +66,6 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates (\* 表示同等贡献,📧 表示通讯作者。) -
-历史消息 - - **2024年7月16日:** 🔥我们现在支持 Wanda/Naive(幅度)进行 LLM 稀疏化和逐层混合比特量化! - **2024年7月14日:** 🔥我们现在支持基于旋转的量化 QuaRot! @@ -97,7 +90,7 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
-## 亮点功能 +## 🚀 亮点功能 - 💥**综合算法支持**: 提供广泛的 ✨`SOTA压缩算法` 支持,包括 ✅量化、✅混合精度量化 和 ✅稀疏化,同时保持与原始仓库一致的精度。我们还提供 ✨`量化最佳实践`(参见✨`最佳实践` 章节[此处](https://llmc-zhcn.readthedocs.io/en/latest/)),确保最佳性能和效率。 @@ -109,177 +102,129 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates - 💥**性能效率**: 支持大规模LLM的量化,例如 ✨`Llama3.1-405B` 和 ✨`DeepSeek-R1-671B`,并可在 `单个 A100/H100/H800 GPU` 上评估 PPL。 -## 使用指南 +## ⚙️ 快速上手 请参阅 🚀`快速入门`章节[此处](https://llmc-zhcn.readthedocs.io/en/latest/)。 -## 支持的模型列表 - -✅ [BLOOM](https://huggingface.co/bigscience/bloom) - -✅ [LLaMA](https://github.com/facebookresearch/llama) - -✅ [LLaMA V2](https://huggingface.co/meta-llama) - -✅ [StarCoder](https://github.com/bigcode-project/starcoder) - -✅ [OPT](https://huggingface.co/docs/transformers/model_doc/opt) - -✅ [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon) - -✅ [InternLM2](https://huggingface.co/internlm) - -✅ [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral) - -✅ [LLaMA V3](https://huggingface.co/meta-llama) - -✅ [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral) - -✅ [Qwen V2](https://github.com/QwenLM/Qwen2) - -✅ [LLaVA](https://github.com/haotian-liu/LLaVA) - -✅ [InternLM2.5](https://huggingface.co/internlm) - -✅ [StableLM](https://github.com/Stability-AI/StableLM) - -✅ [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2) - -✅ [Phi2](https://huggingface.co/microsoft/phi-2) - -✅ [Phi 1.5](https://huggingface.co/microsoft/phi-1_5) - -✅ [MiniCPM](https://github.com/OpenBMB/MiniCPM) - -✅ [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966) - -✅ [DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5) - -✅ [LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision) - -✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B) +## :robot: 支持的模型 + +- ✅ [BLOOM](https://huggingface.co/bigscience/bloom) +- ✅ [LLaMA](https://github.com/facebookresearch/llama) +- ✅ [LLaMA V2](https://huggingface.co/meta-llama) +- ✅ [StarCoder](https://github.com/bigcode-project/starcoder) +- ✅ [OPT](https://huggingface.co/docs/transformers/model_doc/opt) + +
+更多模型 + +- ✅ [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon) +- ✅ [InternLM2](https://huggingface.co/internlm) +- ✅ [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral) +- ✅ [LLaMA V3](https://huggingface.co/meta-llama) +- ✅ [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral) +- ✅ [Qwen V2](https://github.com/QwenLM/Qwen2) +- ✅ [LLaVA](https://github.com/haotian-liu/LLaVA) +- ✅ [InternLM2.5](https://huggingface.co/internlm) +- ✅ [StableLM](https://github.com/Stability-AI/StableLM) +- ✅ [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2) +- ✅ [Phi2](https://huggingface.co/microsoft/phi-2) +- ✅ [Phi 1.5](https://huggingface.co/microsoft/phi-1_5) +- ✅ [MiniCPM](https://github.com/OpenBMB/MiniCPM) +- ✅ [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966) +- ✅ [DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5) +- ✅ [LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision) +- ✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B) +- ✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) +- ✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B) -✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) - -✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B) - -你可以参考 `llmc/models/*.py` 文件添加自己的模型类型。 - -## 支持的后端列表 - -✅ [VLLM](https://github.com/vllm-project/vllm) - -✅ [LightLLM](https://github.com/ModelTC/lightllm) +
-✅ [Sglang](https://github.com/sgl-project/sglang) +您可参考 `llmc/models/*.py` 添加自定义模型。 -✅ [MLC-LLM](https://github.com/mlc-ai/mlc-llm) +## :bus: 支持的后端 -✅ [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) +- ✅ [VLLM](https://github.com/vllm-project/vllm) +- ✅ [LightLLM](https://github.com/ModelTC/lightllm) +- ✅ [Sglang](https://github.com/sgl-project/sglang) +- ✅ [MLC-LLM](https://github.com/mlc-ai/mlc-llm) +- ✅ [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) -## 支持的算法列表 +## 💡 支持的算法 ### 量化 -✅ Naive - -✅ [AWQ](https://arxiv.org/abs/2306.00978) - -✅ [GPTQ](https://arxiv.org/abs/2210.17323) - -✅ [SmoothQuant](https://arxiv.org/abs/2211.10438) - -✅ [OS+](https://arxiv.org/abs/2304.09145) - -✅ [OmniQuant](https://arxiv.org/abs/2308.13137) - -✅ [NormTweaking](https://arxiv.org/abs/2309.02784) - -✅ [AdaDim](https://arxiv.org/pdf/2309.15531.pdf) - -✅ [QUIK](https://arxiv.org/abs/2310.09259) - -✅ [SpQR](https://arxiv.org/abs/2306.03078) +- ✅ Naive +- ✅ [AWQ](https://arxiv.org/abs/2306.00978) +- ✅ [GPTQ](https://arxiv.org/abs/2210.17323) +- ✅ [SmoothQuant](https://arxiv.org/abs/2211.10438) +- ✅ [OS+](https://arxiv.org/abs/2304.09145) + +
+更多算法 + +- ✅ [OmniQuant](https://arxiv.org/abs/2308.13137) +- ✅ [NormTweaking](https://arxiv.org/abs/2309.02784) +- ✅ [AdaDim](https://arxiv.org/pdf/2309.15531.pdf) +- ✅ [QUIK](https://arxiv.org/abs/2310.09259) +- ✅ [SpQR](https://arxiv.org/abs/2306.03078) +- ✅ [DGQ](https://arxiv.org/abs/2310.04836) +- ✅ [OWQ](https://arxiv.org/abs/2306.02272) +- ✅ [LLM.int8()](https://arxiv.org/abs/2208.07339) +- ✅ [HQQ](https://mobiusml.github.io/hqq_blog/) +- ✅ [QuaRot](https://arxiv.org/abs/2404.00456) +- ✅ [SpinQuant](https://arxiv.org/abs/2405.16406) **([见此分支](https://github.com/ModelTC/llmc/tree/dev_spinquant))** +- ✅ [TesseraQ](https://arxiv.org/abs/2410.19103) -✅ [DGQ](https://arxiv.org/abs/2310.04836) - -✅ [OWQ](https://arxiv.org/abs/2306.02272) - -✅ [LLM.int8()](https://arxiv.org/abs/2208.07339) - -✅ [HQQ](https://mobiusml.github.io/hqq_blog/) - -✅ [QuaRot](https://arxiv.org/abs/2404.00456) - -✅ [SpinQuant](https://arxiv.org/abs/2405.16406) **([见此分支](https://github.com/ModelTC/llmc/tree/dev_spinquant))** - -✅ [TesseraQ](https://arxiv.org/abs/2410.19103) +
### 剪枝 -✅ Naive(Magnitude) +- ✅ Naive(Magnitude) +- ✅ [Wanda](https://arxiv.org/abs/2306.11695) +- ✅ [ShortGPT](https://arxiv.org/abs/2403.03853) -✅ [Wanda](https://arxiv.org/abs/2306.11695) +## 🤝 致谢 -✅ [ShortGPT](https://arxiv.org/abs/2403.03853) +本项目参考了以下仓库: -## 鸣谢 +- [mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq) +- [mit-han-lab/smoothquant](https://github.com/mit-han-lab/smoothquant) +- [OpenGVLab/OmniQuant](https://github.com/OpenGVLab/OmniQuant) +- [IST-DASLab/gptq](https://github.com/IST-DASLab/gptq) +- [ModelTC/Outlier_Suppression_Plus](https://github.com/ModelTC/Outlier_Suppression_Plus) -我们的代码参考了以下仓库: +
+更多相关实现 -- https://github.com/mit-han-lab/llm-awq -- https://github.com/mit-han-lab/smoothquant -- https://github.com/OpenGVLab/OmniQuant -- https://github.com/IST-DASLab/gptq -- https://github.com/ModelTC/Outlier_Suppression_Plus -- https://github.com/IST-DASLab/QUIK -- https://github.com/Vahe1994/SpQR -- https://github.com/ilur98/DGQ -- https://github.com/xvyaward/owq -- https://github.com/TimDettmers/bitsandbytes -- https://github.com/mobiusml/hqq -- [https://github.com/spcl/QuaRot](https://github.com/spcl/QuaRot) -- [https://github.com/locuslab/wanda](https://github.com/locuslab/wanda) -- [https://github.com/EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) -- [https://github.com/facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant) -- [https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ) +- [IST-DASLab/QUIK](https://github.com/IST-DASLab/QUIK) +- [Vahe1994/SpQR](https://github.com/Vahe1994/SpQR) +- [ilur98/DGQ](https://github.com/ilur98/DGQ) +- [xvyaward/owq](https://github.com/xvyaward/owq) +- [TimDettmers/bitsandbytes](https://github.com/TimDettmers/bitsandbytes) +- [mobiusml/hqq](https://github.com/mobiusml/hqq) +- [spcl/QuaRot](https://github.com/spcl/QuaRot) +- [locuslab/wanda](https://github.com/locuslab/wanda) +- [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) +- [facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant) +- [Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ) -## Star 历史 +
-[![Star History Chart](https://api.star-history.com/svg?repos=ModelTC/llmc&type=Timeline)](https://star-history.com/#ModelTC/llmc&Timeline) +## 🌟 Star 历史 -## 引用 +[![Star History Chart](https://api.star-history.com/svg?repos=ModelTC/llmc&type=Timeline)](https://star-history.com/#ModelTC/llmc&Timeline) -## 引用 +## ✏️ 引用 -如果您认为我们的 LLM-QBench 论文/llmc 工具对您的研究有用或相关,请务必引用我们的论文: +如果您觉得本工具包或相关论文对您的研究有帮助,请引用: ``` -@misc{llmc, - author = {llmc contributors}, - title = {llmc: Towards Accurate and Efficient LLM Compression}, - year = {2024}, - publisher = {GitHub}, - journal = {GitHub repository}, - howpublished = {\url{https://github.com/ModelTC/llmc}}, -} - -@misc{gong2024llmqbench, - title={LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models}, - author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Yunchen Zhang and Xianglong Liu and Dacheng Tao}, - year={2024}, - eprint={2405.06001}, - archivePrefix={arXiv}, - primaryClass={cs.LG} -} - -@misc{gong2024llmcbenchmarkinglargelanguage, - title={LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit}, - author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chentao Lv and Yunchen Zhang and Xianglong Liu and Dacheng Tao}, - year={2024}, - eprint={2405.06001}, - archivePrefix={arXiv}, - primaryClass={cs.LG}, - url={https://arxiv.org/abs/2405.06001}, +@inproceedings{DBLP:conf/emnlp/GongYGHLZT024, + author = {Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chengtao Lv and Yunchen Zhang and Dacheng Tao and Xianglong Liu}, + title = {LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit}, + booktitle = {EMNLP (Industry Track)}, + year = {2024}, + pages = {132--152}, + url = {https://aclanthology.org/2024.emnlp-industry.12} } ``` diff --git a/docs/en/source/conf.py b/docs/en/source/conf.py index 879d62c58..7e78c0fb8 100644 --- a/docs/en/source/conf.py +++ b/docs/en/source/conf.py @@ -1,17 +1,26 @@ # Configuration file for the Sphinx documentation builder. # -# For the full list of built-in configuration values, see the documentation: -# https://www.sphinx-doc.org/en/master/usage/configuration.html +# This file adopts the theme and basic settings used by the Lightx2v docs +# but keeps the llmc-specific information from the original configuration. +# ----------------------------------------------------------------------------- -# -- Project information ----------------------------------------------------- -# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information +import os +import sys +from typing import List + +# -- Path setup -------------------------------------------------------------- +# Add project root (two levels up) so autodoc can find the modules. +ROOT_DIR = os.path.abspath(os.path.join(__file__, "../../..")) +sys.path.append(ROOT_DIR) +# -- Project information ----------------------------------------------------- project = "llmc" copyright = "2024, llmc contributors" author = "ModelTC" release = "1.0.0" -github_url = f"https://github.com/ModelTC/llmc" +# GitHub repository ---------------------------------------------------------- +github_url = "https://github.com/ModelTC/llmc" html_context = { "display_github": True, @@ -20,50 +29,86 @@ "github_version": "main", "conf_py_path": "/docs/en/source/", # Path in the checkout to the docs root } -html_theme_options = { - "github_url": github_url, - "doc_items": { - "paper": "https://arxiv.org/abs/2405.06001", - "institution": "https://github.com/ModelTC", - }, - "logo": "images/logo/llmc.svg", - "logo_dark": "images/logo/llmc.svg", - "logo_icon": "images/logo/llmc.svg", -} - # -- General configuration --------------------------------------------------- -# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration extensions = [ - "myst_parser", - "sphinx.ext.autodoc", - "sphinx.ext.viewcode", "sphinx.ext.napoleon", - "sphinxcontrib.contentui", + "sphinx.ext.viewcode", + "sphinx.ext.intersphinx", + "sphinx.ext.autodoc", + "sphinx.ext.autosummary", + "myst_parser", + "sphinx_copybutton", "sphinx.ext.doctest", "sphinx.ext.mathjax", "sphinx.ext.ifconfig", - "sphinx-prompt", - "sphinxcontrib.jquery", - "sphinx.ext.autosectionlabel", "sphinx.ext.githubpages", - "sphinx.ext.intersphinx", + "sphinx.ext.autosectionlabel", "sphinxcontrib.katex", - "sphinx_copybutton", + "sphinxcontrib.contentui", ] -templates_path = ["_templates"] -exclude_patterns = [] +templates_path: List[str] = ["_templates"] +exclude_patterns: List[str] = [] language = "en" +# Exclude the prompt "$" when copying code blocks -------------------------- +copybutton_prompt_text = r"\$ " +copybutton_prompt_is_regexp = True + # -- Options for HTML output ------------------------------------------------- -# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output +html_title = project +html_theme = "sphinx_book_theme" +html_logo = "images/logo/llmc.svg" +html_static_path = ["_static"] + +# Theme options compatible with sphinx_book_theme / pydata-sphinx-theme +html_theme_options = { + "path_to_docs": "docs/en/source", + "repository_url": github_url, + "use_repository_button": True, + "logo": { + "text": "LLMC", + "image_light": "images/logo/llmc.svg", + "image_dark": "images/logo/llmc.svg", + }, + "doc_items": { + "paper": "https://arxiv.org/abs/2405.06001", + "institution": "https://github.com/ModelTC", + }, +} + +# -- Intersphinx mapping (optional) ----------------------------------------- +intersphinx_mapping = { + "python": ("https://docs.python.org/3", {}), + "sphinx": ("https://www.sphinx-doc.org/en/master", {}), +} +# -- Mock heavy external dependencies --------------------------------------- +autodoc_mock_imports = [ + "torch", + "transformers", + "sentencepiece", + "tensorizer", +] -html_theme = "trojanzoo_sphinx_theme" +# Remove base-class note in generated docs ---------------------------------- +from sphinx.ext import autodoc # noqa: E402, isort: skip -html_static_path = ["_static"] +class MockedClassDocumenter(autodoc.ClassDocumenter): + """Remove note about base class when a class is derived from object.""" + + def add_line(self, line: str, source: str, *lineno: int) -> None: + if line == " Bases: :py:class:`object`": + return + super().add_line(line, source, *lineno) + +autodoc.ClassDocumenter = MockedClassDocumenter + +# -- Customisation hooks ----------------------------------------------------- -source_suffix = [".rst", ".md"] +def setup(app): + """Optional Sphinx setup hooks.""" + pass diff --git a/docs/zh_cn/source/conf.py b/docs/zh_cn/source/conf.py index 9b1ae0785..f6ef270a4 100644 --- a/docs/zh_cn/source/conf.py +++ b/docs/zh_cn/source/conf.py @@ -1,69 +1,110 @@ -# Configuration file for the Sphinx documentation builder. -# -# For the full list of built-in configuration values, see the documentation: -# https://www.sphinx-doc.org/en/master/usage/configuration.html +# Configuration file for the Sphinx documentation builder (中文文档). +# ----------------------------------------------------------------------------- +# 参考 Lightx2v 样式,把原先 trojanzoo_sphinx_theme 改为 sphinx_book_theme, +# 并修正 logo 配置格式。 -# -- Project information ----------------------------------------------------- -# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information +import os +import sys +from typing import List +# -- Path setup -------------------------------------------------------------- +ROOT_DIR = os.path.abspath(os.path.join(__file__, "../../..")) +sys.path.append(ROOT_DIR) + +# -- 项目信息 --------------------------------------------------------------- project = "llmc" copyright = "2024, llmc contributors" author = "ModelTC" release = "1.0.0" -github_url = f"https://github.com/ModelTC/llmc" +# GitHub 信息 --------------------------------------------------------------- +github_url = "https://github.com/ModelTC/llmc" html_context = { "display_github": True, "github_user": author, "github_repo": "llmc", "github_version": "main", - "conf_py_path": "/docs/zh_cn/source/", # Path in the checkout to the docs root -} -html_theme_options = { - "github_url": github_url, - "doc_items": { - "paper": "https://arxiv.org/abs/2405.06001", - "institution": "https://github.com/ModelTC", - }, - "logo": "images/logo/llmc.svg", - "logo_dark": "images/logo/llmc.svg", - "logo_icon": "images/logo/llmc.svg", + "conf_py_path": "/docs/zh_cn/source/", # 文档根路径 } - -# -- General configuration --------------------------------------------------- -# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration - +# -- 通用配置 ---------------------------------------------------------------- extensions = [ - "myst_parser", - "sphinx.ext.autodoc", - "sphinx.ext.viewcode", "sphinx.ext.napoleon", - "sphinxcontrib.contentui", + "sphinx.ext.viewcode", + "sphinx.ext.intersphinx", + "sphinx.ext.autodoc", + "sphinx.ext.autosummary", + "myst_parser", + "sphinx_copybutton", "sphinx.ext.doctest", "sphinx.ext.mathjax", "sphinx.ext.ifconfig", - "sphinx-prompt", - "sphinxcontrib.jquery", - "sphinx.ext.autosectionlabel", "sphinx.ext.githubpages", - "sphinx.ext.intersphinx", + "sphinx.ext.autosectionlabel", "sphinxcontrib.katex", - "sphinx_copybutton", + "sphinxcontrib.contentui", ] -templates_path = ["_templates"] -exclude_patterns = [] +templates_path: List[str] = ["_templates"] +exclude_patterns: List[str] = [] + +language = "zh_CN" + +# 复制代码块时去除shell提示符 --------------------------------------------- +copybutton_prompt_text = r"\$ " +copybutton_prompt_is_regexp = True + +# -- HTML 输出选项 ----------------------------------------------------------- +html_title = project +html_theme = "sphinx_book_theme" +html_logo = "images/logo/llmc.svg" +html_static_path = ["_static"] + +html_theme_options = { + "path_to_docs": "docs/zh_cn/source", + "repository_url": github_url, + "use_repository_button": True, + "logo": { + "text": "LLMC", + "image_light": "images/logo/llmc.svg", + "image_dark": "images/logo/llmc.svg", + }, + "doc_items": { + "paper": "https://arxiv.org/abs/2405.06001", + "institution": "https://github.com/ModelTC", + }, +} + +# -- Intersphinx ------------------------------------------------------------- +intersphinx_mapping = { + "python": ("https://docs.python.org/3", {}), + "sphinx": ("https://www.sphinx-doc.org/en/master", {}), +} -language = "cn" +# -- Mock 外部依赖 ----------------------------------------------------------- +autodoc_mock_imports = [ + "torch", + "transformers", + "sentencepiece", + "tensorizer", +] -# -- Options for HTML output ------------------------------------------------- -# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output +# -- 自定义处理 ------------------------------------------------------------- +from sphinx.ext import autodoc # noqa: E402, isort: skip +class MockedClassDocumenter(autodoc.ClassDocumenter): + """移除“Bases: object”行。""" -html_theme = "trojanzoo_sphinx_theme" + def add_line(self, line: str, source: str, *lineno: int) -> None: + if line == " Bases: :py:class:`object`": + return + super().add_line(line, source, *lineno) -html_static_path = ["_static"] +autodoc.ClassDocumenter = MockedClassDocumenter + +# -- 额外钩子 --------------------------------------------------------------- -source_suffix = [".rst", ".md"] +def setup(app): + """可选的 Sphinx setup。""" + pass diff --git a/requirements/docs.txt b/requirements/docs.txt index a15a4fc07..1c8eec42f 100644 --- a/requirements/docs.txt +++ b/requirements/docs.txt @@ -1,15 +1,7 @@ -docutils -modelindex -myst-parser -sphinx -sphinx-copybutton -sphinx-design -sphinx-notfound-page -sphinx-tabs -sphinxcontrib-jquery -tabulate -sphinxcontrib.contentui --e git+https://github.com/ain-soph/trojanzoo_sphinx_theme.git#egg=trojanzoo_sphinx_theme -sphinx-prompt -sphinxcontrib-katex -sphinx-copybutton +sphinx == 6.2.1 +sphinx-book-theme == 1.0.1 +sphinx-copybutton == 0.5.2 +myst-parser == 2.0.0 +sphinx-argparse +sphinxcontrib.redoc +sphinxcontrib.openapi