ModelTC · Harahan · Jul 10, 2025 · Jul 10, 2025 · gemini-code-assist · Jul 10, 2025
diff --git a/README.md b/README.md
@@ -1,8 +1,7 @@
-# LLMC: Towards Accurate and Efficient LLM Compression
+<div align="center" style="font-family: charter;">
+<h1> LLMC: Towards Accurate and Efficient LLM Compression </h1>
 
-<img src="./imgs/llmc.png" alt="llmc" style="zoom:35%;" />
-
-<div align="center">
+<img src="./imgs/llmc.png" alt="llmc" width="75%" />
 
 [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
 [![arXiv](https://img.shields.io/badge/LLMC-2405.06001-b31b1b)](https://arxiv.org/abs/2405.06001)
@@ -11,7 +10,7 @@
 [![Discord Banner](https://img.shields.io/discord/1139835312592392214?logo=discord&logoColor=white)](https://discord.com/invite/NfJzbkK3jY)
 [![QQ](https://img.shields.io/badge/QQ-EB1923?logo=tencent-qq&logoColor=white)](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592)
 [![Doc](https://img.shields.io/badge/docs-English-99cc2)](https://llmc-en.readthedocs.io/en/latest/)
-[![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://llmc-zhcn.readthedocs.io/en/latest/)
+[![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://llmc-zhcn.readthedocs.io/en/latest/)&#160;
 
 **\[ English | [中文](README_zh.md) | [日本語](README_ja.md) \]**
 
@@ -27,36 +26,33 @@ docker pull llmcompression/llmc:pure-latest
 docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-latest
 ```
 
-**Community**:
-
-- [Discord Server](https://discord.com/invite/NfJzbkK3jY)
-- [Tencent QQ Group](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592)
+**Community**: [Discord Server](https://discord.com/invite/NfJzbkK3jY), [Tencent QQ Group](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=I9IGPWWj8uuRXWH3_ELWjouf6gkIMgUl&authKey=GA3WbFAsm90ePJf%2FCbc7ZyXXq4ShQktlBaLxgqS5yuSPAsr3%2BDKMRdosUiLYoilO&noverify=0&group_code=526192592).
 
-**Docs**:
+**Docs**: [English](https://llmc-en.readthedocs.io/en/latest/), [Chinese](https://llmc-zhcn.readthedocs.io/en/latest/).
 
-- [English](https://llmc-en.readthedocs.io/en/latest/)
-- [Chinese](https://llmc-zhcn.readthedocs.io/en/latest/)
-
-## Latest News
+## :fire: Latest News
 
 - **May 12, 2025:** 🔥 We now fully support quantization for the **`Wan2.1`** series of video generation models and provide export of truly quantized **INT8/FP8** weights, compatible with the [lightx2v](https://github.com/ModelTC/lightx2v) inference framework. For details, please refer to the [lightx2v documentation](https://llmc-en.readthedocs.io/en/latest/backend/lightx2v.html).
 
-- **Feb 7, 2025:** 🔥 We now fully support quantization of large-scale **`MOE`** models like **`DeepSeekv3`**, **`DeepSeek-R1`**, and **`DeepSeek-R1-zero`** with **`671B`** parameters. You can now directly load FP8 weights without any extra conversion. AWQ and RTN quantization can run on a single 80GB GPU, and we also support the export of true quantized **INT4/INT8** weights.
+- **Feb 07, 2025:** 🔥 We now fully support quantization of large-scale **`MOE`** models like **`DeepSeekv3`**, **`DeepSeek-R1`**, and **`DeepSeek-R1-zero`** with **`671B`** parameters. You can now directly load FP8 weights without any extra conversion. AWQ and RTN quantization can run on a single 80GB GPU, and we also support the export of true quantized **INT4/INT8** weights.
 
 - **Nov 20, 2024:** 🔥 We now fully support the quantization of ✨`DeepSeekv2(2.5)` and other `MOE` models, as well as ✨`Qwen2VL`, `Llama3.2`, and other `VLM` models. Supported quantization methods include ✅integer quantization, ✅floating-point quantization, and advanced algorithms like ✅AWQ, ✅GPTQ, ✅SmoothQuant, and ✅Quarot.
 
 - **Nov 12, 2024:** 🔥 We have added support for 💥`static per-tensor activation quantization` across various models and algorithms, covering ✅integer quantization and ✅floating-point quantization to further optimize performance and efficiency. Additionally, we now support exporting ✨`real quantized models` and using the [VLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang) backends for inference acceleration. For more details, refer to the [VLLM documentation](https://llmc-en.readthedocs.io/en/latest/backend/vllm.html) and [SGLang documentation](https://llmc-en.readthedocs.io/en/latest/backend/sglang.html).
 
 - **Sep 26, 2024:** 🔥 We now support exporting 💥`FP8 quantized(E4M3, E5M2)` models from 🚀`LLMC` to advanced inference backends such as [VLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang). For detailed usage, please refer to the [VLLM documentation](https://llmc-en.readthedocs.io/en/latest/backend/vllm.html) and [SGLang documentation](https://llmc-en.readthedocs.io/en/latest/backend/sglang.html).
 
+<details close>
-<details close>
+<details>
-<details close>
+<details>
+<summary>Previous News</summary>
+
 - **Sep 24, 2024:** 🔥 We have officially released ✅INT4 and ✅INT8 models of ✨`Llama-3.1-405B`, quantized using 🚀`LLMC` in `save_lightllm` mode. You can download the model parameters [here](https://huggingface.co/Dongz/llama31-405b-quant).
 
 - **Sep 23, 2024:** 🔥 We now support exporting ✨`real quantized(INT4, INT8)` models from 🚀`LLMC` to advanced inference backends such as [VLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), and [MLC-LLM](https://github.com/mlc-ai/mlc-llm) for quantized inference deployment, enabling ✨`reduced memory usage` and ✨`faster inference speeds`.
   For detailed usage, please refer to the [VLLM documentation](https://llmc-en.readthedocs.io/en/latest/backend/vllm.html), [SGLang documentation](https://llmc-en.readthedocs.io/en/latest/backend/sglang.html), [AutoAWQ documentation](https://llmc-en.readthedocs.io/en/latest/backend/autoawq.html), and [MLC-LLM documentation](https://llmc-en.readthedocs.io/en/latest/backend/mlcllm.html).
 
-- **Sep 9, 2024:** 🔥 We provide some configs of our best practice towards superior performance (see Best Practice [here](https://llmc-en.readthedocs.io/en/latest/)).
+- **Sep 09, 2024:** 🔥 We provide some configs of our best practice towards superior performance (see Best Practice [here](https://llmc-en.readthedocs.io/en/latest/)).
 
-* **Sep 3, 2024:** 🔥 We support [opencompass](https://github.com/open-compass/opencompass) 🤗 to eval 🚀`LLMC` model. Follow this [doc](https://llmc-en.readthedocs.io/en/latest/advanced/model_test_v2.html) and have a try!
+* **Sep 03, 2024:** 🔥 We support [opencompass](https://github.com/open-compass/opencompass) 🤗 to eval 🚀`LLMC` model. Follow this [doc](https://llmc-en.readthedocs.io/en/latest/advanced/model_test_v2.html) and have a try!
 
 * **Aug 22, 2024:** 🔥We support lots of small language models, including current SOTA [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)(see [Supported Model List](#supported-model-list)).
 
@@ -70,9 +66,6 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
 
   (\* denotes equal contribution, 📧 denotes corresponding author.)
 
-<details close>
-<summary>Previous News</summary>
-
 - **Jul 16, 2024:** 🔥We support Wanda/Naive(Magnitude) for llm sparsification and layer-wise mix bits quantization now!
 
 - **Jul 14, 2024:** 🔥We support rotation based quantization QuaRot now!
@@ -95,11 +88,11 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
   on the calibration data, algorithm pipeline, and quantization configuration selection. Based on the takeaways, a best practice for the LLM PTQ pipeline is designed, to achieve the best accuracy and efficiency performance balance
   under various scenarios.
 
-- **Mar 7, 2024:** 🚀 We release the quantization part of a powerful and efficient LLM compression tool. Notably, our benchmark paper is coming soon😊.
+- **Mar 07, 2024:** 🚀 We release the quantization part of a powerful and efficient LLM compression tool. Notably, our benchmark paper is coming soon😊.
 
 </details>
 
-## Highlight Feature
+## 🚀 Highlight Feature
 
 - 💥**Comprehensive Algorithm Support**: Provides a broad range of ✨`SOTA compression algorithms`, including ✅quantization, ✅mixed-precision quantization, and ✅sparsity, while maintaining accuracy consistent with the original repositories. ✨`Quantization best practices` (see 🚀`Best Practices` [here](https://llmc-en.readthedocs.io/en/latest/)) are also available to ensure optimal performance and efficiency.
 
@@ -111,175 +104,131 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
 
 - 💥**Performance Efficiency**: Enables quantization of large LLMs, such as ✨`Llama3.1-405B` and ✨`DeepSeek-R1-671B`, with PPL evaluation on a `single A100/H100/H800 GPU`.
 
-## Usage
+## ⚙️ Usage
 
 Please refer to the 🚀`Quick Start` section in the [documentation](https://llmc-en.readthedocs.io/en/latest/).
 
-## Supported Model List
-
-✅ [BLOOM](https://huggingface.co/bigscience/bloom)
-
-✅ [LLaMA](https://github.com/facebookresearch/llama)
-
-✅ [LLaMA V2](https://huggingface.co/meta-llama)
-
-✅ [StarCoder](https://github.com/bigcode-project/starcoder)
-
-✅ [OPT](https://huggingface.co/docs/transformers/model_doc/opt)
-
-✅ [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)
-
-✅ [InternLM2](https://huggingface.co/internlm)
-
-✅ [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)
-
-✅ [LLaMA V3](https://huggingface.co/meta-llama)
-
-✅ [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)
-
-✅ [Qwen V2](https://github.com/QwenLM/Qwen2)
-
-✅ [LLaVA](https://github.com/haotian-liu/LLaVA)
-
-✅ [InternLM2.5](https://huggingface.co/internlm)
-
-✅ [StableLM](https://github.com/Stability-AI/StableLM)
-
-✅ [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2)
-
-✅ [Phi2](https://huggingface.co/microsoft/phi-2)
-
-✅ [Phi 1.5](https://huggingface.co/microsoft/phi-1_5)
-
-✅ [MiniCPM](https://github.com/OpenBMB/MiniCPM)
-
-✅ [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)
-
-✅ [DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5)
-
-✅ [LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)
+## :robot: Supported Model List
+
+- ✅ [BLOOM](https://huggingface.co/bigscience/bloom)
+- ✅ [LLaMA](https://github.com/facebookresearch/llama)
+- ✅ [LLaMA V2](https://huggingface.co/meta-llama)
+- ✅ [StarCoder](https://github.com/bigcode-project/starcoder)
+- ✅ [OPT](https://huggingface.co/docs/transformers/model_doc/opt)
+
+<details>
+<summary>More Supported Models&nbsp</summary>
+
+- ✅ [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)
+- ✅ [InternLM2](https://huggingface.co/internlm)
+- ✅ [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)
+- ✅ [LLaMA V3](https://huggingface.co/meta-llama)
+- ✅ [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)
+- ✅ [Qwen V2](https://github.com/QwenLM/Qwen2)
+- ✅ [LLaVA](https://github.com/haotian-liu/LLaVA)
+- ✅ [InternLM2.5](https://huggingface.co/internlm)
+- ✅ [StableLM](https://github.com/Stability-AI/StableLM)
+- ✅ [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2)
+- ✅ [Phi2](https://huggingface.co/microsoft/phi-2)
+- ✅ [Phi 1.5](https://huggingface.co/microsoft/phi-1_5)
+- ✅ [MiniCPM](https://github.com/OpenBMB/MiniCPM)
+- ✅ [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)
+- ✅ [DeepSeekv2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5)
+- ✅ [LLaMA V3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)
+- ✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)
+- ✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
+- ✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)
 
-✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)
-
-✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
-
-✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)
+</details>
 
 You can add your own model type referring to files under `llmc/models/*.py`.
 
-## Supported Backend List
-
-✅ [VLLM](https://github.com/vllm-project/vllm)
+## :bus: Supported Backend List
 
-✅ [LightLLM](https://github.com/ModelTC/lightllm)
+- ✅ [VLLM](https://github.com/vllm-project/vllm)
+- ✅ [LightLLM](https://github.com/ModelTC/lightllm)
+- ✅ [Sglang](https://github.com/sgl-project/sglang)
+- ✅ [MLC-LLM](https://github.com/mlc-ai/mlc-llm)
+- ✅ [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
 
-✅ [Sglang](https://github.com/sgl-project/sglang)
-
-✅ [MLC-LLM](https://github.com/mlc-ai/mlc-llm)
-
-✅ [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
-
-## Supported Algorithm List
+## 💡 Supported Algorithm List
 
 ### Quantization
 
-✅ Naive
-
-✅ [AWQ](https://arxiv.org/abs/2306.00978)
-
-✅ [GPTQ](https://arxiv.org/abs/2210.17323)
-
-✅ [SmoothQuant](https://arxiv.org/abs/2211.10438)
-
-✅ [OS+](https://arxiv.org/abs/2304.09145)
-
-✅ [OmniQuant](https://arxiv.org/abs/2308.13137)
-
-✅ [NormTweaking](https://arxiv.org/abs/2309.02784)
-
-✅ [AdaDim](https://arxiv.org/pdf/2309.15531.pdf)
-
-✅ [QUIK](https://arxiv.org/abs/2310.09259)
+- ✅ Naive
+- ✅ [AWQ](https://arxiv.org/abs/2306.00978)
+- ✅ [GPTQ](https://arxiv.org/abs/2210.17323)
+- ✅ [SmoothQuant](https://arxiv.org/abs/2211.10438)
+- ✅ [OS+](https://arxiv.org/abs/2304.09145)
+
+<details>
+<summary>More Supported Algorithms&nbsp</summary>
+
+- ✅ [OmniQuant](https://arxiv.org/abs/2308.13137)
+- ✅ [NormTweaking](https://arxiv.org/abs/2309.02784)
+- ✅ [AdaDim](https://arxiv.org/pdf/2309.15531.pdf)
+- ✅ [QUIK](https://arxiv.org/abs/2310.09259)
+- ✅ [SpQR](https://arxiv.org/abs/2306.03078)
+- ✅ [DGQ](https://arxiv.org/abs/2310.04836)
+- ✅ [OWQ](https://arxiv.org/abs/2306.02272)
+- ✅ [LLM.int8()](https://arxiv.org/abs/2208.07339)
+- ✅ [HQQ](https://mobiusml.github.io/hqq_blog/)
+- ✅ [QuaRot](https://arxiv.org/abs/2404.00456)
+- ✅ [SpinQuant](https://arxiv.org/abs/2405.16406) **([See this branch](https://github.com/ModelTC/llmc/tree/dev_spinquant))**
+- ✅ [TesseraQ](https://arxiv.org/abs/2410.19103)
 
-✅ [SpQR](https://arxiv.org/abs/2306.03078)
-
-✅ [DGQ](https://arxiv.org/abs/2310.04836)
-
-✅ [OWQ](https://arxiv.org/abs/2306.02272)
-
-✅ [LLM.int8()](https://arxiv.org/abs/2208.07339)
-
-✅ [HQQ](https://mobiusml.github.io/hqq_blog/)
-
-✅ [QuaRot](https://arxiv.org/abs/2404.00456)
-
-✅ [SpinQuant](https://arxiv.org/abs/2405.16406) **([See this branch](https://github.com/ModelTC/llmc/tree/dev_spinquant))**
-
-✅ [TesseraQ](https://arxiv.org/abs/2410.19103)
+</details>
 
 ### Pruning
 
-✅ Naive(Magnitude)
+- ✅ Naive(Magnitude)
+- ✅ [Wanda](https://arxiv.org/abs/2306.11695)
+- ✅ [ShortGPT](https://arxiv.org/abs/2403.03853)
 
-✅ [Wanda](https://arxiv.org/abs/2306.11695)
+## 🤝 Acknowledgments
 
-✅ [ShortGPT](https://arxiv.org/abs/2403.03853)
+We develop our code referring to the following repos:
 
-## Acknowledgments
+- [mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq)
+- [mit-han-lab/smoothquant](https://github.com/mit-han-lab/smoothquant)
+- [OpenGVLab/OmniQuant](https://github.com/OpenGVLab/OmniQuant)
+- [IST-DASLab/gptq](https://github.com/IST-DASLab/gptq)
+- [ModelTC/Outlier_Suppression_Plus](https://github.com/ModelTC/Outlier_Suppression_Plus)
+
+<details>
+<summary>More Related Implementations&nbsp</summary>
+
+- [IST-DASLab/QUIK](https://github.com/IST-DASLab/QUIK)
+- [Vahe1994/SpQR](https://github.com/Vahe1994/SpQR)
+- [ilur98/DGQ](https://github.com/ilur98/DGQ)
+- [xvyaward/owq](https://github.com/xvyaward/owq)
+- [TimDettmers/bitsandbytes](https://github.com/TimDettmers/bitsandbytes)
+- [mobiusml/hqq](https://github.com/mobiusml/hqq)
+- [spcl/QuaRot](https://github.com/spcl/QuaRot)
+- [locuslab/wanda](https://github.com/locuslab/wanda)
+- [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
+- [facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant)
+- [Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ)
 
-We develop our code referring to the following repos:
+</details>
 
-- https://github.com/mit-han-lab/llm-awq
-- https://github.com/mit-han-lab/smoothquant
-- https://github.com/OpenGVLab/OmniQuant
-- https://github.com/IST-DASLab/gptq
-- https://github.com/ModelTC/Outlier_Suppression_Plus
-- https://github.com/IST-DASLab/QUIK
-- https://github.com/Vahe1994/SpQR
-- https://github.com/ilur98/DGQ
-- https://github.com/xvyaward/owq
-- https://github.com/TimDettmers/bitsandbytes
-- https://github.com/mobiusml/hqq
-- [https://github.com/spcl/QuaRot](https://github.com/spcl/QuaRot)
-- [https://github.com/locuslab/wanda](https://github.com/locuslab/wanda)
-- [https://github.com/EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
-- [https://github.com/facebookresearch/SpinQuant](https://github.com/facebookresearch/SpinQuant)
-- [https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ](https://github.com/Intelligent-Computing-Lab-Yale/TesseraQ)
-
-## Star History
+## 🌟 Star History
 
 [![Star History Chart](https://api.star-history.com/svg?repos=ModelTC/llmc&type=Timeline)](https://star-history.com/#ModelTC/llmc&Timeline)
 
-## Citation
+## ✏️ Citation
 
-If you find our LLM-QBench paper/llmc toolkit useful or relevant to your research, please kindly cite our paper:
+If you find our toolkit or research paper useful or relevant to your research, please kindly cite our work:
 
 ```
-@misc{llmc,
-   author = {llmc contributors},
-   title = {llmc: Towards Accurate and Efficient LLM Compression},
-   year = {2024},
-   publisher = {GitHub},
-   journal = {GitHub repository},
-   howpublished = {\url{https://github.com/ModelTC/llmc}},
-}
-
-@misc{gong2024llmqbench,
-      title={LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models},
-      author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Yunchen Zhang and Xianglong Liu and Dacheng Tao},
-      year={2024},
-      eprint={2405.06001},
-      archivePrefix={arXiv},
-      primaryClass={cs.LG}
-}
-
-@misc{gong2024llmcbenchmarkinglargelanguage,
-      title={LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit},
-      author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chentao Lv and Yunchen Zhang and Xianglong Liu and Dacheng Tao},
-      year={2024},
-      eprint={2405.06001},
-      archivePrefix={arXiv},
-      primaryClass={cs.LG},
-      url={https://arxiv.org/abs/2405.06001},
+@inproceedings{DBLP:conf/emnlp/GongYGHLZT024,
+  author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chengtao Lv and Yunchen Zhang and Dacheng Tao and Xianglong Liu},
+  title={LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit},
+  year={2024},
+  cdate={1704067200000},
+  pages={132-152},
+  url={https://aclanthology.org/2024.emnlp-industry.12},
+  booktitle={EMNLP (Industry Track)},
+  crossref={conf/emnlp/2024i}
 }
 ```