Skip to content

CurryX-001/LangBridge

Repository files navigation

LangBridge: Interpreting Image as a Combination of Language Embeddings

Official implementation of the paper "LangBridge: Interpreting Image as a Combination of Language Embeddings" accepted at ICCV 2025.

Paper | Project Page | Models

🔥 News

  • [2025-06] LangBridge paper accepted at ICCV 2025!
  • [2025-06] Code and models released!

📖 Abstract

We propose LangBridge, a novel adapter that explicitly maps visual tokens to linear combinations of LLM vocabulary embeddings. This design enables pretraining-free adapter transfer across different LLMs while maintaining competitive performance.

LangBridge Method

🛠️ Installation

We use CUDA 11.8

git clone https://github.com/CurryX-001/LangBridge.git
cd LangBridge

# Create environment
conda create -n langbridge python=3.10 -y
conda activate langbridge
pip install --upgrade pip

# Install package
pip install -e .

# Install additional packages for training
pip install -e ".[train]"
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu118torch2.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whl --no-build-isolation --no-cache-dir

📊 Data Preparation

1. Download the Pretraining Data and Visual Instruction Tuning Data, and Evaluation Data

Download the annotation file for final mixture instruction tuning data llava_v1_5_mix665k.json, and download the images from constituting datasets:

After downloading all of them, organize the data as follows in ./playground/data:

├── coco
│   └── train2017
├── LLaVA-Pretrain
│   └── images
├── gqa
│   └── images
├── ocr_vqa
│   └── images
├── textvqa
│   └── train_images
└── vg
    ├── VG_100K
    └── VG_100K_2

You can download training data using the provided script:

bash scripts/download_data.sh

🎨 Visualization Code

Progressive Training Visualization

bash scripts/vis.sh

🚀 Training

1. Extract Model Embeddings

Extract input embeddings from pretrained models and process them with vocabulary mappings:

# Extract embeddings for Llama3-8B with 19200 vocab
python scripts/get_input_embeddings.py \
    --model_name "meta-llama/Meta-Llama-3-8B-Instruct" \
    --vocab_path "vocab/19200_llama3_sub_llava_share_intersect_llama_qwen.json" \
    --output_dir "./embeddings"

# Extract embeddings for Qwen2-7B with 19200 vocab  
python scripts/get_input_embeddings.py \
    --model_name "Qwen/Qwen2-7B-Instruct" \
    --vocab_path "vocab/19200_Qwen_sub_llava_share_intersect_llama_qwen.json" \
    --output_dir "./embeddings"

To generate different vocabulary sizes, use:

# Create vocabulary subsets with different sizes
python scripts/create_vocab_subset.py --vocab_size 19200 --model_name llama3
python scripts/create_vocab_subset.py --vocab_size 25600 --model_name llama3
python scripts/create_vocab_subset.py --vocab_size 32000 --model_name llama3

python scripts/create_vocab_subset.py --vocab_size 19200 --model_name Qwen
python scripts/create_vocab_subset.py --vocab_size 25600 --model_name Qwen
python scripts/create_vocab_subset.py --vocab_size 32000 --model_name Qwen

2. Model Training

Train the LangBridge model using the provided training scripts:

# For Llama3-based models
bash scripts/examples/llama3/train_langbridge.sh

# Example training configurations available:
# - scripts/examples/llama3/pretrain.sh
# - scripts/examples/llama3/finetune.sh
# - scripts/examples/llama3/multimodel_training.sh

For detailed training configurations and advanced options, refer to the example scripts in scripts/examples/llama3/.

📊 Evaluation

Evaluate trained models across multiple benchmarks:

bash scripts/evaluate_all.sh

For LLaVA-Next specific training and evaluation protocols, refer to ./LLaVA-NeXT/Instruction.md.

🏆 Models

Pre-trained models are available for download:

LLM Connector Model Type Download
Qwen2-7B Qwen2-7B-Pretrain-MLP LLaVA-Next ModelScope
Qwen2-7B Qwen2-0.5B-Pretrain-LangBridge LLaVA-Next ModelScope

📄 Citation

@article{liao2025langbridge,
  title={LangBridge: Interpreting Image as a Combination of Language Embeddings},
  author={Liao, Jiaqi and Niu, Yuwei and Meng, Fanqing and Li, Hao and Tian, Changyao and Du, Yinuo and Xiong, Yuwen and Li, Dianqi and Zhu, Xizhou and Yuan, Li and others},
  journal={arXiv preprint arXiv:2503.19404},
  year={2025}
}

📧 Contact

For questions, please open an issue or contact: godubnation7@gmail.com

Acknowledgement

LangBridge is built on LlaVA, LlaVA-NeXT, and lmms-eval. We thank the authors for their excellent work and open-source contributions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •