LangBridge: Interpreting Image as a Combination of Language Embeddings

Official implementation of the paper "LangBridge: Interpreting Image as a Combination of Language Embeddings" accepted at ICCV 2025.

Paper | Project Page | Models

🔥 News

[2025-06] LangBridge paper accepted at ICCV 2025!
[2025-06] Code and models released!

📖 Abstract

We propose LangBridge, a novel adapter that explicitly maps visual tokens to linear combinations of LLM vocabulary embeddings. This design enables pretraining-free adapter transfer across different LLMs while maintaining competitive performance.

🛠️ Installation

We use CUDA 11.8

git clone https://github.com/CurryX-001/LangBridge.git
cd LangBridge

# Create environment
conda create -n langbridge python=3.10 -y
conda activate langbridge
pip install --upgrade pip

# Install package
pip install -e .

# Install additional packages for training
pip install -e ".[train]"
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu118torch2.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whl --no-build-isolation --no-cache-dir

📊 Data Preparation

1. Download the Pretraining Data and Visual Instruction Tuning Data, and Evaluation Data

Download the annotation file for final mixture instruction tuning data llava_v1_5_mix665k.json, and download the images from constituting datasets:

LLaVA-Pretrain: images
COCO: train2017
GQA: images
OCR-VQA: download script, we save all files as .jpg
TextVQA: train_val_images
VisualGenome: part1, part2

After downloading all of them, organize the data as follows in ./playground/data:

├── coco
│   └── train2017
├── LLaVA-Pretrain
│   └── images
├── gqa
│   └── images
├── ocr_vqa
│   └── images
├── textvqa
│   └── train_images
└── vg
    ├── VG_100K
    └── VG_100K_2

You can download training data using the provided script:

bash scripts/download_data.sh

🎨 Visualization Code

Progressive Training Visualization

bash scripts/vis.sh

🚀 Training

1. Extract Model Embeddings

Extract input embeddings from pretrained models and process them with vocabulary mappings:

# Extract embeddings for Llama3-8B with 19200 vocab
python scripts/get_input_embeddings.py \
    --model_name "meta-llama/Meta-Llama-3-8B-Instruct" \
    --vocab_path "vocab/19200_llama3_sub_llava_share_intersect_llama_qwen.json" \
    --output_dir "./embeddings"

# Extract embeddings for Qwen2-7B with 19200 vocab  
python scripts/get_input_embeddings.py \
    --model_name "Qwen/Qwen2-7B-Instruct" \
    --vocab_path "vocab/19200_Qwen_sub_llava_share_intersect_llama_qwen.json" \
    --output_dir "./embeddings"

To generate different vocabulary sizes, use:

# Create vocabulary subsets with different sizes
python scripts/create_vocab_subset.py --vocab_size 19200 --model_name llama3
python scripts/create_vocab_subset.py --vocab_size 25600 --model_name llama3
python scripts/create_vocab_subset.py --vocab_size 32000 --model_name llama3

python scripts/create_vocab_subset.py --vocab_size 19200 --model_name Qwen
python scripts/create_vocab_subset.py --vocab_size 25600 --model_name Qwen
python scripts/create_vocab_subset.py --vocab_size 32000 --model_name Qwen

2. Model Training

Train the LangBridge model using the provided training scripts:

# For Llama3-based models
bash scripts/examples/llama3/train_langbridge.sh

# Example training configurations available:
# - scripts/examples/llama3/pretrain.sh
# - scripts/examples/llama3/finetune.sh
# - scripts/examples/llama3/multimodel_training.sh

For detailed training configurations and advanced options, refer to the example scripts in scripts/examples/llama3/.

📊 Evaluation

Evaluate trained models across multiple benchmarks:

bash scripts/evaluate_all.sh

For LLaVA-Next specific training and evaluation protocols, refer to ./LLaVA-NeXT/Instruction.md.

🏆 Models

Pre-trained models are available for download:

LLM	Connector	Model Type	Download
Qwen2-7B	Qwen2-7B-Pretrain-MLP	LLaVA-Next	ModelScope
Qwen2-7B	Qwen2-0.5B-Pretrain-LangBridge	LLaVA-Next	ModelScope

📄 Citation

@article{liao2025langbridge,
  title={LangBridge: Interpreting Image as a Combination of Language Embeddings},
  author={Liao, Jiaqi and Niu, Yuwei and Meng, Fanqing and Li, Hao and Tian, Changyao and Du, Yinuo and Xiong, Yuwen and Li, Dianqi and Zhu, Xizhou and Yuan, Li and others},
  journal={arXiv preprint arXiv:2503.19404},
  year={2025}
}

📧 Contact

For questions, please open an issue or contact: godubnation7@gmail.com

Acknowledgement

LangBridge is built on LlaVA, LlaVA-NeXT, and lmms-eval. We thank the authors for their excellent work and open-source contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
LLaVA-NeXT		LLaVA-NeXT
assets		assets
build/lib/llava		build/lib/llava
llava.egg-info		llava.egg-info
llava		llava
scripts		scripts
.gitignore		.gitignore
README.md		README.md
cog.yaml		cog.yaml
jsonl2json.py		jsonl2json.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LangBridge: Interpreting Image as a Combination of Language Embeddings

🔥 News

📖 Abstract

🛠️ Installation

📊 Data Preparation

1. Download the Pretraining Data and Visual Instruction Tuning Data, and Evaluation Data

🎨 Visualization Code

Progressive Training Visualization

🚀 Training

1. Extract Model Embeddings

2. Model Training

📊 Evaluation

🏆 Models

📄 Citation

📧 Contact

Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

CurryX-001/LangBridge

Folders and files

Latest commit

History

Repository files navigation

LangBridge: Interpreting Image as a Combination of Language Embeddings

🔥 News

📖 Abstract

🛠️ Installation

📊 Data Preparation

1. Download the Pretraining Data and Visual Instruction Tuning Data, and Evaluation Data

🎨 Visualization Code

Progressive Training Visualization

🚀 Training

1. Extract Model Embeddings

2. Model Training

📊 Evaluation

🏆 Models

📄 Citation

📧 Contact

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages