Skip to content

UCDvision/gen2seg

Repository files navigation

gen2seg: Generative Models Enable Generalizable Instance Segmentation

Hugging Face Spaces

gen2seg: Generative Models Enable Generalizable Instance Segmentation
Om Khangaonkar, Hamed Pirsiavash
UC Davis

Pretrained Models

Stable Diffusion 2 (SD): https://huggingface.co/reachomk/gen2seg-sd

ImageNet-1K-pretrained Masked Autoencoder-Huge (MAE-H): https://huggingface.co/reachomk/gen2seg-mae-h

If you want any of our other models, send me an email. If there is sufficient demand, I will also release them publicly.

Getting Started

Please set up the environment by running

conda env create -f environment.yml

and then

conda activate gen2seg

Inference

Currently, we have released inference code for our SD and MAE models. You can run them by editing the image_path variable (for your input image) in each file, and then simply running it with python inference_{mae or sd}.py.

You will need to have transformers and diffusers installed, along with standard machine learning packages such as pytorch and numpy. More details on our specific environment will be released with the training code.

We have also released code for prompting. Please run pip install opencv-contrib-python prior to running this file if you didn't start from our conda environment.

Here is how you run it:

python prompting.py \
    --feature_image /path/to/your/feature_image.png \
    --prompt_x [prompt pixel x] \ 
    --prompt_y [prompt pixel y] \

The feature image is the one generated by our model, NOT the original image.

We also have the additional optional arguments:

--output_mask /path/to/save/output_mask.png
--sigma [value between 0 and 1]
--threshold [value between 0 and 255]

Threshold and sigma allow you to control the mask threshold and the amount of averaging for the query vector, respectively. By default they are 0.01 and 3. See our paper for more details.

We have also provided our inference script for SAM, to enable qualitative comparison. Please make sure you download the checkpoint and input the path in the script. You should also edit the image_path variable (for your input image).

Training our models

You will probably need a 48 GB GPU to train our SD model, but MAE will work on 24GB.

Data

We use two datasets, Hypersim and Virtual Kitti 2.

You can download Virtual Kitti 2 directly from this link: https://europe.naverlabs.com/proxy-virtual-worlds-vkitti-2/

Please download the rgb and instanceSegmentation tars. To work off-the-shelf with our current dataloader, please extract them into the same directory. This way, for a given scene, the RGB and segmentation will be under frames/rgb and frames/instanceSegmentation respectively. You can see the VirtualKITTI2._find_pairs function in training/dataloaders/load.py for more details.

For Hypersim, I recommend downloading using this script: https://github.com/apple/ml-hypersim/tree/main/contrib/99991

Assuming you have a root folder root, you should download the RGB frames (scene_cam_00_final_preview/*.color.jpg) into root/rgb. You also will need to download the segmentation annotations (scene_cam_03_geometry_hdf5/*..semantic_instance.hdf5). You will to convert these RGB annotations by assigning the background as black and each mask a unique color (that is not black or white). Please delete all frames that do not have any annotations. If you keep these it will degrade performance. I also found deleting scenes with less than 10 annotated objects helped. Please place the colored annotations into root/instance-rgb.

You will need to specify the path to each dataset at line 360 in training/train.py, or line 274 in training/train_mae_full.py.

Training

Before beginning, please modify the num_processes variable in training/scripts/multi_gpu.yaml with the number of GPUs you want to parallelize over.

To train our models, please run the following scripts. Descriptions of the arguments are available in the respective training scripts.

Stable Diffusion: ./training/scripts/train_stable_diffusion_e2e_ft_instance.sh

MAE: ./training/scripts/train_mae_full_e2e_ft_instance.sh

Please let me know if you want more details or have any questions.

Citation

Please cite our paper if it was helpful or you liked it.

@article{khangaonkar2025gen2seg,
      title={gen2seg: Generative Models Enable Generalizable Instance Segmentation}, 
      author={Om Khangaonkar and Hamed Pirsiavash},
      year={2025},
      journal={arXiv preprint arXiv:2505.15263}
}

About

Code for "gen2seg: Generative Models Enable Generalizable Instance Segmentation"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •