SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining - A Minimal Implementation of Inference
The minimal inference implementation of our work: SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining. Input Gaussian Splats
$^\star$ Yue Li1, $^\star$ Qi Ma2,3, Runyi Yang3, Huapeng Li2, Mengjiao Ma3,4, $^\dagger$ Bin Ren3,5,6, Nikola Popovic3, Nicu Sebe6, Ender Konukoglu2, Theo Gevers1, Luc Van Gool2,3, Martin R. Oswald1, and Danda Pani Paudel3
1 University of Amsterdam
2 ETH Zürich
3 INSAIT
4 Nanjing University of Aeronautics and Astronautics
5 University of Pisa
6 University of Trento
Please set up the provided conda environment with Python 3.10, PyTorch 2.5.1, and CUDA 12.4.
conda env create -f env.yaml
conda activate scene_splat
mkdir -p checkpoints
cd checkpoints
mkdir model_wo_normal
cd model_wo_normal
huggingface-cli download GaussianWorld/SceneSplat_lang-pretrain-concat-scan-ppv2-matt-mcmc-wo-normal-contrastive --local-dir .
mv config/model_wo_normal/config_inference.py .
mkdir -p checkpoints
cd checkpoints
mkdir model_normal
cd model_normal
huggingface-cli GaussianWorld/lang-pretrain-ppv2-and-scannet-fixed-all-w-normal-late-contrastive --local-dir .
mv config/model_normal/config_inference.py .
More Details and how to prepare npy data should be refered to SceneSplat
Run SceneSplat inference on NPY data:
python run_gs_pipeline.py \
--npy_folder example_npy \
--scene_name scene0000_00 \
--model_folder checkpoints/model_normal/ \
--device cuda \
--save_features
Run SceneSplat inference on PLY data:
python run_gs_pipeline.py \
--ply /path/to/scene.ply \
--scene_name scene0000_00 \
--model_folder checkpoints/model_wo_normal/ \
--device cuda \
--save_features
--npy_folder
: Root directory containing NPY scene data (with structure:train/
,val/
,test/
subdirs)--ply
: Path to PLY file containing Gaussian Splatting data--model_folder
: Path to folder containing model checkpoint (.pth) and config_inference.py--normal
: Include normal vectors in features (adds 3 channels, default: False)--device
: Device to use (cuda
orcpu
, default:cuda
)--save_features
: Save extracted language features topred_langfeat.npy
--save_output
: Save input attributes (coord, color, opacity, quat, scale, normal)--output_dir
: Output directory for saved files (default:./output
)--list_scenes
: List all available scenes in npy_folder and exit (NPY format only)
NPY Format (Preprocessed):
Each scene should be a directory containing these .npy
files:
scene0000_00/
├── coord.npy # [N, 3] 3D coordinates
├── color.npy # [N, 3] RGB colors (0-255 or 0-1)
├── opacity.npy # [N, 1] or [N] opacity values
├── quat.npy # [N, 4] quaternions (wxyz)
├── scale.npy # [N, 3] scaling factors
├── normal.npy # [N, 3] surface normals (optional)
└── segment.npy # [N] semantic labels (optional)
PLY Format (Raw Gaussian Splatting): Standard 3D Gaussian Splatting PLY files with these attributes:
scene.ply
├── x, y, z # 3D coordinates
├── f_dc_0/1/2 # Spherical harmonic DC coefficients (RGB)
├── opacity # Raw opacity values
├── rot_0/1/2/3 # Quaternion components (wxyz)
├── scale_0/1/2 # Log-space scaling factors
├── nx, ny, nz # Normal vectors (optional)
└── f_rest_* # Higher-order SH coefficients (ignored)
When --save_features
is used, the script saves:
pred_langfeat.npy
: [N, D] L2-normalized language features (float16)
Features are automatically mapped back to original point order using inverse sampling if available.
List available NPY scenes:
python run_gs_pipeline.py --npy_folder example_data --list_scenes
Process NPY data with custom output:
python run_gs_pipeline.py \
--npy_folder /path/to/data \
--scene_name scene0000_00 \
--model_folder checkpoints/model_normal/ \
--save_features \
--output_dir ./results
Process PLY data with normals:
python run_gs_pipeline.py \
--ply /path/to/gaussians.ply \
--model_folder checkpoints/model_normal/ \
--normal \
--save_features \
--output_dir ./results
The model input channels depend on the --normal
flag:
- Without
--normal
: 11 channels (3 color + 1 opacity + 4 quat + 3 scale) - With
--normal
: 14 channels (3 color + 1 opacity + 4 quat + 3 scale + 3 normal)
Make sure your model checkpoint matches the expected input dimensions.
Please refer to Viewer to visualize language feature.
We sincerely thank all the author teams of the original datasets for their contributions. Our work builds on the following repositories:
- Pointcept repository, on which we develop our codebase,
- gsplat repository, which we adapted to optimize the 3DGS scenes,
- Occam's LGS repository, which we adapted for 3DGS pseudo label collection.
We are grateful to the authors for their open-source contributions!