LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds (CVPR 2025)
Zihui Zhang, Weisheng Dai, Hongtao Wen, Bo Yang
We propose an unsupervised learning approach for 3D semantic segmentation.
The global patterns are semantic-aware, and our performances exceed baselines:
![]() |
![]() |
---|
### CUDA 11.8
conda env create -f env.yml
source activate LogoSP
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install pytorch-scatter -c pyg
git clone https://github.com/NVIDIA/MinkowskiEngine.git
Modify the MinkowskiEngine to adapt to Pytorch 2.x
- MinkowskiEngine/src/3rdparty/concurrent_unordered_map.cuh: Add '#include <thrust/execution_policy.h>'
- MinkowskiEngine/src/convolution_kernel.cuh: Add '#include <thrust/execution_policy.h>'
- MinkowskiEngine/src/coordinate_map_gpu.cu: Add '#include <thrust/unique.h>' and '#include <thrust/remove.h>'
- MinkowskiEngine/src/spmm.cu: Add '#include <thrust/execution_policy.h>', '#include <thrust/reduce.h>', and '#include <thrust/sort.h>'
cd MinkowskiEngine
python setup.py install --blas=openblas
The data preparation process includes segmentation data, superpoints, and DINOv2 feature extraction and projection.
We mainly follow GrowSP to preprocess the ScanNet dataset and build superpoints.
For ScanNet data, please download from here.
Uncompress the folder and move it to ./data/ScanNet/raw/
.
For superpoints, we can follow GrowSP to use VCCS+Region Growing or use the ScanNet officially provided Felzenszwalb superpoints (optional). Choosing one of them with data preprocessing is enough.
python data_prepare/data_prepare_ScanNet.py --data_path './data/ScanNet/raw' --processed_data_path './data/ScanNet/processed' --Felzenszwalb False
python data_prepare/initialSP_prepare_ScanNet.py --input_path './data/ScanNet/processed/' --sp_path './data/ScanNet/initial_superpoints/'
Please download the ScanNet toolkit and come into ScanNet/Segmentor
to build by running make
(or create makefiles for your system using cmake
).
This will create a segmentator binary file.
Then, go outside the ./ScanNet
to run the segmentator:
./run_segmentator.sh your_scannet_tranval_path ## e.g ./data/ScanNet/raw/scans
./run_segmentator.sh your_scannet_test_path ## e.g ./data/ScanNet/raw/scans_test
# Running the preprocessing when having the superpoint files.
python data_prepare/data_prepare_ScanNet.py --data_path './data/ScanNet/raw' --processed_data_path './data/ScanNet/processed' --processed_sp_path './data/ScanNet/Felzenszwalb'
After superpoints construction and data preprocessing by (1) or (2), we can extract and project DINOv2 features.
We resume the data provided by OpenScene, uncompress them and put into ./data/ScanNet
wget https://cvg-data.inf.ethz.ch/openscene/data/scannet_processed/scannet_3d.zip
wget https://cvg-data.inf.ethz.ch/openscene/data/scannet_processed/scannet_2d.zip
Finally, extracting DINOv2 features and project to 3D point clouds by:
python project_ScanNet.py
This will create 3D point clouds with features in ./data/ScanNet/DINOv2_feats_s14up4_voxel_0.05
.
The data structure should be:
ScanNet
βββ processed
βββ scannet_3d
| βββ train
| βββ val
| βββ scannetv2_train.txt
| βββ scannetv2_val.txt
| βββ scannetv2_test.txt
βββ initial_superpoints_0.25
βββ Felzenszwalb (optional)
βββ DINOv2_feats_s14up4_voxel_0.15
S3DIS dataset can be found here.
Download the files named "Stanford3dDataset_v1.2_Aligned_Version.zip". Uncompress the folder and move it to data/S3DIS/raw
. There is an error in line 180389
of file Area_5/hallway_6/Annotations/ceiling_1.txt
which needs to be fixed manually and modify the copy_Room_1.txt
in Area_6/copyRoom_1
to copyRoom_1.txt
. Then run the below commands to begin preprocessing:
python data_prepare/data_prepare_S3DIS.py --data_path './data/ScanNet/raw' --processed_data_path './data/ScanNet/processed' --processed_sp_path './data/ScanNet/Felzenszwalb'
The 2D image and camera parameters are storted in 2D-3D-S dataset, please download it and extract DINO features for S3DIS by:
python project_S3DIS.py
The data structure should be:
S3DIS
βββ input_0.010
βββ initial_superpoints
βββ DINOv2_feats_s14up4_voxel_0.05
βββ 2D-3D-S
βββ Area1
βββ Area2
...
βββ Area5a
βββ Area5b
βββ Area6
The training and validation set of nuScenes (including RGB for distillation) can be downloaded following OpenScene:
# all 3d data
wget https://cvg-data.inf.ethz.ch/openscene/data/nuscenes_processed/nuscenes_3d.zip
wget https://cvg-data.inf.ethz.ch/openscene/data/nuscenes_processed/nuscenes_3d_train.zip
# all image data
wget https://cvg-data.inf.ethz.ch/openscene/data/nuscenes_processed/nuscenes_2d.zip
Constructing superpoints by:
python data_prepare/initialSP_prepare_nuScenes.py --input_path '../data/nuscenes/nuScenes_3d/train/' --sp_path '../data/nuScenes/initial_superpoints/'
DINOv2 features extracting and projecting by:
python project_nuScenes.py --output_dir './data/nuScenes/DINOv2_feats_s14up4_voxel_0.15'
The data structure should be:
nuScenes
βββ nuScenes_3d
| βββ train
| βββ val
βββ nuScenes_2d
| βββ train
| βββ val
βββ initial_superpoints
| βββ train
βββ DINOv2_feats_s14up4_voxel_0.15
The distillation model is first trained by:
CUDA_VISIBLE_DEVICES=0 python train_Distill_ScanNet.py --save_path 'ckpt/ScanNet/distill/' --feats_path './data/ScanNet/DINOv2_feats_s14up4_voxel_0.05/'
After distillation, we have the model checkpoints and train the segmentation model:
# e.g., use the epoch 300 checkpoint
CUDA_VISIBLE_DEVICES=0 python train_Seg_ScanNet.py --save_path 'ckpt/ScanNet/seg/' --distill_ckpt './ckpt/ScanNet/distill/checkpoint_300.tar' --sp_path './data/ScanNet/initial_superpoints/'
Distillation & Segmentation:
CUDA_VISIBLE_DEVICES=0 python train_Distill_S3DIS.py --save_path 'ckpt/S3DIS/distill/' --feats_path './data/S3DIS/DINOv2_feats_s14up4_voxel_0.05/'
# e.g., use the epoch 700 checkpoint
CUDA_VISIBLE_DEVICES=0 python train_Seg_S3DIS.py --save_path 'ckpt/S3DIS/seg/' --distill_ckpt './ckpt/S3DIS/distill/checkpoint_700.tar' --sp_path './data/S3DIS/initial_superpoints/'
Distillation & Segmentation:
CUDA_VISIBLE_DEVICES=0 python train_Distill_nuScenes.py --save_path 'ckpt/nuScenes/distill/' --feats_path './data/nuScenes/DINOv2_feats_s14up4_voxel_0.15/'
# e.g., use the epoch 300 checkpoint
CUDA_VISIBLE_DEVICES=0 python train_Seg_nuScenes.py --save_path 'ckpt/nuScenes/seg/' --distill_ckpt './ckpt/nuScenes/distill/checkpoint_300.tar' --sp_path './data/nuScenes/initial_superpoints/train/'
If preparing the online testing predictions, please download testing data from here, Uncompress them and put the data structure as:
v1.0-test_meta
βββ v1.0-test
βββ samples
βββ maps
βββ LICENSE
Making preprocessing for testing data, and then running testing for online submission:
# pip install nuscenes-devkit
python nuScenes_test_extraction.py --input_dir './v1.0-test_meta' --output_dir './data/nuScenes/nuscenes_3d/test'
# mode_ckpt, classifier_ckpt should be indicated. e.g. './ckpt_seg/nuScenes/model_50_checkpoint.pth'
CUDA_VISIBLE_DEVICES=0 python nuScenes_test_preds.py --test_input_pat './nuScenes_test_data' --val_input_path './data/nuScenes/nuScenes_3d/val' --out_path './nuScenes_online_testing', --mode_ckpt './ckpt_seg/nuScenes/model_50_checkpoint.pth' --classifier_ckpt './ckpt_seg/nuScenes/cls_50_checkpoint.pth'
The well-trained checkpoints for three datasets are in Google Drive.
We also provide the scripts for visualization, each point of 3D scenes will be assigned a color (RGB) by the prediction model with a colormap, and
the colored point clouds are stored as .ply files. Please refer to the ./vis_predictions
folder. These .ply files are then converted as .obj
files, refer to ./to_obj
for rendering by KeyShot.