Skip to content

🧬 A modular pipeline for Whole Slide Image (WSI) processing, integrating HistoGPT with DVC for efficient data versioning and reproducible workflows.

Notifications You must be signed in to change notification settings

xinghao302001/wsi_data_pipeline_dev

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 WSI Data Pipeline Dev

A modular and reproducible pipeline for processing Whole Slide Images (WSIs), integrating HistoGPT with DVC for efficient data versioning and experiment tracking.


πŸ“Œ Project Overview

This repository provides a structured pipeline for WSI data preparation and embedding extraction using HistoGPT. It emphasizes:

  • πŸ“¦ Reproducibility via DVC
  • βš™οΈ Custom configuration overrides for HistoGPT
  • πŸ§ͺ Notebook-based experimentation
  • πŸ—ƒοΈ Clear separation of data, code, and configs

πŸ—‚οΈ Project Structure

wsi_data_pipeline_dev/
β”œβ”€β”€ assets/                   # Visualization outputs
β”œβ”€β”€ config/                   # YAML-based configuration files
β”œβ”€β”€ data/                     # Raw/processed data (DVC-tracked)
β”œβ”€β”€ histogpt_install_setup/   # Customized HistoGPT overrides
β”œβ”€β”€ notebooks/                # Jupyter notebooks for experiments
β”œβ”€β”€ src/                      # Core scripts (e.g., preprocessing)
β”œβ”€β”€ .dvc/                     # DVC metadata
β”œβ”€β”€ .gitignore
β”œβ”€β”€ dvc.yaml                  # DVC pipeline definition
β”œβ”€β”€ dvc.lock                  # DVC version lock file
β”œβ”€β”€ requirements.txt          # Python dependencies
└── README.md

πŸš€ Getting Started

1. Clone the repository

git clone https://github.com/xinghao302001/wsi_data_pipeline_dev.git
cd wsi_data_pipeline_dev

2. Install dependencies

pip install -r requirements.txt

3. Setup HistoGPT

git clone https://github.com/marrlab/HistoGPT.git
cp -r histogpt_install_setup/* HistoGPT/  # Overwrite configs

4. (Optional) Initialise DVC

dvc init
dvc pull  # If using remote storage

πŸ“’ Notebooks

Explore notebooks/ for examples on:

  • Extracting embeddings using HistoGPT
  • Visualising patch results
  • Testing pipeline steps with sample slides

πŸ› οΈ Tools & Technologies


🀝 Contributing

Issues and PRs are welcome. If you have improvements or questions, feel free to contribute!

About

🧬 A modular pipeline for Whole Slide Image (WSI) processing, integrating HistoGPT with DVC for efficient data versioning and reproducible workflows.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published