🧬 WSI Data Pipeline Dev

A modular and reproducible pipeline for processing Whole Slide Images (WSIs), integrating HistoGPT with DVC for efficient data versioning and experiment tracking.

📌 Project Overview

This repository provides a structured pipeline for WSI data preparation and embedding extraction using HistoGPT. It emphasizes:

📦 Reproducibility via DVC
⚙️ Custom configuration overrides for HistoGPT
🧪 Notebook-based experimentation
🗃️ Clear separation of data, code, and configs

🗂️ Project Structure

wsi_data_pipeline_dev/
├── assets/                   # Visualization outputs
├── config/                   # YAML-based configuration files
├── data/                     # Raw/processed data (DVC-tracked)
├── histogpt_install_setup/   # Customized HistoGPT overrides
├── notebooks/                # Jupyter notebooks for experiments
├── src/                      # Core scripts (e.g., preprocessing)
├── .dvc/                     # DVC metadata
├── .gitignore
├── dvc.yaml                  # DVC pipeline definition
├── dvc.lock                  # DVC version lock file
├── requirements.txt          # Python dependencies
└── README.md

🚀 Getting Started

1. Clone the repository

git clone https://github.com/xinghao302001/wsi_data_pipeline_dev.git
cd wsi_data_pipeline_dev

2. Install dependencies

pip install -r requirements.txt

3. Setup HistoGPT

git clone https://github.com/marrlab/HistoGPT.git
cp -r histogpt_install_setup/* HistoGPT/  # Overwrite configs

4. (Optional) Initialise DVC

dvc init
dvc pull  # If using remote storage

📒 Notebooks

Explore notebooks/ for examples on:

Extracting embeddings using HistoGPT
Visualising patch results
Testing pipeline steps with sample slides

🛠️ Tools & Technologies

🤝 Contributing

Issues and PRs are welcome. If you have improvements or questions, feel free to contribute!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧬 WSI Data Pipeline Dev

📌 Project Overview

🗂️ Project Structure

🚀 Getting Started

1. Clone the repository

2. Install dependencies

3. Setup HistoGPT

4. (Optional) Initialise DVC

📒 Notebooks

🛠️ Tools & Technologies

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.dvc		.dvc
assets		assets
config		config
data		data
histogpt_install_setup		histogpt_install_setup
notebooks		notebooks
src		src
.dvcignore		.dvcignore
.gitignore		.gitignore
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
requirements.txt		requirements.txt

xinghao302001/wsi_data_pipeline_dev

Folders and files

Latest commit

History

Repository files navigation

🧬 WSI Data Pipeline Dev

📌 Project Overview

🗂️ Project Structure

🚀 Getting Started

1. Clone the repository

2. Install dependencies

3. Setup HistoGPT

4. (Optional) Initialise DVC

📒 Notebooks

🛠️ Tools & Technologies

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages