Skip to content

Commit e23bac7

Browse files
negin513weiji14
andauthored
Overall cleanup (#34)
* clean up * pytorch era5 repr method * revert some fixes and clean ups * better name * example of ERA5 data loader. * adding early stopping option * update readme * update readme * bug fix * adding num_workers * improved DALI * black * improved init_process_group * improved distributed and readability * corrections * update train_unet.py * some fixes to era5_dataloader.py * some updates * some modularity * clean ups * README updates * benchmark docstring * clean up * improvements in file and folder names * fixing names * cleaner logging * bug fixes * refactoring and clean ups * update readme * measurements * updates to readme * updates to dali data loader * update readme * data path * Update README.md Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com> * Update README.md Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com> * Update README.md Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com> * Update README.md Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com> --------- Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
1 parent 3c9eddb commit e23bac7

File tree

10 files changed

+730
-428
lines changed

10 files changed

+730
-428
lines changed

README.md

Lines changed: 45 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,73 @@
1-
# xarray-on-gpus
1+
# Accelerating AI/ML Workflows in Earth Sciences with GPU-Native Xarray and Zarr
22

3-
Repository for the Xarray on GPUs team during the
3+
Read about this project in the [Xarray blog](https://xarray.dev/blog/gpu-pipeline).
4+
5+
🏔️⚡ A collaborative benchmarking and optimization effort from [NSF-NCAR](https://www.ncar.ucar.edu/), [Development Seed](https://developmentseed.org/), and [NVIDIA](https://www.nvidia.com/) to accelerate data-intensive geoscience AI/ML workflows using GPU-native technologies like Zarr v3, CuPy, KvikIO, and NVIDIA DALI.
6+
7+
## 📌 Overview
8+
9+
This repository contains code, benchmarks, and examples from Xarray on GPUs hackathon project during the
410
[NREL/NCAR/NOAA Open Hackathon](https://www.openhackathons.org/s/siteevent/a0CUP00000rwYYZ2A2/se000355)
5-
in Golden, Colorado from 18-27 February 2025.
11+
in Golden, Colorado from 18-27 February 2025. The goal of this project is to provide a proof-of-concept example of optimizing the performance of geospatial machine learning workflows on GPUs by using [Zarr-python v3](https://zarr.dev/) and [NVIDIA DALI](https://developer.nvidia.com/dali).
12+
13+
📖 [Read the full blog post](https://xarray.dev/blog/gpu-pipeline)
614

7-
# Getting started
15+
In this project, we demonstrate how to:
816

9-
## Installation
17+
- Optimize chunking strategies for Zarr datasets
18+
- Read ERA5 Zarr v3 data directly into GPU memory using CuPy and KvikIO
19+
- Apply GPU-based decompression using NVIDIA's nvCOMP
20+
- Build end-to-end GPU-native DALI pipelines
21+
- Improve training throughput for U-Net-based ML models
1022

11-
### Basic
1223

13-
To help out with development, start by cloning this [repo-url](/../../)
24+
## 📂 Repository Structure
1425

15-
git clone <repo-url>
26+
In this repository, you will find the following:
1627

17-
Then I recommend [using mamba](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html)
18-
to install the dependencies. A virtual environment will also be created with Python and
19-
[JupyterLab](https://github.com/jupyterlab/jupyterlab) installed.
28+
- `benchmarks/`: Scripts to evaluate read and write performance for Zarr v3 datasets on both CPU and GPU.
29+
- `zarr_dali_example/`: Contains a minimal example of using DALI to read Zarr data and train a model.
30+
- `zarr_ML_optimization`: Contains an example benchmark for training a U-Net model using DALI with Zarr data format.
31+
- `rechunk` : Contains a notebook that demonstrates how to optimize chunking strategies for Zarr datasets.
2032

21-
cd ncar-hackathon-xarray-on-gpus
22-
mamba env create --file environment.yml
33+
See [zarr_ML_optimization/README.md](zarr_ML_optimization/README.md) for more details on running the U-Net training example.
2334

24-
Activate the virtual environment first.
2535

26-
mamba activate gpuhackathon
36+
# Creating the Environment
2737

28-
Finally, double-check that the libraries have been installed.
38+
## Basic
2939

30-
mamba list
40+
Start by cloning the repo & setting up the `conda` environment:
41+
```bash
42+
git clone https://github.com/pangeo-data/ncar-hackathon-xarray-on-gpus.git
43+
cd ncar-hackathon-xarray-on-gpus
44+
conda env create --file environment.yml
45+
conda activate gpuhackathon
46+
```
3147

32-
### Advanced
48+
### Advanced using `conda-lock`
3349

3450
This is for those who want full reproducibility of the virtual environment.
3551
Create a virtual environment with just Python and conda-lock installed first.
3652

37-
mamba create --name gpuhackathon python=3.11 conda-lock=2.5.7
38-
mamba activate gpuhackathon
53+
```
54+
conda create --name gpuhackathon python=3.11 conda-lock=2.5.7
55+
conda activate gpuhackathon
56+
```
3957

4058
Generate a unified [`conda-lock.yml`](https://github.com/conda/conda-lock) file
4159
based on the dependency specification in `environment.yml`. Use only when
4260
creating a new `conda-lock.yml` file or refreshing an existing one.
43-
44-
conda-lock lock --mamba --file environment.yml --platform linux-64 --with-cuda=12.8
61+
```
62+
conda-lock lock --mamba --file environment.yml --platform linux-64 --with-cuda=12.8
63+
```
4564

4665
Installing/Updating a virtual environment from a lockile. Use this to sync your
4766
dependencies to the exact versions in the `conda-lock.yml` file.
4867

49-
conda-lock install --mamba --name gpuhackathon conda-lock.yml
50-
68+
```
69+
conda-lock install --mamba --name gpuhackathon conda-lock.yml
70+
```
5171
See also https://conda.github.io/conda-lock/output/#unified-lockfile for more
5272
usage details.
73+

benchmark/era5_zarr_benchmark.py renamed to benchmarks/era5_zarr_benchmark.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,19 @@
1+
"""
2+
This script is a GPU/CPU I/O benchmark for reading a Zarr dataset.
3+
It compares read performance between CPU-based and GPU-native approaches using Zarr v3.
4+
5+
The script uses:
6+
- `zarr.config.enable_gpu()` for GPU-backed reads (via CuPy),
7+
- `GDSStore` from `kvikio_zarr_v3` for GPU Direct Storage (GDS) support,
8+
- `nvtx` annotations for profiling iterations with NVIDIA Nsight tools.
9+
10+
The dataset is assumed to be a 4D array stored under the key 'combined', typically in (time, channel, height, width) format.
11+
12+
The benchmark:
13+
- Reads pairs of time steps in a loop,
14+
- Measures elapsed time,
15+
- Computes effective I/O bandwidth in GB/s.
16+
"""
117
import asyncio
218
from contextlib import nullcontext
319
import math
File renamed without changes.

benchmark/zstd_benchmark.py renamed to benchmarks/zstd_benchmark.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,16 @@
1+
"""
2+
Zarr v3 I/O Benchmark: CPU vs GPU Read Performance
3+
4+
This script benchmarks the I/O performance of writing and reading a synthetic
5+
Zarr v3 dataset using CPU and GPU. It demonstrates how to:
6+
7+
- Create a 4D array in Zarr v3 using a specified compression codec (CPU or GPU).
8+
- Read the dataset using either CPU-based or GPU-accelerated access.
9+
- Annotate profiling regions using NVTX for use with NVIDIA Nsight tools.
10+
- Compute and report effective I/O bandwidth in GB/s.
11+
12+
"""
13+
114
import asyncio
215
from contextlib import nullcontext
316
import math
@@ -14,7 +27,6 @@
1427
from zarr.codecs import NvcompZstdCodec, ZstdCodec
1528
from zarr.storage import LocalStore
1629

17-
1830
def get_store(path: Path) -> LocalStore:
1931
async def _get_store(path: Path) -> LocalStore:
2032
return await LocalStore.open(path)

zarr_ML_optimization/README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
1-
# Zarr ML end-to-end example
1+
# Zarr ML end-to-end example
2+
3+
This folder contains an end-to-end example of training a UNet model using the DALI library with Zarr data format.
4+
The code for this example is parallelized using PyTorch DDP (Distributed Data Parallel) and can be run on multiple GPUs or nodes.
5+
26

37
## How to run!
48
To run on 1 GPU, use the following command:
59

610
```bash
711
module load conda
8-
conda activate /glade/work/weiji/conda-envs/gpuhackathon
12+
conda activate gpuhackathon
913
```
1014

1115
To run on 1 GPU, use the following command:

0 commit comments

Comments
 (0)