|
1 |
| -# xarray-on-gpus |
| 1 | +# Accelerating AI/ML Workflows in Earth Sciences with GPU-Native Xarray and Zarr |
2 | 2 |
|
3 |
| -Repository for the Xarray on GPUs team during the |
| 3 | +Read about this project in the [Xarray blog](https://xarray.dev/blog/gpu-pipeline). |
| 4 | + |
| 5 | +🏔️⚡ A collaborative benchmarking and optimization effort from [NSF-NCAR](https://www.ncar.ucar.edu/), [Development Seed](https://developmentseed.org/), and [NVIDIA](https://www.nvidia.com/) to accelerate data-intensive geoscience AI/ML workflows using GPU-native technologies like Zarr v3, CuPy, KvikIO, and NVIDIA DALI. |
| 6 | + |
| 7 | +## 📌 Overview |
| 8 | + |
| 9 | +This repository contains code, benchmarks, and examples from Xarray on GPUs hackathon project during the |
4 | 10 | [NREL/NCAR/NOAA Open Hackathon](https://www.openhackathons.org/s/siteevent/a0CUP00000rwYYZ2A2/se000355)
|
5 |
| -in Golden, Colorado from 18-27 February 2025. |
| 11 | +in Golden, Colorado from 18-27 February 2025. The goal of this project is to provide a proof-of-concept example of optimizing the performance of geospatial machine learning workflows on GPUs by using [Zarr-python v3](https://zarr.dev/) and [NVIDIA DALI](https://developer.nvidia.com/dali). |
| 12 | + |
| 13 | +📖 [Read the full blog post](https://xarray.dev/blog/gpu-pipeline) |
6 | 14 |
|
7 |
| -# Getting started |
| 15 | +In this project, we demonstrate how to: |
8 | 16 |
|
9 |
| -## Installation |
| 17 | +- Optimize chunking strategies for Zarr datasets |
| 18 | +- Read ERA5 Zarr v3 data directly into GPU memory using CuPy and KvikIO |
| 19 | +- Apply GPU-based decompression using NVIDIA's nvCOMP |
| 20 | +- Build end-to-end GPU-native DALI pipelines |
| 21 | +- Improve training throughput for U-Net-based ML models |
10 | 22 |
|
11 |
| -### Basic |
12 | 23 |
|
13 |
| -To help out with development, start by cloning this [repo-url](/../../) |
| 24 | +## 📂 Repository Structure |
14 | 25 |
|
15 |
| - git clone <repo-url> |
| 26 | +In this repository, you will find the following: |
16 | 27 |
|
17 |
| -Then I recommend [using mamba](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html) |
18 |
| -to install the dependencies. A virtual environment will also be created with Python and |
19 |
| -[JupyterLab](https://github.com/jupyterlab/jupyterlab) installed. |
| 28 | +- `benchmarks/`: Scripts to evaluate read and write performance for Zarr v3 datasets on both CPU and GPU. |
| 29 | +- `zarr_dali_example/`: Contains a minimal example of using DALI to read Zarr data and train a model. |
| 30 | +- `zarr_ML_optimization`: Contains an example benchmark for training a U-Net model using DALI with Zarr data format. |
| 31 | +- `rechunk` : Contains a notebook that demonstrates how to optimize chunking strategies for Zarr datasets. |
20 | 32 |
|
21 |
| - cd ncar-hackathon-xarray-on-gpus |
22 |
| - mamba env create --file environment.yml |
| 33 | +See [zarr_ML_optimization/README.md](zarr_ML_optimization/README.md) for more details on running the U-Net training example. |
23 | 34 |
|
24 |
| -Activate the virtual environment first. |
25 | 35 |
|
26 |
| - mamba activate gpuhackathon |
| 36 | +# Creating the Environment |
27 | 37 |
|
28 |
| -Finally, double-check that the libraries have been installed. |
| 38 | +## Basic |
29 | 39 |
|
30 |
| - mamba list |
| 40 | +Start by cloning the repo & setting up the `conda` environment: |
| 41 | +```bash |
| 42 | +git clone https://github.com/pangeo-data/ncar-hackathon-xarray-on-gpus.git |
| 43 | +cd ncar-hackathon-xarray-on-gpus |
| 44 | +conda env create --file environment.yml |
| 45 | +conda activate gpuhackathon |
| 46 | +``` |
31 | 47 |
|
32 |
| -### Advanced |
| 48 | +### Advanced using `conda-lock` |
33 | 49 |
|
34 | 50 | This is for those who want full reproducibility of the virtual environment.
|
35 | 51 | Create a virtual environment with just Python and conda-lock installed first.
|
36 | 52 |
|
37 |
| - mamba create --name gpuhackathon python=3.11 conda-lock=2.5.7 |
38 |
| - mamba activate gpuhackathon |
| 53 | +``` |
| 54 | +conda create --name gpuhackathon python=3.11 conda-lock=2.5.7 |
| 55 | +conda activate gpuhackathon |
| 56 | +``` |
39 | 57 |
|
40 | 58 | Generate a unified [`conda-lock.yml`](https://github.com/conda/conda-lock) file
|
41 | 59 | based on the dependency specification in `environment.yml`. Use only when
|
42 | 60 | creating a new `conda-lock.yml` file or refreshing an existing one.
|
43 |
| - |
44 |
| - conda-lock lock --mamba --file environment.yml --platform linux-64 --with-cuda=12.8 |
| 61 | +``` |
| 62 | +conda-lock lock --mamba --file environment.yml --platform linux-64 --with-cuda=12.8 |
| 63 | +``` |
45 | 64 |
|
46 | 65 | Installing/Updating a virtual environment from a lockile. Use this to sync your
|
47 | 66 | dependencies to the exact versions in the `conda-lock.yml` file.
|
48 | 67 |
|
49 |
| - conda-lock install --mamba --name gpuhackathon conda-lock.yml |
50 |
| - |
| 68 | +``` |
| 69 | +conda-lock install --mamba --name gpuhackathon conda-lock.yml |
| 70 | +``` |
51 | 71 | See also https://conda.github.io/conda-lock/output/#unified-lockfile for more
|
52 | 72 | usage details.
|
| 73 | + |
0 commit comments