pangeo-data
diff --git a/‎README.md
Lines changed: 45 additions & 24 deletions b/‎README.md
Lines changed: 45 additions & 24 deletions
diff --git a/‎benchmark/era5_zarr_benchmark.py renamed to ‎benchmarks/era5_zarr_benchmark.py
Lines changed: 16 additions & 0 deletions b/‎benchmark/era5_zarr_benchmark.py renamed to ‎benchmarks/era5_zarr_benchmark.py
Lines changed: 16 additions & 0 deletions
diff --git a/‎benchmark/pyproject.toml renamed to ‎benchmarks/pyproject.toml b/‎benchmark/pyproject.toml renamed to ‎benchmarks/pyproject.toml
diff --git a/‎benchmark/zstd_benchmark.py renamed to ‎benchmarks/zstd_benchmark.py
Lines changed: 13 additions & 1 deletion b/‎benchmark/zstd_benchmark.py renamed to ‎benchmarks/zstd_benchmark.py
Lines changed: 13 additions & 1 deletion
diff --git a/‎zarr_ML_optimization/README.md
Lines changed: 6 additions & 2 deletions b/‎zarr_ML_optimization/README.md
Lines changed: 6 additions & 2 deletions
@@ -1,52 +1,73 @@
-# xarray-on-gpus
+# Accelerating AI/ML Workflows in Earth Sciences with GPU-Native Xarray and Zarr
 
-Repository for the Xarray on GPUs team during the
+Read about this project in the [Xarray blog](https://xarray.dev/blog/gpu-pipeline).
+
+🏔️⚡ A collaborative benchmarking and optimization effort from [NSF-NCAR](https://www.ncar.ucar.edu/), [Development Seed](https://developmentseed.org/), and [NVIDIA](https://www.nvidia.com/) to accelerate data-intensive geoscience AI/ML workflows using GPU-native technologies like Zarr v3, CuPy, KvikIO, and NVIDIA DALI.
+
+## 📌 Overview
+
+This repository contains code, benchmarks, and examples from Xarray on GPUs hackathon project during the
 [NREL/NCAR/NOAA Open Hackathon](https://www.openhackathons.org/s/siteevent/a0CUP00000rwYYZ2A2/se000355)
-in Golden, Colorado from 18-27 February 2025.
+in Golden, Colorado from 18-27 February 2025. The goal of this project is to provide a proof-of-concept example of optimizing the performance of geospatial machine learning workflows on GPUs by using [Zarr-python v3](https://zarr.dev/)  and [NVIDIA DALI](https://developer.nvidia.com/dali). 
+
+📖 [Read the full blog post](https://xarray.dev/blog/gpu-pipeline)
 
-# Getting started
+In this project, we demonstrate how to:
 
-## Installation
+- Optimize chunking strategies for Zarr datasets
+- Read ERA5 Zarr v3 data directly into GPU memory using CuPy and KvikIO
+- Apply GPU-based decompression using NVIDIA's nvCOMP
+- Build end-to-end GPU-native DALI pipelines
+- Improve training throughput for U-Net-based ML models
 
-### Basic
 
-To help out with development, start by cloning this [repo-url](/../../)
+## 📂 Repository Structure
 
-    git clone <repo-url>
+In this repository, you will find the following:
 
-Then I recommend [using mamba](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html)
-to install the dependencies. A virtual environment will also be created with Python and
-[JupyterLab](https://github.com/jupyterlab/jupyterlab) installed.
+- `benchmarks/`: Scripts to evaluate read and write performance for Zarr v3 datasets on both CPU and GPU.
+- `zarr_dali_example/`: Contains a minimal example of using DALI to read Zarr data and train a model. 
+- `zarr_ML_optimization`: Contains an example benchmark for training a U-Net model using DALI with Zarr data format.
+- `rechunk` : Contains a notebook that demonstrates how to optimize chunking strategies for Zarr datasets.
 
-    cd ncar-hackathon-xarray-on-gpus
-    mamba env create --file environment.yml
+See [zarr_ML_optimization/README.md](zarr_ML_optimization/README.md) for more details on running the U-Net training example.
 
-Activate the virtual environment first.
 
-    mamba activate gpuhackathon
+# Creating the Environment
 
-Finally, double-check that the libraries have been installed.
+## Basic
 
-    mamba list
+Start by cloning the repo & setting up the `conda` environment:
+```bash
+git clone https://github.com/pangeo-data/ncar-hackathon-xarray-on-gpus.git
+cd ncar-hackathon-xarray-on-gpus
+conda env create --file environment.yml
+conda activate gpuhackathon
+```
 
-### Advanced
+### Advanced using `conda-lock`
 
 This is for those who want full reproducibility of the virtual environment.
 Create a virtual environment with just Python and conda-lock installed first.
 
-    mamba create --name gpuhackathon python=3.11 conda-lock=2.5.7
-    mamba activate gpuhackathon
+```
+conda create --name gpuhackathon python=3.11 conda-lock=2.5.7
+conda activate gpuhackathon
+```
 
 Generate a unified [`conda-lock.yml`](https://github.com/conda/conda-lock) file
 based on the dependency specification in `environment.yml`. Use only when
 creating a new `conda-lock.yml` file or refreshing an existing one.
-
-    conda-lock lock --mamba --file environment.yml --platform linux-64 --with-cuda=12.8
+```
+conda-lock lock --mamba --file environment.yml --platform linux-64 --with-cuda=12.8
+```
 
 Installing/Updating a virtual environment from a lockile. Use this to sync your
 dependencies to the exact versions in the `conda-lock.yml` file.
 
-    conda-lock install --mamba --name gpuhackathon conda-lock.yml
-
+```
+conda-lock install --mamba --name gpuhackathon conda-lock.yml
+```
 See also https://conda.github.io/conda-lock/output/#unified-lockfile for more
 usage details.
+
@@ -1,3 +1,19 @@
+"""
+This script is a GPU/CPU I/O benchmark for reading a Zarr dataset.
+It compares read performance between CPU-based and GPU-native approaches using Zarr v3.
+
+The script uses:
+- `zarr.config.enable_gpu()` for GPU-backed reads (via CuPy),
+- `GDSStore` from `kvikio_zarr_v3` for GPU Direct Storage (GDS) support,
+- `nvtx` annotations for profiling iterations with NVIDIA Nsight tools.
+
+The dataset is assumed to be a 4D array stored under the key 'combined', typically in (time, channel, height, width) format.
+
+The benchmark:
+- Reads pairs of time steps in a loop,
+- Measures elapsed time,
+- Computes effective I/O bandwidth in GB/s.
+"""
 import asyncio
 from contextlib import nullcontext
 import math
 
@@ -1,3 +1,16 @@
+"""
+Zarr v3 I/O Benchmark: CPU vs GPU Read Performance
+
+This script benchmarks the I/O performance of writing and reading a synthetic
+Zarr v3 dataset using CPU and GPU. It demonstrates how to:
+
+- Create a 4D array in Zarr v3 using a specified compression codec (CPU or GPU).
+- Read the dataset using either CPU-based or GPU-accelerated access.
+- Annotate profiling regions using NVTX for use with NVIDIA Nsight tools.
+- Compute and report effective I/O bandwidth in GB/s.
+
+"""
+
 import asyncio
 from contextlib import nullcontext
 import math
@@ -14,7 +27,6 @@
 from zarr.codecs import NvcompZstdCodec, ZstdCodec
 from zarr.storage import LocalStore
 
-
 def get_store(path: Path) -> LocalStore:
     async def _get_store(path: Path) -> LocalStore:
         return await LocalStore.open(path)
 
@@ -1,11 +1,15 @@
-# Zarr ML end-to-end example
+# Zarr ML end-to-end example 
+
+This folder contains an end-to-end example of training a UNet model using the DALI library with Zarr data format.
+The code for this example is parallelized using PyTorch DDP (Distributed Data Parallel) and can be run on multiple GPUs or nodes.
+
 
 ## How to run!
 To run on 1 GPU, use the following command:
 
 ```bash
 module load conda
-conda activate /glade/work/weiji/conda-envs/gpuhackathon
+conda activate gpuhackathon
 ```
 
 To run on 1 GPU, use the following command: