Skip to content
This repository was archived by the owner on Mar 17, 2025. It is now read-only.

Commit a5752f1

Browse files
authored
Merge pull request #1 from WDGPH/initial-version
Initial version
2 parents 9a556cd + 17ee75f commit a5752f1

File tree

7 files changed

+692
-0
lines changed

7 files changed

+692
-0
lines changed
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
name: Build Processing Image
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
tags:
8+
- '*'
9+
paths:
10+
- 'processing/**'
11+
pull_request:
12+
branches:
13+
- main
14+
paths:
15+
- 'processing/**'
16+
17+
env:
18+
REGISTRY: ghcr.io
19+
20+
jobs:
21+
build:
22+
runs-on: ubuntu-latest
23+
permissions:
24+
contents: read
25+
packages: write
26+
27+
steps:
28+
- name: Checkout code
29+
uses: actions/checkout@v4
30+
31+
- name: Login to GitHub Container Registry
32+
uses: docker/login-action@v3
33+
with:
34+
registry: ${{ env.REGISTRY }}
35+
username: ${{ github.actor }}
36+
password: ${{ secrets.GITHUB_TOKEN }}
37+
38+
- name: Extract metadata (tags, labels) for processing container
39+
id: meta-processing
40+
uses: docker/metadata-action@v5
41+
with:
42+
images: ${{ env.REGISTRY }}/${{ github.repository }}/processing
43+
flavor: latest=false
44+
tags: |
45+
type=ref,event=branch
46+
type=ref,event=pr
47+
type=sha,prefix=main-
48+
type=semver,pattern={{version}}
49+
type=semver,pattern={{major}}.{{minor}}
50+
51+
- name: Build and push processing container image
52+
uses: docker/build-push-action@v5
53+
with:
54+
context: ./processing
55+
file: ./processing/dockerfile
56+
push: true
57+
tags: ${{ steps.meta-processing.outputs.tags }}
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
name: Build Retrieval Image
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
tags:
8+
- '*'
9+
paths:
10+
- 'retrieval/**'
11+
pull_request:
12+
branches:
13+
- main
14+
paths:
15+
- 'retrieval/**'
16+
17+
env:
18+
REGISTRY: ghcr.io
19+
20+
jobs:
21+
build:
22+
runs-on: ubuntu-latest
23+
permissions:
24+
contents: read
25+
packages: write
26+
27+
steps:
28+
- name: Checkout code
29+
uses: actions/checkout@v4
30+
31+
- name: Login to GitHub Container Registry
32+
uses: docker/login-action@v3
33+
with:
34+
registry: ${{ env.REGISTRY }}
35+
username: ${{ github.actor }}
36+
password: ${{ secrets.GITHUB_TOKEN }}
37+
38+
- name: Extract metadata (tags, labels) for retrieval container
39+
id: meta-retrieval
40+
uses: docker/metadata-action@v5
41+
with:
42+
images: ${{ env.REGISTRY }}/${{ github.repository }}/retrieval
43+
flavor: latest=false
44+
tags: |
45+
type=ref,event=branch
46+
type=ref,event=pr
47+
type=sha,prefix=main-
48+
type=semver,pattern={{version}}
49+
type=semver,pattern={{major}}.{{minor}}
50+
51+
- name: Build and push retrieval container image
52+
uses: docker/build-push-action@v5
53+
with:
54+
context: ./retrieval
55+
file: ./retrieval/dockerfile
56+
push: true
57+
tags: ${{ steps.meta-retrieval.outputs.tags }}

README.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# WSI Data Pipeline
2+
3+
## Introduction
4+
This containers is part of a data pipeline to automatically retrieve data from the Ontario Wastewater Surveillance Initiative (WSI) Data and Visualization Hub. Containerization of this data pipeline components offers environment isolation and reproducibility. Below follows a description and basic usage of each container.
5+
6+
Container images are built by Github actions, and pushed to Github's container registry. You can find up-to-date built images [here](https://github.com/orgs/WDGPH/packages?repo_name=workflow-WSI).
7+
8+
## Retrieval Container
9+
This container downloads ArcGIS online items from a specified url.
10+
11+
To use, `ARCGIS_USER` and `ARCGIS_PASSWORD` environment variables must be set for the container (credentials for WSI Data and Visualization Hub). It is strongly suggested that a secure key vault is utilized for this process and that credentials are rotated frequently. Additionally, the following arguments are required:
12+
13+
**1. `url`**
14+
ArcGIS Online item url. Changes with addition/removal of features to dataset requiring occasional updates.
15+
**Example**: `https://services6.arcgis.com/ghjer345tert/arcgis/rest/services/PROD_PHU_Base_Aggregated/FeatureServer/0/query`
16+
17+
**2. `output`**
18+
The filename where the output in CSV format will be written.
19+
**Example**: `wsi.csv`
20+
21+
## Processing Container
22+
This container takes the CSV output from the retrieval container, and performs standardization and trend analysis on the data. There are disease target-specific outputs at both the sewershed and region-level. Sewershed weighting is required in order to perform region-level analyses. The container uses the following arguments:
23+
24+
**1. `input`**
25+
CSV file containing at minimum columns: sampleDate, siteName, mN1, mN2, mFluA, mFluB, and mBiomarker. Intention is to use the file that is output from the retrieval container for this.
26+
**Example**: `wsi.csv`
27+
28+
**2. `weights`**
29+
CSV file with columns: Site, and Weight. The site column corresponds to siteName values in the `input`. Weights represents factor used for combing site-specific trends into a single regional trend. Weights are decimal numbers and should sum to 1. The weights may be set to be equal, or correspond to population weighting, sampling frequency, or any other user-determined criteria
30+
**Example**: `weights.csv`
31+
32+
**3. `patch`**
33+
Optional CSV file with columns: Date, Site, and one or more of mN1, mN2, mFluA, mFluB, mBiomarker. Values in the patch file will add or overide any existing values in the primary input file. Useful, for adding historical data not present in WSI, or fixing erroneous data.
34+
**Example**: `patch.csv`
35+
36+
**4. `output_region_covid`**
37+
Optional output location for CSV file containing regional summary for SARS-CoV-2. No output will be generated if left blank.
38+
**Example**: `output_region_covid.csv`
39+
40+
**5. `output_region_flu_a`**
41+
Optional output location for CSV file containing regional summary for Influenza A. No output will be generated if left blank.
42+
**Example**: `output_region_flu_a.csv`
43+
44+
**6. `output_region_flu_b`**
45+
Optional output location for CSV file containing regional summary for Influenza B. No output will be generated if left blank.
46+
**Example**: `output_region_flu_b.csv`
47+
48+
**7. `output_covid`**
49+
Optional output location for CSV file containing site-specific SARS-CoV-2 data. No output will be generated if left blank.
50+
**Example**: `output_covid.csv`
51+
52+
**8. `output_flu_a`**
53+
Optional output location for CSV file containing site-specific Influenza A data. No output will be generated if left blank.
54+
**Example**: `output_flu_a.csv`
55+
56+
**9. `output_flu_b`**
57+
Optional output location for CSV file containing site-specific Influenza B data. No output will be generated if left blank.
58+
**Example**: `output_flu_b.csv`
59+
60+
## Pipeline Orchestration
61+
This data pipeline can be orchestrated by a variety of tools that support containerized components, but has been developed and tested with [Kubeflow Pipelines](https://www.kubeflow.org/), which is based on [Argo Workflows](https://argoproj.github.io/argo-workflows/).
62+
63+
## Contributing
64+
Dependency updates, documentation improvements, logging improvements, and additions of tests will enhance the usability and reliability of this project and are welcome contributions.

processing/dockerfile

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
FROM rocker/r-ver:4.3.3
2+
3+
WORKDIR /home/docker/
4+
5+
# Library initialization using renv
6+
RUN Rscript --vanilla -e " \
7+
options(repos = c(CRAN = 'https://cloud.r-project.org')); \
8+
install.packages('renv') \
9+
"
10+
11+
# Direct dependencies
12+
RUN Rscript --vanilla -e " \
13+
renv::install( \
14+
packages = c( \
15+
'dplyr@1.1.4', \
16+
'magrittr@2.0.3', \
17+
'optparse@1.7.5', \
18+
'readr@2.1.5', \
19+
'renv@1.0.7', \
20+
'stringr@1.5.1', \
21+
'tidyr@1.3.1' \
22+
), \
23+
prompt = F, \
24+
lock = T \
25+
) \
26+
"
27+
# Data processing code
28+
COPY process.R /home/docker/
29+
30+
# Run container
31+
ENTRYPOINT ["Rscript", "--vanilla", "process.R"]

0 commit comments

Comments
 (0)