Skip to content

Commit feda924

Browse files
committed
docs(README.md): update YAML code comments for clarity
Add YAML config and parameter examples for enabling column/row-count checks and trial configuration.
1 parent a82286c commit feda924

File tree

1 file changed

+14
-6
lines changed

1 file changed

+14
-6
lines changed

README.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -83,15 +83,21 @@ The `universal_step.py` script automatically runs tests whenever the relevant te
8383
- **Failure Handling**: Any mismatch or missing column triggers a Pandera `SchemaError`, halting the pipeline.
8484

8585
**Example**
86-
To enable the column and row-count checks, set:
86+
To enable the column and row-count checks, set in the respective YAML config:
8787

8888
```yaml
89-
transformations:
89+
# transformations/base.yaml
9090
check_required_columns: true
9191
check_row_count: true
92-
tests:
92+
```
93+
```yaml
94+
# tests_params/base.yaml
95+
96+
# test_params/v0...v13.yaml set version specific values where needed.
97+
9398
check_required_columns:
9499
required_columns: ["year", "facility_id", "apr_drg_code"]
100+
95101
check_row_count:
96102
row_count: 1081672
97103
```
@@ -115,10 +121,12 @@ Each experiment uses a dynamically generated `run_id` (from `${run_id_outputs}`)
115121
**Example**
116122

117123
```yaml
124+
# configs/transformations/rf_optuna_trial.yaml
125+
118126
model_tags:
119-
run_id_tag: ${ml_experiments.mlflow_tags.run_id_tag}
120-
data_version_tag: ${ml_experiments.mlflow_tags.data_version_tag}
121-
model_tag: RandomForestRegressor
127+
run_id_tag: ${ml_experiments.mlflow_tags.run_id_tag}
128+
data_version_tag: ${ml_experiments.mlflow_tags.data_version_tag}
129+
model_tag: RandomForestRegressor # gets updated automatically for each trial
122130
```
123131

124132
This ensures each MLflow run is fully traceable to both a particular pipeline execution (via run_id_tag) and the underlying dataset (via data_version_tag). When your experiments scale up, searching and grouping by these tags provides a clear lineage of how each result was produced.

0 commit comments

Comments
 (0)