Airflow Dataset-Aware Pipeline with Failsafe Monitoring

This project demonstrates a robust Apache Airflow pipeline architecture that combines dataset-aware scheduling with failsafe monitoring to ensure reliable data processing workflows.

Purpose

The system implements a data pipeline where:

3 Producer DAGs generate data and update their respective datasets
1 Coordinator DAG triggers the main processing pipeline when all datasets are ready
1 Health Check DAG monitors the main pipeline and provides failsafe triggering via Airflow API

This architecture ensures that your main processing pipeline runs reliably, even if the dataset-aware triggering mechanism fails.

Components

Producer DAGs (producer_for_dataset_a/b/c_dag.py)
- Run on hourly schedule (0 * * * *)
- Generate data for their respective sources
- Update corresponding datasets upon successful completion
Dataset Definitions (include/datasets.py)
- Centralized dataset URIs for consistency
- Used by both producers and consumers
Coordinator DAG (pipeline_trigger_on_datasets_ready_dag.py)
- Triggered when ALL three datasets are updated
- Launches the main processing pipeline
- Uses dataset-aware scheduling
Main Processing DAG (main_processing_dag.py)
- Contains core business logic
- Triggered by coordinator or failsafe mechanism
- Manual schedule (triggered externally)
Health Check DAG (health_check_and_trigger_main_pipeline_dag.py)
- Monitors main pipeline success status via Airflow API
- Configurable time thresholds (minutes/hours/days)
- Triggers main pipeline if no recent successful runs detected
- Provides failsafe mechanism

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs/images		docs/images
include		include
.airflowignore		.airflowignore
README.md		README.md
health_check_and_trigger_main_pipeline_dag.py		health_check_and_trigger_main_pipeline_dag.py
main_processing_dag.py		main_processing_dag.py
pipeline_trigger_on_datasets_ready_dag.py		pipeline_trigger_on_datasets_ready_dag.py
producer_for_dataset_a_dag.py		producer_for_dataset_a_dag.py
producer_for_dataset_b_dag.py		producer_for_dataset_b_dag.py
producer_for_dataset_c_dag.py		producer_for_dataset_c_dag.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Airflow Dataset-Aware Pipeline with Failsafe Monitoring

Purpose

Components

Screenshots

DAGs Overview

Health Check Details

About

Uh oh!

Languages

tarikbacak/airflow-dataset-trigger-fallback

Folders and files

Latest commit

History

Repository files navigation

Airflow Dataset-Aware Pipeline with Failsafe Monitoring

Purpose

Components

Screenshots

DAGs Overview

Health Check Details

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages