Welcome to your modern data stack template! This project demonstrates how to build a scalable data warehouse using:
- Airbyte for data ingestion
- dbt for transformation
- BigQuery for storage and computing
- VSCode installed (Download here)
- Turntable.so extension installed in VSCode (Install here)
- Python installed (3.8 or higher)
- Google Cloud account with BigQuery enabled
- Airbyte instance set up with sources configured
π models/
βββ π staging/ # π οΈ Raw data standardization
β βββ π stg_stripe/ # π³ Payment processing
β β βββ π base/ # π Raw JSON parsing
β β β βββ π base_stripe__customers.sql
β β βββ π stg_stripe__customers.sql
β β βββ π _stripe_sources.yml
β β
β βββ π stg_hubspot/ # π Marketing automation
β β βββ π base/
β β βββ π _hubspot_sources.yml
β β
β βββ π stg_shopify/ # π E-commerce platform
β βββ π base/
β βββ π _shopify_sources.yml
β
βββ π intermediate/ # π Business logic layer
β βββ π finance/
β βββ π marketing/
β βββ π sales/
β
βββ π marts/ # π Business-specific models
βββ π core/ # π Core business entities
βββ π finance/ # π° Finance-specific models
βββ π marketing/ # π£ Marketing-specific models
βββ π sales/ # π Sales-specific models
-
Clone the Repository:
git clone https://github.com/yourusername/dbt-bigquery-quickstart-project.git cd dbt-bigquery-quickstart-project
-
Set Up Environment Variables:
cp .env.example .env # Edit .env with your configurations
-
Configure dbt Profile:
cp profiles.yml.example ~/.dbt/profiles.yml # Edit profiles.yml with your BigQuery details
-
Install Dependencies:
pip install dbt-core dbt-bigquery dbt deps
This template is designed to work with Airbyte's BigQuery destination. Key points:
-
Raw Data Structure:
- Airbyte creates tables with prefix
_airbyte_raw_
- Data is stored in JSON format in
_airbyte_data
column - Each record has
_airbyte_emitted_at
timestamp
- Airbyte creates tables with prefix
-
Base Models:
-- Example: models/staging/stg_stripe/base/base_stripe__customers.sql select JSON_EXTRACT_SCALAR(_airbyte_data, '$.id') as customer_id, JSON_EXTRACT_SCALAR(_airbyte_data, '$.email') as email, _airbyte_emitted_at as ingested_at from {{ source('stripe', '_airbyte_raw_customers') }}
-
Source Configuration:
# Example: models/staging/stg_stripe/_stripe_sources.yml version: 2 sources: - name: stripe database: "{{ env_var('DBT_PROJECT_ID') }}" schema: "{{ env_var('AIRBYTE_SCHEMA', 'raw') }}" loader: airbyte loaded_at_field: _airbyte_emitted_at tables: - name: _airbyte_raw_customers
-
Set Up Airbyte Source:
- Configure source in Airbyte UI
- Set destination to BigQuery
- Note the destination schema
-
Update Environment Variables:
DBT_PROJECT_ID=your-project-id AIRBYTE_SCHEMA=raw DBT_STAGING_SCHEMA=staging
-
Create Base Models:
- Parse JSON data from Airbyte
- Use
JSON_EXTRACT_SCALAR
for BigQuery - Add basic data type conversions
-
Create Staging Models:
- Add business logic and cleaning
- Implement standard naming
- Add data quality tests
-
Build Marts:
- Combine data from multiple sources
- Create business-specific views
- Optimize for analysis
-
Source Freshness:
sources: - name: stripe freshness: warn_after: {count: 12, period: hour} error_after: {count: 24, period: hour}
-
Data Tests:
models: - name: stg_stripe__customers columns: - name: customer_id tests: - unique - not_null
-
Airbyte Sync Status:
- Check Airbyte UI for sync status
- Monitor
_airbyte_emitted_at
for freshness
-
dbt Run Status:
- Use
dbt source freshness
- Check model test results
- Use
-
Airbyte Resources:
-
dbt Resources:
-
Community:
- Check error messages in Turntable.so
- Review Airbyte logs for sync issues
- Visit dbt Discourse
- Create an issue in this repository
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
MIT License - see LICENSE file
For detailed instructions on setting up your BigQuery connection, including OAuth authentication and testing, see our BigQuery Setup Guide.
This repository is maintained by Matt Strautmann, an experienced is working closely with Founder/CEOs to use your Data to improve your bottom line. Period. Let me help you trust your data. know your customer. improve your bottom line.
Starring this repository helps me understand which tools, templates, and projects bring the most value to the community. Your support motivates me to keep producing high-quality content and maintain these resources for everyone!
If this repository has helped you:
- Give it a β to show your appreciation!
- Share it with others who might find it useful.
Iβd love to hear how youβre using this repository or discuss how I can help with your next project. Letβs connect:
- LinkedIn: Matt Strautmann
- GitHub: Matt Strautmann