Skip to content

Welcome to the dbt-BigQuery Quickstart Project! πŸŽ‰ This repository is designed as a hands-on guide to help you build a modern data stack leveraging powerful tools like Airbyte for ingestion, dbt for transformation, and BigQuery for storage and analytics.

Notifications You must be signed in to change notification settings

matt-strautmann/dbt-bigquery-ecommerce-quickstart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Getting Started with dbt and Airbyte

Welcome to your modern data stack template! This project demonstrates how to build a scalable data warehouse using:

  • Airbyte for data ingestion
  • dbt for transformation
  • BigQuery for storage and computing

Prerequisites

  1. VSCode installed (Download here)
  2. Turntable.so extension installed in VSCode (Install here)
  3. Python installed (3.8 or higher)
  4. Google Cloud account with BigQuery enabled
  5. Airbyte instance set up with sources configured

πŸ—οΈ Project Structure

πŸ“‚ models/
β”œβ”€β”€ πŸ“ staging/              # πŸ› οΈ Raw data standardization
β”‚   β”œβ”€β”€ πŸ“ stg_stripe/       # πŸ’³ Payment processing
β”‚   β”‚   β”œβ”€β”€ πŸ“ base/         # πŸ“œ Raw JSON parsing
β”‚   β”‚   β”‚   └── πŸ“„ base_stripe__customers.sql
β”‚   β”‚   β”œβ”€β”€ πŸ“„ stg_stripe__customers.sql
β”‚   β”‚   └── πŸ“„ _stripe_sources.yml
β”‚   β”‚
β”‚   β”œβ”€β”€ πŸ“ stg_hubspot/      # πŸ“ˆ Marketing automation
β”‚   β”‚   β”œβ”€β”€ πŸ“ base/
β”‚   β”‚   └── πŸ“„ _hubspot_sources.yml
β”‚   β”‚
β”‚   └── πŸ“ stg_shopify/      # πŸ›’ E-commerce platform
β”‚       β”œβ”€β”€ πŸ“ base/
β”‚       └── πŸ“„ _shopify_sources.yml
β”‚
β”œβ”€β”€ πŸ“ intermediate/         # πŸ” Business logic layer
β”‚   β”œβ”€β”€ πŸ“ finance/
β”‚   β”œβ”€β”€ πŸ“ marketing/
β”‚   └── πŸ“ sales/
β”‚
└── πŸ“ marts/                # πŸ“Š Business-specific models
    β”œβ”€β”€ πŸ“ core/             # πŸ”‘ Core business entities
    β”œβ”€β”€ πŸ“ finance/          # πŸ’° Finance-specific models
    β”œβ”€β”€ πŸ“ marketing/        # πŸ“£ Marketing-specific models
    └── πŸ“ sales/            # πŸ›’ Sales-specific models

πŸ”„ Setup Instructions

  1. Clone the Repository:

    git clone https://github.com/yourusername/dbt-bigquery-quickstart-project.git
    cd dbt-bigquery-quickstart-project
  2. Set Up Environment Variables:

    cp .env.example .env
    # Edit .env with your configurations
  3. Configure dbt Profile:

    cp profiles.yml.example ~/.dbt/profiles.yml
    # Edit profiles.yml with your BigQuery details
  4. Install Dependencies:

    pip install dbt-core dbt-bigquery
    dbt deps

πŸ”Œ Airbyte Integration

This template is designed to work with Airbyte's BigQuery destination. Key points:

  1. Raw Data Structure:

    • Airbyte creates tables with prefix _airbyte_raw_
    • Data is stored in JSON format in _airbyte_data column
    • Each record has _airbyte_emitted_at timestamp
  2. Base Models:

    -- Example: models/staging/stg_stripe/base/base_stripe__customers.sql
    select 
        JSON_EXTRACT_SCALAR(_airbyte_data, '$.id') as customer_id,
        JSON_EXTRACT_SCALAR(_airbyte_data, '$.email') as email,
        _airbyte_emitted_at as ingested_at
    from {{ source('stripe', '_airbyte_raw_customers') }}
  3. Source Configuration:

    # Example: models/staging/stg_stripe/_stripe_sources.yml
    version: 2
    sources:
      - name: stripe
        database: "{{ env_var('DBT_PROJECT_ID') }}"
        schema: "{{ env_var('AIRBYTE_SCHEMA', 'raw') }}"
        loader: airbyte
        loaded_at_field: _airbyte_emitted_at
        tables:
          - name: _airbyte_raw_customers

🏭 Development Workflow

  1. Set Up Airbyte Source:

    • Configure source in Airbyte UI
    • Set destination to BigQuery
    • Note the destination schema
  2. Update Environment Variables:

    DBT_PROJECT_ID=your-project-id
    AIRBYTE_SCHEMA=raw
    DBT_STAGING_SCHEMA=staging
  3. Create Base Models:

    • Parse JSON data from Airbyte
    • Use JSON_EXTRACT_SCALAR for BigQuery
    • Add basic data type conversions
  4. Create Staging Models:

    • Add business logic and cleaning
    • Implement standard naming
    • Add data quality tests
  5. Build Marts:

    • Combine data from multiple sources
    • Create business-specific views
    • Optimize for analysis

πŸ“Š Data Quality

  1. Source Freshness:

    sources:
      - name: stripe
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}
  2. Data Tests:

    models:
      - name: stg_stripe__customers
        columns:
          - name: customer_id
            tests:
              - unique
              - not_null

πŸ” Monitoring

  1. Airbyte Sync Status:

    • Check Airbyte UI for sync status
    • Monitor _airbyte_emitted_at for freshness
  2. dbt Run Status:

    • Use dbt source freshness
    • Check model test results

πŸ“š Learning Resources

  1. Airbyte Resources:

  2. dbt Resources:

  3. Community:

πŸ†˜ Need Help?

  1. Check error messages in Turntable.so
  2. Review Airbyte logs for sync issues
  3. Visit dbt Discourse
  4. Create an issue in this repository

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

πŸ“ License

MIT License - see LICENSE file

BigQuery Setup Guide

For detailed instructions on setting up your BigQuery connection, including OAuth authentication and testing, see our BigQuery Setup Guide.


⭐ Credits & Connect

πŸš€ About This Repository

This repository is maintained by Matt Strautmann, an experienced is working closely with Founder/CEOs to use your Data to improve your bottom line. Period. Let me help you trust your data. know your customer. improve your bottom line.

Why Star This Repository?

Starring this repository helps me understand which tools, templates, and projects bring the most value to the community. Your support motivates me to keep producing high-quality content and maintain these resources for everyone!

🌟 Support This Project

If this repository has helped you:

  1. Give it a ⭐ to show your appreciation!
  2. Share it with others who might find it useful.

🀝 Connect with Me

I’d love to hear how you’re using this repository or discuss how I can help with your next project. Let’s connect:


About

Welcome to the dbt-BigQuery Quickstart Project! πŸŽ‰ This repository is designed as a hands-on guide to help you build a modern data stack leveraging powerful tools like Airbyte for ingestion, dbt for transformation, and BigQuery for storage and analytics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published