A modern web application for calculating combined AWS and Databricks pricing for your data engineering workloads.
- Real-time Pricing: Get live pricing from Vantage API for AWS and Databricks pricing APIs
- Smart Instance Selection: Dropdown with 760+ AWS instance types with search and categorization
- Multiple Instance Support: Calculate costs for multiple instance types and configurations
- Flexible Configuration: Support for different compute types, plans, and regions
- Visual Analytics: Interactive charts and breakdowns of costs
- Export Capabilities: Export results to CSV or JSON formats
- Modern UI: Beautiful, responsive interface built with Streamlit
- Python 3.12+
- Vantage API token (for AWS pricing)
- Internet connection (for API calls)
-
Clone the repository:
git clone <repository-url> cd databricks_and_cloud_pricing
-
Install dependencies:
uv sync
-
Set up environment variables:
cp env.example .env
Edit
.env
and add your Vantage API token:VANTAGE_API_TOKEN=your_vantage_api_token_here
- Visit Vantage
- Sign up for a free account
- Navigate to your API settings
- Generate a new API token
- Add the token to your
.env
file
streamlit run src/streamlit_app.py
uv run pricing-calculator
The application will open in your default browser at http://localhost:8501
.
- Cloud Provider: Currently supports AWS (GCP and Azure coming soon)
- Region: Select your AWS region
- Databricks Plan: Choose Standard, Premium, or Enterprise
- Compute Type: Select Jobs Compute, All-Purpose Compute, SQL Compute, or ML Runtime
- Instance Type: Use the dropdown to search and select from 760+ available AWS instance types
- Type to search: Start typing to filter instance types
- Categories: View instance types by category (General Purpose, Compute Optimized, Memory Optimized, etc.)
- Instance Details: See detailed specifications when an instance type is selected
- Number of Instances: How many instances you'll run
- Hours per Run: How many hours per run the instances will run (max 168 hours = 1 week)
Click "Calculate Pricing" to get your results. The application will:
- Fetch AWS pricing from Vantage API
- Fetch Databricks pricing from their public API
- Calculate combined costs
- Display detailed breakdowns
- Summary Metrics: Total costs, instance counts, and hours
- Cost Breakdown: Visual pie chart showing AWS vs Databricks costs
- Detailed Table: Complete breakdown of all costs
- Export Options: Download results as CSV or JSON
Run the test suite:
uv run pytest tests/
Run with coverage:
uv run pytest tests/ --cov=src --cov-report=html
databricks_and_cloud_pricing/
βββ src/
β βββ __init__.py
β βββ main.py # Main Streamlit application
β βββ config.py # Configuration and constants
β βββ api_client.py # API clients for Vantage and Databricks
β βββ calculator.py # Core pricing calculation logic
β βββ utils.py # Utility functions for export and formatting
β βββ streamlit_app.py # Streamlit entry point
β βββ aws.json # AWS instance type data from Databricks API
βββ tests/
β βββ test_calculator.py # Unit tests
βββ pyproject.toml # Project configuration
βββ env.example # Environment variables template
βββ README.md # This file
The application includes a comprehensive database of AWS instance types sourced from Databricks' pricing API. The data includes:
- 760+ Instance Types: Complete list of available AWS instance types
- Categorized Organization: Instance types organized by purpose:
- General Purpose (201 types)
- Compute Optimized (110 types)
- Memory Optimized (226 types)
- Storage Optimized (32 types)
- GPU Instances (20 types)
- Other (171 types)
- Detailed Specifications: Each instance type includes specifications like vCPU, memory, storage, and pricing information
- Search Functionality: Type to search and filter instance types quickly
Variable | Description | Default |
---|---|---|
VANTAGE_API_TOKEN |
Your Vantage API token | Required |
DEFAULT_REGION |
Default AWS region | us-east-1 |
DEFAULT_COMPUTE_TYPE |
Default Databricks compute type | Jobs Compute |
DEFAULT_PLAN |
Default Databricks plan | Enterprise |
- us-east-1 (N. Virginia)
- us-east-2 (Ohio)
- us-west-1 (N. California)
- us-west-2 (Oregon)
- eu-west-1 (Ireland)
- eu-central-1 (Frankfurt)
- ap-southeast-1 (Singapore)
- ap-northeast-1 (Tokyo)
- Premium: Advanced features and priority support
- Enterprise: Full enterprise features and dedicated support
- Jobs Compute: For batch processing jobs
- All-Purpose Compute: For interactive notebooks and jobs
- SQL Compute: For SQL analytics workloads
- ML Runtime: For machine learning workloads
- Support for GCP and Azure
- Excel export functionality
- Cost optimization recommendations
- Historical pricing tracking
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues:
- Check that your Vantage API token is valid
- Ensure you have an internet connection for API calls
- Verify your instance type is supported
- Check the logs for detailed error messages
For additional help, please open an issue on GitHub.
- Vantage for providing AWS pricing data
- Databricks for their pricing APIs
- Streamlit for the web framework
- Plotly for interactive visualizations