An end-to-end machine learning pipeline for ABC Renewables to forecast revenue across multiple renewable energy sites. The pipeline includes synthetic data generation, model training, automated predictions, and visualization.
- Synthetic data generation for renewable energy sites (solar, wind, battery)
- Automated data processing and model training
- Multiple forecasting models (ARIMA, SARIMA, Prophet, Random Forest, LSTM)
- Interactive Streamlit dashboard with confidence intervals
- Full Azure Data Factory orchestration
- Application Insights monitoring
- Cost-optimized architecture (<$50/month)
graph LR
A[Data Generation] -->|Azure Data Lake| B[Data Processing]
B --> C[ML Training]
C --> D[Model Registry]
D --> E[Model Serving]
E --> F[Streamlit Dashboard]
G[Azure Data Factory] -->|Orchestration| A
G -->|Orchestration| B
G -->|Orchestration| C
H[App Insights] -->|Monitoring| A & B & C & D & E & F
/enterprise-revenue-prediction-pipeline
├── /data_generation/ # Synthetic data generation scripts
├── /data_processing/ # Data cleaning and preprocessing
├── /ml_training/ # Model training pipelines
├── /model_serving/ # Model deployment and serving
├── /streamlit_dashboard/ # Interactive visualization
├── /adf_pipelines/ # Azure Data Factory pipelines
├── /infra/ # Infrastructure as Code
│ ├── /bicep/ # Bicep IaC templates
│ ├── /scripts/ # Deployment scripts
│ └── /docs/ # Additional documentation
├── requirements.txt # Python dependencies
├── README.md # This file
└── .gitignore # Git ignore patterns
- Clone the repository:
git clone https://github.com/nerdy1texan/RevenuePred.git
cd enterprise-revenue-prediction-pipeline
- Create and activate virtual environment:
python -m venv venv
source venv/bin/activate # Linux/Mac
.\venv\Scripts\activate # Windows
source /c/Users/mauli/anaconda3/Scripts/activate #GitBash Initialize Conda
conda activate revenuepred
- Install dependencies:
pip install -r requirements.txt
- Set up Azure Infrastructure:
a. Install Prerequisites:
# Install Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
# Login to Azure
az login
# Install Bicep tools
az bicep install
b. Deploy Infrastructure:
cd infra/scripts
./deploy-infrastructure.sh
c. Set up RBAC:
./setup-rbac.sh
d. Verify Deployment:
./verify-deployment.sh
- Configure Environment:
The
setup-rbac.sh
script will create a.env
file with all required credentials:
AZURE_STORAGE_CONNECTION_STRING=your_connection_string
AZURE_TENANT_ID=your_tenant_id
AZURE_CLIENT_ID=your_client_id
AZURE_CLIENT_SECRET=your_client_secret
APPINSIGHTS_CONNECTION_STRING=your_connection_string
-
Data Generation
- Generate synthetic data for renewable sites
- Store in Azure Data Lake
-
Data Processing
- Clean and preprocess data
- Feature engineering
- Data validation
-
Model Training
- Train multiple forecasting models
- Model evaluation and selection
- Model registration
-
Model Serving
- Deploy best performing model
- Real-time predictions
- Batch predictions
-
Visualization
- Interactive Streamlit dashboard
- Revenue forecasts with confidence intervals
- Performance metrics
{
'Date': 'datetime64[ns]',
'SiteID': 'string',
'SiteName': 'string',
'SiteType': 'string', # solar, wind, battery
'EnergyProduced_kWh': 'float64',
'SpotMarketPrice': 'float64',
'Revenue': 'float64',
'WeatherCondition': 'string',
'DowntimeHours': 'float64',
'Temperature_C': 'float64',
'WindSpeed_mps': 'float64'
}
-
Infrastructure as Code (Bicep templates):
main.bicep
: Orchestrates all resource deploymentsstorage.bicep
: Data Lake Storage Gen2 with lifecycle managementaml.bicep
: Azure ML workspace with compute clusteradf.bicep
: Data Factory with managed vnetswa.bicep
: Static Web Apps for dashboardmonitoring.bicep
: Application Insights with alerts
-
Deployment Scripts:
deploy-infrastructure.sh
: Main deployment scriptsetup-rbac.sh
: RBAC and service principal setupverify-deployment.sh
: Deployment verification
-
Cost Optimization:
- Storage: Uses lifecycle management to move old data to cool tier
- AML: Auto-shutdown for compute instances
- ADF: Uses managed vnet for cost-effective integration
- App Insights: Daily cap of 1GB
- Budget alerts at 80% threshold
- Estimated monthly cost: $18-30
-
Security Features:
- RBAC with least privilege access
- Service Principal for automation
- Managed identities for services
- Secure storage access
- Application Insights for monitoring
- Uses Azure free tiers where possible
- Automated resource scaling
- Efficient data storage patterns
- Budget alerts and controls
- Target monthly cost: <$50
- Application Insights integration
- Custom metrics and KPIs
- Automated alerts
- Performance monitoring
- Cost tracking
Run tests:
pytest tests/
This project is licensed under the MIT License - see the LICENSE file for details.
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request