Production-Grade Machine Learning Service - A reference implementation demonstrating the complete transformation from research notebooks to scalable, maintainable production APIs with multiple model deployment strategies.
π Part of AI Enhanced Engineer - Exploring production patterns for ML systems at scale.
In "Hidden Technical Debt in Machine Learning Systems"1, Sculley et al. from Google revealed that ML code comprises less than 5% of real ML systemsβthe remaining 95% involves configuration, monitoring, serving infrastructure, and data verification. This repository bridges that gap by demonstrating the complete transformation from experimental notebooks to production-grade services.
The journey from a Jupyter notebook to a production API involves fundamental architectural transformations. Your experimental code becomes a distributed system requiring configuration management, dependency injection, error handling, monitoring, and deployment strategies2. This repository shows you exactly how to make that transformation while maintaining model performance and adding production resilience.
Everyone celebrates achieving high accuracy in notebooks, but production is where ML systems prove their value. Through deployment experience serving millions of predictions daily, we've learned that the hardest challenges aren't about model accuracyβthey're about operational excellence3.
Consider the iris classification problem demonstrated here. Our research revealed that simple heuristics match sophisticated ensemble methods in accuracy. But in production, models must handle malformed inputs, API rate limits, deployment rollbacks, A/B testing, monitoring alerts, and configuration changesβall while maintaining sub-10ms latency4.
This repository shows you how. We've transformed a classical ML problem into a production system demonstrating patterns applicable to fraud detection, customer segmentation, quality control, or any feature-based classification task.
Our architecture supports seamless runtime switching between four distinct models through environment configuration, enabling A/B testing and gradual rollouts without code changes:
Model | Accuracy | Latency | Memory | Interpretability | Production Use Case |
---|---|---|---|---|---|
Heuristic | 96.0% | <1ms | 10MB | High (rules) | Real-time, regulated industries |
Decision Tree | 96.7% | <1ms | 15MB | High (tree viz) | Explainable AI requirements |
Random Forest | 96.0% | ~2ms | 50MB | Medium | Balanced performance |
XGBoost | 96.0% | ~3ms | 75MB | Low | High-throughput systems |
The Factory pattern combined with configuration-driven design enables this flexibility:
# Switch models via environment variable
MPS_MODEL_TYPE=heuristic make api-run # Development
MPS_MODEL_TYPE=decision_tree make api-run # Staging
MPS_MODEL_TYPE=random_forest make api-run # Production
Following SOLID principles5, our architecture ensures maintainability and testability:
# Abstract interface for all predictors
class BasePredictor(ABC):
@abstractmethod
def predict(self, measurements: IrisMeasurements) -> str:
pass
# Factory pattern for model instantiation
def get_predictor(config: ServiceConfig) -> BasePredictor:
match config.model_type:
case ModelType.HEURISTIC:
return HeuristicPredictor()
case ModelType.RANDOM_FOREST:
return MLModelPredictor(model_path=config.get_model_path())
97% test coverage with parametrized testing across all models ensures consistent behavior:
@pytest.mark.parametrize("model_type", [
ModelType.HEURISTIC, ModelType.DECISION_TREE,
ModelType.RANDOM_FOREST, ModelType.XGBOOST
])
async def test_prediction_endpoint(model_type):
# Ensures consistent API behavior across implementations
Production observability includes structured logging, health checks, latency tracking, and comprehensive error handling6.
- Python 3.10-3.12
uv
package manager or standardpip
- Docker (optional, for containerized deployment)
- Make (for automation commands)
# Clone and setup
git clone https://github.com/leogarciavargas/ml-production-service
cd ml-production-service
make environment-create
# Validate setup
make validate-branch
# Start API with chosen model
MPS_MODEL_TYPE=decision_tree make api-run
# Access endpoints
# API: http://localhost:8000
# Docs: http://localhost:8000/docs
# Health: http://localhost:8000/health
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"sepal_length": 5.1, "sepal_width": 3.5,
"petal_length": 1.4, "petal_width": 0.2}'
# Response: {"prediction": "setosa"}
# Quick start (build + run)
MPS_MODEL_TYPE=random_forest make service-quick-start
# Or step by step
make service-build
make service-start
make service-validate
make service-stop
Deploy multiple model variants simultaneously:
# docker-compose.yml
services:
model-a:
image: ml-production-service:latest
environment:
- MPS_MODEL_TYPE=random_forest
ports:
- "8000:8000"
model-b:
image: ml-production-service:latest
environment:
- MPS_MODEL_TYPE=decision_tree
ports:
- "8001:8000"
# Google Cloud Run
gcloud run deploy ml-production-service \
--image gcr.io/your-project/ml-production-service \
--set-env-vars MPS_MODEL_TYPE=random_forest \
--memory 512Mi --cpu 1
# AWS ECS
aws ecs create-service \
--service-name ml-production-service \
--task-definition ml-production-service:latest \
--desired-count 3
# Kubernetes
kubectl apply -f k8s/deployment.yaml
kubectl set env deployment/ml-production-service MPS_MODEL_TYPE=decision_tree
Variable | Description | Default | Options |
---|---|---|---|
MPS_MODEL_TYPE |
Model selection | heuristic |
heuristic , decision_tree , random_forest , xgboost |
MPS_LOG_LEVEL |
Logging verbosity | INFO |
DEBUG , INFO , WARNING , ERROR |
MPS_API_HOST |
API bind address | 0.0.0.0 |
Any valid host |
MPS_API_PORT |
API port | 8000 |
Any valid port |
# Environment Management
make environment-create # First-time setup
make environment-sync # Update dependencies
# Code Quality
make format # Auto-format with Ruff
make lint # Linting checks
make type-check # MyPy validation
make validate-branch # All quality checks
# Testing
make unit-test # Unit tests only
make functional-test # Functional tests
make all-test # Complete test suite
make all-test-validate-branch # Tests + quality
# API Development
make api-run # Start dev server
make api-validate # Test running API
# Research & Training
make eval-heuristic # Evaluate baseline
make train-decision-tree # Train decision tree
make train-random-forest # Train random forest
make train-xgboost # Train XGBoost
Our research follows a production-driven model selection approach:
- Baseline Establishment: Rule-based heuristic as performance floor
- Progressive Complexity: Systematic evaluation from simple to complex
- Multiple Validation: LOOCV, k-fold CV, OOB estimation
- Production Metrics: Beyond accuracyβlatency, interpretability, maintenance
Research insights that shaped our architecture:
- Accuracy Ceiling: All models plateau at 96-97% (data limitation)
- Validation Impact: LOOCV vs split validation shows 5.6% difference
- Feature Engineering: Derived features (petal_area) improve tree models
- Complexity ROI: Diminishing returns beyond decision trees
Research Phase β Evaluation β Promotion β Deployment
research/models/ β research/results/ β registry/prd/ β production
Complete documentation available in research/EXPERIMENTS_JOURNEY.md and individual experiment reports in research/experiments/.
ml-production-service/
βββ ml_production_service/ # Production service
β βββ predictors/ # Model implementations
β βββ server/ # FastAPI application
β βββ configs.py # Configuration
β βββ factory.py # Dependency injection
β
βββ research/ # Experimentation
β βββ experiments/ # Training code
β βββ models/ # Trained artifacts
β βββ results/ # Performance metrics
β
βββ registry/prd/ # Production models
βββ tests/ # Test suite (97% coverage)
βββ Dockerfile # Container definition
βββ Makefile # Automation commands
- Create predictor class implementing
BasePredictor
- Register in
ModelType
enum - Add to factory function
- Write comprehensive tests
See CLAUDE.md for detailed guide.
We welcome contributions that strengthen production ML patterns:
- New model implementations following
BasePredictor
interface - Production patterns (monitoring, deployment strategies)
- Performance optimizations
- Testing strategies for ML systems
Fork β Branch β Test (maintain 97% coverage) β PR with clear description
- Hidden Technical Debt in Machine Learning Systems - Sculley et al., NeurIPS 2015
- Rules of Machine Learning - Martin Zinkevich, Google
- MLOps: Continuous delivery and automation pipelines - Google Cloud
1 Sculley, D., et al. (2015). "Hidden Technical Debt in Machine Learning Systems". NeurIPS 2015.
2 Polyzotis, N., et al. (2017). "Data Management Challenges in Production Machine Learning". SIGMOD 2017.
3 Breck, E., et al. (2017). "The ML Test Score: A Rubric for ML Production Readiness". IEEE Big Data 2017.
4 Shankar, S., et al. (2024). "Operationalizing Machine Learning: An Interview Study".
5 Martin, R.C. (2017). "Clean Architecture: A Craftsman's Guide to Software Structure and Design".
6 Huyen, C. (2022). "Designing Machine Learning Systems". O'Reilly.
Apache License 2.0 - See LICENSE file for details.
π Ready to deploy production ML? Start with make environment-create
and have your first model API running in under 2 minutes.
From research notebooks to production APIs. For ML engineers shipping real systems.