ML Production Service

Production-Grade Machine Learning Service - A reference implementation demonstrating the complete transformation from research notebooks to scalable, maintainable production APIs with multiple model deployment strategies.

📚 Part of AI Enhanced Engineer - Exploring production patterns for ML systems at scale.

🏗️ The Research-to-Production Pipeline

In "Hidden Technical Debt in Machine Learning Systems"¹, Sculley et al. from Google revealed that ML code comprises less than 5% of real ML systems—the remaining 95% involves configuration, monitoring, serving infrastructure, and data verification. This repository bridges that gap by demonstrating the complete transformation from experimental notebooks to production-grade services.

The journey from a Jupyter notebook to a production API involves fundamental architectural transformations. Your experimental code becomes a distributed system requiring configuration management, dependency injection, error handling, monitoring, and deployment strategies². This repository shows you exactly how to make that transformation while maintaining model performance and adding production resilience.

💡 The Reality of Production ML Systems

Everyone celebrates achieving high accuracy in notebooks, but production is where ML systems prove their value. Through deployment experience serving millions of predictions daily, we've learned that the hardest challenges aren't about model accuracy—they're about operational excellence³.

Consider the iris classification problem demonstrated here. Our research revealed that simple heuristics match sophisticated ensemble methods in accuracy. But in production, models must handle malformed inputs, API rate limits, deployment rollbacks, A/B testing, monitoring alerts, and configuration changes—all while maintaining sub-10ms latency⁴.

This repository shows you how. We've transformed a classical ML problem into a production system demonstrating patterns applicable to fraud detection, customer segmentation, quality control, or any feature-based classification task.

🔧 Key Architecture Components

Multi-Model Architecture with Hot Swapping

Our architecture supports seamless runtime switching between four distinct models through environment configuration, enabling A/B testing and gradual rollouts without code changes:

Model	Accuracy	Latency	Memory	Interpretability	Production Use Case
Heuristic	96.0%	<1ms	10MB	High (rules)	Real-time, regulated industries
Decision Tree	96.7%	<1ms	15MB	High (tree viz)	Explainable AI requirements
Random Forest	96.0%	~2ms	50MB	Medium	Balanced performance
XGBoost	96.0%	~3ms	75MB	Low	High-throughput systems

The Factory pattern combined with configuration-driven design enables this flexibility:

# Switch models via environment variable
MPS_MODEL_TYPE=heuristic make api-run        # Development
MPS_MODEL_TYPE=decision_tree make api-run    # Staging
MPS_MODEL_TYPE=random_forest make api-run    # Production

Clean Architecture with Dependency Injection

Following SOLID principles⁵, our architecture ensures maintainability and testability:

# Abstract interface for all predictors
class BasePredictor(ABC):
    @abstractmethod
    def predict(self, measurements: IrisMeasurements) -> str:
        pass

# Factory pattern for model instantiation
def get_predictor(config: ServiceConfig) -> BasePredictor:
    match config.model_type:
        case ModelType.HEURISTIC:
            return HeuristicPredictor()
        case ModelType.RANDOM_FOREST:
            return MLModelPredictor(model_path=config.get_model_path())

Comprehensive Testing & Observability

97% test coverage with parametrized testing across all models ensures consistent behavior:

@pytest.mark.parametrize("model_type", [
    ModelType.HEURISTIC, ModelType.DECISION_TREE,
    ModelType.RANDOM_FOREST, ModelType.XGBOOST
])
async def test_prediction_endpoint(model_type):
    # Ensures consistent API behavior across implementations

Production observability includes structured logging, health checks, latency tracking, and comprehensive error handling⁶.

⚡ Quick Start

Prerequisites

Python 3.10-3.12
uv package manager or standard pip
Docker (optional, for containerized deployment)
Make (for automation commands)

Installation

# Clone and setup
git clone https://github.com/leogarciavargas/ml-production-service
cd ml-production-service
make environment-create

# Validate setup
make validate-branch

Run Locally

# Start API with chosen model
MPS_MODEL_TYPE=decision_tree make api-run

# Access endpoints
# API: http://localhost:8000
# Docs: http://localhost:8000/docs
# Health: http://localhost:8000/health

Test the API

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"sepal_length": 5.1, "sepal_width": 3.5, 
       "petal_length": 1.4, "petal_width": 0.2}'

# Response: {"prediction": "setosa"}

🏭 Production Deployment

Docker Deployment

# Quick start (build + run)
MPS_MODEL_TYPE=random_forest make service-quick-start

# Or step by step
make service-build
make service-start
make service-validate
make service-stop

A/B Testing Configuration

Deploy multiple model variants simultaneously:

# docker-compose.yml
services:
  model-a:
    image: ml-production-service:latest
    environment:
      - MPS_MODEL_TYPE=random_forest
    ports:
      - "8000:8000"
  
  model-b:
    image: ml-production-service:latest
    environment:
      - MPS_MODEL_TYPE=decision_tree
    ports:
      - "8001:8000"

Cloud Platform Deployment

# Google Cloud Run
gcloud run deploy ml-production-service \
  --image gcr.io/your-project/ml-production-service \
  --set-env-vars MPS_MODEL_TYPE=random_forest \
  --memory 512Mi --cpu 1

# AWS ECS
aws ecs create-service \
  --service-name ml-production-service \
  --task-definition ml-production-service:latest \
  --desired-count 3

# Kubernetes
kubectl apply -f k8s/deployment.yaml
kubectl set env deployment/ml-production-service MPS_MODEL_TYPE=decision_tree

Environment Configuration

Variable	Description	Default	Options
`MPS_MODEL_TYPE`	Model selection	`heuristic`	`heuristic`, `decision_tree`, `random_forest`, `xgboost`
`MPS_LOG_LEVEL`	Logging verbosity	`INFO`	`DEBUG`, `INFO`, `WARNING`, `ERROR`
`MPS_API_HOST`	API bind address	`0.0.0.0`	Any valid host
`MPS_API_PORT`	API port	`8000`	Any valid port

🛠️ Development Workflow

Essential Commands

# Environment Management
make environment-create     # First-time setup
make environment-sync       # Update dependencies

# Code Quality
make format                # Auto-format with Ruff
make lint                  # Linting checks
make type-check           # MyPy validation
make validate-branch      # All quality checks

# Testing
make unit-test            # Unit tests only
make functional-test      # Functional tests
make all-test            # Complete test suite
make all-test-validate-branch  # Tests + quality

# API Development
make api-run             # Start dev server
make api-validate        # Test running API

# Research & Training
make eval-heuristic          # Evaluate baseline
make train-decision-tree     # Train decision tree
make train-random-forest     # Train random forest
make train-xgboost          # Train XGBoost

📊 Research Methodology & Results

Systematic Experimentation

Our research follows a production-driven model selection approach:

Baseline Establishment: Rule-based heuristic as performance floor
Progressive Complexity: Systematic evaluation from simple to complex
Multiple Validation: LOOCV, k-fold CV, OOB estimation
Production Metrics: Beyond accuracy—latency, interpretability, maintenance

Key Findings

Research insights that shaped our architecture:

Accuracy Ceiling: All models plateau at 96-97% (data limitation)
Validation Impact: LOOCV vs split validation shows 5.6% difference
Feature Engineering: Derived features (petal_area) improve tree models
Complexity ROI: Diminishing returns beyond decision trees

Model Lifecycle

Research Phase → Evaluation → Promotion → Deployment
research/models/ → research/results/ → registry/prd/ → production

Complete documentation available in research/EXPERIMENTS_JOURNEY.md and individual experiment reports in research/experiments/.

Project Structure

ml-production-service/
├── ml_production_service/     # Production service
│   ├── predictors/           # Model implementations
│   ├── server/              # FastAPI application
│   ├── configs.py          # Configuration
│   └── factory.py          # Dependency injection
│
├── research/                 # Experimentation
│   ├── experiments/         # Training code
│   ├── models/             # Trained artifacts
│   └── results/            # Performance metrics
│
├── registry/prd/            # Production models
├── tests/                   # Test suite (97% coverage)
├── Dockerfile              # Container definition
└── Makefile               # Automation commands

Adding New Models

Create predictor class implementing BasePredictor
Register in ModelType enum
Add to factory function
Write comprehensive tests

See CLAUDE.md for detailed guide.

🤝 Contributing

We welcome contributions that strengthen production ML patterns:

New model implementations following BasePredictor interface
Production patterns (monitoring, deployment strategies)
Performance optimizations
Testing strategies for ML systems

Fork → Branch → Test (maintain 97% coverage) → PR with clear description

Related Resources

Essential Reading

Hidden Technical Debt in Machine Learning Systems - Sculley et al., NeurIPS 2015
Rules of Machine Learning - Martin Zinkevich, Google
MLOps: Continuous delivery and automation pipelines - Google Cloud

Project Documentation

References

¹ Sculley, D., et al. (2015). "Hidden Technical Debt in Machine Learning Systems". NeurIPS 2015.

² Polyzotis, N., et al. (2017). "Data Management Challenges in Production Machine Learning". SIGMOD 2017.

³ Breck, E., et al. (2017). "The ML Test Score: A Rubric for ML Production Readiness". IEEE Big Data 2017.

⁴ Shankar, S., et al. (2024). "Operationalizing Machine Learning: An Interview Study".

⁵ Martin, R.C. (2017). "Clean Architecture: A Craftsman's Guide to Software Structure and Design".

⁶ Huyen, C. (2022). "Designing Machine Learning Systems". O'Reilly.

License

Apache License 2.0 - See LICENSE file for details.

🚀 Ready to deploy production ML? Start with make environment-create and have your first model API running in under 2 minutes.

From research notebooks to production APIs. For ML engineers shipping real systems.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
ml_production_service		ml_production_service
registry/prd		registry/prd
research		research
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
research.mk		research.mk
uv.lock		uv.lock

License

ai-enhanced-engineer/ml-production-service

Folders and files

Latest commit

History

Repository files navigation

ML Production Service

🏗️ The Research-to-Production Pipeline

💡 The Reality of Production ML Systems

🔧 Key Architecture Components

Multi-Model Architecture with Hot Swapping

Clean Architecture with Dependency Injection

Comprehensive Testing & Observability

⚡ Quick Start

Prerequisites

Installation

Run Locally

Test the API

🏭 Production Deployment

Docker Deployment

A/B Testing Configuration

Cloud Platform Deployment

Environment Configuration

🛠️ Development Workflow

Essential Commands

📊 Research Methodology & Results

Systematic Experimentation

Key Findings

Model Lifecycle

Project Structure

Adding New Models

🤝 Contributing

Related Resources

Essential Reading

Project Documentation

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 3

Uh oh!

Languages

Packages