Skip to content

ai-enhanced-engineer/ml-production-service

Repository files navigation

ML Production Service

Production-Grade Machine Learning Service - A reference implementation demonstrating the complete transformation from research notebooks to scalable, maintainable production APIs with multiple model deployment strategies.

πŸ“š Part of AI Enhanced Engineer - Exploring production patterns for ML systems at scale.

πŸ—οΈ The Research-to-Production Pipeline

In "Hidden Technical Debt in Machine Learning Systems"1, Sculley et al. from Google revealed that ML code comprises less than 5% of real ML systemsβ€”the remaining 95% involves configuration, monitoring, serving infrastructure, and data verification. This repository bridges that gap by demonstrating the complete transformation from experimental notebooks to production-grade services.

The journey from a Jupyter notebook to a production API involves fundamental architectural transformations. Your experimental code becomes a distributed system requiring configuration management, dependency injection, error handling, monitoring, and deployment strategies2. This repository shows you exactly how to make that transformation while maintaining model performance and adding production resilience.

πŸ’‘ The Reality of Production ML Systems

Everyone celebrates achieving high accuracy in notebooks, but production is where ML systems prove their value. Through deployment experience serving millions of predictions daily, we've learned that the hardest challenges aren't about model accuracyβ€”they're about operational excellence3.

Consider the iris classification problem demonstrated here. Our research revealed that simple heuristics match sophisticated ensemble methods in accuracy. But in production, models must handle malformed inputs, API rate limits, deployment rollbacks, A/B testing, monitoring alerts, and configuration changesβ€”all while maintaining sub-10ms latency4.

This repository shows you how. We've transformed a classical ML problem into a production system demonstrating patterns applicable to fraud detection, customer segmentation, quality control, or any feature-based classification task.

πŸ”§ Key Architecture Components

Multi-Model Architecture with Hot Swapping

Our architecture supports seamless runtime switching between four distinct models through environment configuration, enabling A/B testing and gradual rollouts without code changes:

Model Accuracy Latency Memory Interpretability Production Use Case
Heuristic 96.0% <1ms 10MB High (rules) Real-time, regulated industries
Decision Tree 96.7% <1ms 15MB High (tree viz) Explainable AI requirements
Random Forest 96.0% ~2ms 50MB Medium Balanced performance
XGBoost 96.0% ~3ms 75MB Low High-throughput systems

The Factory pattern combined with configuration-driven design enables this flexibility:

# Switch models via environment variable
MPS_MODEL_TYPE=heuristic make api-run        # Development
MPS_MODEL_TYPE=decision_tree make api-run    # Staging
MPS_MODEL_TYPE=random_forest make api-run    # Production

Clean Architecture with Dependency Injection

Following SOLID principles5, our architecture ensures maintainability and testability:

# Abstract interface for all predictors
class BasePredictor(ABC):
    @abstractmethod
    def predict(self, measurements: IrisMeasurements) -> str:
        pass

# Factory pattern for model instantiation
def get_predictor(config: ServiceConfig) -> BasePredictor:
    match config.model_type:
        case ModelType.HEURISTIC:
            return HeuristicPredictor()
        case ModelType.RANDOM_FOREST:
            return MLModelPredictor(model_path=config.get_model_path())

Comprehensive Testing & Observability

97% test coverage with parametrized testing across all models ensures consistent behavior:

@pytest.mark.parametrize("model_type", [
    ModelType.HEURISTIC, ModelType.DECISION_TREE,
    ModelType.RANDOM_FOREST, ModelType.XGBOOST
])
async def test_prediction_endpoint(model_type):
    # Ensures consistent API behavior across implementations

Production observability includes structured logging, health checks, latency tracking, and comprehensive error handling6.

⚑ Quick Start

Prerequisites

  • Python 3.10-3.12
  • uv package manager or standard pip
  • Docker (optional, for containerized deployment)
  • Make (for automation commands)

Installation

# Clone and setup
git clone https://github.com/leogarciavargas/ml-production-service
cd ml-production-service
make environment-create

# Validate setup
make validate-branch

Run Locally

# Start API with chosen model
MPS_MODEL_TYPE=decision_tree make api-run

# Access endpoints
# API: http://localhost:8000
# Docs: http://localhost:8000/docs
# Health: http://localhost:8000/health

Test the API

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"sepal_length": 5.1, "sepal_width": 3.5, 
       "petal_length": 1.4, "petal_width": 0.2}'

# Response: {"prediction": "setosa"}

🏭 Production Deployment

Docker Deployment

# Quick start (build + run)
MPS_MODEL_TYPE=random_forest make service-quick-start

# Or step by step
make service-build
make service-start
make service-validate
make service-stop

A/B Testing Configuration

Deploy multiple model variants simultaneously:

# docker-compose.yml
services:
  model-a:
    image: ml-production-service:latest
    environment:
      - MPS_MODEL_TYPE=random_forest
    ports:
      - "8000:8000"
  
  model-b:
    image: ml-production-service:latest
    environment:
      - MPS_MODEL_TYPE=decision_tree
    ports:
      - "8001:8000"

Cloud Platform Deployment

# Google Cloud Run
gcloud run deploy ml-production-service \
  --image gcr.io/your-project/ml-production-service \
  --set-env-vars MPS_MODEL_TYPE=random_forest \
  --memory 512Mi --cpu 1

# AWS ECS
aws ecs create-service \
  --service-name ml-production-service \
  --task-definition ml-production-service:latest \
  --desired-count 3

# Kubernetes
kubectl apply -f k8s/deployment.yaml
kubectl set env deployment/ml-production-service MPS_MODEL_TYPE=decision_tree

Environment Configuration

Variable Description Default Options
MPS_MODEL_TYPE Model selection heuristic heuristic, decision_tree, random_forest, xgboost
MPS_LOG_LEVEL Logging verbosity INFO DEBUG, INFO, WARNING, ERROR
MPS_API_HOST API bind address 0.0.0.0 Any valid host
MPS_API_PORT API port 8000 Any valid port

πŸ› οΈ Development Workflow

Essential Commands

# Environment Management
make environment-create     # First-time setup
make environment-sync       # Update dependencies

# Code Quality
make format                # Auto-format with Ruff
make lint                  # Linting checks
make type-check           # MyPy validation
make validate-branch      # All quality checks

# Testing
make unit-test            # Unit tests only
make functional-test      # Functional tests
make all-test            # Complete test suite
make all-test-validate-branch  # Tests + quality

# API Development
make api-run             # Start dev server
make api-validate        # Test running API

# Research & Training
make eval-heuristic          # Evaluate baseline
make train-decision-tree     # Train decision tree
make train-random-forest     # Train random forest
make train-xgboost          # Train XGBoost

πŸ“Š Research Methodology & Results

Systematic Experimentation

Our research follows a production-driven model selection approach:

  1. Baseline Establishment: Rule-based heuristic as performance floor
  2. Progressive Complexity: Systematic evaluation from simple to complex
  3. Multiple Validation: LOOCV, k-fold CV, OOB estimation
  4. Production Metrics: Beyond accuracyβ€”latency, interpretability, maintenance

Key Findings

Research insights that shaped our architecture:

  • Accuracy Ceiling: All models plateau at 96-97% (data limitation)
  • Validation Impact: LOOCV vs split validation shows 5.6% difference
  • Feature Engineering: Derived features (petal_area) improve tree models
  • Complexity ROI: Diminishing returns beyond decision trees

Model Lifecycle

Research Phase β†’ Evaluation β†’ Promotion β†’ Deployment
research/models/ β†’ research/results/ β†’ registry/prd/ β†’ production

Complete documentation available in research/EXPERIMENTS_JOURNEY.md and individual experiment reports in research/experiments/.

Project Structure

ml-production-service/
β”œβ”€β”€ ml_production_service/     # Production service
β”‚   β”œβ”€β”€ predictors/           # Model implementations
β”‚   β”œβ”€β”€ server/              # FastAPI application
β”‚   β”œβ”€β”€ configs.py          # Configuration
β”‚   └── factory.py          # Dependency injection
β”‚
β”œβ”€β”€ research/                 # Experimentation
β”‚   β”œβ”€β”€ experiments/         # Training code
β”‚   β”œβ”€β”€ models/             # Trained artifacts
β”‚   └── results/            # Performance metrics
β”‚
β”œβ”€β”€ registry/prd/            # Production models
β”œβ”€β”€ tests/                   # Test suite (97% coverage)
β”œβ”€β”€ Dockerfile              # Container definition
└── Makefile               # Automation commands

Adding New Models

  1. Create predictor class implementing BasePredictor
  2. Register in ModelType enum
  3. Add to factory function
  4. Write comprehensive tests

See CLAUDE.md for detailed guide.

🀝 Contributing

We welcome contributions that strengthen production ML patterns:

  • New model implementations following BasePredictor interface
  • Production patterns (monitoring, deployment strategies)
  • Performance optimizations
  • Testing strategies for ML systems

Fork β†’ Branch β†’ Test (maintain 97% coverage) β†’ PR with clear description

Related Resources

Essential Reading

Project Documentation

References

1 Sculley, D., et al. (2015). "Hidden Technical Debt in Machine Learning Systems". NeurIPS 2015.

2 Polyzotis, N., et al. (2017). "Data Management Challenges in Production Machine Learning". SIGMOD 2017.

3 Breck, E., et al. (2017). "The ML Test Score: A Rubric for ML Production Readiness". IEEE Big Data 2017.

4 Shankar, S., et al. (2024). "Operationalizing Machine Learning: An Interview Study".

5 Martin, R.C. (2017). "Clean Architecture: A Craftsman's Guide to Software Structure and Design".

6 Huyen, C. (2022). "Designing Machine Learning Systems". O'Reilly.

License

Apache License 2.0 - See LICENSE file for details.


πŸš€ Ready to deploy production ML? Start with make environment-create and have your first model API running in under 2 minutes.

From research notebooks to production APIs. For ML engineers shipping real systems.