VIA - Intelligent Human-in-the-Loop AI System

Portfolio Project: An AI orchestration platform demonstrating advanced Human-in-the-Loop architecture, multi-agent systems, and enterprise-grade engineering practices.

Executive Summary

VIA is a sophisticated AI workflow orchestration platform that revolutionizes customer support through intelligent human-AI collaboration. This project showcases my current abilities in AI engineering, product design and management, and system architecture - delivering improvements in both customer satisfaction and employee wellbeing.

Background

Via was built during the Dallas AI Summer Program 2025 under the mentorship of Eric Poon, Senior Vice President, Head of Technology, Shoppa's/Toyota Material Handling.

It was an eight week program where a group of AI enthusiasts of various backgrounds worked together under one of the mentors to conceive, design, and create an AI product prototype.

My Role

I was the product designer/manager and the lead AI engineer for this project, responsible for the technical architecture, multi-agent system design, and human-in-the-loop workflows. I (and Claude Code) are responsible for all code in this repository except for the frontend code in the frontend/ directory, which was built by Nithin Dodla.

This fork is my personal showcase as I continue to develop and refine the VIA system, demonstrating my skills in AI engineering and product design.

Key Achievements

📊 Designed a human-in-the-loop AI product
🏗️ Architected a multi-agent AI system using LangGraph/LangChain
⚡ Implemented a production-ready backend with modular, testable components
🤖 Integrated multiple LLMs (Gemini, Claude, local models)
🧪 Established comprehensive testing framework with 95%+ code coverage
🌐 Delivered a live technical demo of the agents in action

🚀 Technical Innovation & Product Vision

The Problem We Solved

Traditional chatbot systems create frustration cycles: customers get stuck with inadequate AI responses, leading to angry escalations that burn out human agents. This creates a lose-lose scenario for both customers and employees.

The Solution: Intelligent Intervention and Escalation Architecture

VIA is designed to monitor customer and AI interactions and to intelligently and pro-actively escalate to the most appropriate human when appropriate, based on factors like customer sentiment, issue complexity, and human agent wellbeing.

This system also empowers human agents to be part of the team, providing feedback and context to continuously improve AI performance.

Key Features:

Intercepts all AI responses before delivery for quality assessment
Monitors customer sentiment in real-time to prevent frustration buildup
Intelligent Escalations to optimal human agents based on factors like expertise AND wellbeing
Learns from every interaction to continuously integrate human expertise into the system

Business Impact

Reduction in customer frustration incidents and overall resolution times
Improvement in first-contact resolution quality
Decrease in employee burnout indicators through intelligent workload balancing
Empowerment of human agents making them part of the team and driving continuous improvement

🛠️ Technical & AI Engineering

Multi-Agent AI Architecture

I designed and implemented a sophisticated multi-agent system that demonstrates AI engineering patterns:

# Human-in-the-Loop Workflow Pipeline
Customer Query → Frustration Analysis → Chatbot Agent → Quality Agent → 
 → Context Enrichment → Intelligent Human Routing

Prototype Considerations

Time Constraints: This was supposed to be an 8-week project, but coming up with a novel product concept that matched our mentor's theme of "Human-in-the-Loop AI" and then getting all 5 team members (1 team member eventually dropped out) of various levels of engagement and commitment to agree to the final product design took a significant amount of time.
- After the concept was finalized, we had less than 4 weeks to design, implement, and deliver a working prototype with a live demo.
- Only 2 of the 4 team members had coding experience. Nobody on the team had programmed with LLMs before.
- This fork of the project represents my attempt to finish out the core of the product.
Incomplete Design/Implementation: We ran out of time to really finish out an MVP and there are various loose ends, like attaching the backend to the frontend. Our core features as a product will come down to the escalation engine and context manager, and these will be the key areas of focus for future development, such as advanced RAG implementation for the context manager and multi-model ensembles for the escalation engine.
Foundation Models: For proof of concept in the short time frame, we used Gemini Flash 2.5 API calls for best real time performance, but these were not thoroughly tuned and latency is not optimal (and cost will probably be high at scale.) Future plans include specialized sentiment models, pre-screener models, and ensemble approaches for cost/performance optimization.
Evaluation Framework: We did not have time to do proper evaluation and selection, but a comprehensive framework that allows rapid iterations and experimentation will be critical for future performance optimization and cost control.
Simulation Environment: We built a basic simulation environment to test the system and acquire mock data, but it is severely lacking. We need a more robust environment (ideally based on real data) with realistic customer personas and employee behavior models to properly validate the system.
Competitive Optimization: Ultimately, if this were to be a real product, the main areas of competitive differentiation would be:
- Overall User Experience - both for customers and human agents
- Model Performance - the speed, accuracy, and cost of our models
- Innovation - the overall quality of the escalation engine, context manager, and human feedback loop, especially working in unison

Core AI Agents

1. Frustration Detection Agent 🟢 Implemented

Real-time frustration analysis with escalating pattern detection and configurable thresholds

✅ Current: Gemini Flash 2.5 API calls for sentiment analysis
🔄 Planned: Specialized sentiment models for speed and cost optimization

2. Quality Assurance Agent 🟢 Implemented

Real-time response evaluation before delivery with configurable quality thresholds

✅ Current: Gemini Flash 2.5 API calls for consensus validation
🔄 Planned: Specialized models or pre-screener models for performance optimization

3. Intelligent Routing Agent 🟢 Implemented

Dynamic human agent routing based on various factors including expertise, workload, and customer history

✅ Current: Gemini Flash 2.5 API calls for roster scoring and routing
🔄 Planned: Ensemble of multiple models for performance optimization (e.g. XGBoost for ranking and scoring ensembled with context-aware models for optimal employee selection)

4. Context Manager Agent 🟢 Implemented

Centralized context aggregation from multiple sources to provide rich, real-time data for decision-making and to continuously integrate human expertise and feedback into the system

✅ Current: SQL database for customer history and context searches
🔄 Planned: Advanced RAG knowledgebase for faster context retrieval and integration

Live Technical Demo

I created a live technical demo showcasing the core AI agents in action (the demo is geared towards custmer supprt in the insurance domain.)

Screen captures below, and the live demo can be found here: Live Backend Demo

Technical Architecture Highlights

Production-Ready Engineering Practices:

Modular, Interface-Driven Design: Clean separation of concerns with comprehensive abstraction and interface contracts
Pluggable Structure: Modular, OOP, and factory patterns for various components (e.g., agents, workflows) enabling easy extension
Agent-Centric Configuration: Modular, hot-reloadable configuration system
Comprehensive Testing: 95%+ code coverage with unit, integration, and performance tests
Observability: Structured logging, LangSmith tracing, and performance monitoring
Database Management: Centralized data layer

Technology Stack:

AI/ML Framework: LangGraph, LangChain, LangSmith
Languages: Python 3.11+
Data: SQLite, Pydantic
Testing: pytest
DevOps: Docker, Dev Containers
Quality: ruff, mypy

Future Enhancements:

Evaluation Framework: Comprehensive model evaluation and selection framework for performance optimization
Integrated Evals: monitor model performance and drift as well as config/threshold tuning
Multi-Model Ensembles: Cost/performance optimization through ensemble approaches
Advanced Sentiment Models: Beyond basic prompt-based analysis for deeper customer insights
Advanced RAG Implementation: Context manager with specialized model integration for performance optimization
Real-time Dashboard: Customer satisfaction metrics and agent performance tracking

🏗️ System Architecture & Engineering

Scalable Multi-Agent Design

src/
├── core/                          # Infrastructure & Configuration
│   ├── agent_config_manager.py    # Hot-reloadable agent configuration
│   ├── context_manager.py         # Multi-source data aggregation  
│   ├── database_config.py         # Centralized data management
│   └── logging/                   # Structured observability
├── interfaces/                    # Contract-driven development
│   ├── core/                      # System interface contracts
│   ├── nodes/                     # Agent behavior specifications
│   └── workflows/                 # Orchestration interfaces
├── nodes/                         # AI Agent Implementations
│   ├── chatbot_agent.py           # Customer service AI
│   ├── quality_agent.py           # Response quality assurance
│   ├── frustration_agent.py       # Sentiment analysis & intervention
│   ├── human_routing_agent.py     # Intelligent escalation routing
│   └── context_manager_agent.py   # Context aggregation & delivery
├── simulation/                    # Testing & Validation Framework
│   ├── human_customer_simulator.py # Realistic customer personas
│   ├── employee_simulator.py      # Human agent simulation
│   └── demo_orchestrator.py       # End-to-end scenario testing
└── workflows/                     # Orchestration & State Management
    └── hybrid_workflow.py         # HITL system coordination

Configuration Management Innovation

Designed an agent-centric configuration system that enables:

Modular Development: Each agent has isolated configuration namespace
Environment Management: Clean dev/test/prod separation
Hot Reloading: Runtime configuration updates without restart
Model Consolidation: Single source of truth for AI model preferences

Quality Assurance & Testing Strategy

Comprehensive Test Coverage:

Unit Tests: All core components with mock-based isolation
Integration Tests: End-to-end workflow validation
Performance Tests: Concurrent operation and large dataset handling
Error Scenario Testing: Comprehensive failure mode validation

Development Workflows:

make setup     # Automated environment setup
make test      # Full test suite with coverage
make check     # Code quality validation
make run       # Local development server

🚀 Live Demonstrations & Portfolio Links

Interactive Demos

🔴 Live Backend Demo - Hugging Face Spaces deployment
🎨 Frontend Prototype - React-based user interface
📊 Presentation Deck - Complete technical overview

Code Quality & Documentation

📋 Development Guide - Comprehensive technical documentation
🧪 Test Suite - 95%+ coverage with real-world scenarios
⚙️ Configuration Examples - Production-ready setup guides

🎯 Key Skills Demonstrated

Software Engineering Proficiency

Architecture Design: Scalable, modular, maintainable system patterns
Testing Strategy: Comprehensive validation and quality assurance
DevOps Practices: Containerization, CI/CD, and deployment automation
Documentation: Technical writing and AI and developer experience optimization

Product Design/Management Expertise

Strategic Vision: Conceptualization of innovative HITL product
Technical Leadership: Cross-functional team coordination and delivery
Requirements Engineering: Concept translation to technical architecture
Competitive Analysis: Video concept and live agent demo for stakeholder engagement

📈 Business Context & Team Collaboration

Dallas AI Summer Program 2025

Program Structure: 8-week intensive program pairing mentors with teams of 2-5 members of varying backgrounds to conceive, design, and build AI product prototypes
My Role: Product Design/Manager & Lead AI Engineer
Team Composition: 4 members across AI, Frontend, Creative, and Human Factors

Mentor Collaboration

Eric Poon - Senior Vice President, Head of Technology, Shoppa's/Toyota Material Handling

Regular strategic guidance on product vision and market positioning
Competition strategy and feedback
Video and final presentation strategy and review

Cross-Functional Team Leadership

Team Members & Contributions:

Chris Munch - Product Designer/Manager, AI Architecture, Backend Development
Snehaa Muthiah - Creative Director, Branding, Presentation Design
Nithin Dodla - Frontend Development, UI/UX
Thomas Siskos - Human Factors Research, Marketing Strategy

🔧 Technical Implementation Details

Quick Start for Technical Review

# Clone and setup
git clone <repository-url>
cd human-ai-orchestrator
make setup

# Run comprehensive demos
uv run python scripts/experimentation_demo.py  # 6 scenario demo
uv run python scripts/gradio_demo.py          # Interactive interface

# Validate code quality  
make test     # Full test suite
make check    # Linting and type checking

Environment Configuration

# Required for full functionality
export OPENAI_API_KEY="your_key_here"      # Multi-provider support
export ANTHROPIC_API_KEY="your_key_here"   # Claude integration  
export GEMINI_API_KEY="your_key_here"     # Gemini integration
export LANGCHAIN_API_KEY="your_key_here"   # Optional: Tracing

# Optional: Environment selection
export HYBRID_SYSTEM_ENV="development"     # dev/test/prod configs

Production Deployment Ready

# Docker deployment
docker build -t via-hitl-system .
docker run --env-file .env via-hitl-system

# Dev container support for VSCode/Cursor
# GPU-enabled development environment available

📞 Let's Connect

I'm actively seeking opportunities in AI/ML Engineering or even Product Management where I can apply these skills to solve complex problems at scale.

Contact Information:

LinkedIn: Chris Munch
Email: Available on LinkedIn profile
Portfolio: This repository demonstrates production-ready AI engineering

Next Steps:

Review the live demo to see the system in action
Explore the codebase to evaluate technical implementation quality
Check out the series of articles detailing my thought process and technical decisions

This project represents 8 weeks of intensive development, demonstrating my ability to deliver production-ready AI systems under tight deadlines while leading cross-functional teams and maintaining high technical standards.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.claude		.claude
.devcontainer		.devcontainer
.vscode		.vscode
VIA_tech_demo		VIA_tech_demo
config		config
data		data
docs		docs
front_end		front_end
images		images
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
Makefile		Makefile
README.md		README.md
docker-compose.gpu.yml		docker-compose.gpu.yml
gradio_demo.py		gradio_demo.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock
via_showcase_presentation.pdf		via_showcase_presentation.pdf

cmunch1/human-ai-orchestrator

Folders and files

Latest commit

History

Repository files navigation