Speech Emotion Classification System

This project implements a speech emotion classification system using neural networks and genetic algorithms for optimization. The system classifies emotions such as calm, happy, sad, angry, fearful, surprise, and disgust from speech audio using the RAVDESS dataset.

Project Structure

speech_emotion_classification/
│
├── src/                       # Source code package
│   ├── data/                 # Data loading and processing
│   │   ├── __init__.py
│   │   └── data_loader.py    # Dataset loading and splitting
│   │
│   ├── features/             # Feature extraction
│   │   ├── __init__.py
│   │   └── feature_extractor.py # Audio feature extraction
│   │
│   ├── models/               # Model definitions
│   │   ├── __init__.py
│   │   ├── emotion_model.py  # Model architectures
│   │   ├── trainer.py        # Model training and evaluation
│   │   └── model_manager.py  # Model management and tracking
│   │
│   ├── utils/                # Utility functions
│   │   ├── __init__.py
│   │   └── monkey_patch.py   # TensorFlow fixes
│   │
│   ├── visualization/        # Visualization tools
│   │   ├── __init__.py
│   │   └── visualizer.py     # Visualization utilities
│   │
│   └── ui/                   # User interface components
│       ├── __init__.py
│       ├── app.py            # Streamlit application
│       └── dashboard.py      # Analysis dashboard
│   └── scripts/              # Utility scripts
│       └── train.py          # Wrapper to launch training with defaults
│
├── tests/                    # Test suite
│   ├── conftest.py          # Test fixtures
│   └── test_*.py            # Test modules
│
├── models/                   # Saved models
├── results/                  # Analysis results
├── logs/                     # Training logs
├── demo_files/              # Demo audio files
├── samples/                 # Sample audio files
├── uploads/                 # User uploads
│
├── setup_package.py         # Package installation setup
├── requirements.txt         # Project dependencies
└── README.md               # Project documentation

Features

Data loading and preprocessing using the RAVDESS dataset from Hugging Face
Feature extraction using librosa (MFCCs and spectrograms)
Neural network models (MLP and CNN) implemented with TensorFlow/Keras
Model training with early stopping and comprehensive evaluation metrics
Hyperparameter optimization using genetic algorithms via DEAP
Modular and well-documented codebase

Installation

pip install -r requirements.txt

If you're on Windows and encounter TensorFlow DLL issues, ensure you install a compatible TensorFlow version for your Python and CUDA setup. This project includes a TensorFlow monkey patch to avoid known signbit/argmax issues.

Usage

Basic CLI for training/evaluation:

# Train CNN using mel spectrograms (recommended)
python -m src.main --train --model-type cnn --feature-type mel_spectrogram

# Train MLP using MFCCs
python -m src.main --train --model-type mlp --feature-type mfcc

# Evaluate an existing model by ID (see models/model_registry.json)
python -m src.main --evaluate --model-id <MODEL_ID>

Run the app via unified driver (recommended):

pip install -r requirements.txt
python app.py           # ensures model exists, then launches the UI
python app.py --train   # train once and exit
python app.py --api     # launch FastAPI server

Run the Streamlit app directly:

streamlit run src/ui/streamlit_app.py --server.port 8501 --server.headless true
Live recording:

The app supports recording in-browser via microphone. Ensure you install the UI extras:

```bash
pip install -r requirements.txt  # includes audio-recorder-streamlit

On first use, your browser will ask for mic permissions. Use the "Record Audio" tab to capture and analyze speech.


## Expected Performance

CNNs with spectrograms typically achieve 70-90% accuracy on the RAVDESS dataset, while MLPs may perform slightly worse due to simpler feature inputs.

## Technologies Used

- TensorFlow/Keras: For building and training neural networks
- scikit-learn: For preprocessing and evaluation metrics
- librosa: For extracting audio features
- DEAP: For genetic algorithms to optimize hyperparameters
- datasets (Hugging Face): For loading the RAVDESS dataset

## Model Management and Reuse

The system is designed to train models once and then reuse them for predictions, making the application more efficient. This is implemented through the following components:

### ModelManager

The `model_manager.py` module provides a comprehensive system for managing trained models:

- **Model Registration**: Models are automatically registered after training with metadata and performance metrics
- **Model Selection**: The UI allows users to select from available pre-trained models
- **Model Reuse**: Once trained, models are saved and can be reused for future predictions without retraining

### Training Process

```bash
# Train a new CNN model
python -m src.main --train --model-type cnn --feature-type mel_spectrogram

# Train a new MLP model
python -m src.main --train --model-type mlp --feature-type mfcc

# Train with hyperparameter optimization
# (future) python -m src.main --train --model-type cnn --optimize

Model Selection in the UI

The application includes a dedicated "Model Management" section in the UI that allows users to:

View all available trained models
Select a model to use for predictions
Train new models when needed
View model performance metrics

Benefits of Model Reuse

Faster Startup: The application loads pre-trained models instead of retraining
Consistent Performance: Using the same model ensures consistent predictions
Efficiency: Avoid redundant training of models, saving computational resources
Multiple Models: Maintain and compare different model architectures (CNN vs MLP)

Model Directory Structure

models/
├── cnn_emotion_model.keras     # Primary CNN model (Keras format)
├── cnn_emotion_model.h5        # Backup CNN model (HDF5 format)
├── mlp_emotion_model.keras     # Primary MLP model (optional)
├── mlp_emotion_model.h5        # Backup MLP model (optional)
└── model_registry.json         # Registry with model metadata

Using the Model Management UI

The speech emotion classification system includes a comprehensive model management UI that provides the following features:

Model Selection

Browse all available trained models with their performance metrics
Select any trained model to use for predictions
View model details including creation date, size, and performance metrics

Model Comparison

Compare multiple trained models side-by-side
Visualize model performance using interactive charts
Review detailed metrics across different model architectures

Model Details

View detailed performance metrics for each model
Visualize model performance using radar charts
Access evaluation reports for deeper analysis

Training New Models

Train new models directly from the UI
Customize training parameters (epochs, batch size)
Enable hyperparameter optimization

Running the Application

To run the Streamlit application with model management features:

streamlit run src/ui/streamlit_app.py --server.port 8501 --server.headless true

The app attempts to load the latest model from models/ or recent training logs. If no model exists, it can initialize a default architecture, but you should train a model first for meaningful predictions.

Docker:

docker build -t speech-emotion-app .
docker run -p 8501:8501 speech-emotion-app

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.vscode		.vscode
__pycache__		__pycache__
backup/__pycache__		backup/__pycache__
demo_files		demo_files
features		features
huggingface_repo/speech-emotion-analyzer		huggingface_repo/speech-emotion-analyzer
logs/run_20250504_154714		logs/run_20250504_154714
models		models
results		results
samples		samples
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
driver.py		driver.py
requirements.txt		requirements.txt
setup.py		setup.py
speech_emotion.log		speech_emotion.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speech Emotion Classification System

Project Structure

Features

Installation

Usage

Model Selection in the UI

Benefits of Model Reuse

Model Directory Structure

Using the Model Management UI

Model Selection

Model Comparison

Model Details

Training New Models

Running the Application

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Rayyan9477/speech_emotion_classification

Folders and files

Latest commit

History

Repository files navigation

Speech Emotion Classification System

Project Structure

Features

Installation

Usage

Model Selection in the UI

Benefits of Model Reuse

Model Directory Structure

Using the Model Management UI

Model Selection

Model Comparison

Model Details

Training New Models

Running the Application

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages