This project implements a speech emotion classification system using neural networks and genetic algorithms for optimization. The system classifies emotions such as calm, happy, sad, angry, fearful, surprise, and disgust from speech audio using the RAVDESS dataset.
speech_emotion_classification/
│
├── src/ # Source code package
│ ├── data/ # Data loading and processing
│ │ ├── __init__.py
│ │ └── data_loader.py # Dataset loading and splitting
│ │
│ ├── features/ # Feature extraction
│ │ ├── __init__.py
│ │ └── feature_extractor.py # Audio feature extraction
│ │
│ ├── models/ # Model definitions
│ │ ├── __init__.py
│ │ ├── emotion_model.py # Model architectures
│ │ ├── trainer.py # Model training and evaluation
│ │ └── model_manager.py # Model management and tracking
│ │
│ ├── utils/ # Utility functions
│ │ ├── __init__.py
│ │ └── monkey_patch.py # TensorFlow fixes
│ │
│ ├── visualization/ # Visualization tools
│ │ ├── __init__.py
│ │ └── visualizer.py # Visualization utilities
│ │
│ └── ui/ # User interface components
│ ├── __init__.py
│ ├── app.py # Streamlit application
│ └── dashboard.py # Analysis dashboard
│ └── scripts/ # Utility scripts
│ └── train.py # Wrapper to launch training with defaults
│
├── tests/ # Test suite
│ ├── conftest.py # Test fixtures
│ └── test_*.py # Test modules
│
├── models/ # Saved models
├── results/ # Analysis results
├── logs/ # Training logs
├── demo_files/ # Demo audio files
├── samples/ # Sample audio files
├── uploads/ # User uploads
│
├── setup_package.py # Package installation setup
├── requirements.txt # Project dependencies
└── README.md # Project documentation
- Data loading and preprocessing using the RAVDESS dataset from Hugging Face
- Feature extraction using librosa (MFCCs and spectrograms)
- Neural network models (MLP and CNN) implemented with TensorFlow/Keras
- Model training with early stopping and comprehensive evaluation metrics
- Hyperparameter optimization using genetic algorithms via DEAP
- Modular and well-documented codebase
pip install -r requirements.txt
If you're on Windows and encounter TensorFlow DLL issues, ensure you install a compatible TensorFlow version for your Python and CUDA setup. This project includes a TensorFlow monkey patch to avoid known signbit/argmax issues.
Basic CLI for training/evaluation:
# Train CNN using mel spectrograms (recommended)
python -m src.main --train --model-type cnn --feature-type mel_spectrogram
# Train MLP using MFCCs
python -m src.main --train --model-type mlp --feature-type mfcc
# Evaluate an existing model by ID (see models/model_registry.json)
python -m src.main --evaluate --model-id <MODEL_ID>
Run the app via unified driver (recommended):
pip install -r requirements.txt
python app.py # ensures model exists, then launches the UI
python app.py --train # train once and exit
python app.py --api # launch FastAPI server
Run the Streamlit app directly:
streamlit run src/ui/streamlit_app.py --server.port 8501 --server.headless true
Live recording:
The app supports recording in-browser via microphone. Ensure you install the UI extras:
```bash
pip install -r requirements.txt # includes audio-recorder-streamlit
On first use, your browser will ask for mic permissions. Use the "Record Audio" tab to capture and analyze speech.
## Expected Performance
CNNs with spectrograms typically achieve 70-90% accuracy on the RAVDESS dataset, while MLPs may perform slightly worse due to simpler feature inputs.
## Technologies Used
- TensorFlow/Keras: For building and training neural networks
- scikit-learn: For preprocessing and evaluation metrics
- librosa: For extracting audio features
- DEAP: For genetic algorithms to optimize hyperparameters
- datasets (Hugging Face): For loading the RAVDESS dataset
## Model Management and Reuse
The system is designed to train models once and then reuse them for predictions, making the application more efficient. This is implemented through the following components:
### ModelManager
The `model_manager.py` module provides a comprehensive system for managing trained models:
- **Model Registration**: Models are automatically registered after training with metadata and performance metrics
- **Model Selection**: The UI allows users to select from available pre-trained models
- **Model Reuse**: Once trained, models are saved and can be reused for future predictions without retraining
### Training Process
```bash
# Train a new CNN model
python -m src.main --train --model-type cnn --feature-type mel_spectrogram
# Train a new MLP model
python -m src.main --train --model-type mlp --feature-type mfcc
# Train with hyperparameter optimization
# (future) python -m src.main --train --model-type cnn --optimize
The application includes a dedicated "Model Management" section in the UI that allows users to:
- View all available trained models
- Select a model to use for predictions
- Train new models when needed
- View model performance metrics
- Faster Startup: The application loads pre-trained models instead of retraining
- Consistent Performance: Using the same model ensures consistent predictions
- Efficiency: Avoid redundant training of models, saving computational resources
- Multiple Models: Maintain and compare different model architectures (CNN vs MLP)
models/
├── cnn_emotion_model.keras # Primary CNN model (Keras format)
├── cnn_emotion_model.h5 # Backup CNN model (HDF5 format)
├── mlp_emotion_model.keras # Primary MLP model (optional)
├── mlp_emotion_model.h5 # Backup MLP model (optional)
└── model_registry.json # Registry with model metadata
The speech emotion classification system includes a comprehensive model management UI that provides the following features:
- Browse all available trained models with their performance metrics
- Select any trained model to use for predictions
- View model details including creation date, size, and performance metrics
- Compare multiple trained models side-by-side
- Visualize model performance using interactive charts
- Review detailed metrics across different model architectures
- View detailed performance metrics for each model
- Visualize model performance using radar charts
- Access evaluation reports for deeper analysis
- Train new models directly from the UI
- Customize training parameters (epochs, batch size)
- Enable hyperparameter optimization
To run the Streamlit application with model management features:
streamlit run src/ui/streamlit_app.py --server.port 8501 --server.headless true
The app attempts to load the latest model from models/
or recent training logs. If no model exists, it can initialize a default architecture, but you should train a model first for meaningful predictions.
Docker:
docker build -t speech-emotion-app .
docker run -p 8501:8501 speech-emotion-app