You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -17,17 +17,17 @@ This portfolio project demonstrates my expertise in:
17
17
-**PyTorch & TensorFlow**: Utilized both frameworks for model development and training
18
18
-**Model Optimization**: Iteratively improved model performance from 29.7% to 50.5% accuracy
19
19
20
-
This challenging project involves classifying speech into 8 distinct emotions. While commercial systems often focus on 3-4 emotions, my system achieves impressive results across all 8 emotion classes, outperforming random chance (12.5%) by 4× with **50.5% accuracy**.
20
+
This challenging project involves classifying speech into 8 distinct emotions. While commercial systems often focus on 3-4 emotions, this system achieves impressive results across all 8 emotion classes, outperforming random chance (12.5%) by 4× with **50.5% accuracy**.
21
21
22
-
## 🌟 Project Highlights
22
+
## Project Highlights
23
23
24
24
- Designed and implemented 4 different neural network architectures
25
25
- Achieved 50.5% accuracy on 8-class emotion classification (4× better than random)
26
26
- Created a real-time inference system for live emotion analysis
27
27
- Developed a custom dataset preprocessing pipeline for the RAVDESS dataset
28
28
- Authored detailed documentation and visualization tools
*Confusion matrix showing the model's performance across 8 emotion classes. Note the strong performance on Neutral (72%) and Calm (63%) emotions, with most confusion occurring between acoustically similar emotions like Happy/Surprised.*
@@ -72,42 +72,42 @@ Classification Report:
72
72
weighted avg 0.52 0.51 0.50 320
73
73
```
74
74
75
-
## 🔬 Model Evolution & Research Notebooks
75
+
## Model Evolution & Research Notebooks
76
76
77
-
This project features a comprehensive series of Jupyter notebooks documenting my iterative model development process:
77
+
This project features a comprehensive series of Jupyter notebooks documenting the iterative model development process:
78
78
79
-
### 📓 [Base Model (29.7% Accuracy)](docs/notebooks/04_Base_Model.ipynb)
80
-
My initial CNN-based approach established a strong baseline with:
79
+
### [Base Model (29.7% Accuracy)](docs/notebooks/04_Base_Model.ipynb)
80
+
Initial CNN-based approach established a strong baseline with:
81
81
- Convolutional layers for feature extraction from mel spectrograms
82
82
- Recurrent neural networks (GRU) for temporal sequence modeling
83
83
- Basic data augmentation techniques for improved generalization
84
84
- Identified key challenges for speech emotion recognition
85
85
86
-
### 📓 [Enhanced Model (31.5% Accuracy)](docs/notebooks/05_Enhanced_Model.ipynb)
87
-
Building on the base model, I incorporated:
86
+
### [Enhanced Model (31.5% Accuracy)](docs/notebooks/05_Enhanced_Model.ipynb)
87
+
Building on the base model, incorporated:
88
88
- Self-attention mechanisms to focus on emotionally salient parts of speech
89
89
- Deeper convolutional blocks with residual connections
90
90
- Improved regularization techniques including dropout and batch normalization
91
91
- Advanced learning rate scheduling with cosine annealing
92
92
93
-
### 📓 [Ultimate Model (33.3% Accuracy)](docs/notebooks/06_Ultimate_Model.ipynb)
94
-
This complex architecture pushed the boundaries with:
93
+
### [Ultimate Model (33.3% Accuracy)](docs/notebooks/06_Ultimate_Model.ipynb)
94
+
Complex architecture that pushed the boundaries with:
95
95
- Multi-modal feature extraction combining MFCCs, mel spectrograms, and spectral features
96
96
- Full transformer architecture with multi-head self-attention
97
97
- Squeeze-and-excitation blocks for channel-wise feature recalibration
98
98
- Complex learning schedule with warmup and cosine annealing
99
99
- 5-hour training time yielding only modest gains
100
100
101
-
### 📓 [Simplified Model (50.5% Accuracy)](docs/notebooks/07_Simplified_Model.ipynb)
102
-
My best-performing model proved that focused architectural design beats complexity:
101
+
### [Simplified Model (50.5% Accuracy)](docs/notebooks/07_Simplified_Model.ipynb)
102
+
Best-performing model proved that focused architectural design beats complexity:
103
103
- Streamlined model with 4 transformer layers and 8 attention heads
104
104
- Focused feature extraction with optimal dimensionality (256 features)
105
105
- Robust error handling and training stability
106
106
- 1-hour training time with **17.2% absolute improvement** over the Ultimate Model
107
107
108
-
Each notebook contains comprehensive documentation, visualizations, and performance analyses to demonstrate my research process and technical insights.
108
+
Each notebook contains comprehensive documentation, visualizations, and performance analyses that demonstrate the research process and technical insights.
109
109
110
-
## 🎭 Emotions Recognized
110
+
## Emotions Recognized
111
111
112
112
The system can recognize the following 8 emotions from speech:
113
113
@@ -120,7 +120,7 @@ The system can recognize the following 8 emotions from speech:
This project uses the [RAVDESS dataset](https://zenodo.org/record/1188976) (Ryerson Audio-Visual Database of Emotional Speech and Song). Follow these steps precisely:
- Actor (01 to 24. Odd-numbered actors are male, even-numbered actors are female)
250
250
251
-
My processing script handles this naming convention automatically to extract emotions and organize files.
251
+
The processing script handles this naming convention automatically to extract emotions and organize files.
252
252
253
253
> **IMPORTANT**: The raw dataset (~25.6GB) and processed audio files are deliberately excluded from this repository due to their size. You must follow the steps above to prepare the dataset locally.
254
254
255
-
## 💾 Model Files
255
+
## Model Files
256
256
257
257
Pre-trained models are not included in this repository due to their large size. After training your own models using the instructions below, they will be saved in the `models/` directory.
258
258
@@ -263,11 +263,11 @@ To use a specific model for inference:
My development process involved creating and refining several model architectures, each documented in detail through the project notebooks:
270
+
The development process involved creating and refining several model architectures, each documented in detail through the project notebooks:
271
271
272
272
1.**Base Model (29.7% accuracy)**
273
273
- Convolutional layers for feature extraction
@@ -288,7 +288,7 @@ My development process involved creating and refining several model architecture
288
288
- Resource-intensive but limited generalization
289
289
- Detailed in [06_Ultimate_Model.ipynb](docs/notebooks/06_Ultimate_Model.ipynb)
290
290
291
-
4.**Simplified Model (50.5% accuracy)** ✅
291
+
4.**Simplified Model (50.5% accuracy)**
292
292
- Focused architecture with 4 transformer layers
293
293
- 8 attention heads with 256 feature dimensions
294
294
- Robust error handling and training stability
@@ -315,9 +315,9 @@ Analyzing the confusion matrix revealed:
315
315
316
316
These insights informed targeted improvements in the model architecture.
317
317
318
-
## 📈 Model Development Journey
318
+
## Model Development Journey
319
319
320
-
This project showcases my iterative approach to deep learning model development:
320
+
This project showcases an iterative approach to deep learning model development:
321
321
322
322
1.**Initial Exploration**: Started with baseline CNN models and traditional audio features
323
323
2.**Architecture Exploration**: Tested various neural network architectures (CNN, RNN, Transformer)
@@ -326,9 +326,9 @@ This project showcases my iterative approach to deep learning model development:
326
326
5.**Error Analysis**: Identified common misclassifications and model weaknesses
327
327
6.**Model Simplification**: Found that a focused, simplified architecture performed best
328
328
329
-
Each iteration provided insights that informed the next development phase, ultimately leading to my best-performing model with a **50.5%** accuracy on this challenging 8-class task.
329
+
Each iteration provided insights that informed the next development phase, ultimately leading to the best-performing model with a **50.5%** accuracy on this challenging 8-class task.
0 commit comments