Skip to content

Commit aa809f6

Browse files
Update README.md
1 parent a9412ee commit aa809f6

File tree

1 file changed

+36
-36
lines changed

1 file changed

+36
-36
lines changed

README.md

Lines changed: 36 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@
66
[![TensorFlow](https://img.shields.io/badge/TensorFlow-2.4%2B-orange.svg)](https://tensorflow.org/)
77
[![License](https://img.shields.io/badge/license-MIT-green.svg)](https://opensource.org/licenses/MIT)
88

9-
## 👤 About this Project
9+
## About this Project
1010

11-
This portfolio project demonstrates my expertise in:
11+
This portfolio project demonstrates expertise in:
1212

1313
- **Deep Learning Architecture Design**: Created multiple neural network architectures for audio processing
1414
- **Transfer Learning**: Leveraged Wav2Vec 2.0 pre-trained models for improved feature extraction
@@ -17,17 +17,17 @@ This portfolio project demonstrates my expertise in:
1717
- **PyTorch & TensorFlow**: Utilized both frameworks for model development and training
1818
- **Model Optimization**: Iteratively improved model performance from 29.7% to 50.5% accuracy
1919

20-
This challenging project involves classifying speech into 8 distinct emotions. While commercial systems often focus on 3-4 emotions, my system achieves impressive results across all 8 emotion classes, outperforming random chance (12.5%) by 4× with **50.5% accuracy**.
20+
This challenging project involves classifying speech into 8 distinct emotions. While commercial systems often focus on 3-4 emotions, this system achieves impressive results across all 8 emotion classes, outperforming random chance (12.5%) by 4× with **50.5% accuracy**.
2121

22-
## 🌟 Project Highlights
22+
## Project Highlights
2323

2424
- Designed and implemented 4 different neural network architectures
2525
- Achieved 50.5% accuracy on 8-class emotion classification (4× better than random)
2626
- Created a real-time inference system for live emotion analysis
2727
- Developed a custom dataset preprocessing pipeline for the RAVDESS dataset
2828
- Authored detailed documentation and visualization tools
2929

30-
## 📊 Key Results
30+
## Key Results
3131

3232
![Confusion Matrix](docs/images/confusion_matrix.png)
3333
*Confusion matrix showing the model's performance across 8 emotion classes. Note the strong performance on Neutral (72%) and Calm (63%) emotions, with most confusion occurring between acoustically similar emotions like Happy/Surprised.*
@@ -72,42 +72,42 @@ Classification Report:
7272
weighted avg 0.52 0.51 0.50 320
7373
```
7474

75-
## 🔬 Model Evolution & Research Notebooks
75+
## Model Evolution & Research Notebooks
7676

77-
This project features a comprehensive series of Jupyter notebooks documenting my iterative model development process:
77+
This project features a comprehensive series of Jupyter notebooks documenting the iterative model development process:
7878

79-
### 📓 [Base Model (29.7% Accuracy)](docs/notebooks/04_Base_Model.ipynb)
80-
My initial CNN-based approach established a strong baseline with:
79+
### [Base Model (29.7% Accuracy)](docs/notebooks/04_Base_Model.ipynb)
80+
Initial CNN-based approach established a strong baseline with:
8181
- Convolutional layers for feature extraction from mel spectrograms
8282
- Recurrent neural networks (GRU) for temporal sequence modeling
8383
- Basic data augmentation techniques for improved generalization
8484
- Identified key challenges for speech emotion recognition
8585

86-
### 📓 [Enhanced Model (31.5% Accuracy)](docs/notebooks/05_Enhanced_Model.ipynb)
87-
Building on the base model, I incorporated:
86+
### [Enhanced Model (31.5% Accuracy)](docs/notebooks/05_Enhanced_Model.ipynb)
87+
Building on the base model, incorporated:
8888
- Self-attention mechanisms to focus on emotionally salient parts of speech
8989
- Deeper convolutional blocks with residual connections
9090
- Improved regularization techniques including dropout and batch normalization
9191
- Advanced learning rate scheduling with cosine annealing
9292

93-
### 📓 [Ultimate Model (33.3% Accuracy)](docs/notebooks/06_Ultimate_Model.ipynb)
94-
This complex architecture pushed the boundaries with:
93+
### [Ultimate Model (33.3% Accuracy)](docs/notebooks/06_Ultimate_Model.ipynb)
94+
Complex architecture that pushed the boundaries with:
9595
- Multi-modal feature extraction combining MFCCs, mel spectrograms, and spectral features
9696
- Full transformer architecture with multi-head self-attention
9797
- Squeeze-and-excitation blocks for channel-wise feature recalibration
9898
- Complex learning schedule with warmup and cosine annealing
9999
- 5-hour training time yielding only modest gains
100100

101-
### 📓 [Simplified Model (50.5% Accuracy)](docs/notebooks/07_Simplified_Model.ipynb)
102-
My best-performing model proved that focused architectural design beats complexity:
101+
### [Simplified Model (50.5% Accuracy)](docs/notebooks/07_Simplified_Model.ipynb)
102+
Best-performing model proved that focused architectural design beats complexity:
103103
- Streamlined model with 4 transformer layers and 8 attention heads
104104
- Focused feature extraction with optimal dimensionality (256 features)
105105
- Robust error handling and training stability
106106
- 1-hour training time with **17.2% absolute improvement** over the Ultimate Model
107107

108-
Each notebook contains comprehensive documentation, visualizations, and performance analyses to demonstrate my research process and technical insights.
108+
Each notebook contains comprehensive documentation, visualizations, and performance analyses that demonstrate the research process and technical insights.
109109

110-
## 🎭 Emotions Recognized
110+
## Emotions Recognized
111111

112112
The system can recognize the following 8 emotions from speech:
113113

@@ -120,7 +120,7 @@ The system can recognize the following 8 emotions from speech:
120120
- Disgust
121121
- Surprised
122122

123-
## 🚀 Quick Start
123+
## Quick Start
124124

125125
### Installation
126126

@@ -133,7 +133,7 @@ cd speech-emotion-recognition
133133
pip install -r requirements.txt
134134
```
135135

136-
## 🎮 Real-time Emotion Recognition
136+
## Real-time Emotion Recognition
137137

138138
![GUI Application](docs/images/gui_screenshot.png)
139139

@@ -169,7 +169,7 @@ chmod +x src/check_project.py
169169
python src/check_project.py
170170
```
171171

172-
## 📊 Dataset Processing
172+
## Dataset Processing
173173

174174
This project uses the [RAVDESS dataset](https://zenodo.org/record/1188976) (Ryerson Audio-Visual Database of Emotional Speech and Song). Follow these steps precisely:
175175

@@ -248,11 +248,11 @@ RAVDESS audio files follow this naming convention:
248248
- Repetition (01 = 1st repetition, 02 = 2nd repetition)
249249
- Actor (01 to 24. Odd-numbered actors are male, even-numbered actors are female)
250250

251-
My processing script handles this naming convention automatically to extract emotions and organize files.
251+
The processing script handles this naming convention automatically to extract emotions and organize files.
252252

253253
> **IMPORTANT**: The raw dataset (~25.6GB) and processed audio files are deliberately excluded from this repository due to their size. You must follow the steps above to prepare the dataset locally.
254254
255-
## 💾 Model Files
255+
## Model Files
256256

257257
Pre-trained models are not included in this repository due to their large size. After training your own models using the instructions below, they will be saved in the `models/` directory.
258258

@@ -263,11 +263,11 @@ To use a specific model for inference:
263263
python src/inference.py --model_path models/ravdess_simple/best_model.pt
264264
```
265265

266-
## 💻 Technical Implementation
266+
## Technical Implementation
267267

268268
### Architecture Evolution
269269

270-
My development process involved creating and refining several model architectures, each documented in detail through the project notebooks:
270+
The development process involved creating and refining several model architectures, each documented in detail through the project notebooks:
271271

272272
1. **Base Model (29.7% accuracy)**
273273
- Convolutional layers for feature extraction
@@ -288,7 +288,7 @@ My development process involved creating and refining several model architecture
288288
- Resource-intensive but limited generalization
289289
- Detailed in [06_Ultimate_Model.ipynb](docs/notebooks/06_Ultimate_Model.ipynb)
290290

291-
4. **Simplified Model (50.5% accuracy)**
291+
4. **Simplified Model (50.5% accuracy)**
292292
- Focused architecture with 4 transformer layers
293293
- 8 attention heads with 256 feature dimensions
294294
- Robust error handling and training stability
@@ -315,9 +315,9 @@ Analyzing the confusion matrix revealed:
315315

316316
These insights informed targeted improvements in the model architecture.
317317

318-
## 📈 Model Development Journey
318+
## Model Development Journey
319319

320-
This project showcases my iterative approach to deep learning model development:
320+
This project showcases an iterative approach to deep learning model development:
321321

322322
1. **Initial Exploration**: Started with baseline CNN models and traditional audio features
323323
2. **Architecture Exploration**: Tested various neural network architectures (CNN, RNN, Transformer)
@@ -326,9 +326,9 @@ This project showcases my iterative approach to deep learning model development:
326326
5. **Error Analysis**: Identified common misclassifications and model weaknesses
327327
6. **Model Simplification**: Found that a focused, simplified architecture performed best
328328

329-
Each iteration provided insights that informed the next development phase, ultimately leading to my best-performing model with a **50.5%** accuracy on this challenging 8-class task.
329+
Each iteration provided insights that informed the next development phase, ultimately leading to the best-performing model with a **50.5%** accuracy on this challenging 8-class task.
330330

331-
## 🔧 Training Your Own Model
331+
## Training Your Own Model
332332

333333
```bash
334334
# Prepare RAVDESS dataset
@@ -345,7 +345,7 @@ python src/train_simplified.py \
345345
bash train_optimal.sh
346346
```
347347

348-
## 🛠️ Tools and Technologies
348+
## Tools and Technologies
349349

350350
- **Languages**: Python 3.8+
351351
- **Deep Learning Frameworks**: PyTorch 1.7+, TensorFlow 2.4+
@@ -354,11 +354,11 @@ bash train_optimal.sh
354354
- **Visualization**: TensorBoard, Matplotlib, Plotly
355355
- **Development Tools**: Git, Docker, Jupyter Notebooks
356356

357-
## 🤝 Contributing
357+
## Contributing
358358

359359
Contributions are welcome! Please feel free to submit a Pull Request.
360360

361-
## Troubleshooting
361+
## Troubleshooting
362362

363363
### PyAudio Installation Issues
364364

@@ -384,24 +384,24 @@ If the project doesn't seem to find certain files or directories:
384384
2. Check that all paths are correctly set relative to the project root
385385
3. Use `src/check_project.py` to verify the project structure
386386

387-
## 🔗 References
387+
## References
388388

389389
1. [RAVDESS Dataset](https://zenodo.org/record/1188976)
390390
2. [Wav2Vec 2.0 Paper](https://arxiv.org/abs/2006.11477)
391391
3. [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
392392
4. [Speech Emotion Recognition: Literature Review](https://arxiv.org/abs/2107.09712)
393393

394-
## 📄 License
394+
## License
395395

396396
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
397397

398-
## 🙏 Acknowledgments
398+
## Acknowledgments
399399

400400
- The RAVDESS dataset creators for providing high-quality emotional speech data
401401
- The PyTorch and torchaudio teams for their excellent frameworks
402402
- The research community for advancing speech emotion recognition techniques
403403

404-
## 📬 Contact & Connect
404+
## Contact & Connect
405405

406406
- **LinkedIn**: [Vatsal Mehta](https://www.linkedin.com/in/vatsal-mehta-aa3a6219a/)
407407
- **GitHub**: [@vatsalmehta2001](https://github.com/vatsalmehta2001)

0 commit comments

Comments
 (0)