This project implements a Spam Email Detection system using a Convolutional Neural Network (CNN). The model classifies emails as either Spam or Ham (Not Spam) based on their content. Deep learning techniques are applied to enhance accuracy compared to traditional machine learning approaches.
The dataset used for training and evaluation consists of labeled emails categorized as spam or ham. It is preprocessed to extract text features and converted into numerical representations suitable for CNN processing.
- Text Cleaning: Removal of special characters, stopwords, and extra spaces.
- Tokenization: Conversion of text into sequences.
- Padding: Ensuring uniform input size for CNN.
- Embedding Layer: Word embeddings for improved feature extraction.
- Python
- TensorFlow / Keras
- Natural Language Processing (NLP)
- CNN for Text Classification
- Jupyter Notebook / Google Colab
The CNN model is structured as follows:
- Embedding Layer - Converts words into dense vectors.
- Convolutional Layers - Extracts spatial features from text sequences.
- Max-Pooling Layers - Reduces dimensionality and retains important information.
- Fully Connected Layers - Final decision-making layers using dense layers.
- Softmax Activation - Outputs probability for spam/ham classification.
git clone https://github.com/ayushk028/spam-email-detection-cnn.git
cd spam-email-detection-cnn
pip install -r requirements.txt
python train.py
python test.py
- Accuracy: Achieved an accuracy of ~90% on test data.
- Loss Function: Categorical Cross-Entropy.
- Optimization Algorithm: Adam Optimizer.
- Evaluation Metrics: Precision, Recall, F1-Score.
The model effectively differentiates between spam and non-spam emails, outperforming traditional ML models such as Naive Bayes and SVM in terms of accuracy and generalization.
- Implement Bidirectional LSTMs to improve sequential understanding.
- Use pre-trained word embeddings (e.g., Word2Vec, GloVe) for better text representation.
- Deploy the model as a web service or API.
Feel free to fork this repository and contribute! Pull requests are welcome.
This project is open-source and available under the MIT License.
For queries, reach out at ayushk028.github.io.