A machine learning-based web application that classifies emails as Spam or Not Spam using Natural Language Processing (NLP) and Logistic Regression. This project uses Scikit-learn, TensorFlow, Flask, and NLTK, and is built with a simple yet functional user interface.
Email spam is a major issue in digital communication, often leading to security risks and lost productivity. This project provides an effective solution using machine learning and NLP to classify emails as spam or not spam.
Using a Kaggle dataset, the model is trained with Logistic Regression after applying text preprocessing techniques like stopword removal and lemmatization via NLTK. Text is vectorized using CountVectorizer, and the model achieves a strong 95% accuracy.
A simple Flask web app allows users to input email text, view predictions, and see model performance in real-time. This project demonstrates a complete ML workflow from training to deployment.
-
Logistic Regression-based spam classification
-
NLP preprocessing using NLTK
-
Text vectorization using CountVectorizer
-
95% model accuracy on test data
-
Interactive web interface using Flask
-
Displays both prediction result and original input
-
Terminal-based prediction loop for new entries
-
Clean and modular code structure
-
Contents:
-
Text-based email messages
-
Labels:
spam
orham
(not spam)
-
-
Preprocessing:
-
Lowercasing
-
Stopword removal
-
Lemmatization
-
-
Model Used: Logistic Regression
-
Vectorizer: CountVectorizer
-
Accuracy:
95%
-
Evaluation Metrics:
Category | Tools/Technologies |
---|---|
Libraries | Scikit-learn, TensorFlow, NLTK, Flask |
Techniques | NLP, Logistic Regression, CountVectorizer |
Tools | Jupyter Notebook, VS Code |
Language | Python |
Deployment | Flask App (Localhost) |
Muqadas Ejaz
BS Computer Science (AI Specialization)
Machine Learning & Computer Vision Enthusiast
π« Connect with me on LinkedIn
π GitHub: github.com/muqadasejaz