Spam Message Detector

A Python project that classifies SMS text messages as spam or not spam using several machine learning models. The project demonstrates preprocessing, feature extraction, and model evaluation on a real dataset of messages.

🚀 Project Overview

This project explores different approaches to detect spam messages, including:

Count Vectorization and TF-IDF Vectorization of text data.
Extraction of additional features like:
- Message length
- Number of digits
- Number of non-word characters
Training multiple classifiers:
- Multinomial Naive Bayes
- Support Vector Machine (SVM)
- Logistic Regression
Evaluating models using ROC AUC scores.

By combining classic text processing with engineered features, the models achieve high performance in spam detection.

📂 Project Structure

.
├── LICENSE
├── README.md
├── spam.csv
└── spam_detector.py

LICENSE: MIT License.
README.md: Project description and usage.
spam.csv: Dataset of labeled SMS messages.
spam_detector.py: Main script containing all preprocessing and model training functions.

📊 Dataset

The dataset (spam.csv) includes SMS text messages labeled as:

spam: Messages intended for advertising, fraud, or phishing.
ham: Regular messages.

Each row contains:

text: The message content.
target: Label (1 = spam, 0 = not spam).

⚙️ Requirements

To run this project, install the following Python libraries:

pip install pandas numpy scikit-learn

🧩 How to Run

Clone the repository or download the files.
Make sure spam.csv and spam_detector.py are in the same folder.
Run the script:

python spam_detector.py

Each function (answer_one() to answer_eleven()) can be called to see the results of different analysis steps and models.

📈 Example Outputs

Percentage of spam messages: ~13%
Longest token in vocabulary: "com1win150ppmx3age16subscription"
Naive Bayes AUC Score: ~0.99
SVM AUC Score: ~0.99
Logistic Regression AUC Score: ~0.99

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙌 Acknowledgements

This project was inspired by classic text classification techniques and demonstrates how combining vectorized features with engineered metadata can improve spam detection performance.

Feel free to use and adapt it in your own projects!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spam Message Detector

🚀 Project Overview

📂 Project Structure

📊 Dataset

⚙️ Requirements

🧩 How to Run

📈 Example Outputs

📝 License

🙌 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
spam.csv		spam.csv
spam_detector.py		spam_detector.py

License

Mukeshthenraj/spam-message-detector

Folders and files

Latest commit

History

Repository files navigation

Spam Message Detector

🚀 Project Overview

📂 Project Structure

📊 Dataset

⚙️ Requirements

🧩 How to Run

📈 Example Outputs

📝 License

🙌 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages