ML-Assignment-Classification-Problem

Breast Cancer Classification

This project implements various classification algorithms on the Breast Cancer dataset available in the sklearn library. The goal is to compare the performance of different algorithms and determine the best classifier for this dataset.

1. Dataset and Preprocessing

Dataset: The Breast Cancer dataset from the sklearn library.
Preprocessing Steps:
1. Missing Values: Handled using the mean imputation strategy. No missing values were present in this dataset.
2. Feature Scaling: Standardized all features using StandardScaler to ensure algorithms like SVM and k-NN perform optimally.
3. Splitting: Divided the dataset into training and testing sets using an 80-20 split.

2. Classification Algorithms

The following classification algorithms were implemented:

Logistic Regression
- A linear model that predicts the probability of a binary outcome using a logistic function.
- Achieved the highest accuracy (0.97), likely due to the linear separability of the data.
Decision Tree Classifier
- A non-linear algorithm that splits data into subsets based on feature thresholds.
- Accuracy: 0.95. It may have slightly overfitted the data.
Random Forest Classifier
- An ensemble method that uses multiple decision trees and averages their results.
- Accuracy: 0.96. Robust to overfitting and effective on this dataset.
Support Vector Machine (SVM)
- Finds the optimal hyperplane that separates classes with maximum margin.
- Accuracy: 0.96. Performed well due to feature scaling.
k-Nearest Neighbors (k-NN)
- Classifies samples based on their nearest neighbors in the feature space.
- Accuracy: 0.95. Sensitive to the choice of k and data scaling.

3. Model Comparison

The performance of the algorithms is summarized below:

Algorithm	Accuracy
Logistic Regression	0.97
Random Forest	0.96
Support Vector Machine	0.96
Decision Tree	0.95
k-NN	0.95

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ML-Assignment-4-Classification Problem.ipynb		ML-Assignment-4-Classification Problem.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML-Assignment-Classification-Problem

Breast Cancer Classification

1. Dataset and Preprocessing

2. Classification Algorithms

3. Model Comparison

About

Uh oh!

Releases

Packages

Languages

SuvidyaNP/ML-Classification-Problem

Folders and files

Latest commit

History

Repository files navigation

ML-Assignment-Classification-Problem

Breast Cancer Classification

1. Dataset and Preprocessing

2. Classification Algorithms

3. Model Comparison

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages