This project aims to detect fraudulent car insurance claims using machine learning classification techniques. The dataset consists of various categorical and numerical variables representing customer, vehicle, and claim characteristics.
- Identify potential fraudulent claims in a binary classification setup
- Handle class imbalance using SMOTE and alternative resampling techniques
- Compare model performance
- R
- Handling Imbalance: SMOTE, ROSE, Undersampling, Oversampling
- Models: Logistic Regression, SVM, Classification Tree, XG Boost, Random Forest
- Hyperparameter Tuning Methods: GridSearch with Cross Validation, Bayesian Optimization
- Evaluation metrics: Accuracy, Precision, Recall, F1 Score, ROC-AUC
- Model Interpretation: Partial Dependence Plots
/code/
β All R codes for preprocessing, modeling, and evaluation/report/
β PDF report and presentation slides/data/
β Includes the dataset
Distributed under the MIT License. See LICENSE
for more information.