A machine learning project to classify blockchain transactions into sales, purchases, transfers, scams, and phishing attempts.
This project explores the application of various machine learning models to predict transaction types within blockchain data. The dataset was sourced from Kaggle and contained ~78,600 records. The primary goal is to improve fraud detection in blockchain transactions using effective classification techniques.
โ
Implemented multiple ML models for classification:
โข Random Forest
โข K-Nearest Neighbors (KNN)
โข Gaussian Naive Bayes
โข Decision Tree
โข AdaBoost
โ
Applied effective preprocessing techniques like:
โข Oversampling, Undersampling, Stratified Learning, and SMOTE
โข Information Gain for feature selection
โข Min-Max Normalization for feature scaling
โ
Evaluated model performance using metrics like accuracy, precision, recall, and F1-score.
- Source: Kaggle Blockchain Dataset
- Size: ~78,600 records
- Attributes: Various transaction details, including timestamps, regions, and behavioral patterns.
Data Imbalance Challenge: The dataset had a skewed distribution with legitimate transactions heavily outnumbering fraudulent ones. Techniques such as oversampling, undersampling, and SMOTE were implemented to mitigate this issue.
- Dropped Irrelevant Features: Removed attributes like
timestamp
,sending_address
, andreceiving_address
. - Handling Missing Values: Used median imputation for numerical values.
- Categorical Encoding: Applied Label Encoding to variables such as
location_region
,purchase_pattern
,age_group
, andanomaly
. - Feature Scaling: Used Min-Max Normalization to ensure consistent feature scaling.
Model | Key Hyperparameters | Best Accuracy |
---|---|---|
AdaBoost | n_estimators , learning_rate |
72.9% |
Gaussian Naive Bayes | var_smoothing |
69.2% |
KNN | n_neighbors , weights |
97.8% |
Random Forest | n_estimators , min_samples_split |
97.9% |
โ Best-performing Model: Random Forest Classifier
- Random Forest achieved the highest accuracy of 97.9% and demonstrated consistent performance across various configurations.
- KNN closely followed with an accuracy of 97.8%, excelling particularly in 'purchase' and 'sale' transactions.
- The AdaBoost and Gaussian Naive Bayes models struggled with certain transaction types, such as 'phishing' and 'scam'.
๐ Exploring additional ensemble models like XGBoost and Gradient Boosting.
๐งน Investigating advanced preprocessing techniques for better feature extraction.
๐ง Improving model performance for minority class predictions through enhanced sampling techniques.
- Omnia Osama Ahmed
- Sara Imad Hamdan
- Nour Bashar Soukieh
This project was developed for the course Artificial Intelligence (CSC406) In Abu Dhabi Uniersity.
๐ป Built with โค๏ธ for secure digital transactions in the Metaverse!