This project provides a comprehensive, end-to-end customer analytics solution using the Online Retail II dataset. It transforms over one million raw transactional records into actionable business insights. The core of this project involves:
- RFM Segmentation: Identifying distinct customer segments such as "Champions" and "At-Risk" by analyzing their Recency, Frequency, and Monetary value to enable targeted marketing strategies.
- Predictive Churn Modeling: Building and deploying an XGBoost model to proactively identify customers with a high probability of churning, and analyzing the key factors that drive this behavior.
- Customer Lifetime Value (CLV) Forecasting: Implementing probabilistic models (BG/NBD and Gamma-Gamma) to predict the future purchasing behavior and monetary value of customers, guiding long-term strategic investments.
- Statistical A/B Testing: Designing and executing a simulated A/B test to statistically measure the impact of a discount campaign on customer spending, using the Mann-Whitney U test for significance.
The main business objectives of this project are:
- Identify the most valuable customer segments and their common characteristics.
- Pinpoint customers at high risk of churn and understand the factors influencing this risk.
- Estimate the future value (CLV) of customers to guide marketing budget allocation.
- Determine the effectiveness of a marketing campaign on increasing customer spending.
data/
: Contains raw, intermediate, and final datasets.images/
: Stores visual assets used in the README.notebooks/
: Houses the Jupyter Notebooks for each analysis stage.01_Data_Preprocessing_EDA.ipynb
02_RFM_and_Segmentation.ipynb
03_Churn_Prediction.ipynb
04_CLV_Analysis.ipynb
05_AB_Testing.ipynb
pyproject.toml
,uv.lock
: Files for modern dependency and environment management.README.md
: This project overview.
The dataset contains transactional data from a UK-based online retail company. Key fields used in the analysis include:
Feature | Description | Type |
---|---|---|
Invoice |
A 6-digit nominal, unique for each transaction. 'C' indicates a cancellation. | object |
StockCode |
A unique code assigned to each distinct product. | object |
Description |
The name of the product. | object |
Quantity |
The number of units of a product sold in a transaction. | int64 |
InvoiceDate |
The date and time when the transaction occurred. | object |
Price |
The unit price of the product in British Pounds (£). | float64 |
Customer ID |
A 5-digit unique identifier for each customer. | float64 |
Country |
The country where the customer resides. | object |
This project follows a sequential workflow, with each step detailed in its respective Jupyter Notebook.
-
Data Cleaning and EDA (
notebooks/01_Data_Preprocessing_EDA.ipynb
)- Cleans and preprocesses over 1 million records, handling missing values, duplicates, and cancellations.
- Performs feature engineering to create analysis-ready features.
- Applies
IsolationForest
for sophisticated outlier detection. - Conducts comprehensive exploratory data analysis (EDA) to uncover trends in sales over time, by country, and by product.
-
RFM Analysis & Segmentation (
notebooks/02_RFM_and_Segmentation.ipynb
)- Calculates Recency, Frequency, and Monetary (RFM) metrics for each customer.
- Segments customers into actionable groups like "Best Customers," "At-Risk," and "Promising Customers."
- Visualizes segment distributions and characteristics using 2D and 3D plots.
-
Churn Prediction (
notebooks/03_Churn_Prediction.ipynb
)- Defines churn based on a 90-day inactivity window.
- Builds and compares
Logistic Regression
,Random Forest
, andXGBoost
models. - Evaluates models using ROC-AUC, F1-score, and confusion matrices to identify the best-performing algorithm.
- Visualizes feature importance to understand key drivers of customer churn.
-
CLV Analysis (
notebooks/04_CLV_Analysis.ipynb
)- Implements the probabilistic BG/NBD model to predict future transaction frequency.
- Uses the Gamma-Gamma model to estimate the average monetary value of transactions.
- Develops a hybrid XGBoost model using BG/NBD features for enhanced CLV prediction.
- Compares CLV results with RFM segments for validation.
- Visualizes CLV distribution and identifies key customer groups.
-
A/B Testing (
notebooks/05_AB_Testing.ipynb
)- Simulates an A/B test scenario to measure the impact of a discount campaign.
- Applies appropriate statistical tests (Shapiro-Wilk for normality, Mann-Whitney U for comparison) to determine statistical significance.
- Provides clear visualizations and reporting of the test results.
- Ensure Python 3.12+ is installed.
- Clone the repository:
git clone https://github.com/mertafacan/Complete-Customer-Analytics-for-E-Commerce.git cd Complete-Customer-Analytics-for-E-Commerce
- (Recommended) Create and activate a virtual environment:
python -m venv .venv .\.venv\Scripts\Activate.ps1
- Install the required dependencies using a modern package manager.
uv
is recommended for its speed.- Using uv (Recommended):
pip install -U uv # Install uv if you haven't already uv pip install -e .
- Using pip:
pip install -U pip pip install -e .
- Using uv (Recommended):
- Launch Jupyter Notebook or JupyterLab and navigate to the
notebooks/
directory to explore the analyses. (Optional)jupyter notebook
Mert Afacan – https://www.linkedin.com/in/mert-afacan/ – mert0afacan@gmail.com