This is a direct marketing optimization project that uses machine learning to predict customer propensity and optimize targeting strategies. The goal is to maximize revenue while adhering to constraints such as contact limitations and single-offer assignments.
├── data/
│ ├── DataScientist_CaseStudy_Dataset.csv # The original RAW dataset
│ ├── X.csv # Processed features dataset
│ ├── y.csv # Processed target labels dataset
│
├── lib/
│ ├── train_functions.py # Functions for model training and evaluation
│ ├── optim_utils.py # Reusable functions used in the Optimization step
│ ├── optim_functions.py # Objective function used in the Optimization step
│
├── output/
│ ├── top_15_pct_by_revenue.csv # CSV containing the final top 15% marketing recommendations
│ ├── top_prop_*.csv # CSV containing the final predictions of the 3 propensity models trained
│
├── train_params/
│ ├── *.json # JSON files of the corresponding best hyperparameters generated
│
├── part-1-eda.ipynb # Exploratory Data Analysis (EDA), sanitation, and preprocessing
├── part-2-data-modelling.ipynb # Model training, benchmarking, and revenue optimization
├── executive-summary.pdf # 2-page executive summary
├── requirements.txt # List of dependencies
├── README.md # Project documentation
- Conducted data cleaning and preprocessing.
- Handled missing values, outliers, redundant features, feature engineering, etc..
- Generated
X.csv
andy.csv
as inputs for model training.
- Built propensity models for:
- Consumer Loan
- Credit Card
- Mutual Fund
- Used LGBM as the primary model as I believe it works well for imbalanced datasets.
- Used Optuna for hyperparameter tuning to optimize model performance.
- Benchmarked 3 different model configurations to select the best-performing ones.
- Applied the best-performing models to estimate customer likelihood of conversion.
- Used the Expected Value formula to estimate Expected Revenue
- Selected the top 15% of clients and corresponding product offer that maximizes expected revenue.
- Produced the final targeting list as
top_15_pct_by_revenue.csv
.
Ensure you have Python installed. Clone this repository and install dependencies:
pip install -r requirements.txt
-
Data Exploration & Preprocessing:
- Run
part-1-eda.ipynb
to clean and preprocess data. - Outputs:
X.csv
andy.csv
in thedata/
folder.
- Run
-
Model Training & Evaluation:
- Run
part-2-data-modelling.ipynb
to train models and generate predictions. - Outputs:
- Final targeting strategy in
output/top_15_pct_by_revenue.csv
- Final targeting strategy in
- Run
- The best hyperparameters are stored in
train_params/*.json
. - To retrain using these parameters, load them in
part-2-data-modelling.ipynb
using theload_params()
function inlib.optim_utils
.
from lib.optim_utils import load_params
For any questions or clarifications, feel free to reach out:
- Email: reinbugnot@gmail.com
- LinkedIn: Rein Bugnot