Skip to content

Commit 3a75776

Browse files
committed
Causal demo working
1 parent 90fa501 commit 3a75776

10 files changed

+1894
-0
lines changed

ANALYSIS_SAMMPLE.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
2+
3+
## Causal Diagram
4+
![DAG](causal_diagram.png "Causal Diagram")
5+
6+
## Data Analysis
7+
![Data Analysis](correlation_plots_with_seasonality.png "Data Analysis")
8+
9+
## Data Analysis
10+
![Monthly Trends](monthly_trends_all_indicators.png "Monthly Trends")
11+
12+
## Sample Output
13+
```
14+
{
15+
1: month 0.968259
16+
v_holiday 0.010343
17+
v_email 0.004659
18+
v_influencer 0.003849
19+
v_launch 0.003722
20+
v_campaign 0.002841
21+
v_customer 0.001837
22+
v_awareness 0.001256
23+
v_promotion 0.001195
24+
v_boost 0.001068
25+
v_ads 0.000972
26+
v_standard 0.000000
27+
v_special 0.000000
28+
v_retention 0.000000
29+
v_social 0.000000
30+
v_season 0.000000
31+
v_school 0.000000
32+
v_sale 0.000000
33+
v_newsletter 0.000000
34+
v_program 0.000000
35+
v_product 0.000000
36+
v_partnership 0.000000
37+
v_media 0.000000
38+
v_display 0.000000
39+
v_brand 0.000000
40+
v_summer 0.000000
41+
dtype: float64
42+
}
43+
```
44+
## What does it mean?
45+
46+
This output shows feature importance from the causalBert model that incorporates text data. Let me explain what it means:
47+
48+
The numbers represent the relative importance of each feature in determining the causal effect of high marketing spend on sales. The values sum to 1 (or 100%), indicating the proportional contribution of each feature.
49+
50+
### Key insights from this output:
51+
52+
* **Month dominance (96.83%):** `month` is overwhelmingly the most important feature, accounting for about 96.8% of the causal effect.
53+
* This confirms our understanding that seasonality (represented by `month`) is the primary causal factor affecting sales.
54+
* This makes sense in our simulated data where we explicitly built seasonality as a strong confounding variable.
55+
* **Campaign type importance:** The features prefixed with `v_` are derived from the text in the campaign descriptions.
56+
* The model has broken down the campaign descriptions into individual words/terms and assessed their importance.
57+
* `v_holiday` has the second highest importance (1.03%), suggesting holiday-themed campaigns have some causal effect beyond just the month/season.
58+
* `v_email` (0.47%), `v_influencer` (0.38%), and `v_launch` (0.37%) show modest causal importance.
59+
* **Zero importance features:** Many campaign-related features show zero importance (`v_standard`, `v_special`, `v_retention`, etc.).
60+
* This suggests these campaign elements don't have a measurable causal effect on sales after controlling for month/seasonality.
61+
* The model has determined these aspects of campaigns don't meaningfully contribute to the causal relationship.
62+
63+
### What this tells us for your presentation:
64+
65+
* **The power of causal analysis:** This shows how causal analysis can identify that what appears to be a strong marketing effect (in correlation analysis) is actually largely explained by seasonality.
66+
* **Insight for marketing strategy:** You could demonstrate that while high marketing spend does have *some* causal effect, the bulk of sales variation comes from seasonality. The most effective strategy would focus marketing dollars on certain campaign types during key seasonal periods.
67+
* **Specific campaign elements matter:** You could highlight that holiday-themed messaging, email campaigns, and influencer partnerships show measurable causal effects while other campaign elements don't.
68+
* **Practical takeaway:** For your Google Next demonstration, this presents a compelling story: *"Traditional analytics might tell you to just spend more on marketing, but causal AI reveals that **when** and **how** you spend matters far more than **how much**."*
69+
70+
This is exactly the kind of insight that showcases the value of combining knowledge graphs and generative AI for causal reasoning - identifying the true drivers of business outcomes beyond simple correlations.

causalBertv2.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
import pandas as pd
2+
from causalnlp import CausalInferenceModel
3+
from lightgbm import LGBMRegressor
4+
5+
# Load your data with columns for seasonality, marketing_spend, website_traffic, and sales
6+
df = pd.read_csv('marketing_sales_daily_data.csv')
7+
8+
# 1. Impact of seasonality on marketing spend
9+
model_seasonality_marketing = CausalInferenceModel(
10+
df,
11+
method='t-learner',
12+
treatment_col='is_high_season', # Binary treatment (0/1)
13+
outcome_col='marketing_spend',
14+
include_cols=['month', 'weekday']
15+
)
16+
model_seasonality_marketing.fit()
17+
18+
# 2. Impact of marketing spend on website traffic
19+
model_marketing_traffic = CausalInferenceModel(
20+
df,
21+
method='t-learner',
22+
treatment_col='high_marketing_spend', # Binary treatment (0/1)
23+
outcome_col='website_traffic',
24+
include_cols=['month', 'weekday', 'is_high_season']
25+
)
26+
model_marketing_traffic.fit()
27+
28+
# 3. Impact of website traffic on sales
29+
model_traffic_sales = CausalInferenceModel(
30+
df,
31+
method='t-learner',
32+
treatment_col='high_website_traffic', # Binary treatment (0/1)
33+
outcome_col='sales',
34+
include_cols=['month', 'weekday', 'is_high_season', 'high_marketing_spend']
35+
)
36+
model_traffic_sales.fit()
37+
38+
# Average Treatment Effect (ATE)
39+
seasonality_marketing_effect = model_seasonality_marketing.estimate_ate()
40+
print(f"Effect of high season on marketing spend: {seasonality_marketing_effect['ate']}")
41+
42+
# Conditional Average Treatment Effect (CATE)
43+
holiday_effect = model_seasonality_marketing.estimate_ate(df['month'].isin([11, 12]))
44+
print(f"Effect of high season during holidays: {holiday_effect['ate']}")
45+
46+
model_with_text = CausalInferenceModel(
47+
df,
48+
method='t-learner',
49+
treatment_col='high_marketing_spend',
50+
outcome_col='sales',
51+
text_col='campaign_description', # Text data as a controlled-for variable
52+
include_cols=['month', 'seasonality']
53+
)
54+
model_with_text.fit()
55+
56+
# Interpret the model to see feature importance
57+
feature_importance = model_with_text.interpret(plot=False)
58+
print(feature_importance) # Show top 10 features

causal_diagram.png

22.7 KB
Loading
433 KB
Loading

0 commit comments

Comments
 (0)