doitintl
diff --git a/‎ANALYSIS_SAMMPLE.md
Lines changed: 70 additions & 0 deletions b/‎ANALYSIS_SAMMPLE.md
Lines changed: 70 additions & 0 deletions
diff --git a/‎causalBertv2.py
Lines changed: 58 additions & 0 deletions b/‎causalBertv2.py
Lines changed: 58 additions & 0 deletions
diff --git a/‎causal_diagram.png
22.7 KB b/‎causal_diagram.png
22.7 KB
diff --git a/‎correlation_plots_with_seasonality.png
433 KB b/‎correlation_plots_with_seasonality.png
433 KB
@@ -0,0 +1,70 @@
+
+
+## Causal Diagram
+![DAG](causal_diagram.png "Causal Diagram")
+
+## Data Analysis
+![Data Analysis](correlation_plots_with_seasonality.png "Data Analysis")
+
+## Data Analysis
+![Monthly Trends](monthly_trends_all_indicators.png "Monthly Trends")
+
+## Sample Output
+```
+{
+    1: month            0.968259
+    v_holiday        0.010343
+    v_email          0.004659
+    v_influencer     0.003849
+    v_launch         0.003722
+    v_campaign       0.002841
+    v_customer       0.001837
+    v_awareness      0.001256
+    v_promotion      0.001195
+    v_boost          0.001068
+    v_ads            0.000972
+    v_standard       0.000000
+    v_special        0.000000
+    v_retention      0.000000
+    v_social         0.000000
+    v_season         0.000000
+    v_school         0.000000
+    v_sale           0.000000
+    v_newsletter     0.000000
+    v_program        0.000000
+    v_product        0.000000
+    v_partnership    0.000000
+    v_media          0.000000
+    v_display        0.000000
+    v_brand          0.000000
+    v_summer         0.000000
+    dtype: float64
+}
+```
+## What does it mean?
+
+This output shows feature importance from the causalBert model that incorporates text data. Let me explain what it means:
+
+The numbers represent the relative importance of each feature in determining the causal effect of high marketing spend on sales. The values sum to 1 (or 100%), indicating the proportional contribution of each feature.
+
+### Key insights from this output:
+
+* **Month dominance (96.83%):** `month` is overwhelmingly the most important feature, accounting for about 96.8% of the causal effect.
+    * This confirms our understanding that seasonality (represented by `month`) is the primary causal factor affecting sales.
+    * This makes sense in our simulated data where we explicitly built seasonality as a strong confounding variable.
+* **Campaign type importance:** The features prefixed with `v_` are derived from the text in the campaign descriptions.
+    * The model has broken down the campaign descriptions into individual words/terms and assessed their importance.
+    * `v_holiday` has the second highest importance (1.03%), suggesting holiday-themed campaigns have some causal effect beyond just the month/season.
+    * `v_email` (0.47%), `v_influencer` (0.38%), and `v_launch` (0.37%) show modest causal importance.
+* **Zero importance features:** Many campaign-related features show zero importance (`v_standard`, `v_special`, `v_retention`, etc.).
+    * This suggests these campaign elements don't have a measurable causal effect on sales after controlling for month/seasonality.
+    * The model has determined these aspects of campaigns don't meaningfully contribute to the causal relationship.
+
+### What this tells us for your presentation:
+
+* **The power of causal analysis:** This shows how causal analysis can identify that what appears to be a strong marketing effect (in correlation analysis) is actually largely explained by seasonality.
+* **Insight for marketing strategy:** You could demonstrate that while high marketing spend does have *some* causal effect, the bulk of sales variation comes from seasonality. The most effective strategy would focus marketing dollars on certain campaign types during key seasonal periods.
+* **Specific campaign elements matter:** You could highlight that holiday-themed messaging, email campaigns, and influencer partnerships show measurable causal effects while other campaign elements don't.
+* **Practical takeaway:** For your Google Next demonstration, this presents a compelling story: *"Traditional analytics might tell you to just spend more on marketing, but causal AI reveals that **when** and **how** you spend matters far more than **how much**."*
+
+This is exactly the kind of insight that showcases the value of combining knowledge graphs and generative AI for causal reasoning - identifying the true drivers of business outcomes beyond simple correlations.
@@ -0,0 +1,58 @@
+import pandas as pd
+from causalnlp import CausalInferenceModel
+from lightgbm import LGBMRegressor
+
+# Load your data with columns for seasonality, marketing_spend, website_traffic, and sales
+df = pd.read_csv('marketing_sales_daily_data.csv')
+
+# 1. Impact of seasonality on marketing spend
+model_seasonality_marketing = CausalInferenceModel(
+    df,
+    method='t-learner',
+    treatment_col='is_high_season',  # Binary treatment (0/1) 
+    outcome_col='marketing_spend',
+    include_cols=['month', 'weekday']
+)
+model_seasonality_marketing.fit()
+
+# 2. Impact of marketing spend on website traffic
+model_marketing_traffic = CausalInferenceModel(
+    df,
+    method='t-learner',
+    treatment_col='high_marketing_spend',  # Binary treatment (0/1)
+    outcome_col='website_traffic',
+    include_cols=['month', 'weekday', 'is_high_season']
+)
+model_marketing_traffic.fit()
+
+# 3. Impact of website traffic on sales
+model_traffic_sales = CausalInferenceModel(
+    df,
+    method='t-learner',
+    treatment_col='high_website_traffic',  # Binary treatment (0/1)
+    outcome_col='sales',
+    include_cols=['month', 'weekday', 'is_high_season', 'high_marketing_spend']
+)
+model_traffic_sales.fit()
+
+# Average Treatment Effect (ATE)
+seasonality_marketing_effect = model_seasonality_marketing.estimate_ate()
+print(f"Effect of high season on marketing spend: {seasonality_marketing_effect['ate']}")
+
+# Conditional Average Treatment Effect (CATE)
+holiday_effect = model_seasonality_marketing.estimate_ate(df['month'].isin([11, 12]))
+print(f"Effect of high season during holidays: {holiday_effect['ate']}")
+
+model_with_text = CausalInferenceModel(
+    df,
+    method='t-learner',
+    treatment_col='high_marketing_spend',
+    outcome_col='sales',
+    text_col='campaign_description',  # Text data as a controlled-for variable
+    include_cols=['month', 'seasonality']
+)
+model_with_text.fit()
+
+# Interpret the model to see feature importance
+feature_importance = model_with_text.interpret(plot=False)
+print(feature_importance)  # Show top 10 features