Cohort model for go-to-market analytics. It builds monthly acquisition cohorts, computes cumulative LTV, CAC, and LTV:CAC, and exports a stakeholder-ready Excel workbook plus a preview chart.
# 1) Create and activate a virtual environment
python3 -m venv .venv && source .venv/bin/activate
# 2) Install dependencies
pip install -r requirements.txt
pip install pytest
pytest -q
# (or) pip install duckdb pandas numpy matplotlib xlsxwriter openpyxl
# 3) Generate Excel + PNG from the sample CSVs
python -m src.cohort_cli --orders data/orders_sample.csv --spend data/marketing_spend.csv --out reports/cac_ltv_cohorts.xlsx
- Download the Excel workbook
- Preview chart:
Turn raw transactions and marketing spend into a decision-ready view of unit economics. This is the kind of concise, reproducible analysis teams use in Strategy & Operations, GTM Ops, Growth, and Product Analytics to answer:
- Are new cohorts paying back faster or slower than prior cohorts?
- How does LTV compare to CAC by cohort month?
- Where should we shift budget or revisit pricing/retention?
Hypothetically advising a mid-market e-commerce retailer that invested across paid search, paid social, affiliate, email, and organic during early 2023. The leadership team wants a quick, reproducible view of unit economics by cohort month in order to plan next-quarter spend.
Business question
- Are newly acquired cohorts paying back in a healthy timeframe, and is LTV:CAC trending up or down across months?
What the sample represents
orders_sample.csv
: synthetic order history where the first purchase defines the customer’s cohort month.marketing_spend.csv
: synthetic monthly media spend by channel for the same period.
How to read the output
- LTV_cumulative shows value per cohort accumulating by month since first purchase.
- Summary lists the latest LTV alongside CAC for each cohort, plus the LTV:CAC ratio.
Decisions this enables
- Scale or pause acquisition by month based on LTV:CAC and time-to-breakeven.
- Investigate cohorts with weak payback to check promo mix, funnel quality, or retention mechanics.
- Set targets for next tests, for example “raise month-2 LTV by +10 percent.”
Assumptions and scope
- CAC is computed as total spend in the cohort month divided by new customers that month (channel attribution can be added later).
- LTV uses revenue, not margin (a margin column can be added to shift to contribution LTV and explicit payback months).
- Data is synthetic for demonstration and runs fast on a laptop.
Small, self-contained CSVs so anyone can run the project:
-
data/orders_sample.csv Columns: order_id, order_date, customer_id, channel, revenue Context: synthetic e-commerce orders across a few channels. The first order defines the customer’s cohort month.
-
data/marketing_spend.csv Columns: month (YYYY-MM), channel, spend Context: monthly media spend. For a robust demo, CAC is computed as total spend in cohort month / new customers that month. (Later you can refine to per-channel attribution.)
PS: These samples are intentionally small for fast, transparent runs
- Build a lightweight DuckDB data mart from the two CSVs (SQL is executed from Python).
- Derive each customer’s first_order_date → cohort month.
- Aggregate revenue by cohort_month × months_since_cohort.
- Compute LTV as cumulative revenue per new customer over time.
- Compute CAC as total spend in the cohort month divided by new customers that month.
- Produce a Summary with LTV_latest, CAC, and LTV:CAC per cohort.
- LTV_cumulative: rows are cohort months, columns are months since cohort. Values are cumulative LTV. Read across a row to see payback progress.
- Summary: per cohort month — latest cumulative LTV, CAC, and the LTV:CAC ratio.
- Bars = LTV (latest) by cohort.
- Line = CAC by cohort.
- Quick visual to spot cohorts that don’t reach payback or have declining quality.
- LTV:CAC > 1 is a baseline; many teams target higher multiples depending on margin and payback policy.
- Watch time-to-breakeven by cohort. Faster breakeven implies healthier acquisition or stronger retention/ARPU.
- Budget planning: compare cohorts to decide where to scale, pause, or rebalance spend.
- GTM performance: track how acquisition quality shifts with seasonality or channel mix.
- Product & pricing: changes that improve retention/ARPU lift the LTV curve; the model makes that visible.
- Executive reporting: drop the Summary sheet and PNG into a deck with a short narrative.
data/ # CSV inputs (orders, marketing spend)
assets/ # preview image for README
reports/ # Excel output (stakeholder workbook)
notebooks/ # optional exploratory notebook(s)
src/ # one-command CLI to regenerate artifacts
└── cohort_cli.py
requirements.txt
README.md
python -m src.cohort_cli \
--orders data/orders_sample.csv \
--spend data/marketing_spend.csv \
--out reports/cac_ltv_cohorts.xlsx
- orders path to orders CSV (order_id, order_date, customer_id, channel, revenue)
- spend path to spend CSV (month YYYY-MM, channel, spend)
- out Excel path to write (created or overwritten)
- Excel is open: close the workbook before rerunning (file lock).
- Header mismatch: align column names and date formats to the samples.
- Attribution depth: current CAC uses total spend per cohort month; extend with per-channel attribution if you want finer granularity.
- Margin vs revenue: add a cost column to compute contribution LTV and explicit payback periods.