GTM Cohort CAC/LTV (Python + DuckDB SQL)

Cohort model for go-to-market analytics. It builds monthly acquisition cohorts, computes cumulative LTV, CAC, and LTV:CAC, and exports a stakeholder-ready Excel workbook plus a preview chart.

Quickstart

# 1) Create and activate a virtual environment
python3 -m venv .venv && source .venv/bin/activate

# 2) Install dependencies
pip install -r requirements.txt
pip install pytest
pytest -q
# (or) pip install duckdb pandas numpy matplotlib xlsxwriter openpyxl

# 3) Generate Excel + PNG from the sample CSVs
python -m src.cohort_cli --orders data/orders_sample.csv --spend data/marketing_spend.csv --out reports/cac_ltv_cohorts.xlsx

Artifacts created:

Download the Excel workbook
Preview chart:

Summary from SQL (CSV)

What this repo is for

Turn raw transactions and marketing spend into a decision-ready view of unit economics. This is the kind of concise, reproducible analysis teams use in Strategy & Operations, GTM Ops, Growth, and Product Analytics to answer:

Are new cohorts paying back faster or slower than prior cohorts?
How does LTV compare to CAC by cohort month?
Where should we shift budget or revisit pricing/retention?

Situational context

Hypothetically advising a mid-market e-commerce retailer that invested across paid search, paid social, affiliate, email, and organic during early 2023. The leadership team wants a quick, reproducible view of unit economics by cohort month in order to plan next-quarter spend.

Business question

Are newly acquired cohorts paying back in a healthy timeframe, and is LTV:CAC trending up or down across months?

What the sample represents

orders_sample.csv: synthetic order history where the first purchase defines the customer’s cohort month.
marketing_spend.csv: synthetic monthly media spend by channel for the same period.

How to read the output

LTV_cumulative shows value per cohort accumulating by month since first purchase.
Summary lists the latest LTV alongside CAC for each cohort, plus the LTV:CAC ratio.

Decisions this enables

Scale or pause acquisition by month based on LTV:CAC and time-to-breakeven.
Investigate cohorts with weak payback to check promo mix, funnel quality, or retention mechanics.
Set targets for next tests, for example “raise month-2 LTV by +10 percent.”

Assumptions and scope

CAC is computed as total spend in the cohort month divided by new customers that month (channel attribution can be added later).
LTV uses revenue, not margin (a margin column can be added to shift to contribution LTV and explicit payback months).
Data is synthetic for demonstration and runs fast on a laptop.

Inputs and sample data (included)

Small, self-contained CSVs so anyone can run the project:

data/orders_sample.csv Columns: order_id, order_date, customer_id, channel, revenue Context: synthetic e-commerce orders across a few channels. The first order defines the customer’s cohort month.
data/marketing_spend.csv Columns: month (YYYY-MM), channel, spend Context: monthly media spend. For a robust demo, CAC is computed as total spend in cohort month / new customers that month. (Later you can refine to per-channel attribution.)

PS: These samples are intentionally small for fast, transparent runs

Methods (concise)

Build a lightweight DuckDB data mart from the two CSVs (SQL is executed from Python).
Derive each customer’s first_order_date → cohort month.
Aggregate revenue by cohort_month × months_since_cohort.
Compute LTV as cumulative revenue per new customer over time.
Compute CAC as total spend in the cohort month divided by new customers that month.
Produce a Summary with LTV_latest, CAC, and LTV:CAC per cohort.

How to read the outputs

Excel workbook (reports/cac_ltv_cohorts.xlsx)

LTV_cumulative: rows are cohort months, columns are months since cohort. Values are cumulative LTV. Read across a row to see payback progress.
Summary: per cohort month — latest cumulative LTV, CAC, and the LTV:CAC ratio.

PNG chart (ltv_vs_cac.png)

Bars = LTV (latest) by cohort.
Line = CAC by cohort.
Quick visual to spot cohorts that don’t reach payback or have declining quality.

Rules of thumb

LTV:CAC > 1 is a baseline; many teams target higher multiples depending on margin and payback policy.
Watch time-to-breakeven by cohort. Faster breakeven implies healthier acquisition or stronger retention/ARPU.

Practical applications (for GTM & Strategy roles)

Budget planning: compare cohorts to decide where to scale, pause, or rebalance spend.
GTM performance: track how acquisition quality shifts with seasonality or channel mix.
Product & pricing: changes that improve retention/ARPU lift the LTV curve; the model makes that visible.
Executive reporting: drop the Summary sheet and PNG into a deck with a short narrative.

Repository structure

data/            # CSV inputs (orders, marketing spend)
assets/          # preview image for README
reports/         # Excel output (stakeholder workbook)
notebooks/       # optional exploratory notebook(s)
src/             # one-command CLI to regenerate artifacts
  └── cohort_cli.py
requirements.txt
README.md

Parameters

python -m src.cohort_cli \
  --orders data/orders_sample.csv \
  --spend  data/marketing_spend.csv \
  --out    reports/cac_ltv_cohorts.xlsx

orders path to orders CSV (order_id, order_date, customer_id, channel, revenue)
spend path to spend CSV (month YYYY-MM, channel, spend)
out Excel path to write (created or overwritten)

Troubleshooting

Excel is open: close the workbook before rerunning (file lock).
Header mismatch: align column names and date formats to the samples.
Attribution depth: current CAC uses total spend per cohort month; extend with per-channel attribution if you want finer granularity.
Margin vs revenue: add a cost column to compute contribution LTV and explicit payback periods.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GTM Cohort CAC/LTV (Python + DuckDB SQL)

Quickstart

Artifacts created:

What this repo is for

Situational context

Inputs and sample data (included)

Methods (concise)

How to read the outputs

Excel workbook (reports/cac_ltv_cohorts.xlsx)

PNG chart (ltv_vs_cac.png)

Rules of thumb

Practical applications (for GTM & Strategy roles)

Repository structure

Parameters

Troubleshooting

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
data		data
notebooks		notebooks
reports		reports
sql		sql
src		src
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

SylviaZam/GTM-Cohort-CAC-LTV

Folders and files

Latest commit

History

Repository files navigation

GTM Cohort CAC/LTV (Python + DuckDB SQL)

Quickstart

Artifacts created:

What this repo is for

Situational context

Inputs and sample data (included)

Methods (concise)

How to read the outputs

Excel workbook (reports/cac_ltv_cohorts.xlsx)

PNG chart (ltv_vs_cac.png)

Rules of thumb

Practical applications (for GTM & Strategy roles)

Repository structure

Parameters

Troubleshooting

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages