For leaders
Quick start: deploying cost per accepted change in 90 days
A sequenced playbook for engineering and finance leaders. Pilot on one team, ship a defensible number to your CFO in 90 days, and have a real trend line by quarter two. The order matters more than the perfection of any single step.
Before you start
Three things make or break this pilot. None of them are about the formula.
- Pick one team, not the whole org. A 5–15 engineer team you can instrument quickly. Resist the "let's roll this out everywhere" urge — it will sink the pilot in accounting debates.
- Get a finance partner. One person who can value engineering time and reconcile model spend. Without them, the numerator will be hand-waved and the result will not survive a CFO conversation.
- Name a single owner. One person responsible for the measurement. Distributing the work distributes the failure.
Week 1 — Decide
Pick the team
Choose a team that (a) has been using AI tooling for at least one full quarter, (b) ships to production regularly, and (c) has a manager who will champion the measurement. The first pilot is as much a political case study as a numerical one — a team with friction will produce a measurement no one trusts.
Pick the window
| Team size | Recommended window |
|---|---|
| 3–10 engineers | Quarterly |
| 10–30 engineers | Monthly |
| 30+ engineers | Monthly, with quarterly executive review |
The constraint is statistical: cost per accepted change stabilizes around 25–100 accepted change units. Smaller teams need longer windows to hit that count. See the guide on how to use the metric for the variance reasoning.
Name the owner and the sponsor
Owner: an engineering manager, platform lead, or DevEx engineer who will gather the inputs each window. Sponsor: a VP or director whose budget conversations will be reshaped by the result. Without a sponsor, the number gets computed and ignored.
Decide what "accepted change" means in your repo
The default is a merged commit to your production branch, size-normalized via the 500-LOC rule. If you use a different convention (e.g., shipped feature flag, deployed increment), pick it now and document it. The rule must be the same every window; switching mid-pilot invalidates the trend.
Weeks 2–3 — Capture the baseline
Aim for one or two completed windows of historical data. You do not need perfect numbers; you need a defensible accounting that you can repeat without changes.
The five numerator inputs, with where to get them
| Input | Source | Stumbling block |
|---|---|---|
| Model cost | LLM provider billing dashboard, segmented by team if possible. git-ai can attribute spend to specific commits. | Aggregate billing is easy; per-team attribution often requires a one-time tagging effort. |
| Infrastructure cost | Cloud bill — observability, CI/CD, agent runners. Allocate by team's share of compute. | Avoid over-precision. A rough allocation is fine; a debated allocation kills momentum. |
| Engineering time | Use loaded hourly rates from finance × hours allocated to delivery work in the window. | Time tracking is rare. Use a fully-loaded blended rate × planned capacity hours as the starting estimate. |
| Review cost | Review hours (estimated or sampled) × reviewer hourly rate. | Often missed entirely. If you cannot measure, sample a representative week and extrapolate. |
| Rework cost | Hours spent reverting, fixing, or repairing accepted changes that did not stay in production × hourly rate. | Often invisible. Pull the revert commits and follow-up fixes from your VCS history and assign a rough hour estimate. |
The denominator: count accepted change units
List merged PRs to production in the window, filter to those that stayed (no revert or repair within the window), apply the 500-LOC normalization (a PR of 1–500 lines = 1 unit; larger = ⌈ N / 500 ⌉). See the FAQ recipe for the exact git and gh commands.
Document everything
Write down: the window dates, hourly rates used, attribution rules, exclusions (vendored, generated, lockfiles), and the size-normalization choice. This documentation is the measurement. Anyone challenging the number will challenge the accounting first.
Week 4 — Compute and present
Run the calculation
Open the calculator, enter the five cost inputs and the unit count, and copy the shareable URL. That is your first reading.
Hold the first sponsor meeting
Bring three things:
- The baseline cost per accepted change number, with the formula and the inputs.
- The reporting cadence proposal (monthly or quarterly).
- The next-window measurement plan, so the cadence starts the next day.
Do not try to interpret the number in this meeting. One window is not a trend. The point of the meeting is to get the cadence agreed and the next measurement scheduled.
Months 2–3 — Establish the cadence
Repeat the measurement
Compute again using the same accounting. The second number, paired with the first, is your first trend signal — still noisy, but real.
Pair with a leading indicator
Pick one leading indicator and report it alongside cost per accepted change every window:
- Change failure rate (DORA) — tells you whether the "stayed there" denominator is holding up.
- Acceptance rate from your AI tooling — tells you whether developers are still finding the tool useful.
- DevEx pulse — tells you whether the verification load is degrading the team.
Without a leading indicator, you will see the bottom-line number move with no way to explain why.
Set a measurement ritual
Block 60 minutes on the owner's calendar at the same time each window. Same agenda each time: pull inputs, run calc, write a one-line summary, update the time series, send to sponsor. The ritual matters more than the elegance.
End of quarter — First defensible reading
You now have 1–3 windows of data and a documented accounting. Run the first quarterly review with leadership. Present:
- The number, the trend direction, and the magnitude of the change.
- Which numerator component drove the move, if any.
- The leading indicator(s) and what they say.
- A hypothesis (not a verdict) and a concrete next-quarter focus.
If sponsors ask for an absolute target, decline. Targets invite gaming; trends invite investigation. The right ask is: "Let's hold the accounting steady for two more quarters and see what the trend says."
Common stumbling blocks
"Engineering time is too hard to estimate"
Use a blended fully-loaded rate × planned capacity. Refine over time. The rate's absolute value matters less than its consistency across windows.
"Our model cost is hard to attribute"
Start with aggregate model spend allocated by team head-count or by AI-touched PR count. Tools like git-ai get you to per-commit attribution over time. Do not block the first measurement on perfect attribution.
"We don't know how to count rework"
Pull revert commits and follow-up fixes from the window's git history. Assign each a rough hour estimate. The discipline of this exercise often reveals more about your delivery system than the final number does.
"The first number looks bad"
Expected. The point of a baseline is to make the invisible visible, not to look good. Most teams' first cost per accepted change reading is higher than they expected because the rework and review components were not previously priced. That is the discovery; that is the value.
"Our window's accepted-change count is too low"
Use a longer window. A 5-engineer team should report quarterly, not monthly. See the variance discussion on the how-to-use page.
"Should I exclude AI-generated rework?"
No. The point of cost per accepted change is to surface the full cost, including the cost of fixing AI-generated mistakes. Excluding rework on the grounds that "the AI did its job" defeats the metric entirely.
What 90 days of success looks like
- You have a documented accounting that your finance partner has signed off on.
- You have at least two windows of cost per accepted change measured the same way.
- You have one leading indicator reported alongside.
- You have a recurring sponsor review on the calendar.
- You have a defensible number to put in front of a CFO without a 20-minute briefing.
- You have an opinion on which numerator component to focus on next.
You do not need a great number at 90 days. You need an honest one, a repeatable measurement process, and a sponsor who is paying attention. The trend is what matters; the trend takes 2–4 quarters to earn.
What comes next
- Quarter 2: Expand to a second team. Hold the accounting constant.
- Quarter 3: Roll up to an org-level number. Begin reporting to the board.
- Quarter 4: Begin tying tooling investment decisions to cost-per-accepted-change movement.
By the end of year one, you should be making concrete decisions — tooling renewals, headcount, agent scope — informed by the trend. That is what successful use of this metric looks like.
Get the templates → Open the calculator →
Questions or stuck on a step? Open an issue at the repo. The most useful additions to this playbook come from teams sharing what worked and what blocked them.