AI FinOps — cost per accepted change & cost per accepted outcome

AI FinOps

Treat AI spend like any other unit cost: measure the dollars it takes to produce software you keep and to run agents whose work sticks — then govern the trend. That is AI FinOps. It is FinOps cost-to-serve discipline, moved upstream from the cost of running software to the cost of producing trusted software. Two metrics anchor it.

For development

Cost per accepted change

The fully-loaded cost of producing software that reached production and stayed there, divided by the changes that did. The build-side metric — defined below.

For agents

Cost per accepted outcome →

The runtime sibling: the cost of agent work that was accepted and stayed accepted — including the downstream consequences of the outcomes that failed.

Cost per accepted change

The fully-loaded cost of producing software that reaches production and stays there, divided by the number of changes that do.

The clause that matters An accepted change is one that reached production and stayed there for a defined survival window — by default, 30 days post-merge. A change that is reverted, repaired, or feature-flagged off within that window is not counted in the denominator; the cost incurred to produce it is counted in the numerator. (Teams may use 14 days for faster cadence or 60–90 days for higher-confidence acceptance — see the FAQ.)

Why this metric

AI did not make software delivery faster. It made code generation faster. The distance between the two — between what organizations expect from AI and what they actually deliver — is the delivery gap.

Token cost, pull-request count, and "AI code share" are activity metrics — they count what was produced, not what was kept. DORA's change failure rate is the closest existing signal, and it captures something real: that some merges fail. But it counts incidents, not the dollar cost of the rework those incidents demand — and post-AI, rework cost has become the dominant hidden line. SPACE's five dimensions are the gold standard for diagnosing developer experience, but they were never designed to roll up to a CFO-facing number.

Cost per accepted change is the outcome metric: the dollar cost of producing software your team actually kept. The number to report to the board, sitting one layer above the diagnostics — not replacing them.

It is FinOps cost-to-serve moved one layer upstream — the cost to produce delivered software, not the cost to run it.

The formula, expanded

The numerator is a fully-loaded production cost over a measurement window (typically a sprint, a month, or a quarter):

Component	What it captures
Model cost	LLM, API, and inference spend attributable to the changes produced.
Infrastructure cost	Compute, storage, observability, and tooling overhead for the production loop.
Engineering time	Time spent specifying, prompting, integrating, and steering AI work — converted to currency.
Review cost	Time spent reviewing, validating, and gating AI-generated work — converted to currency.
Rework cost	Cost of fixing, reverting, or repairing changes that failed the "stayed there" test.

The denominator is the count of accepted change units: changes that reached production and stayed there during the same window, size-normalized so one unit represents a comparable amount of substantive work. A PR of 1–500 lines counts as 1 unit; a larger PR of N lines counts as ⌈ N / 500 ⌉ units. See the FAQ for the rule in detail.

A worked example

A 10-engineer team measures a four-week window. Loaded engineering time is $150/hour.

Total cost	$28,000
Model cost (LLM API spend)	$1,200
Infrastructure cost	$400
Engineering time (120 hours × $150)	$18,000
Review cost (40 hours × $150)	$6,000
Rework cost (16 hours × $150)	$2,400
Merged PRs that stayed in production	38
After 500-LOC normalization (one PR was 1,800 lines = 4 units; three were 600–1,000 lines = 2 units each)	42 accepted change units
Cost per accepted change	$666.67

That number, reported alongside the volume of activity, is what reveals whether AI is paying for itself. A team shipping 200 PRs but accepting only 42 units has a very different result than a team shipping 50 PRs and accepting 42 units — and the difference is invisible to traditional velocity metrics.

Quick start for leaders → Open the calculator → Get templates →

What this metric is designed to catch

Reviewer load shifted onto your most expensive engineers. If senior engineers are absorbing the verification cost of AI generation, cost per accepted change sees it. PR count does not.
Silent rework. A change that is "merged" but quietly fixed three days later does not stay in production. It is excluded from the denominator and the fix is counted in the numerator.
A false economy. Trimming model spend while rework quietly rises isn't really a saving — and it's an easy trap to fall into, because the model bill is the number you see first. Cost per accepted change keeps the whole trade-off in view, so the saving is real when you make it.

What it is not

Not a vendor benchmark. It measures your team's full production cost, not the unit cost of an LLM provider.
Not a productivity score for individuals. It is an organizational outcome metric.
Not a substitute for safety, security, or correctness gates. It pairs with them; it does not replace them.

Origin

Cost is the third vertex of the Verification Triangle. Intent clarity and verification quality describe whether the team is building the right thing well; cost describes what it took to get there.

Cost per accepted change was defined in The Delivery Gap (Brenn Hill, 2026) as the cost vertex of that framework — the dollar corner of a triangle whose other corners are intent clarity and verification quality. That verification-quality corner has a home of its own in eval-driven development (Eval-Driven Development); and for the agents you run, LoopRails (oversight) and BRACE (security) complete the picture. See how to cite, the measurement page, or subscribe at brennhill.substack.com.

This page is the canonical definition. The metric is free to use, adopt, and cite. Refinements and worked examples are welcome via GitHub.