AI FinOps
AI FinOps
Treat AI spend like any other unit cost: measure the dollars it takes to produce software you keep and to run agents whose work sticks — then govern the trend. That is AI FinOps. It is FinOps cost-to-serve discipline, moved upstream from the cost of running software to the cost of producing trusted software. Two metrics anchor it.
Cost per accepted change
The fully-loaded cost of producing software that reached production and stayed there, divided by the changes that did. The build-side metric — defined below.
Cost per accepted action →
The runtime sibling: the cost of agent work that was accepted and stayed accepted — including the downstream consequences of the actions that failed.
Cost per accepted change
The fully-loaded cost of producing software that reaches production and stays there, divided by the number of changes that do.
Why this metric
AI did not make software delivery faster. It made code generation faster. The distance between the two — between what organizations expect from AI and what they actually deliver — is the delivery gap.
Token cost, pull-request count, and "AI code share" are activity metrics — they count what was produced, not what was kept. DORA's change failure rate is the closest existing signal, and it captures something real: that some merges fail. But it counts incidents, not the dollar cost of the rework those incidents demand — and post-AI, rework cost has become the dominant hidden line. SPACE's five dimensions are the gold standard for diagnosing developer experience, but they were never designed to roll up to a CFO-facing number.
Cost per accepted change is the outcome metric: the dollar cost of producing software your team actually kept. The number to report to the board, sitting one layer above the diagnostics — not replacing them.
It is FinOps cost-to-serve moved one layer upstream — the cost to produce delivered software, not the cost to run it.
The formula, expanded
The numerator is a fully-loaded production cost over a measurement window (typically a sprint, a month, or a quarter):
| Component | What it captures |
|---|---|
| Model cost | LLM, API, and inference spend attributable to the changes produced. |
| Infrastructure cost | Compute, storage, observability, and tooling overhead for the production loop. |
| Engineering time | Time spent specifying, prompting, integrating, and steering AI work — converted to currency. |
| Review cost | Time spent reviewing, validating, and gating AI-generated work — converted to currency. |
| Rework cost | Cost of fixing, reverting, or repairing changes that failed the "stayed there" test. |
The denominator is the count of accepted change units: changes that reached production and stayed there during the same window, size-normalized so one unit represents a comparable amount of substantive work. A PR of 1–500 lines counts as 1 unit; a larger PR of N lines counts as ⌈ N / 500 ⌉ units. See the FAQ for the rule in detail.
A worked example
A 10-engineer team measures a four-week window. Loaded engineering time is $150/hour.
| Model cost (LLM API spend) | $1,200 |
| Infrastructure cost | $400 |
| Engineering time (120 hours × $150) | $18,000 |
| Review cost (40 hours × $150) | $6,000 |
| Rework cost (16 hours × $150) | $2,400 |
| Total cost | $28,000 |
|---|---|
| Merged PRs that stayed in production | 38 |
| After 500-LOC normalization (one PR was 1,800 lines = 4 units; three were 600–1,000 lines = 2 units each) | 42 accepted change units |
| Cost per accepted change | $666.67 |
That number, reported alongside the volume of activity, is what reveals whether AI is paying for itself. A team shipping 200 PRs but accepting only 42 units has a very different result than a team shipping 50 PRs and accepting 42 units — and the difference is invisible to traditional velocity metrics.
Quick start for leaders → Open the calculator → Get templates →
What this metric is designed to catch
- Reviewer load shifted onto your most expensive engineers. If senior engineers are absorbing the verification cost of AI generation, cost per accepted change sees it. PR count does not.
- Silent rework. A change that is "merged" but quietly fixed three days later does not stay in production. It is excluded from the denominator and the fix is counted in the numerator.
- A false economy. Trimming model spend while rework quietly rises isn't really a saving — and it's an easy trap to fall into, because the model bill is the number you see first. Cost per accepted change keeps the whole trade-off in view, so the saving is real when you make it.
What it is not
- Not a vendor benchmark. It measures your team's full production cost, not the unit cost of an LLM provider.
- Not a productivity score for individuals. It is an organizational outcome metric.
- Not a substitute for safety, security, or correctness gates. It pairs with them; it does not replace them.
Origin
Cost is the third vertex of the Verification Triangle. Intent clarity and verification quality describe whether the team is building the right thing well; cost describes what it took to get there.
Cost per accepted change was defined in The Delivery Gap (Brenn Hill, 2026) as the cost vertex of that framework — the dollar corner of a triangle whose other corners are intent clarity and verification quality. That verification-quality corner has a home of its own in eval-driven development (Eval-Driven Development); and for the agents you run, LoopRails (oversight) and BRACE (security) complete the picture. See how to cite, the measurement page, or subscribe at brennhill.substack.com.
This page is the canonical definition. The metric is free to use, adopt, and cite. Refinements and worked examples are welcome via GitHub.