AI FinOps

AI FinOps

AI FinOps

Treat AI spend like any other unit cost: measure the dollars it takes to produce software you keep and to run agents whose work sticks — then govern the trend. That is AI FinOps. It is FinOps cost-to-serve discipline, moved upstream from the cost of running software to the cost of producing trusted software. Two metrics anchor it.

For development

Cost per accepted change

The fully-loaded cost of producing software that reached production and stayed there, divided by the changes that did. The build-side metric — defined below.

For agents

Cost per accepted action →

The runtime sibling: the cost of agent work that was accepted and stayed accepted — including the downstream consequences of the actions that failed.

Cost per accepted change

The fully-loaded cost of producing software that reaches production and stays there, divided by the number of changes that do.

Cost per accepted change formula, rendered as a diagram COST PER ACCEPTED CHANGE model cost · infrastructure · engineering time · review · rework accepted change units ( reached production and stayed there ) $ AC THE COST OF DELIVERED SOFTWARE THAT STAYED DELIVERED
The clause that matters An accepted change is one that reached production and stayed there for a defined survival window — by default, 30 days post-merge. A change that is reverted, repaired, or feature-flagged off within that window is not counted in the denominator; the cost incurred to produce it is counted in the numerator. (Teams may use 14 days for faster cadence or 60–90 days for higher-confidence acceptance — see the FAQ.)

Why this metric

AI did not make software delivery faster. It made code generation faster. The distance between the two — between what organizations expect from AI and what they actually deliver — is the delivery gap.

Token cost, pull-request count, and "AI code share" are activity metrics — they count what was produced, not what was kept. DORA's change failure rate is the closest existing signal, and it captures something real: that some merges fail. But it counts incidents, not the dollar cost of the rework those incidents demand — and post-AI, rework cost has become the dominant hidden line. SPACE's five dimensions are the gold standard for diagnosing developer experience, but they were never designed to roll up to a CFO-facing number.

Cost per accepted change is the outcome metric: the dollar cost of producing software your team actually kept. The number to report to the board, sitting one layer above the diagnostics — not replacing them.

It is FinOps cost-to-serve moved one layer upstream — the cost to produce delivered software, not the cost to run it.

The formula, expanded

The numerator is a fully-loaded production cost over a measurement window (typically a sprint, a month, or a quarter):

ComponentWhat it captures
Model costLLM, API, and inference spend attributable to the changes produced.
Infrastructure costCompute, storage, observability, and tooling overhead for the production loop.
Engineering timeTime spent specifying, prompting, integrating, and steering AI work — converted to currency.
Review costTime spent reviewing, validating, and gating AI-generated work — converted to currency.
Rework costCost of fixing, reverting, or repairing changes that failed the "stayed there" test.

The denominator is the count of accepted change units: changes that reached production and stayed there during the same window, size-normalized so one unit represents a comparable amount of substantive work. A PR of 1–500 lines counts as 1 unit; a larger PR of N lines counts as ⌈ N / 500 ⌉ units. See the FAQ for the rule in detail.

A worked example

A 10-engineer team measures a four-week window. Loaded engineering time is $150/hour.

Model cost (LLM API spend)$1,200
Infrastructure cost$400
Engineering time (120 hours × $150)$18,000
Review cost (40 hours × $150)$6,000
Rework cost (16 hours × $150)$2,400
Total cost$28,000
Merged PRs that stayed in production38
After 500-LOC normalization (one PR was 1,800 lines = 4 units; three were 600–1,000 lines = 2 units each)42 accepted change units
Cost per accepted change$666.67

That number, reported alongside the volume of activity, is what reveals whether AI is paying for itself. A team shipping 200 PRs but accepting only 42 units has a very different result than a team shipping 50 PRs and accepting 42 units — and the difference is invisible to traditional velocity metrics.

Quick start for leaders → Open the calculator → Get templates →

What this metric is designed to catch

What it is not

Origin

Verification Triangle — cost vertex highlighted Intent clarity Verification quality Cost THE VERIFICATION TRIANGLE

Cost is the third vertex of the Verification Triangle. Intent clarity and verification quality describe whether the team is building the right thing well; cost describes what it took to get there.

Cost per accepted change was defined in The Delivery Gap (Brenn Hill, 2026) as the cost vertex of that framework — the dollar corner of a triangle whose other corners are intent clarity and verification quality. That verification-quality corner has a home of its own in eval-driven development (Eval-Driven Development); and for the agents you run, LoopRails (oversight) and BRACE (security) complete the picture. See how to cite, the measurement page, or subscribe at brennhill.substack.com.


This page is the canonical definition. The metric is free to use, adopt, and cite. Refinements and worked examples are welcome via GitHub.