AI FinOps

Framework · For running agents

Cost per accepted action

Cost per accepted change measures the cost of producing trusted software. As teams move from AI that writes code to agents that do work in production, the same question reappears one layer over: not "what did we build and keep," but "what did the agent do, and did it stick?" Cost per accepted action is that metric — the runtime sibling, built on exactly the same bones.

Cost per accepted action formula, rendered as a diagram COST PER ACCEPTED ACTION inference · tools · infra · oversight · remediation · failed runs · failure impact accepted action units ( accepted and stayed accepted ) $ AA THE COST OF AGENT WORK THAT STAYED DONE
The one-line version Cost per accepted change prices a change that reached production and stayed. Cost per accepted action prices an action an agent performed that was accepted and stayed accepted. Same discipline — denominate by kept outcomes, count the fully-loaded cost, report in dollars — moved from build-time to run-time.

Why a new denominator

The instinctive way to watch an agent is by its bill: cost per token, cost per request, cost per agent-run. Those are the agent-world equivalent of "lines of code" — activity metrics that rise whether or not the work was any good. An agent that runs ten thousand times but whose actions get overridden, rolled back, or quietly re-done has produced ten thousand runs and far fewer kept outcomes. A per-token dashboard will call that cheap. It is not.

Cost per accepted action borrows the move that makes cost per accepted change honest: put kept value in the denominator. Count the actions that were accepted and stayed accepted, and let the cost of the ones that didn't fall into the numerator as remediation. The result is a single, finance-legible number that tracks the economics of trusted autonomous work, not the volume of it.

The formula, expanded

The numerator is the fully-loaded cost of running the agent over a window — and, as with cost per accepted change, the lines that matter most are the ones a token bill never shows:

ComponentWhat it captures
Inference costLLM tokens — input, output, cache, reasoning — including retries and multi-step loops.
Tool & API costExternal calls the agent makes: search, code execution, RAG / vector, paid third-party APIs.
Infrastructure costOrchestration runtime, sandboxes, memory / vector stores, observability, queues.
Oversight costThe human-in-the-loop labor — approvals, reviews, the show-and-prove load. As autonomy scales, this becomes the dominant hidden line, exactly as review cost did for AI-assisted coding.
Remediation costThe internal labor to clean up actions that did not stay: rollbacks, human redo, incident response.
Failed-run costRuns that produced nothing usable but still billed tokens and compute.
Failure impactThe downstream financial consequence of actions that failed — escalation to costlier channels, lost or delayed revenue, refunds and credits, SLA penalties, churn, compliance exposure. Distinct from remediation: that's what you pay to fix it; this is the value destroyed.

The denominator is the count of accepted action units: consequential agent actions that were accepted and stayed accepted during the window, complexity-normalized so a one-shot classification and a fifty-step autonomous workflow aren't counted as equals. (A natural normalizer is the action's risk grade — more on that below.)

"Stayed accepted" — the rework defense, for agents

The clause that does the work, just like the "stayed there" clause in cost per accepted change. An action counts in the denominator only if, within a survival window, it was not:

That double-hit is the point, and it's inherited straight from cost per accepted change: a shortfall shows up twice — once by shrinking the denominator, once by growing the numerator — so a metric built on it can't be fooled by an agent that ships fast and reliably wrong.

Two nuances worth baking in Approval is not an override. In a human-in-the-loop system, a reviewer approving an action is the design working, not a failure — the test is whether the action stayed without later correction. And a correct escalation is a win. An agent that recognizes it can't safely handle something and hands off to a human produced a good outcome; only failure handoffs — it tried, got it wrong, a human cleaned up — count against the denominator.

A worked example

A fleet of support-and-ops agents over a one-week window. Loaded human-review time runs through the oversight line.

Inference cost$3,000
Tool & API cost$1,200
Infrastructure cost$800
Oversight cost (human-in-the-loop)$9,000
Remediation cost (overridden / rolled-back actions)$4,000
Failed-run cost$1,000
Direct operating cost$19,000
Failure impact (escalations, refunds, lost sales)$12,000
Total fully-loaded cost$31,000
Actions attempted12,500
Accepted action units (accepted & stayed)5,000
Cost per accepted action$6.20

The same window tells three very different stories. A per-run dashboard divides the $31,000 by all 12,500 attempts and reports a cheerful $2.48 a run. Count only what you directly pay and divide by the 5,000 actions that stuck, and you get $3.80. But the honest number includes the consequences of the actions that failed — the escalations, refunds, and lost sales — which lifts it to $6.20, where the single largest line is neither inference nor oversight but failure impact, at 39% of the bill. Nothing here says "don't run the agent." It says the real cost lives in the human load, the cleanup, and above all the downstream consequences — so that's where the next improvement is. Raise the acceptance rate and all three move in your favor at once.

The line most agent dashboards omit Failure impact is the hardest component to quantify and the easiest to leave at zero — which is exactly why omitting it is dangerous. A single wrong autonomous action can trigger a refund, an SLA penalty, or a lost deal worth orders of magnitude more than the tokens that produced it. Estimate it as failure rate × average consequence rather than pretend it's nothing; a rough number beats a missing one. This line is also the dollar form of LoopRails' consequence-severity axis — high-consequence actions are the ones to prevent, not merely review — so cost per accepted action is what makes that severity legible to finance.

A FinOps operating model

What makes this a FinOps practice and not just a metric is the loop around it. It maps cleanly onto the FinOps Foundation's three phases:

Phase 1

Inform

Tag every action with agent, task-type, risk grade, tenant/team, and outcome (accepted / overridden / escalated / reverted). Attribute spend to accepted outcomes, not the total token bill. This is the visibility layer — showback by team, agent, and task.

Phase 2

Optimize

Drive cost per accepted action down: cache, right-size the model per risk grade, cap loop depth — and, the biggest lever, raise the acceptance rate. Fewer reverts and failure-escalations beats cheaper tokens, because it lifts the denominator while cutting remediation and failure impact at the same time.

Phase 3

Operate

Govern continuously: budgets and anomaly alerts on cost per accepted action per agent and team, and — the real prize — tie any expansion of agent autonomy to the trend, not to adoption. A flat per-seat token cap is a blunt v0 of this; the trend is the steering wheel.

Pair it with leading indicators

As with cost per accepted change, the headline is a summary — don't use it alone. Report it alongside two or three diagnostics that explain why it moved:

The bridge: oversight you can measure

Cost per accepted action only works if you can observe which actions stayed — and that observability is precisely what a good oversight framework gives you. LoopRails is the natural companion: its RAIL properties make actions Reversible, Authorized, Interruptible, and Logged, and that "Logged" is the audit trail you need to tell an accepted action from a reverted one. Its risk grades (G0–G3) are a ready-made complexity normalizer for the denominator — weight accepted actions by grade instead of inventing a new scale.

The division of labor is clean: LoopRails decides which actions need a human and proves the oversight catches mistakes; cost per accepted action prices the ones that got kept. Oversight and economics, measuring the same thing — trusted autonomous work — from two sides.

Two more companions each map onto a line in the numerator. BRACE supplies the security controls that stop a hijacked or misaligned agent from generating catastrophic failure impact in the first place. And eval-driven development — the verification-quality discipline at Eval-Driven Development — is how you raise the acceptance rate that lifts the whole denominator. Security, oversight, quality, cost: four lenses on the same goal, and cost per accepted action is the one that puts the others in dollars.

What it is, and is not

Compute it

The calculator has a For agents tab — enter the six cost lines and your accepted-action count and get the number, with a shareable link. The same logic ships in the reference library:

import { costPerAcceptedAction } from 'cost-per-accepted-change';

const cpaa = costPerAcceptedAction({
  inferenceCost: 3000,
  toolCost: 1200,
  infraCost: 800,
  oversightCost: 9000,
  remediationCost: 4000,
  failedRunCost: 1000,
  failureImpactCost: 12000,   // escalations, refunds, lost sales
  acceptedActions: 5000,
});

console.log(cpaa.value);     // 6.2
console.log(cpaa.breakdown); // share-of-total per component

Open the agent calculator → Cost per accepted change → Field notes →


Cost per accepted action is the runtime sibling of cost per accepted change, defined in The Delivery Gap (Brenn Hill, 2026). It's free to use and adapt; refinements and worked examples are welcome via GitHub. See how to cite.