Framework · For running agents
Cost per accepted action
Cost per accepted change measures the cost of producing trusted software. As teams move from AI that writes code to agents that do work in production, the same question reappears one layer over: not "what did we build and keep," but "what did the agent do, and did it stick?" Cost per accepted action is that metric — the runtime sibling, built on exactly the same bones.
Why a new denominator
The instinctive way to watch an agent is by its bill: cost per token, cost per request, cost per agent-run. Those are the agent-world equivalent of "lines of code" — activity metrics that rise whether or not the work was any good. An agent that runs ten thousand times but whose actions get overridden, rolled back, or quietly re-done has produced ten thousand runs and far fewer kept outcomes. A per-token dashboard will call that cheap. It is not.
Cost per accepted action borrows the move that makes cost per accepted change honest: put kept value in the denominator. Count the actions that were accepted and stayed accepted, and let the cost of the ones that didn't fall into the numerator as remediation. The result is a single, finance-legible number that tracks the economics of trusted autonomous work, not the volume of it.
The formula, expanded
The numerator is the fully-loaded cost of running the agent over a window — and, as with cost per accepted change, the lines that matter most are the ones a token bill never shows:
| Component | What it captures |
|---|---|
| Inference cost | LLM tokens — input, output, cache, reasoning — including retries and multi-step loops. |
| Tool & API cost | External calls the agent makes: search, code execution, RAG / vector, paid third-party APIs. |
| Infrastructure cost | Orchestration runtime, sandboxes, memory / vector stores, observability, queues. |
| Oversight cost | The human-in-the-loop labor — approvals, reviews, the show-and-prove load. As autonomy scales, this becomes the dominant hidden line, exactly as review cost did for AI-assisted coding. |
| Remediation cost | The internal labor to clean up actions that did not stay: rollbacks, human redo, incident response. |
| Failed-run cost | Runs that produced nothing usable but still billed tokens and compute. |
| Failure impact | The downstream financial consequence of actions that failed — escalation to costlier channels, lost or delayed revenue, refunds and credits, SLA penalties, churn, compliance exposure. Distinct from remediation: that's what you pay to fix it; this is the value destroyed. |
The denominator is the count of accepted action units: consequential agent actions that were accepted and stayed accepted during the window, complexity-normalized so a one-shot classification and a fifty-step autonomous workflow aren't counted as equals. (A natural normalizer is the action's risk grade — more on that below.)
"Stayed accepted" — the rework defense, for agents
The clause that does the work, just like the "stayed there" clause in cost per accepted change. An action counts in the denominator only if, within a survival window, it was not:
- reverted or rolled back — the action was undone;
- overridden or corrected by a human reviewer;
- re-run to get a result that finally stuck;
- re-opened by the end user (the support "re-contact" signal); or
- the cause of an incident, complaint, or compliance issue needing remediation in the window.
Actions the agent completed this window
That double-hit is the point, and it's inherited straight from cost per accepted change: a shortfall shows up twice — once by shrinking the denominator, once by growing the numerator — so a metric built on it can't be fooled by an agent that ships fast and reliably wrong.
A worked example
A fleet of support-and-ops agents over a one-week window. Loaded human-review time runs through the oversight line.
| Inference cost | $3,000 |
| Tool & API cost | $1,200 |
| Infrastructure cost | $800 |
| Oversight cost (human-in-the-loop) | $9,000 |
| Remediation cost (overridden / rolled-back actions) | $4,000 |
| Failed-run cost | $1,000 |
| Direct operating cost | $19,000 |
|---|---|
| Failure impact (escalations, refunds, lost sales) | $12,000 |
| Total fully-loaded cost | $31,000 |
| Actions attempted | 12,500 |
| Accepted action units (accepted & stayed) | 5,000 |
| Cost per accepted action | $6.20 |
The same window tells three very different stories. A per-run dashboard divides the $31,000 by all 12,500 attempts and reports a cheerful $2.48 a run. Count only what you directly pay and divide by the 5,000 actions that stuck, and you get $3.80. But the honest number includes the consequences of the actions that failed — the escalations, refunds, and lost sales — which lifts it to $6.20, where the single largest line is neither inference nor oversight but failure impact, at 39% of the bill. Nothing here says "don't run the agent." It says the real cost lives in the human load, the cleanup, and above all the downstream consequences — so that's where the next improvement is. Raise the acceptance rate and all three move in your favor at once.
A FinOps operating model
What makes this a FinOps practice and not just a metric is the loop around it. It maps cleanly onto the FinOps Foundation's three phases:
Inform
Tag every action with agent, task-type, risk grade, tenant/team, and outcome (accepted / overridden / escalated / reverted). Attribute spend to accepted outcomes, not the total token bill. This is the visibility layer — showback by team, agent, and task.
Optimize
Drive cost per accepted action down: cache, right-size the model per risk grade, cap loop depth — and, the biggest lever, raise the acceptance rate. Fewer reverts and failure-escalations beats cheaper tokens, because it lifts the denominator while cutting remediation and failure impact at the same time.
Operate
Govern continuously: budgets and anomaly alerts on cost per accepted action per agent and team, and — the real prize — tie any expansion of agent autonomy to the trend, not to adoption. A flat per-seat token cap is a blunt v0 of this; the trend is the steering wheel.
Pair it with leading indicators
As with cost per accepted change, the headline is a summary — don't use it alone. Report it alongside two or three diagnostics that explain why it moved:
- Acceptance rate — share of actions accepted and kept. The primary quality signal.
- Override / reversal rate and failure-escalation rate — where the denominator is leaking.
- Autonomy ratio — share of actions executed without a human gate.
- Loop depth and tokens per accepted action — where inference cost is going.
- Machine catch rate — share of bad actions caught by automated gates before they shipped, rather than downstream.
The bridge: oversight you can measure
Cost per accepted action only works if you can observe which actions stayed — and that observability is precisely what a good oversight framework gives you. LoopRails is the natural companion: its RAIL properties make actions Reversible, Authorized, Interruptible, and Logged, and that "Logged" is the audit trail you need to tell an accepted action from a reverted one. Its risk grades (G0–G3) are a ready-made complexity normalizer for the denominator — weight accepted actions by grade instead of inventing a new scale.
The division of labor is clean: LoopRails decides which actions need a human and proves the oversight catches mistakes; cost per accepted action prices the ones that got kept. Oversight and economics, measuring the same thing — trusted autonomous work — from two sides.
Two more companions each map onto a line in the numerator. BRACE supplies the security controls that stop a hijacked or misaligned agent from generating catastrophic failure impact in the first place. And eval-driven development — the verification-quality discipline at Eval-Driven Development — is how you raise the acceptance rate that lifts the whole denominator. Security, oversight, quality, cost: four lenses on the same goal, and cost per accepted action is the one that puts the others in dollars.
What it is, and is not
- It is an aggregate, time-series, fleet-or-team-level steering metric — the dollar bottom line of running agents.
- It is not cost-per-token or cost-per-run (those are inputs and diagnostics), a per-agent leaderboard, or an autonomy KPI to chase.
- It is not a replacement for safety gates. It pairs with oversight; it never relaxes it.
Compute it
The calculator has a For agents tab — enter the six cost lines and your accepted-action count and get the number, with a shareable link. The same logic ships in the reference library:
import { costPerAcceptedAction } from 'cost-per-accepted-change';
const cpaa = costPerAcceptedAction({
inferenceCost: 3000,
toolCost: 1200,
infraCost: 800,
oversightCost: 9000,
remediationCost: 4000,
failedRunCost: 1000,
failureImpactCost: 12000, // escalations, refunds, lost sales
acceptedActions: 5000,
});
console.log(cpaa.value); // 6.2
console.log(cpaa.breakdown); // share-of-total per component
Open the agent calculator → Cost per accepted change → Field notes →
Cost per accepted action is the runtime sibling of cost per accepted change, defined in The Delivery Gap (Brenn Hill, 2026). It's free to use and adapt; refinements and worked examples are welcome via GitHub. See how to cite.