Framework · For running agents

Cost per accepted action

By Brenn Hill·June 2026·~9 min read

Cost per accepted change measures the cost of producing trusted software. As teams move from AI that writes code to agents that do work in production, the same question reappears one layer over: not "what did we build and keep," but "what did the agent do, and did it stick?" Cost per accepted action is that metric — the runtime sibling, built on exactly the same bones.

The one-line version Cost per accepted change prices a change that reached production and stayed. Cost per accepted action prices an action an agent performed that was accepted and stayed accepted. Same discipline — denominate by kept outcomes, count the fully-loaded cost, report in dollars — moved from build-time to run-time.

Why a new denominator

The instinctive way to watch an agent is by its bill: cost per token, cost per request, cost per agent-run. Those are the agent-world equivalent of "lines of code" — activity metrics that rise whether or not the work was any good. An agent that runs ten thousand times but whose actions get overridden, rolled back, or quietly re-done has produced ten thousand runs and far fewer kept outcomes. A per-token dashboard will call that cheap. It is not.

Cost per accepted action borrows the move that makes cost per accepted change honest: put kept value in the denominator. Count the actions that were accepted and stayed accepted, and let the cost of the ones that didn't fall into the numerator as remediation. The result is a single, finance-legible number that tracks the economics of trusted autonomous work, not the volume of it.

The formula, expanded

The numerator is the fully-loaded cost of running the agent over a window — and, as with cost per accepted change, the lines that matter most are the ones a token bill never shows:

Component	What it captures
Inference cost	LLM tokens — input, output, cache, reasoning — including retries and multi-step loops.
Tool & API cost	External calls the agent makes: search, code execution, RAG / vector, paid third-party APIs.
Infrastructure cost	Orchestration runtime, sandboxes, memory / vector stores, observability, queues.
Oversight cost	The human-in-the-loop labor — approvals, reviews, the show-and-prove load. As autonomy scales, this becomes the dominant hidden line, exactly as review cost did for AI-assisted coding.
Remediation cost	The internal labor to clean up actions that did not stay: rollbacks, human redo, incident response.
Failed-run cost	Runs that produced nothing usable but still billed tokens and compute.
Failure impact	The downstream financial consequence of actions that failed — escalation to costlier channels, lost or delayed revenue, refunds and credits, SLA penalties, churn, compliance exposure. Distinct from remediation: that's what you pay to fix it; this is the value destroyed.

The denominator is the count of accepted action units: consequential agent actions that were accepted and stayed accepted during the window, complexity-normalized so a one-shot classification and a fifty-step autonomous workflow aren't counted as equals. (A natural normalizer is the action's risk grade — more on that below.)

"Stayed accepted" — the rework defense, for agents

The clause that does the work, just like the "stayed there" clause in cost per accepted change. An action counts in the denominator only if, within a survival window, it was not:

reverted or rolled back — the action was undone;
overridden or corrected by a human reviewer;
re-run to get a result that finally stuck;
re-opened by the end user (the support "re-contact" signal); or
the cause of an incident, complaint, or compliance issue needing remediation in the window.

Actions the agent completed this window

Accepted & stayed

Did not stay

↓ counts in the denominator — accepted action units

↓ its cleanup and its downstream consequences hit the numerator (remediation + failure impact)

That double-hit is the point, and it's inherited straight from cost per accepted change: a shortfall shows up twice — once by shrinking the denominator, once by growing the numerator — so a metric built on it can't be fooled by an agent that ships fast and reliably wrong.

Two nuances worth baking in Approval is not an override. In a human-in-the-loop system, a reviewer approving an action is the design working, not a failure — the test is whether the action stayed without later correction. And a correct escalation is a win. An agent that recognizes it can't safely handle something and hands off to a human produced a good outcome; only failure handoffs — it tried, got it wrong, a human cleaned up — count against the denominator.

A worked example

A fleet of support-and-ops agents over a one-week window. Loaded human-review time runs through the oversight line.

Direct operating cost	$19,000
Inference cost	$3,000
Tool & API cost	$1,200
Infrastructure cost	$800
Oversight cost (human-in-the-loop)	$9,000
Remediation cost (overridden / rolled-back actions)	$4,000
Failed-run cost	$1,000
Failure impact (escalations, refunds, lost sales)	$12,000
Total fully-loaded cost	$31,000
Actions attempted	12,500
Accepted action units (accepted & stayed)	5,000
Cost per accepted action	$6.20

The same window tells three very different stories. A per-run dashboard divides the $31,000 by all 12,500 attempts and reports a cheerful $2.48 a run. Count only what you directly pay and divide by the 5,000 actions that stuck, and you get $3.80. But the honest number includes the consequences of the actions that failed — the escalations, refunds, and lost sales — which lifts it to $6.20, where the single largest line is neither inference nor oversight but failure impact, at 39% of the bill. Nothing here says "don't run the agent." It says the real cost lives in the human load, the cleanup, and above all the downstream consequences — so that's where the next improvement is. Raise the acceptance rate and all three move in your favor at once.

The line most agent dashboards omit Failure impact is the hardest component to quantify and the easiest to leave at zero — which is exactly why omitting it is dangerous. A single wrong autonomous action can trigger a refund, an SLA penalty, or a lost deal worth orders of magnitude more than the tokens that produced it. Estimate it as failure rate × average consequence rather than pretend it's nothing; a rough number beats a missing one. This line is also the dollar form of LoopRails' consequence-severity axis — high-consequence actions are the ones to prevent, not merely review — so cost per accepted action is what makes that severity legible to finance.

A FinOps operating model

What makes this a FinOps practice and not just a metric is the loop around it. It maps cleanly onto the FinOps Foundation's three phases:

Phase 1

Inform

Tag every action with agent, task-type, risk grade, tenant/team, and outcome (accepted / overridden / escalated / reverted). Attribute spend to accepted outcomes, not the total token bill. This is the visibility layer — showback by team, agent, and task.

Phase 2

Optimize

Drive cost per accepted action down: cache, right-size the model per risk grade, cap loop depth — and, the biggest lever, raise the acceptance rate. Fewer reverts and failure-escalations beats cheaper tokens, because it lifts the denominator while cutting remediation and failure impact at the same time.

Phase 3

Operate

Govern continuously: budgets and anomaly alerts on cost per accepted action per agent and team, and — the real prize — tie any expansion of agent autonomy to the trend, not to adoption. A flat per-seat token cap is a blunt v0 of this; the trend is the steering wheel.

Pair it with leading indicators

As with cost per accepted change, the headline is a summary — don't use it alone. Report it alongside two or three diagnostics that explain why it moved:

Acceptance rate — share of actions accepted and kept. The primary quality signal.
Override / reversal rate and failure-escalation rate — where the denominator is leaking.
Autonomy ratio — share of actions executed without a human gate.
Loop depth and tokens per accepted action — where inference cost is going.
Machine catch rate — share of bad actions caught by automated gates before they shipped, rather than downstream.

The bridge: oversight you can measure

Cost per accepted action only works if you can observe which actions stayed — and that observability is precisely what a good oversight framework gives you. LoopRails is the natural companion: its RAIL properties make actions Reversible, Authorized, Interruptible, and Logged, and that "Logged" is the audit trail you need to tell an accepted action from a reverted one. Its risk grades (G0–G3) are a ready-made complexity normalizer for the denominator — weight accepted actions by grade instead of inventing a new scale.

The division of labor is clean: LoopRails decides which actions need a human and proves the oversight catches mistakes; cost per accepted action prices the ones that got kept. Oversight and economics, measuring the same thing — trusted autonomous work — from two sides.

Two more companions each map onto a line in the numerator. BRACE supplies the security controls that stop a hijacked or misaligned agent from generating catastrophic failure impact in the first place. And eval-driven development — the verification-quality discipline at Eval-Driven Development — is how you raise the acceptance rate that lifts the whole denominator. Security, oversight, quality, cost: four lenses on the same goal, and cost per accepted action is the one that puts the others in dollars.

What it is, and is not

It is an aggregate, time-series, fleet-or-team-level steering metric — the dollar bottom line of running agents.
It is not cost-per-token or cost-per-run (those are inputs and diagnostics), a per-agent leaderboard, or an autonomy KPI to chase.
It is not a replacement for safety gates. It pairs with oversight; it never relaxes it.

Compute it

The calculator has a For agents tab — enter the six cost lines and your accepted-action count and get the number, with a shareable link. The same logic ships in the reference library:

import { costPerAcceptedAction } from 'cost-per-accepted-change';

const cpaa = costPerAcceptedAction({
  inferenceCost: 3000,
  toolCost: 1200,
  infraCost: 800,
  oversightCost: 9000,
  remediationCost: 4000,
  failedRunCost: 1000,
  failureImpactCost: 12000,   // escalations, refunds, lost sales
  acceptedActions: 5000,
});

console.log(cpaa.value);     // 6.2
console.log(cpaa.breakdown); // share-of-total per component

Open the agent calculator → Cost per accepted change → Field notes →

Cost per accepted action is the runtime sibling of cost per accepted change, defined in The Delivery Gap (Brenn Hill, 2026). It's free to use and adapt; refinements and worked examples are welcome via GitHub. See how to cite.