AI FinOps

For the engineer doing the setup

Instrumentation guide

How to actually wire your stack to produce cost per accepted change every window without rebuilding observability from scratch. Organized by numerator component, with a minimum-viable pipeline at the end.

1. Model cost

The single most leveraged instrumentation choice. Without per-team or per-commit attribution, you can only allocate aggregate model spend by head-count or guess.

Minimum viable

Production

Run an LLM proxy in front of every provider call. The proxy adds: per-request logging, custom tags, retries, budgets, and cache analytics — the data you need to debug a moving model-cost component.

Whichever you pick, ensure each request carries tags for: team, project or repo, actor (user or agent), and purpose (e.g., code-gen, review, chat). Without tags, the proxy gives you nicer aggregate data but the same attribution problem.

Per-commit attribution

If you want model cost attributed to specific accepted changes, you need to know which commits an LLM touched. git-ai is the maturing tool here — it links AI-written lines to the agent, model, and transcripts that generated them via a git extension. Combined with provider pricing, this lets you compute the model cost per accepted change unit rather than spread total spend evenly.

2. Infrastructure cost

The compute, storage, observability, CI/CD, and agent-runner overhead attributable to producing changes.

Minimum viable

Production

3. Engineering time

The time team members spend specifying, prompting, integrating, and steering AI work, converted to currency at a loaded hourly rate.

Minimum viable

Use a blended fully-loaded rate × planned capacity. Most organizations already have these numbers for capacity planning:

This overestimates active delivery time slightly, which is the right direction — it absorbs meetings, interruptions, and the real cost of context-switching.

Production

4. Review cost

Time spent reviewing and gating AI-generated work, converted to currency.

Minimum viable

Sample a representative week. Ask reviewers to track time spent on PR reviews for one week. Multiply by 4 (or however many weeks in your window). Adjust for known seasonality. Use the team's blended hourly rate.

Production

5. Rework cost

The trickiest component to instrument, and the one most teams ignore — which is exactly why catching it matters.

Mining reverts from git

Three signals to capture, ordered from most reliable to most subjective:

(a) Explicit git revert commits. Built-in syntax; commit messages are prefixed Revert "...".

# All revert commits in a window
git log --grep='^Revert "' \
  --since=2026-04-01 --until=2026-04-30 \
  --pretty=format:'%h %s'

# Or via gh — match GitHub's auto-generated revert PR title pattern.
# (Body-search for "reverts #" is unreliable: GitHub's search tokenizer
# strips the # and over-matches the word "reverts".)
gh pr list -R owner/repo --state merged \
  --search 'merged:2026-04-01..2026-04-30 in:title "Revert"' \
  --json number,title,body,additions,deletions

(b) PRs labeled as fixes or hotfixes. Requires team discipline on labels, but very cheap once in place:

gh pr list --state merged \
  --search 'merged:2026-04-01..2026-04-30 label:fix,hotfix,bug' \
  --json number,title,additions,deletions

(c) Conventional Commits. If your team uses Conventional Commits, you get free typing of every commit (fix:, revert:, feat:). Parse the commit messages directly — without --merges, since merge commits typically have non-conventional messages like Merge pull request #123 ...:

# Works for any workflow. For squash-merge workflows, you can add `--merges`
# since the squash commit carries the conventional-format message.
git log --since=2026-04-01 --until=2026-04-30 \
  --pretty=format:'%s' | grep -E '^(fix|revert)(\(.*\))?: '

For each identified revert / fix, capture the hours spent. The cheapest approach is a manual hour estimate per ticket reviewed in the window's quarterly review. The most rigorous is to attach a time-spent field to each fix ticket and roll up automatically.

Mining fix tickets from your issue tracker

If your team uses Jira, Linear, or GitHub Issues, fix tickets are usually well-typed:

Tying fixes back to the original change

For rigorous attribution, link each fix back to the PR or commit that introduced the defect:

Tied fixes let you compute the more rigorous "stayed there" check: each merged PR is examined N days later; if a tied fix was merged within the window, the original is excluded from the denominator and the fix's cost lands in the numerator.

The minimum-viable approach

If you have nothing today, start by pulling the revert commits and the bug-labeled PRs in the window, eyeballing each, and assigning a rough hour estimate per fix. A team of 10–30 engineers typically has 5–25 such items in a four-week window; an hour of triage produces a defensible rework-cost number.

6. The denominator — accepted change units

The other half of the metric. Pull merged PRs, apply the 500-LOC normalization, filter by the survival window.

The recipe

# Step 1: list merged PRs in the window
gh pr list --state merged \
  --search 'merged:2026-04-01..2026-04-30' \
  --json number,additions,deletions,mergedAt,title --limit 1000 \
  > merges.json

# Step 2: for each PR, check it hasn't been reverted or fix-followed
#   within the 30-day survival window. Filter merges.json to surviving PRs.

# Step 3: normalize via the 500-LOC rule. Floor of 1 unit per surviving PR
# so binary-only PRs (additions+deletions=0 from gh's perspective) still
# count, matching the library's normalizeChanges() helper.
jq '[.[] | (.additions + .deletions) | (. / 500) | ceil | if . < 1 then 1 else . end] | (add // 0)' merges.json

The full recipe lives in the FAQ; the calculator library exports normalizeChanges() for the LOC step.

Putting it together: the monthly pipeline

A minimum-viable monthly script. Cron it for the first of the month, fetching the prior month's data. Bash arithmetic ($((...))) is integer-only, and billing APIs return decimals — so the final composition is done in jq, which handles floats and divide-by-zero cleanly.

#!/usr/bin/env bash
set -euo pipefail

# Run on the 1st of every month, computing the prior month's cost per accepted change.
# Requires: gh (authenticated), jq, and access to your provider billing
# + cloud billing. Adjust dates, repo, team key, and hourly rate.

WINDOW_START="2026-04-01"
WINDOW_END="2026-04-30"
OWNER="my-org"
REPO="my-repo"
HOURLY_RATE=150

# ------- REPLACE BEFORE USE -------
# The two URLs below are placeholders. Fill them in with your provider's
# real billing endpoints, or the script will report $0 for model + infra
# cost (and you'll know to come back here). See documentation:
#   Anthropic Admin API:  https://docs.anthropic.com/en/api/admin-api/usage-cost
#   OpenAI Usage API:     https://platform.openai.com/docs/api-reference/usage
#   AWS Cost Explorer:    aws ce get-cost-and-usage --time-period ...
# ----------------------------------

# 1. Model cost — pull from your provider's billing API. The `|| echo 0`
#    fallback lets the script complete even if the endpoint is unreachable;
#    you'll see $0 for model cost and know what to fix.
MODEL_COST=$(curl -fsS "https://REPLACE_ME/billing?start=$WINDOW_START&end=$WINDOW_END" 2>/dev/null \
  | jq '.cost_usd // 0' 2>/dev/null || echo 0)

# 2. Infra cost — pull from cloud billing. Same fallback pattern.
INFRA_COST=$(aws ce get-cost-and-usage \
  --time-period "Start=$WINDOW_START,End=$WINDOW_END" \
  --granularity MONTHLY --metrics AmortizedCost 2>/dev/null \
  | jq '(.ResultsByTime[0].Total.AmortizedCost.Amount | tonumber) // 0' 2>/dev/null \
  || echo 0)

# 3. Engineering time — blended rate × planned capacity (integer math is fine).
ENG_HOURS=1280   # 10 engineers × 4 weeks × 32h
ENG_COST=$((ENG_HOURS * HOURLY_RATE))

# 4. Review cost — sampled week × 4.
REVIEW_HOURS=40
REVIEW_COST=$((REVIEW_HOURS * HOURLY_RATE))

# 5. Rework cost — count fix/hotfix/revert PRs in window, ~2h per fix average.
REWORK_PR_COUNT=$(gh pr list -R "$OWNER/$REPO" \
  --search "merged:$WINDOW_START..$WINDOW_END label:fix,hotfix,revert" \
  --json number --limit 1000 | jq 'length')
REWORK_COST=$((REWORK_PR_COUNT * 2 * HOURLY_RATE))

# 6. Accepted change units — gh + jq + ceil. `add // 0` guards empty windows.
#    Note: --limit 1000 is gh's max for this flag. The truncation check
#    below warns if you hit the cap; for larger windows use
#    `gh api graphql --paginate` with a PR-search query.
PRS_JSON=$(gh pr list -R "$OWNER/$REPO" --state merged \
  --search "merged:$WINDOW_START..$WINDOW_END" \
  --json additions,deletions --limit 1000)
if [[ "$(echo "$PRS_JSON" | jq 'length')" -eq 1000 ]]; then
  echo "WARN: gh pr list hit the --limit 1000 cap; UNITS may be truncated." >&2
fi
UNITS=$(echo "$PRS_JSON" \
  | jq '[.[] | (.additions + .deletions) | (. / 500) | ceil | if . < 1 then 1 else . end] | (add // 0)')

# 7. Compose and report. jq does the float arithmetic and the zero-units guard.
jq -nr \
  --argjson model "$MODEL_COST" \
  --argjson infra "$INFRA_COST" \
  --argjson eng "$ENG_COST" \
  --argjson review "$REVIEW_COST" \
  --argjson rework "$REWORK_COST" \
  --argjson units "$UNITS" \
  --arg start "$WINDOW_START" \
  --arg end "$WINDOW_END" '
  ($model + $infra + $eng + $review + $rework) as $total |
  (if $units == 0 then "N/A (no accepted changes in window)"
   else "$\(($total * 100 / $units | round) / 100)"
   end) as $cpac |
  "Window: \($start) to \($end)
Model:    $\($model)
Infra:    $\($infra)
Eng:      $\($eng)
Review:   $\($review)
Rework:   $\($rework)
Total:    $\($total)
Units:    \($units)
Cost per accepted change: \($cpac)"
'

Append the output as a row in the tracker spreadsheet and you have a defensible monthly time series. Refine each step over time as the metric proves its value.

Honest caveats

For the broader operational guidance (who runs the measurement, how often, what to report), see the quick-start playbook. For where to push back on common critiques of the metric, see the measurement comparison page.


Found a better tool or a sharper script for any of these components? Open an issue at the repo. The most useful updates to this page come from teams sharing what they built.