Field note · Salesforce · Shopify · Duolingo · IBM
AI-first mandates, and the number that would tell you they're working
In 2024 and 2025, a run of well-known leaders stood up and said some version of the same brave thing: we are going AI-first, and we mean it. These were sincere attempts to lead through a genuinely confusing moment, and several were admirably transparent about it. But almost every mandate set its target on the same kind of number — how much AI gets used, or how much hiring gets avoided — and those are inputs. The question a mandate can't answer on its own is whether the work the AI produced was any good and got kept.
Four mandates, briefly and fairly
Each of these deserves to be described generously, because each was a real attempt to do the right thing with incomplete information.
Salesforce
Marc Benioff said AI was doing "30% to 50% of the work" at Salesforce,[1] and mused that the company was "seriously debating" whether to hire any software engineers in 2025 — a line he later softened, noting his engineers were "hugely augmented" but still very much needed.[2] Support headcount came down from about 9,000 to 5,000, which Salesforce framed as redeployment into sales and customer success rather than a clean layoff.[3] The fair summary: bold adoption, honestly caveated, with the firm later walking the most sweeping claims back toward nuance.
Shopify
Tobi Lütke's memo asked teams to "demonstrate why they cannot get what they want done using AI" before requesting new headcount, and declared "reflexive AI usage is now a baseline expectation."[4] Many read it as a leader saying the quiet part out loud, which took some courage. It's also where the input-versus-outcome tension is clearest: "use AI first" is a usage rule, and the memo's striking productivity figures (a "100x" claim, for one) were Shopify's own and never independently verified.
Duolingo
Luis von Ahn's "AI-first" memo said Duolingo would "gradually stop using contractors to do work that AI can handle."[5] After a sharp public reaction, he clarified — graciously — that AI was "a tool to accelerate what we do, at the same or better level of quality," that the company was "continuing to hire at the same speed as before," and, plainly, "I didn't do that well."[6] It's a genuinely likeable course-correction. Notice, though, that the "same or better quality" reassurance was itself a claim without a number attached.
IBM
IBM is the one to study, because it arguably got the governance right. Arvind Krishna's much-quoted 2023 figure — that AI could replace ~30% of ~26,000 back-office roles — was explicitly a projection of what was automatable, not a layoff, and the widely-shared "IBM cut 8,000 HR jobs then rehired after AI failed" story is simply false.[8] What actually happened: AskHR automated ~94% of routine HR inquiries, the work of "a few hundred" roles was displaced — and IBM's total headcount went up, because the savings were reinvested into hiring programmers and salespeople.[8][7] IBM watched a net outcome, not just an adoption rate.
What every one of them measured
Line the mandates up and the shared shape is unmistakable. The numbers on the marquee were percent of work done by AI, percent of inquiries deflected, headcount avoided, contractors phased out. Every one is an input or a cost-out — a measure of how much the organization leaned on AI, or how much labor it removed. None of them, on its own, says whether what the AI produced was accepted, was good, and stayed good.
That's not a knock on the leaders; it's just how the available dashboards point. "We use AI for 50% of the work" feels like an outcome, and it's reported like one, but it's really a statement about activity — the same trap, one altitude up, that "percent of code written by AI" sets for engineering teams.
What a kept-value denominator adds
A mandate is a lever: use AI more, don't add headcount unless AI can't do it. Levers are fine — sometimes exactly what a moment needs. But you can't tell whether pulling a lever helped without an outcome number on the other end, and "adoption went up" isn't one. Generalize cost per accepted change to any function and you get that number: cost per accepted unit of work — the fully-loaded cost of producing outputs (resolutions, drafts, code, decisions) that were accepted and stayed accepted, divided by the count that did, with the cost of redoing the rest counted in.
Put that under a mandate and it suddenly becomes steerable. "AI now does 60% of tier-one support" stops being the headline and becomes an input; the headline becomes "the cost of a resolution that actually sticks fell 18%, and re-contacts held flat" — or, if the mandate is being met on paper while quality quietly leaks, "adoption hit target, but cost per accepted resolution barely moved because escalations rose." Same mandate, but now you know which it is.
A worked reading (illustrative)
Hypothetical numbers — not any real company's — for a support function asked to go AI-first. The mandate's own metric (share of tickets handled by AI) sails past target. Here's that success next to the kept-value view:
| Before mandate | After mandate | |
|---|---|---|
| Share of tickets handled by AI (the mandate metric) | 10% | 65% |
| Tickets handled | 100,000 | 100,000 |
| Re-contacts / escalations within 14 days | 8,000 | 18,000 |
| Resolutions that stayed resolved | 92,000 | 82,000 |
| Direct handling cost | $360,000 | $190,000 |
| Rework cost (escalations, human cleanup) | $40,000 | $80,000 |
| Total cost | $400,000 | $270,000 |
| Cost per accepted resolution | $4.35 | $3.29 |
Read this carefully, because it's deliberately not a "gotcha." The mandate worked: cost per accepted resolution fell from $4.35 to $3.29, a real 24% improvement. And the kept-value lens shows the part the adoption metric hid — re-contacts more than doubled and 10,000 fewer problems actually stuck — so leadership can bank the genuine saving while seeing exactly where to reinvest some of it (in the escalation path) before quality erodes further. That's the whole gift of a denominator: it lets you celebrate the win and steer the risk at the same time, instead of choosing one.
The takeaway
None of these leaders did anything foolish. They placed real bets under real uncertainty, mostly in public, and several adjusted with grace when the picture filled in. If there's a shared lesson, IBM already embodies it: govern the net outcome, not the adoption rate. Watch what the AI produced that got kept, price the rework on what didn't, and reinvest the genuine savings on purpose.
An AI mandate without a kept-value number is steering by the accelerator pedal — you can feel that you're going faster without knowing if you're getting anywhere. Cost per accepted change, generalized to whatever work the mandate touches, is the speedometer. It doesn't tell you to slow down or speed up; it just tells you the truth about how far the effort is actually carrying you, which is all any of us is really trying to find out.
Run your own numbers → Quick start for leaders → More field notes →
Sources
- Entrepreneur, "Salesforce CEO Marc Benioff: AI Is Handling Half of Tasks" (June 2025) — Benioff says AI does "30% to 50% of the work" at ~93% accuracy. Self-reported; "work" is undefined and the range is wide. https://www.entrepreneur.com/business-news/salesforce-ceo-marc-benioff-ai-is-handling-half-of-tasks/493870
- ITPro — Benioff: Salesforce was "seriously debating" whether to hire any software engineers in 2025; he later softened this, saying his engineers were "hugely augmented" but still needed and that the model "cannot operate autonomously." https://www.itpro.com/software/development/maybe-we-arent-going-to-hire-anybody-this-year-marc-benioff-says-salesforce-might-not-hire-any-software-engineers-in-2025-as-the-firm-reaps-the-benefits-of-ai-agents
- Fortune (Sept 2, 2025) — Benioff says Salesforce reduced customer-support headcount from ~9,000 to ~5,000, framed as redeployment: "hundreds" of employees moved into professional services, sales, and customer success rather than a clean ~4,000-person layoff. https://fortune.com/2025/09/02/salesforce-ceo-billionaire-marc-benioff-ai-agents-jobs-layoffs-customer-service-sales/
- Forrester analysis of Tobi Lütke’s Shopify memo (April 2025): "Before asking for more headcount and resources, teams must demonstrate why they cannot get what they want done using AI," and "Reflexive AI usage is now a baseline expectation." Forrester questioned the "prove a negative" logic and Shopify’s unverified productivity claims. https://www.forrester.com/blogs/what-you-can-learn-from-shopifys-ceos-memo-on-workforce-ai/
- Entrepreneur — Luis von Ahn’s April 28, 2025 "AI-first" memo: Duolingo would "gradually stop using contractors to do work that AI can handle," with headcount given "only if a team cannot automate more of their work." https://www.entrepreneur.com/business-news/duolingo-ceo-clarifies-ai-stance-after-backlash-read-memo/492141
- Fortune (May 24, 2025) — von Ahn clarifies: "I do not see AI as replacing what our employees do (we are, in fact, continuing to hire at the same speed as before)," calling AI "a tool to accelerate what we do, at the same or better level of quality," and adding "I didn’t do that well." https://fortune.com/2025/05/24/duolingo-ai-first-employees-ceo-luis-von-ahn/
- Fortune (May 1, 2023) — Arvind Krishna said IBM would pause/slow hiring for ~26,000 back-office roles and "could easily see 30% of that getting replaced by AI and automation over a five-year period" (~7,800). This was a projection of what is automatable, not a layoff figure. https://fortune.com/2023/05/01/ibm-ceo-ai-artificial-intelligence-back-office-jobs-pause-hiring/
- PYMNTS, summarizing the WSJ (May 2025) — Krishna: AI replaced the work of "a few hundred" HR staff, but IBM’s total employment "actually gone up," because the savings funded hiring of programmers, salespeople, and marketers. The viral "8,000 HR jobs cut then rehired" claim is false; IBM’s AskHR automates ~94% of routine HR inquiries. https://www.pymnts.com/artificial-intelligence-2/2025/ibm-ceo-hr-layoffs-due-to-ai-led-to-more-investment-in-other-roles/
This is a field note — a friendly, illustrative reading of the public record, not a commissioned case study and not a scorecard. These leaders made their calls in the open, which is the only reason there's enough here to learn from. Corrections and better public data are genuinely welcome via GitHub.