The Question AI Providers Hope Engineering VPs Never Pose

AI coding usage is rapidly increasing, yet many engineering leaders focus on measuring usage instead of results, leading to a significant blind spot. There’s an unasked question among AI industry insiders. Not OpenAI, Anthropic, Google, or the many startups offering AI coding tools ask how much of the AI-generated code actually makes it to production. They overlook not the amount generated, prompts run, or active seats, but how much passes code reviews, CI, merges, and deploys to reach a customer. AI providers gain nothing from answering this.

The financial investment in AI is tangible, yet insight into its effectiveness is lacking. The Stanford AI Spend Index notes that the median company invests $86 monthly per developer on AI tools, based on data from 140 companies and over 113,000 developers. Top quartile spending exceeds $195, with some companies reaching over $28,000 monthly per developer.

Anthropic recently surpassed $30 billion in annual revenue, a significant jump from $9 billion in just four months. SemiAnalysis predicts that Claude Code authors 4% of public GitHub commits, potentially increasing to over 20% by year’s end. Linear’s CEO declared issue tracking obsolete in March. While 75% of Linear’s enterprise workspaces use coding agents, the flow of money and code lacks tracking on code deployment.

AI provider billing is based on token consumption, not code review success, merge, deployment, or functioning in production. This misalignment means that an engineer repeatedly prompting an AI agent costs more than one completing the task on the first try, benefiting the provider financially while the latter developer offers more value to the organization. Engineering leaders often cannot discern this difference, only seeing a single line item on AI expenses without understanding which tokens resulted in production code versus waste. This misalignment is not a conspiracy but a solvable structural issue for VPs of Engineering, as providers lack motivation to address it.

The trend mirrors early cloud computing, where aggressive moves to AWS and Azure promised efficiency but led to wastage. It took years for the FinOps discipline to rectify spending inefficiencies by 30-40% due to under-utilization. AI spending follows this pattern, with a quicker growth rate and wider measurement gaps.

In AI, similar cost optimization pressure is mounting. Engineering leaders who measure first gain faster optimization, better negotiations, and informed tool retention decisions. Those who delay will continue incurring unchecked expenses, hoping for valuable output.

What’s lacking isn’t more dashboards on adoption and seat usage. Engineering teams need the capability to track AI-generated code from creation to production, with commit-level data showing which agent generated the code, human edits, and deployment outcome. This connects AI expenditure to production results, revealing teams’ true AI efficiency and vendor effectiveness.

At Waydev, the focus this past year has been on this connection. Having tracked engineering behavior on a large scale for nine years for clients like Dropbox, American Express, and PwC, we’ve adapted our measurement layer to fit AI’s evolution. Our new platform assesses AI adoption, impact, and ROI throughout the software lifecycle, linking AI agent spending to production achievement.

The AI industry suggests that increased usage equates to value, but usage does not equal impact. Token consumption does not represent shipped code. A team producing 10,000 AI-generated lines and shipping 2,000 is not necessarily better than one generating 3,000 lines and shipping 2,500. Yet, adoption dashboards today might misconstrue this.

This oversight is becoming costlier each quarter. As the era of unaudited AI spending closes, engineering leaders who establish robust measurement now will lead future AI ROI discussions. Those who procrastinate face years of justifying expenses they never fully comprehended.