Heavy AI token use does not translate into proportionate output, Jellyfish analysis finds

The survey shows it is becoming more important to use AI coding efficiently than whether to adopt it. [Photo: Shutterstock]

As companies tighten controls on artificial intelligence (AI) spending, an analysis found that developers who use AI coding tools heavily do not necessarily deliver higher productivity. In the early days, expanding usage itself was seen as a competitive edge. Now, the benchmark for AI use is shifting to weighing performance against cost.

Business Insider reported on May 7 that engineering intelligence firm Jellyfish released results from analysing user data for Anthropic's AI coding tool Claude Code. The analysis found that the top 10 percent of developers used about 10 times more AI tokens than mid-level users, but actual output rose only about twofold.

Tokens are the unit AI models use to process text and instructions and the basis for calculating costs. The more AI services are used, the more token consumption rises, increasing companies' operating cost burden.

Jellyfish said the results show the limits of the so-called "token maxxing" strategy. It means that simply pushing AI usage to extremes does not directly translate into productivity gains. Nicholas Arcolano (니콜라스 아르콜라노), head of AI and research at Jellyfish, said, "Companies now demand cost control as well as development speed," adding that "chief financial officers are starting to look directly into AI usage costs."

In the survey, weekly Claude Code usage by top AI users reached as much as 225,000,000 tokens per person. By contrast, average usage among mid-level developers was about 32,000,000 tokens. The analysis said the usage gap was large but output gains were relatively limited.

Still, adopting AI coding tools itself appeared to improve productivity. Jellyfish said that, based on pull request throughput, a widely used indicator of software development productivity, teams with high AI use handled about 77 percent more than teams with low AI use.

The analysis said the key is finding an efficient range of AI use rather than whether to use AI at all. It means managing the point at which productivity per cost is highest, rather than consuming as many tokens as possible.

Arcolano also said there is a problem with assessing developer productivity solely by token usage. If the AI model or settings change, token consumption itself can vary widely, meaning it may not accurately reflect actual work performance.

He stressed that companies should put more weight on outcome-focused measures such as "cost per pull request" rather than total token consumption. He also said, "If token costs rise sharply, CFOs ultimately have no choice but to worry," and mentioned the possibility that rising AI operating costs could become a management burden.

In the industry, cases are also increasing in which multiple AI agents are run simultaneously to solve the same problem in different ways. For example, 5 AI agents each generate different code, and a developer selects the best result.

But that approach can raise efficiency while also increasing cost burdens. Arcolano said, "It can still be cheaper than putting people in directly, but it also creates costs because many computation results are not actually used and are discarded."

Jellyfish said the technology industry's yardstick for AI use is shifting from using more to using more efficiently. Early on, AI usage itself was seen as a symbol of innovation, but it has now entered a stage where both real productivity and cost efficiency must be demonstrated.

As a result, companies are paying attention to approaches that lift more engineers into a stable mid-range of usage rather than strategies that increase excessive use by a small number of developers. The industry sees it as likely that competition in AI coding tools will increasingly hinge not on raw performance but on how well productivity per cost can be optimised.

Jinju Hong hongjj@d-today.co.kr

Keyword