[DigitalToday reporter Jinju Hong] U.S. companies are moving to adopt “model routing,” which assigns different AI models to different tasks, to cut spending on artificial intelligence.
CNBC reported on June 5 that a shift is spreading away from sending all queries to the most powerful large models. Under the new approach, companies use expensive models only for complex work and route simpler tasks to cheaper, faster models.
The change is linked to tighter internal budget controls. Chief financial officers and boards are no longer willing to tolerate inefficient AI spending, and are re-examining whether the highest-priced models are needed for every task. Until recently it was common to use the best-performing model as a default, but as actual bills far exceeded budgets, a push has grown to overhaul cost structures.
Scott Wu (스콧 우), chief executive of Cognition, which developed the coding agent “Devin,” said savings are large in repetitive work. He said for standardized tasks, using a model that is “good enough for the job” can improve cost efficiency by 5 to 10 times. He cited as an example the simple question of who the third U.S. president was. Because the answer is Thomas Jefferson regardless of model price, he said there is little reason to keep deploying high-cost models for such tasks.
Companies have still barely adopted routing. Arvind Jain (아르빈드 제인), CEO of Glean, estimated that about 95 percent of enterprise AI usage still runs on the most expensive frontier models. That means tasks that cheaper models can handle are being processed in the same way.
The cost burden has not spared big tech companies. Jeetu Patel (지투 파텔), chief product officer at Cisco, explained that if token-usage costs are about $200 per employee per week, that comes to about $10,000 a year. A company with 90,000 employees would have to spend $900 million a year under that structure. Cisco said 30,000 engineers develop products that are written in large part by AI, and that actual spending far exceeded its own budget, prompting it to adjust resource allocation. Patel said it rebuilt its budget to prioritize token usage over other spending.
AI companies are also mindful of the anxiety. Cognition introduced an “AI productivity guarantee” program. It says that if Devin delivers engineering value below what the customer paid, it will cover usage costs, up to a limit of $10 million, until performance matches the payment. Wu said it was designed to directly address the return-on-investment issue that has become a core industry debate. He said companies should look at how much actual engineering time is reduced, rather than activity metrics such as token consumption or lines of code. He added that “you can spend billions of tokens and do nothing,” and said companies should aim for output, not activity.
The trend could weigh on frontier model providers such as OpenAI and Anthropic. If companies start routing large volumes of simple work to low-cost open-source models from China and elsewhere, high-priced model providers will find it harder to earn revenue across all tasks. If they handle only complex and difficult work, even if they maintain premium pricing, their share of total market volume could shrink.
That does not mean the value of frontier models themselves disappears. Patel said he expected cutting-edge technology to remain valuable. He added that pricing systems are likely to change. He said labs should make model use more efficient rather than simply charging higher prices.
Ultimately, companies' question is shifting from whether to keep increasing spending even as AI costs rise to how to use it more intelligently. That increases the likelihood that pricing power will shift from vendors selling premium AI to the companies buying it. Frontier models may still command a premium for the hardest tasks, but how large the remaining share of work is has emerged as a key variable that will shape future valuations of major AI companies.