Red Hat says shift from AI consumer to provider to control costs

[DigitalToday reporter Chi-gyu Hwang] Red Hat is overhauling its platform strategy for the era of AI agents.

A recent report by Techzine said Red Hat's strategy focuses on providing an open-source full-stack platform spanning hardware to agents.

Chris Wright (크리스 라이트), Red Hat's CTO, said, "AI token unit costs fall 75 to 90 percent a year, but token consumption in enterprise environments is rising more than 500 percent a year." He said, "Advanced reasoning models use 10 to 20 times more tokens than standard models, and autonomous AI agents consume 5 times more on top of that. A model that relies only on external APIs and proprietary frontier models is difficult to sustain financially and operationally over the long term."

Wright said, "To control AI costs, companies need to move away from being consumers that pay to use outside services and shift to being providers that run AI directly on their own infrastructure."

In this context, Red Hat has pushed forward 'Red Hat AI Enterprise,' which includes an end-to-end AI stack.

Red Hat AI Enterprise connects 5 layers, from hardware to agents. Red Hat Enterprise Linux and OpenShift are positioned as the infrastructure base, and the inference layer uses vLLM, established as an open-source standard, and llm-d, a distributed inference framework developed by Red Hat. The company said llm-d tripled token throughput in 1 year and cut time to first response to one-tenth.

In the model service layer, it provides AI models as shared resources within an organisation through Model as a Service (MaaS).

It validates and supports open-source models such as IBM Granite and Mistral. Rather than using only a single model, users can choose several models depending on the purpose. An AI gateway centrally manages token quotas, team access rights and priorities to prevent small experimental projects from consuming excessive GPUs and disrupting core work.

The top layer is the agent service. Wright said, "The time is rapidly approaching when it becomes routine for large companies to operate thousands to tens of thousands of agents simultaneously." He said, "Through AgentOps, Red Hat gives each agent a verified digital identity and applies version control and automated security testing."

Red Hat said it is focusing on a hardware-neutral approach that supports all accelerators, including Nvidia, AMD and Intel. With Nvidia, it is jointly building an AI factory and natively supports Blackwell GPUs.

Wright also said the gap between open-source and proprietary models is narrowing quickly. Meta's Llama 2 took 8 months to reach the early ChatGPT level, while DeepSeek-R1 caught up within 5 months of the release of OpenAI-o1. He warned, "An AI strategy that is locked into a single vendor over the long term is very risky."

Chi-gyu Hwang delight@d-today.co.kr

Keyword