The centre of gravity in the AI chip race is shifting from standalone GPUs to integrated CPU-GPU designs. The key battleground is emerging as the ability to bundle GPUs with CPUs and memory into a single package to optimise data flow, rather than GPU performance alone.
The most prominent semiconductor stocks in the U.S. market recently were in the CPU camp, not GPU leaders. Intel and AMD jumped 23.6 percent and more than 25 percent, respectively, in just 1 week. Their gains since the start of the year are 238.5 percent for Intel and 112.6 percent for AMD, placing them among the top performers in the Philadelphia Semiconductor Index. Intel's first-quarter revenue was $13.58 billion, beating the consensus, and it posted an operating margin of 12.3 percent. AMD also lifted its share price again as growth in server CPU revenue came into focus.
The rise is being driven by the spread of inference and agentic AI. While training depended on GPU computing power, inference and agentic workloads are pushing up token generation sharply, lifting demand for CPUs and DPUs to distribute and control it. According to Eugene Investment & Securities, the CPU-to-GPU installation ratio has risen to about 4 to 1 from the previous 8 to 1. That means twice as many CPUs are needed per GPU. Hana Securities also said it expects server CPU demand and related memory demand to rise at the same time.
Nvidia's moves encapsulate the shift. After signing an individual supply contract in February with Meta for its Grace+Vera CPUs, Nvidia unveiled in March at GTC 2026 a tray-type Vera system carrying 8 Vera CPUs. It aims to expand CPUs into a separate product category beyond rack-unit sales bundled with GPUs. The compute blades in its next-generation Rubin Ultra Kyber racks vertically insert a bundle of 4 GPUs and 2 Vera CPUs, further strengthening the physical integration of CPUs and GPUs.
◆ In an era of surging tokens, twice as many CPUs are needed per GPU
How memory is integrated is also a factor. Rubin Ultra applies 16 HBM4E packages, expanding memory capacity to as much as 1,024GB per single GPU. In the Feynman platform due in 2028, there is discussion of the possible combined application of 3D die stacking that vertically stacks logic dies and custom HBM. Nvidia's separately unveiled LPX rack based on Grok LP30 assigns 128GB SRAM to handle FFN operations in the decode stage. It, too, extends a design philosophy of separating the roles of GPU, CPU and memory while integrating them into a single system. In effect, memory has been elevated from a peripheral component of CPUs and GPUs to a core component.
The gains flow to South Korea's two memory makers. The larger the share that HBM takes in an integrated package structure, the more profits concentrate in companies that have HBM mass-production capability and base-die design expertise. SK Hynix is positioning itself as a key memory supplier for Nvidia's Rubin and Rubin Ultra platforms, backed by mass-production capability for HBM3E and HBM4. Samsung Electronics joins one pillar of the integrated package by taking on contract foundry production of LP30 at 4 nanometres.
The software stack is also being reshaped in the same direction. Nvidia's CUDA and AMD's ROCm are both optimising libraries on the premise of a CPU-GPU unified memory model. The shift is from an era when CPUs and GPUs copied data between separate memory spaces to one where they share the same memory address space and split computing tasks. Discussion that co-packaged optics could be applied to NVLink switches starting with the Feynman platform is in the same context. It is because relieving data bottlenecks between CPU, GPU and memory through optical connections is needed for integrated design efficiency to work.
An industry official said GPUs are gradually becoming a commodity and system performance now depends on how they are bundled with CPUs and memory configurations. The official said the competitive landscape will be reshaped around companies that have both CPU design capability and packaging technology.