From GPUs to factory floors, Nvidia redefines manufacturing landscape

At GTC 2026, Nvidia rolled out 7 chips, 5 rack types, an inference OS, an agent framework and a robot data pipeline. The number of headline keywords alone runs into the dozens. But three shifts run through it: changes in AI computing cost structures, changes in software monetisation, and AI penetration into manufacturing and mobility sites.

Vera Rubin, which cut costs per token by 10 times, Groq LPX, which delivered 35 times inference throughput, and the physical AI market presented at $100 trillion show what that looks like. Those three numbers are redefining the landscape for semiconductors, software and manufacturing.

Nvidia’s roadmap: falling computing costs to agents to the physical world

Nvidia unveiled its rack-scale next-generation platform, Vera Rubin, in force at GTC 2026. Vera Rubin is next-generation infrastructure that integrates 7 chips and 5 rack types into a single AI supercomputer. It enables training of large-scale MoE models using about a quarter of the GPUs needed for Blackwell. Cost per token drops to one-tenth, and inference throughput per watt improves by up to 10 times. The paradigm has shifted from single-GPU chip specifications to rack-scale integrated design.

The Vera CPU also drew attention. With 88 in-house Olympus cores and up to 1.2 TB/s bandwidth based on LPDDR5X, it achieves twice the bandwidth and half the power consumption of general-purpose CPUs. Nvidia said Alibaba, ByteDance, Meta and CoreWeave are pushing to adopt it. CEO Jensen Huang (젠슨 황) also previewed the next architecture, Feynman, and laid out a roadmap including the Rosa CPU, the LP40 LPU and BlueField-5.

Infrastructure rollout supporting the platform is also moving quickly. Nvidia said GPUs installed by its cloud partners (NCP) in AI factories worldwide have topped a cumulative 1 million. Total AI processing capacity exceeds 1.7 gigawatts, more than doubling from 400,000 units and 550 megawatts at last year’s GTC. For example, Microsoft Azure became the first hyperscale cloud to run Vera Rubin NVL72.

South Korean companies were selected as key component suppliers for the platform. Samsung Electronics said Huang signed a Samsung HBM4 wafer at GTC with the handwritten words “AMAZING HBM4”. Samsung Electronics stressed it is the only company that can supply all memory and storage used in Vera Rubin, including HBM4, SOCAMM2 and the PM1763 SSD. It also unveiled a physical HBM4E chip for the first time at this GTC. The target is 16 Gbps per pin and 4.0 TB/s bandwidth. SK Hynix also showcased its supply-chain position by exhibiting HBM4, HBM3E and SOCAMM2, with Chairman Tae-won Chey (최태원) and CEO Noh-jung Kwak (곽노정) attending in person.

Vera Rubin, Groq LPX and physical AI: the era of one-tenth token costs

The second shift is a new economic structure created by a sharp drop in inference costs. It is based on the Groq 3 LPX unveiled at GTC 2026. Groq 3 LPX is an inference acceleration rack that physically combines GPUs and LPUs (language processing units). Nvidia said it achieved 500 tokens per second and $45 per million tokens for a 1 trillion-parameter model. That is a 35-fold increase in throughput compared with existing systems.

Nvidia also presented its software stack. Dynamo 1.0 is a distributed OS for AI factories, raising Blackwell GPU inference performance by up to 7 times. OpenClo and NemoClo are agentic AI orchestration frameworks. Nvidia said OpenClo reached 100,000 GitHub stars and drew 2 million visitors in its first week.

Adoption is spreading at companies as well. Nvidia briefing materials show results including 18,000 Salesforce customers, a 40-hour weekly reduction at CrowdStrike, and a 2 to 3 times improvement in inference efficiency at Cisco.

Such inference infrastructure is also spreading beyond data centres. Nvidia unveiled the DGX Station GB300 for that. With 748GB of coherent memory and up to 20 petaflops of FP4 performance, it can run a 1 trillion-parameter model on a desktop.

AT&T, T-Mobile, Comcast and Spectrum are building an Nvidia infrastructure-based “AI grid,” it said. The structure uses about 100,000 distributed network data centres worldwide and more than 100 gigawatts of spare power to run inference near users, devices and data.

Samsung Foundry’s 4-nanometre process producing Groq LPUs also stands out. Huang’s handwritten “Groq Super FAST” signature on a Samsung Foundry 4-nanometre wafer means the role of South Korean companies has been formalised in the inference acceleration supply chain as well.

Operating rooms, factories and roads: physical AI moves into the field

The third shift is AI moving beyond the digital world into physical sites. Nvidia put the physical AI market at $100 trillion, 50 times the IT industry of about $2 trillion. Blueprint, a physical AI data factory, is the execution tool. Based on the Cosmos world model and the Osmo orchestrator, it integrates the full pipeline for data generation, simulation, evaluation and deployment. It is set to be released on GitHub in April.

Its scope has expanded from manufacturing and mobility to healthcare. Nvidia said Hyundai Motor Group will expand Level 2 or higher autonomous driving based on Drive Hyperion and cooperation on Motional’s Level 4 robotaxis. In healthcare, it unveiled medical-focused physical AI platforms including Open-H, a 776-hour surgical video dataset, Cosmos-H for synthetic surgical data generation, and GR00T-H, a surgical robot action model. CMR Surgical and Johnson & Johnson MedTech are moving to adopt them.

HD Hyundai is building an Omniverse-based industrial digital twin, while Samsung Electronics, SK Hynix and MediaTek are applying Nvidia acceleration technology to EDA software to speed up DRAM and flash production. The structure puts the full pipeline of design, manufacturing and healthcare on the CUDA-X ecosystem.

What Nvidia showed at GTC 2026 is a single flow: falling computing costs secure the economics of agents, which in turn leads to AI penetration of the physical world. South Korean companies have positioned themselves as key partners across memory, foundry and autonomous driving. At the same time, they have also taken on the structural challenge of deeper platform dependence.

Dae-geon Seok d2dg@d-today.co.kr

Keyword