Nvidia shifts AI chip battleground from specs to end-to-end efficiency

Bryan Catanzaro (브라이언 카탄자로), vice president of applied deep learning research at Nvidia, is seen in this photo. [Photo by Dae-geon Seok]

Nvidia has declared it will shift the axis of AI semiconductor competition from chip specifications to end-to-end efficiency. It also disclosed for the first time measured results showing Blackwell-based GPUs are 55 times faster than the prior Hopper generation in mixture-of-experts (MoE) inference speed.

Bryan Catanzaro (브라이언 카탄자로), vice president of applied deep learning research at Nvidia, said at the "Nemotron Developer Days Seoul 2026" event in Seoul on Monday that "Compute is intelligence" and that "A faster model is a smarter model." The message was framed around the idea that future AI advantage will be determined not by single-chip performance competition but by system efficiency spanning four stages: pre-training, post-training, inference and agents.

A standout on the numbers was Blackwell's MoE inference performance. Catanzaro said Nvidia CEO Jensen Huang promised at GTC that Blackwell would be 30 times faster than Hopper, but recent competition results showed it was actually 55 times faster. He explained the performance gap reflected Nvidia's work over several years to design the low-latency, high-bandwidth GPU-to-GPU communications switch NVL72, based on the view that MoE model bottlenecks lie in interconnects rather than computation.

Efficiency gains are also being made at the level of numerical computation. Blackwell introduced a new format, NDFP4, that uses 4.75 bits per value. Catanzaro stressed that the Nemotron 3 Super and Ultra models currently under development are conducting pre-training using only 4-bit arithmetic, and said building a world-class model with such small numbers is a highly challenging attempt.

Nvidia Nemotron 3 reaches IMO gold medal-level performance with 30B model

Efficiency improvements are also evident on the software side. Nvidia's latest pre-training dataset reduced training time to one quarter of the previous version on the same hardware. Catanzaro said the post-training technique PivotLM improved post-training efficiency by about five times by allocating rollout budgets intensively to key branching points in a model's inference path.

A curriculum-based post-training project, Nemotron Cascade, was also unveiled. Applying the method to the 30 billion-parameter Nemotron 3 Nano model delivered gold medal-level performance at the 2025 finals of the International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI) and the International Collegiate Programming Contest (ICPC). The only open-source model to reach the same level is 01.AI's 671 billion-parameter model. The efficiency gap is reflected in the difference in parameter counts.

Domestic partnerships are also expanding. Nvidia said Korean AI companies including Krafton, LG, Naver and SK Telecom are participating in development based on Nemotron. A Korean-language-focused synthetic dataset, Nemotron Persona Korea, was also unveiled at the event. The dataset contains 7 million fully synthetic personas generated from statistics on Korea's population, language and culture, and does not include personally identifiable information (PII).

Dae-geon Seok d2dg@d-today.co.kr

Keyword