DeepSeek V4 Pro nears GPT-5-level performance, rated best Chinese AI model

[Digital Today reporter Jinju Hong (홍진주)] A U.S. government-affiliated assessment says DeepSeek’s latest model, DeepSeek V4 Pro, lags top U.S. artificial intelligence models by about 8 months.

Online media outlet Gigazine reported on Monday that the AI Standards and Innovation Center (CAISI) under the U.S. National Institute of Standards and Technology (NIST) released an evaluation report on DeepSeek V4 Pro. It judged the model to be the best-performing Chinese AI, but still behind the newest U.S. models.

The key issue is where its performance stands. CAISI evaluated DeepSeek V4 Pro, an open-weight model, across 5 areas and 9 benchmarks. It said the model is about 8 months behind the latest AI. The results put its performance at a similar level to OpenAI’s GPT-5 released in August 2025, even though DeepSeek V4 Pro was released in April 2026.

The gap was clear versus rivals in China. DeepSeek V4 Pro scored about 200 points higher than Kimi K2.5, which was cited as the previous top-scoring Chinese AI model. CAISI explained that a 200-point difference in the combined score across the 5 areas means a task is 3 times more likely to be solved.

The evaluation covered 5 areas: cyber, software engineering, natural science, abstract reasoning and mathematics. It incorporated 9 tests, including CTF-Archive-Diamond for hacking capability, SWE-Bench Verified for coding, FrontierScience for research-level scientific reasoning, ARC-AGI-2 semi-private for abstract reasoning, and OTIS-AIME-2025 for mathematical reasoning.

Separate from performance, CAISI cited cost competitiveness as a strength. It assessed DeepSeek V4 Pro as more cost-efficient than other AI models with equivalent performance. It also showed higher cost efficiency in 5 of 7 benchmarks than OpenAI’s GPT-5.4 Mini, which CAISI presented as the most cost-efficient among U.S. models. Overall, it was measured at 41 to 53 percent better than GPT-5.4 Mini.

The pricing structure was also cited as the basis for the assessment. Based on a developer report, DeepSeek V4 Pro’s input token price was $1.74 per 1 million tokens without caching and $0.0145 with caching applied, while the output token price was $3.48. GPT-5.4 Mini was presented at $0.75 for input tokens without caching, $0.075 with caching and $4.5 for output tokens.

The report also showed a gap between DeepSeek’s own performance claims and an external assessment. DeepSeek’s published materials put DeepSeek V4 Pro at a similar level to Claude Opus 4.6 and GPT-5.4. CAISI’s measured results stayed at GPT-5 level. The same model was shown to have a difference between the company’s own announcement and an outside assessment.

CAISI also disclosed why some benchmarks were excluded from the cost-efficiency comparison. It said PortBench was not yet supported by its cost comparison method. It also said there was a technical issue during the GPT-5.4 Mini evaluation process for ARC-AGI-2. As a result, the cost comparison was presented on 7 benchmarks rather than all 9.

DeepSeek released its latest model family, DeepSeek V4, in late April 2026. Among them, DeepSeek V4 Pro is the top model with 1.6 trillion total parameters. The evaluation shows Chinese AI has not fully caught up with the U.S. leading tier in performance, but is growing its presence with open weights and cost competitiveness. It also confirmed that by external verification standards, a gap still remains versus the latest top-tier models.

Jinju Hong hongjj@d-today.co.kr

Keyword