DeepSeek-R1 hallucination rate quadruples versus V3 after stronger reasoning

The figures show that stronger reasoning performance may not directly lead to improved service reliability. [Photo: Shutterstock]

As reasoning-focused artificial intelligence (AI) models spread rapidly, China AI startup DeepSeek's latest reasoning model, DeepSeek-R1, appears to have logged a much higher hallucination rate than its previous model. The industry says the results could be a direct warning signal for the AI-based cryptocurrency agent market.

BeInCrypto, a blockchain media outlet, reported on May 11 that AI evaluation firm Vectara compared DeepSeek-R1 with the earlier DeepSeek-V3 using its hallucination assessment system, the HHEM 2.1 benchmark. DeepSeek-R1 recorded a hallucination rate of 14.3 percent, about four times higher than DeepSeek-V3's 3.9 percent.

Vectara said it cross-validated the test results using Google's FACTS methodology. The analysis showed DeepSeek-R1 had a stronger tendency across most test settings to add content not in the source text or generate unsupported information.

Vectara pointed to an "excessive helpfulness tendency" as DeepSeek-R1's core problem. It said the model tended to add context or explanations not present in the source text in an effort to help users too much. Individual sentences may look factual, but once it arbitrarily links content not in the source, it is classified as a hallucination.

The industry worries the issue could go beyond a simple debate over AI quality and lead to real financial risks. That is because AI agent projects spreading in the cryptocurrency market have adopted structures that combine large language models (LLMs) with trading functions and automation tools.

The market now has various AI agent token projects, including Virtuals Protocol, ai16z and AIXBT. These services automatically carry out tasks such as social media posting, token analysis, investment signal generation, trade execution and market commentary writing. The problem is that if a model generates incorrect information, errors can lead to real on-chain actions.

For example, if an AI generates as fact a partnership that does not exist, an incorrect contract address or inaccurate price data, the investment decision itself can be distorted. In models that plan actions based on multi-step reasoning, an error at the initial stage is likely to spread through the entire decision-making process that follows.

An AI agent project, AIXBT, is known for promoting 416 tokens and logging an average return of 19 percent, but it has also been assessed as exposing a structural risk in which model judgment errors can be transmitted to users as they are.

The industry says this is not a problem unique to DeepSeek. It points to reinforcement learning (RL), used to strengthen reasoning ability, as a technique that can increase a model's confidence and response expansiveness but also make it generate incorrect information more assertively.

Meta's chief AI scientist Yann LeCun also has viewed hallucinations in LLMs as a structural limitation. He has argued that hallucinations are unlikely to disappear completely under the current autoregressive LLM architecture.

Some AI labs, by contrast, say hallucination rates can be significantly reduced by using retrieval-augmented generation (RAG), post-hoc verification models and fine-tuning techniques. Developers in the field say hallucinations still occur frequently in real operating environments.

Ultimately, experts stress that the key task for the AI agent industry is not a simple performance race, but a "verifiable operating structure". They cite as practical alternatives rechecking model-generated information through separate verification systems, or using more conservative models at the financial execution stage.

Hallucinations in LLM are due to the Auto-Regressive prediction. I think what I call "Objective Driven AI" will solve the problem: systems that plan their answer by optimizing a number of objective functions *at inference time* https://t.co/JcR5hItwzJ

Jinju Hong hongjj@d-today.co.kr

Keyword