Crypto AI Tokens Face Concerns Over Reasoning Model Hallucinations

The use of reasoning-style large language models (LLMs) in crypto AI agent tokens has been on the rise, with hundreds of agents leveraging these models for autonomous trading, signals, and on-chain execution. However, a recent study has shed light on a concerning trend: the higher the reasoning depth, the more likely the model is to hallucinate.

DeepSeek-R1, the flagship reasoning model from Chinese lab DeepSeek, was tested using Vectara's HHEM 2.1 benchmark and found to have a 14.3% hallucination rate, nearly four times higher than its non-reasoning predecessor DeepSeek-V3 at 3.9%. This raises questions about the trade-off between reasoning depth and accuracy in large language models.

Vectara's analysis found that R1 tends to 'overhelp,' adding information that does not appear in the source text, which can be factually correct on its own but still count as a hallucination. This behavior can smuggle fabricated context into otherwise sound answers, leading to potential consequences in on-chain actions.

The study's findings are not unique to DeepSeek-R1 and have been observed across other reasoning-trained models from various labs. The reinforcement learning process that sharpens chain-of-thought also rewards bolder and more confident generation, which can lead to a higher rate of hallucinations.