DeepSeek-R1 Hallucinations Raise Concerns for Crypto AI Agents

A new study has highlighted the potential risks associated with the use of large language models (LLMs) in the crypto market. Researchers at Vectara have found that DeepSeek-R1, a reasoning model developed by Chinese lab DeepSeek, exhibits a hallucination rate of 14.3%. This is nearly four times higher than its predecessor, DeepSeek-V3, which scored 3.9%.

The study used Vectara's HHEM 2.1 benchmark to evaluate the models and found that R1 tends to 'overhelp' by adding information that does not appear in the source text. This behavior can lead to the propagation of false facts through a chain of thought, potentially causing harm in autonomous trading and decision-making.

The study's findings have raised concerns for the crypto sector, where AI agent tokens rely on LLMs for various tasks such as trading, signals, and on-chain execution. The authors note that the trade-off between reasoning depth and accuracy is a common issue in the development of LLMs, but emphasize the need for careful risk management in this area.