OpenZeppelin's Audit Exposes Flaws in OpenAI's EVMbench Benchmark
Blockchain security firm OpenZeppelin has conducted an audit of OpenAI's EVMbench, a benchmark designed to evaluate AI models' ability to identify and exploit smart contract vulnerabilities.
The audit found that the dataset used by EVMbench contained methodological flaws and data contamination, which compromised the test's validity. Specifically, OpenZeppelin discovered that all the AI agents that scored the highest had likely been exposed to the benchmark's vulnerability reports during pretraining.
This raises concerns about the accuracy of the test results, as the AI agents may have already known the answers to the problems due to their training data.
OpenZeppelin has called for revising the testing methodology to ensure that AI models are truly capable of identifying and exploiting smart contract vulnerabilities without relying on prior knowledge.