NATURE

AI tools are spotting errors in research papers: inside a growing movement

A large stack of papers and folders with coloured tabs.

Two new AI tools check for errors in research papers including in the calculations, methodology and references.Credit: Jose A. Bernat Bacete/Getty

Late last year, media outlets worldwide warned that black plastic cooking utensils contained worrying levels of cancer-linked flame retardants. The risk was found to be overhyped – a mathematical error in the underlying research suggested a key chemical exceeded the safe limit when in fact it was ten times lower than the limit. Keen-eyed researchers quickly showed that an artificial intelligence (AI) model could have spotted the error in seconds.

The incident has spurred two projects that use AI to find mistakes in the scientific literature. The Black Spatula Project is an open-source AI tool that has so far analysed around 500 papers for errors. The group, which has around eight active developers and hundreds of volunteer advisers, hasn’t made the errors public yet; instead, it is approaching the affected authors directly, says Joaquin Gulloso, an independent AI researcher based in Cartagena, Colombia, who helps to coordinate the project. “Already, it’s catching many errors,” says Gulloso. “It’s a huge list. It’s just crazy.”

The other effort is called YesNoError and was inspired by the Black Spatula Project, says founder and AI entrepreneur Matt Schlicht. The initiative, funded by its own dedicated cryptocurrency, has set its sights even higher. “I thought, why don’t we go through, like, all of the papers?” says Schlicht. He says that their AI tool has analysed more than 37,000 papers in two months. Its website flags papers in which it has found flaws – many of which have yet to be verified by a human, although Schlicht says that YesNoError has a plan to eventually do so at scale.

Both projects want researchers to use their tools before submitting work to a journal, and journals to use them before they publish, the idea being to avoid mistakes, as well as fraud, making their way into the scientific literature.

The projects have tentative support from academic sleuths who work in research integrity. But there are also concerns over the potential risks. How well the tools can spot mistakes, and whether their claims have been verified, must be made clear, says Michèle Nuijten, a researcher in metascience at Tilburg University in the Netherlands. “If you start pointing fingers at people and then it turns out that there was no mistake, there might be reputational damage,” she says.

Others add that although there are risks and the projects need to be cautious about what they claim, the goal is the right one. It is much easier to churn out shoddy papers than it is to retract them, says James Heathers, a forensic metascientist at Linnaeus University in Växjö, Sweden. As a first step, AI could be used to triage papers for further scrutiny, says Heathers, who has acted as a consultant for the Black Spatula Project. “It’s early days, but I’m supportive” of the initiatives, he adds.

AI sleuths

Many researchers have dedicated their careers to spotting integrity concerns in papers – and tools to check certain facets of papers already exist. But advocates hope that AI could carry out a wider range of checks in a single shot and handle a larger volume of papers.

Both the Black Spatula Project and YesNoError use large language models (LLMs) to spot a range of errors in papers, including ones of fact as well as in calculations, methodology and referencing.

The systems first extract information, including tables and images, from the papers. They then craft a set of complex instructions, known as a prompt, which tells a ‘reasoning’ model — a specialist type of LLM — what it is looking at and what kinds of error to hunt for. The model might analyse a paper multiple times, either scanning for different types of error each time, or to cross-check results. The cost of analysing each paper ranges from 15 cents to a few dollars, depending on the length of the paper and the series of prompts used.

The rate of false positives, instances when the AI claims an error where there is none, is a major hurdle. Currently, the Black Spatula Project’s system is wrong about an error around 10% of the time, says Gulloso. Each alleged error must be checked with experts in the subject, and finding them is the project’s greatest bottleneck, says Steve Newman, the software engineer and entrepreneur who founded the Black Spatula Project.

So far, Schlicht’s YesNoError team has quantified the false positives in only around 100 mathematical errors that the AI found in an initial batch of 10,000 papers. Of the 90% of authors who responded to Schlicht, all but one agreed that the error detected was valid, he says. Eventually, YesNoError is planning to work with ResearchHub, a platform which pays PhD scientists in cryptocurrency to carry out peer review. When the AI has checked a paper, YesNoError will trigger a request to verify the results, although this has not yet started.

False positives


Source link

Back to top button