Mathematicians launch First Proof, a first-of-its-kind math exam for AI

Carpathian Mountains Magazine 4 hours ago

February 9, 2026

3 min read

Mathematicians issue a major challenge to AI: Show us your work

Frustrated by the AI industry’s claims of proving math results without offering transparency, a team of leading academics has proposed a better way

By Joseph Howlett edited by Claire Cameron

A close-up of a human eye on the screen of a vintage computer — Alfred Gescheidt/Getty Images

The race is on to develop an artificial intelligence that can do pure mathematics, and top mathematicians just threw down the gauntlet with an exam of actual, unsolved problems that are relevant to their research. The team is giving AI systems a week to solve the problems.

The effort, called “First Proof,” is detailed in a preprint that was posted last Thursday.

“These are brand-new problems that cannot be found in any LLM’s [large language model’s] training data,” says Andrew Sutherland, a mathematician at the Massachusetts Institute of Technology, who was not involved with the new exam. “This seems like a much better experiment than any I have seen to date,” he adds, referring to the difficulty in testing how well AIs can do math.

On supporting science journalism

If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

The AI industry has become fixated on pure mathematics. Because mathematical proofs follow a checkable sequence of logical steps, their conclusion is true or false beyond any subjective measure. And that may offer a better way to compare LLMs’ prowess than evaluating how convincing their poetry is. Start-ups dedicated to AI for mathematics have recently recruited a number of high-profile mathematicians.

These efforts have had some early successes: In 2025 an advanced version of Google’s Gemini Deep Think achieved a gold-level score on the International Mathematical Olympiad, an exam for prodigious high schoolers. And in the past few months, an AI has solved multiple “Erd&odblac;s problems”—a trove of challenges set by the late mathematician Paul Erd&odblac;s. The start-up Axiom Math made headlines last week for successfully tackling several research-level (though far from groundbreaking) math questions.

But none of these tests were controlled experiments. Olympiad problems aren’t research questions. And LLMs seem to have a tendency to find existing, forgotten proofs deep in the mathematical literature and to present them as original. One of Axiom Math’s recent proofs, for example, turned out to be a misrepresented literature search result.

And some math results that have come from tech companies have raised eyebrows among academics for other reasons, says Daniel Spielman, a professor at Yale University and one of the experts behind the new challenge. “Almost all of the papers you see about people using LLMs are written by people at the companies that are producing the LLMs,” Spielman says. “It comes across as a bit of an advertisement.”

First Proof is an attempt to clear the smoke. To set the exam, 11 mathematical luminaries—including one Fields Medal winner—contributed math problems that had arisen in their research. The experts also uploaded proofs of the solutions but encrypted them. The answers will decrypt just before midnight on February 13.

None of the proofs is earth-shattering. They’re “lemmas,” a word mathematicians use to describe the myriad of tiny theorems they prove on the path to a more significant result. Lemmas aren’t typically published as stand-alone papers.

But if an AI were to solve these lemmas, it would demonstrate what many mathematicians see as the technology’s near-term potential: a helpful tool to speed up the more tedious parts of math research.

“I think the greatest impact AI is going to have this year on mathematics is not by solving big open problems but through its penetration into the day-to-day lives of working mathematicians, which mostly has not happened yet,” Sutherland says. “This may be the year when a lot more people start paying attention.”

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can’t-miss newsletters, must-watch videos, challenging games, and the science world’s best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Source link

Carpathian Mountains Magazine 4 hours ago

Mathematicians launch First Proof, a first-of-its-kind math exam for AI

On supporting science journalism

It’s Time to Stand Up for Science

Carpathian Mountains Magazine

A major David Hockney exhibition is coming to Serpentine

Wednesday Leave Wasserman Over CEO’s Ties to Jeffrey Epstein

Hearthstone Spotlight Announces the Next Expansion Coming, and More!

The Festival Mistakes Ravers Still Make in 2026 (and How to Avoid Them)

Something Very Bad is Going to Happen – Season 1 – First Look Photos + Press Release

Hong Kong fire victims long for home as Lunar New Year stirs painful memories

Are Cardi B & Stefon Diggs Still Together? Update on Relationship – Hollywood Life

ByteDance Launches Impressive New AI Video Generation Tool

“Wuthering Heights” Movie Review: Emerald Fennell’s Adaptation

The Hidden Link Between Teacher Workload and Student Engagement

On the Scene at the 15 Percent Pledge Gala: Tina Knowles and Meghan Markle in Harbison, Claire Sulmers in Ant Lamourr, and more!

My mission to make life more user friendly for the disability community

Toñita’s Trip from Los Sures to The Super Bowl

Jump Trading Taking Small Stakes in Kalshi, Polymarket

Mainstays 92-Piece Container Set w/ Storage Tote Only $11.93 on Walmart.com + More

On supporting science journalism

It’s Time to Stand Up for Science

Related Articles