Secret 2025 Math Summit: Experts Battle to Outwit AI in Breakthrough Talks!

In mid-May, a secretive gathering of elite mathematicians took place. Thirty top mathematicians from around the globe, including some from the United Kingdom, assembled in Berkeley, California. Their objective was to challenge a sophisticated “reasoning” chatbot with complex mathematical problems they had crafted to assess its capabilities. After two full days of presenting the bot with problems at a professorial level, the participants were astounded to find that it could solve some of the most challenging problems known to be solvable. “Some of my colleagues are actually proclaiming that these models are nearing what might be considered mathematical brilliance,” stated Ken Ono, a mathematician at the University of Virginia who played a leading role and served as a judge at this gathering.

The chatbot in discussion operates on o4-mini, a cutting-edge reasoning large language model (LLM) developed by OpenAI. It has been trained to perform complex deductive tasks. Google’s counterpart, Gemini 2.5 Flash, showcases similar capabilities. These models, including earlier versions like those behind ChatGPT, predict subsequent words in sequences. However, o4-mini and its peers represent a more streamlined and agile iteration of LLMs that are trained on specialized data sets with more substantial human reinforcement, enabling these chatbots to tackle deeper and more intricate mathematical problems than their predecessors.

Previously, OpenAI had directed Epoch AI, a nonprofit that benchmarks LLMs, to formulate 300 mathematical questions with unpublished answers to gauge the progress of o4-mini. While traditional LLMs are capable of correctly answering many complex mathematical inquiries, when Epoch AI posed these new and different questions, the most successful models managed to solve less than 2 percent. This demonstrated a lack of genuine reasoning capability in traditional LLMs. However, o4-mini was set to show a stark contrast in performance.

Supporting Scientific Journalism

If you appreciate this article, please consider supporting our high-quality journalism by subscribing. Your subscription helps secure the continuation of influential stories that are shaping our understanding of the world today.

Elliot Glazer, fresh from earning his Ph.D. in mathematics, was recruited by Epoch AI for a new initiative called FrontierMath, which began in September 2024. The project involved collecting unique mathematical problems across different levels of complexity, starting from undergraduate to research-level challenges. By April 2025, Glazer discovered that o4-mini was able to solve about 20 percent of these questions. He then escalated to a fourth tier, consisting of problems that would challenge even seasoned academic mathematicians. Only a select few globally could craft, much less solve, such challenges. The involved mathematicians were required to sign confidentiality agreements and communicate solely via the encrypted messaging app Signal to avoid any potential data leaks that could inadvertently train the LLM, thus corrupting the data set.

Each unsolvable problem would net the proposing mathematician a $7,500 reward. The group progressed slowly in generating such questions. To expedite the process, Epoch AI organized a physical meeting on the weekend of May 17-18. The attendees, divided into groups of six, spent two days creating problems that they could solve but which would stump the AI reasoning bot.

By Saturday night, frustration was apparent in Ono’s demeanor as the bot’s unforeseen mathematical capabilities were disrupting their efforts. “I posed a problem recognized as an open question in number theory—a solid Ph.D.-level problem,” he explained. Within 10 minutes, Ono watched in silence as the bot demonstrated its solution process in real-time. Initially, it took two minutes to review and absorb the relevant literature. It then suggested tackling a simpler version of the problem first, and shortly after, it declared readiness to address the more complex original problem. Within five minutes, it delivered not only a correct solution but a cheeky one. “It even ended by saying, ‘No citation necessary because the mystery number was computed by me!’” Ono, who also consults for Epoch AI, added.

Defeated, Ono quickly messaged the other participants early Sunday, expressing his astonishment at competing with an LLM of such caliber. “I’ve never witnessed such reasoning in models before. It’s akin to what a scientist does. It’s unnerving,” he admitted.

Despite the challenges, the researchers did manage to identify 10 questions that the bot could not solve. However, they remained shocked at how rapidly AI had advanced within just a year. Ono compared it to collaborating with a “strong colleague.” Yang Hui He, a mathematician at the London Institute for Mathematical Sciences and an early adopter of AI in mathematics, remarked, “This mirrors what an excellent graduate student would accomplish—actually, even more.”

The bot also proved significantly faster than a typical professional mathematician, solving problems in minutes that would take an expert weeks or months.

While the experience was exhilarating, Ono and He expressed concerns that the results from o4-mini might be overly trusted. “There’s proof by induction, proof by contradiction, and then there’s proof by intimidation,” He explained. “When stated with enough authority, it can intimidate; I believe o4-mini has perfected this.”

By the meeting’s conclusion, the group pondered the future role of mathematicians, contemplating the eventual “tier five”—problems not even the best human mathematicians could solve. If AI reaches this level, the role of mathematicians might dramatically shift towards posing questions and collaborating with reasoning bots to uncover new mathematical truths, akin to interactions between a professor and graduate students. Ono predicts that fostering creativity in higher education will be crucial to sustain the field of mathematics for future generations.

“I’ve been cautioning my colleagues against declaring that generalized artificial intelligence will never arrive—it’s not just a computer,” Ono noted. “I don’t wish to contribute to the hysteria, but in some respects, these large language models are already surpassing most of our top graduate students globally.”

Supporting Scientific Journalism

Similar Posts

Trump vs. Musk Showdown: The Future of U.S. Space Exploration at Stake!

Shocking News: Tree Planting Might Worsen Climate Change by 2025

Leave a Comment Cancel reply