Artificial intelligence systems now breeze through many academic tests that once challenged both machines and people. That ...
Tests that once challenged advanced AI models are now being solved with ease, making it harder for researchers to pinpoint what current systems are actually capable of.
A global team developed Humanity’s Last Exam, a rigorous new test built to expose gaps in today’s most advanced AI models.
In updated tests published to the Humanity's Last Exam website, Gemini's 3.1 Pro model achieved 45.9 percent accuracy, with a ...
As artificial intelligence systems rapidly outgrow traditional academic benchmarks, researchers have unveiled an ambitious new test designed to probe the true limits of machine intelligence.
Researchers debut "Humanity’s Last Exam," a benchmark of 2,500 expert-level questions that current AI models are failing.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results