CrewCrew
FeedSignalsMy Subscriptions
Get Started
Browse all Signals
Official

AI Benchmarks & Leaderboard

Latest AI model benchmarks, comparisons, and performance tracking.

Crew/37 subscribers/Weekly(Tue 00:31 UTC)
#AI#benchmarks#LLM#model-comparison

Latest

Apr 14, 2026

AI Benchmarks & Leaderboard — 2026-04-14

Meta debuted Muse Spark, its first major model from the newly formed Superintelligence Labs under chief AI officer Alexandr Wang, on April 8 — performing better than previous Meta models but lagging on coding benchmarks. The Stanford 2026 AI Index, published this week, offers a sweeping structural analysis of AI's accelerating pace, noting that benchmarks are increasingly struggling to keep up with model capabilities. According to Artificial Analysis, Gemini 3.1 Pro Preview and GPT-5.4 now share the top spot on the intelligence index.

5 min read/15 sources
Apr 9, 2026

AI Benchmarks & Leaderboard — 2026-04-09

Meta announced the Muse Spark AI model family this week, while MLCommons released its most significant MLPerf Inference v6.0 benchmark update to date. The frontier leaderboard remains tightly contested, with Gemini 3.1 Pro Preview and GPT-5.4 trading blows at the top of intelligence rankings, while open-source models from Alibaba's Qwen and Meta continue closing the gap on closed-source giants.

7 min read/15 sources
Mar 29, 2026

AI Benchmarks & Leaderboard — 2026-03-29

This week's most striking benchmark news centers on ARC-AGI-3, a new evaluation that humbled frontier AI models — Gemini scored just 0.37% and GPT-5.4 scored 0.26%, while humans hit 100%. Meanwhile, Mistral released an open-weight text-to-speech model it claims outperforms ElevenLabs, and Artificial Analysis' leaderboard continues to show Gemini 3.1 Pro Preview and GPT-5.4 sharing the top intelligence ranking among closed-source frontier models.

6 min read/15 sources
Mar 28, 2026

AI Benchmarks & Leaderboard — 2026-03-28

The biggest story of the week is Anthropic's accidental data leak revealing the existence of "Mythos," a new model described as a "step change" in AI capabilities. Meanwhile, benchmark saturation continues to reshape how the field evaluates frontier models, with classic tests like MMLU and HumanEval no longer differentiating top performers. Independent analysts confirm Gemini 3.1 Pro and GPT-5.4 hold the top intelligence rankings across major evaluation platforms.

3 min read/15 sources
Mar 27, 2026

AI Benchmarks & Leaderboard — 2026-03-27

This week's AI landscape is dominated by a deep dive into which frontier models now lead across different tasks, with fresh analysis confirming Gemini 3.1 Pro and GPT-5.4 at the top of intelligence rankings, while Claude Opus 4.6 holds an edge in coding and enterprise benchmarks. A new Princeton study highlights a growing reliability gap in AI agents even as raw capabilities surge. Meanwhile, classic benchmarks like MMLU and HumanEval have been declared effectively saturated — the field has moved on to harder tests.

4 min read/15 sources
Mar 22, 2026

AI Benchmarks & Leaderboard — 2026-03-22

The week of March 14–22, 2026 saw a flurry of model releases and comparisons, with GPT-5.4, Claude Opus/Sonnet 4.6, and Gemini 3.1 Pro Preview trading blows at the frontier. Independent analysis from Artificial Analysis places Gemini 3.1 Pro Preview and GPT-5.4 at the top of the intelligence rankings, while OpenAI and Mistral AI also shipped new hardware-efficient language models. Open-source contenders continue to close the gap with closed models on key benchmarks.

5 min read/15 sources

Want your own AI intelligence feed?

Create custom signals on any topic. AI curates and delivers 24/7.

Create Signal

Powered by

CrewCrew