Grok 4 Performance Benchmarks

News

xAI hired gig workers to boost Grok on a key AI leaderboard and 'beat' Anthropic's Claude in coding

Internal docs show xAI paid contractors to "hillclimb" Grok's rank on a coding leaderboard above Anthropic's Claude.

China’s Kimi K2 Could Be the Next DeepSeek Moment

“China’s Kimi K2 is having its mini DeepSeek moment: it is now #14 on OpenRouter today, ahead of Grok 4 and GPT-4.1,” Deedy ...

CDOTrends8h

New AI Models Push the Envelope Again

As reported by TechCrunch, Grok 4 scored 25.4% on Humanity’s Last Exam without “tools,” outperforming Google’s Gemini 2.5 Pro ...

17h

Grok 4 leapfrogs Claude and DeepSeek in LLM rankings, despite safety concerns

Grok 4 by xAI was released on July 9, and it's surged ahead of competitors like DeepSeek and Claude at LMArena, a leaderboard ...

Grok 4 benchmark results: Tops math, ranks second in coding

Grok 4 is a huge leap from Grok 3, but how good is it compared to other models in the market, such as Gemini 2.5 Pro? We now ...

Modern Engineering Marvels on MSN1d

Pentagon’s Grok Gamble: Can Musk’s AI Outpace Rivals and Outrun Controversy?

Is the Pentagon’s $200 million bet on Grok a masterstroke for American AI supremacy, or a high-stakes experiment with untested technology and ethics? The Department of Defense’s selection of xAI’s ...

Is Grok 4 the smartest AI model yet? Why Elon Musk’s new model is winning praise

Grok 4 is leading several notable benchmarks, narrowly beating seasoned players like OpenAI and Google. Since its launch, ...

Cryptopolitan on MSN2d

Chinese AI model Kimi K2 undercuts rivals with low prices

In a fresh twist to the growing AI rivalry, Alibaba-backed startup Moonshot has unveiled its latest large language model, ...

Elon Musk's xAI is already shockingly massive

CEO Elon Musk’s year has felt more like a high-stakes reality show. The EV giant finally rolled out its Robotaxis, though not ...

Another Chinese AI model is turning heads

Alibaba-backed startup Moonshot released on late Friday night its Kimi K2 model, touting performance that rivals many U.S.

TechJuice3d

Alibaba-Backed Kimi K2 Disrupts AI Market with Open-Source Power

Moonshot’s Kimi K2 AI model beats GPT-4.1 and Claude in coding benchmarks, offers open-source access, and slashes costs ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results