Live Rankings
12
Models
8
Categories

Mixed or uncategorized tasks

"Live in this category" means at least one roundtable was recorded here. "Static preset" is the Sage benchmark file for this category until roundtables arrive. "Other categories" averages that provider's scores from tabs where it does appear. Rolling win rates are not shares of a single roundtable, so they do not add up to 100%.

GPT
0 queries
No data
2
Claude
0 queries
No data
3
Gemini
0 queries
No data
4
Grok
0 queries
No data
5
DeepSeek
0 queries
No data
6
Qwen
0 queries
No data
7
Llama
0 queries
No data
8
Kimi K2.5
0 queries
No data
9
Mistral
0 queries
No data
10
Phi-4
0 queries
No data
11
Command R+
0 queries
No data
12
IBM Granite
0 queries
No data

Performance Trend

Top 3 models over last 4 weeks

Score Comparison

All models in General