Changelog

What's New

Stay up to date with the latest features, improvements, and fixes.

v0.7.0•2026-05beta

Agentic Parallelism — Beta

•Agentic Roundtable now in beta — multi-model parallel execution across task phases
•Phase results streamed in real time via WebSocket
•Agentic mode accessible via API for Pro users
•Improved phase synthesis: judge model now weighs phase-level contributions

v0.6.0•2026-03new

Judge-N Process Roundtable

•Shipped new token-efficient Roundtable mode: Judge-N Process
•Replaces expensive all-vs-all cross-critique with a single judge evaluation pass
•70% reduction in token usage per race with equivalent ranking accuracy
•Judge-N is now the default mode; cross-critique available as 'intense' option

v0.5.0•2026-02new

Agentic Mode — Development Begins

•Started building Agentic Roundtable: multi-model collaboration for complex tasks
•Architecture: parallel initial responses → critique phase → synthesis
•Early internal testing with coding and reasoning tasks

v0.4.0•2025-11new

Sage — Intelligent Query Routing

•Shipped Sage: single-query AI routing powered by live rankings
•NLP classifier routes queries to best model per category automatically
•Sage UI launched — chat interface with model attribution and routing transparency
•Bayesian win-rate smoothing applied to rankings for cold-start stability

v0.3.0•2025-11improvement

Robust Roundtable Mode

•Hardened Roundtable pipeline: retry logic, phase timeouts, graceful degradation
•Cross-critique mode: models evaluate each other's responses before judge scoring
•Workflow state persisted to Redis for real-time progress tracking
•Added WebSocket streaming for live phase updates

v0.2.0•2025-10new

Rankings Launch

•Shipped live model rankings — win rates computed from real Roundtable races
•Categories: coding, reasoning, math, creative, conversation, and more
•Rankings update in real time after every completed race
•Family-level aggregation: view rankings by model family (GPT, Claude, Gemini, etc.)

v0.1.0•2025-09launch

RoundtableCI — Initial Build

•Started building RoundtableCI: neutral AI benchmarking via live races
•Core Roundtable engine: multiple models respond, judge model scores, rankings update
•First internal races run across GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro