Official Session Summary
Pulled from the live conference page.
Static benchmarks tell only part of the story. At OpenRouter, we observe a different reality—one shaped by billions of real-world requests, rapidly evolving models, and agents operating in production. In this talk, we’ll explore how the shift toward agent-driven workflows is redefining how we evaluate model performance. We’ll look at data from across the stack to understand trends like exploding token usage, longer context windows, and the rise of tool-calling systems. Along the way, we’ll highlight what actually matters in practice: reliability, cost, and the ability for models to take meaningful actions. Beyond benchmarks, you’ll see how real-world usage reveals the true capabilities—and limitations—of modern AI systems.
Speaker Background
Quick context on the person or people on stage.
Founding engineer at OpenRouter, close to real-world model traffic and practical tradeoffs in inference, reliability, tool use, and cost.
Why This Slot Matters
A compact framing layer for navigating the conference.
This is one of the more substantive abstract-backed sessions on the schedule; worth opening when you need enough context to decide whether to stay in the room.