Official Session Summary
Pulled from the live conference page.
Given the same compute budget, does a single frontier model outperform a system of specialized models? Our research says no. We trained three task-specific models for the subtasks budget, multi-model wins: every frontier model we pair with hits #1 on SWE-Bench Pro, 15% cheaper and 28% faster than running alone - with just WarpGrep. As frontier models saturate tasks, those tasks should move to smaller models with custom inference engines. The expensive model reasons. The cheap models do the mechanical work. This talk covers the CUDA kernels, RL training, and speculative decoding behind that split, and why it's the natural way intelligence organizes under compute constraints.
Speaker Background
Quick context on the person or people on stage.
Founder of Morph LLM, exploring specialized model systems and what multi-model architectures can do better than a single generalist.
Why This Slot Matters
A compact framing layer for navigating the conference.
This is one of the more substantive abstract-backed sessions on the schedule; worth opening when you need enough context to decide whether to stay in the room.