How to Use This
Tap a talk to open its own page. Speaker chips jump to the speaker index. Built for scanning, not brochure prose.
I used the live AI Engineer Miami schedule plus the visible speaker roster, then rebuilt it in a Focus.AI Labs style. Breaks and lunch stay lightweight; anything with a listed abstract or speaker gets a detail page.
Day 1 / Monday
Opening day: Dax Raad, Dexter Horthy, Kent C. Dodds, Rita Kozlov, Jesse Willman, and more.
Welcome to AI Engineer Miami
Opening Remarks
You Don't Have Any Good Ideas
Everything We got Wrong About RPI
No Vibes Allowed is the second most-watched talk of the 2025 AI engineer code summit, and in the top five most-watched AI Engineer talks of all time. From it, you know I'm a big fan of Research / Plan / Implement. For the last 6 months, we've been working with large orgs (hundreds or thousands of engineers) to adopt advanced context engineering techniques for coding agents. We learned a ton of things we got wrong, and are here to share some lessons trying to scale these techniques to large teams working on a broad variety of complex codebases with hundreds of repos spanning platform, product, DevOps, and more. We'll talk about the shortcomings of previous approaches, and how we're now breaking down research and planning into more discrete steps, using control flow instead of prompting for managing workflows, and unique perspectives on "what is PR slop, where does it come from, and how to combat it", and a whole lot more.
What Does Good Taste in DX Look Like in the Age of Agents?
As coding agents become part of the development workflow, great developer experience is no longer just about features, speed, or abstraction—it is about taste. What should feel seamless, what should stay explicit, and how do you build tools at the right level of abstraction so that agents can use them well immediately? This panel will explore what good taste in developer experience looks like in the age of agents: the judgment to create developer experiences that are intuitive, legible, and trustworthy for humans and agents alike.
Morning Break
From Local to Remote: Working with Coding Agents over SSH
Many engineers have already adopted terminal-based coding agents. This talk is about the next step: moving those workflows off your laptop and onto remote machines. I’ll show why I like running coding agents over SSH: remote machines stay up 24/7, don’t compete with local resources, make persistent sessions easier, and create a better environment for long-running or multi-agent work. We’ll cover remote machine choices, tmux workflows, CLI agent options, terminal-friendly browsing and screenshot tools, TUI development, editors, and review workflows. If you already use coding agents in the terminal, this talk is about how to make that setup more durable, scalable, and agent-native.
The Rise of AI Agents in the Wild
Static benchmarks tell only part of the story. At OpenRouter, we observe a different reality—one shaped by billions of real-world requests, rapidly evolving models, and agents operating in production. In this talk, we’ll explore how the shift toward agent-driven workflows is redefining how we evaluate model performance. We’ll look at data from across the stack to understand trends like exploding token usage, longer context windows, and the rise of tool-calling systems. Along the way, we’ll highlight what actually matters in practice: reliability, cost, and the ability for models to take meaningful actions. Beyond benchmarks, you’ll see how real-world usage reveals the true capabilities—and limitations—of modern AI systems.
How to Embed AI Code Quality Gates in Your SDLC
AI is a force multiplier that turns weak standards into architectural chaos. As code review becomes the ultimate bottleneck, engineering teams must bridge the gap between human intuition and machine output. This talk introduces a holistic framework for designing quality-driven systems around AI coding. We'll explore how to codify development best practices into machine-readable guardrails. We'll walk through how to leverage context engineering at scale to ensure your AI code tools respect your system’s design, preserving long-term maintainability without sacrificing the speed of the AI era. Attendees will learn how to embed code quality into the SDLC from AI-generated output. They will leave with a practical methodology for encoding best practices into their workflow, where developers already operate (CLI, IDE, and Git).
Lunch
Software Development Now Costs Less Than Minimum Wage
The cost of software development has fallen to $10.42 an hour—less than minimum wage. A burger flipper at Macca's earns more. What does it mean to be a software developer when everyone in the world can develop software? Tools like Cursor have commoditised the knowledge and skill of software development, enabling non-developers to build and ship. In this talk, Geoffrey Huntley shares a cold, stark view of how AI is reshaping the unit economics of business. Drawing on a year of game theory around Ralph Loops, and conversations with venture capitalists in Australia, South Korea and San Francisco, he explores the K-shaped divergence: model-first companies operating as lean apex predators versus incumbents struggling through people transformation.
How to quantize models (without killing quality)
Four-bit quantization has a bad reputation for destroying model quality. While it’s true that post-training quantization in 4-bit integer formats makes models noticeably worse, new microscaling data formats like MXFP4 and NVFP4 deliver on the promise of fast low-precision inference without meaningful quality loss. This talk introduces these data formats along with a shift from quantization as a binary decision to quantization as a granular process with model-level considerations (quantization across weights, activations, KV cache, attention) and layer-level considerations (quantization of input, output, and hidden layers) to help you preserve quality while accessing improved performance and cost characteristics from low-precision inference.
From Prompt to Production: Maximizing Value with Google's GenMedia Models
Unlock the product potential of Google’s latest generative media models in this focused deep dive. Move beyond the hype and discover how to extract tangible user value from cutting-edge models like Veo, Nano Banana, and Lyria.
From Tickets to PRs: Shipping a Governed Snowflake Ops Agent with LangGraph and MCP
Most AI agents never make it past the experiment phase, especially when they touch sensitive data and regulated workflows. At Pinterest, we built Agent Snowy, a LangGraph‑based agent that automates routine Snowflake data warehouse operational requests end‑to‑end, designed to cut median resolution time from hours down to minutes for supported flows. The agent takes requests from Slack and Jira ticket intake through to generating auditable SQL and GitHub PRs, all without direct write access to production. This talk will walk through how we wired LLMs, the Model Context Protocol (MCP), and existing CI/CD pipelines together, and the concrete guardrails we put in place to keep the system secure and compliant. Attendees will leave with a practical blueprint for turning their own routine operational tickets into safe, auditable agent workflows—without handing the keys to production over to an LLM.
Afternoon Break
Build a Free Agent
Most AI assistants are still just chats with tool access. This talk shows a different approach: using MCP as a personal runtime. I'll walk through how Kody uses `search`, `execute`, and `open_generated_ui` to discover capabilities, run sandboxed workflows, manage and use memory, keep secrets out of prompts, and turn generated interfaces into reusable software. The goal isn't a better model. It's making AI assistants portable, secure, and actually useful across MCP hosts. In this talk, you'll learn that with the right primitives, you can create a highly capable assistant without paying an extra cent for inference.
Building Infrastructure That Scales to a Billion (or Trillion!) Agents
Here's some napkin math: 100 million US knowledge workers, 15% concurrency, one agent each. That's 15 million simultaneous sessions. Now give each person three agents. Now do the rest of the world. Containers can't touch this. So let's talk about things agents actually need: a way to get their own compute instantly without dragging along an entire OS, and the ability to just write code when tool-calling gets awkward (which is often). Dynamic Workers give you an isolated execution environments that materialize in milliseconds and vanish when they're done. We'll talk all the details how it works, what it looks like in practice, and what's next.
Building a Truly Strong General Model
While domain-specific fine-tuning of models has seen massive improvements, this talk focuses on the side of how some of the frontier labs think about building outstanding generalist models, which via either fine-tuning or out-of-the-box, provide a great experience. We will talk broadly about what works, what doesn't, and what to keep in mind if you set out on a journey to build a foundation model!
You're using the wrong AI SDK
BAML, Vercel's AI SDK, the Pi Agent SDK, and the Opencode SDK. Which one should you build with?
Day 2 / Tuesday
Second day: adoption, latency debt, mobile AI, memory, OSS apps, and IDE futures.
Check In and Breakfast
Welcome to Day 2
Transforming Programming Mindsets: Case Studies in Agentic Coding Adoption
In the rapidly evolving landscape of AI, skilled programmers often face an existential crisis, questioning the value of their expertise as their profession adopts agentic coding. This talk delves into the personal journeys of individual engineers working on a high-profile G2i project who not only adapt but thrive through agentic development. Drawing from the human-centered perspective of the counseling profession, these case studies reveal that agentic coding doesn't diminish the value of existing skills; instead, it amplifies and enriches them. Attendees will discover how empathy, adaptability, and deep human insight can enhance AI agent integration, ultimately proving that the richness of human expertise is more valuable than ever. Through these narratives, we’ll explore how AI can be a powerful amplifier of human skill, turning potential anxieties into opportunities for growth and innovation.
Help! We're DEEP in (latency) Debt
In software engineering, "technical debt" refers to the accumulated cost of shortcuts and slop code that works today but creates problems tomorrow. You move fast, auto-accept AI suggestions, and defer the cleanup. Latency debt works the same way. Over the past several years, we've spent enormous resources making AI models more capable, bigger, smarter, more contextually aware. What we haven't done is make the infrastructure keep pace. We optimized for intelligence. We deferred the cost of speed. That bill is now due.
Ambient Generative AI: Deploying Latent Diffusion Models on Mobile NPUs
Most "AI-powered" mobile apps are thin wrappers around hosted APIs, tethered by high latency, cloud costs, and privacy concerns. This talk demonstrates a radical alternative: a 100% offline, sensor-driven image generator running locally on the Samsung Galaxy Z Fold 7. We will explore the technical journey of bridging React Native (Expo) with the device's NPU using ONNX Runtime and the Android Neural Network API (NNAPI). By mapping real-time hardware sensor data (Ambient Light) to latent space prompts, I demonstrate a new UX pattern for offline "Zero-Prompt" generative experiences. This session is a deep dive into the engineering required to move generative models from data centers to our pockets and good practicies how to scale that to other devices with powerful neural processing units such as iPhones.
Morning Break
Everything is Models
Given the same compute budget, does a single frontier model outperform a system of specialized models? Our research says no. We trained three task-specific models for the subtasks budget, multi-model wins: every frontier model we pair with hits #1 on SWE-Bench Pro, 15% cheaper and 28% faster than running alone - with just WarpGrep. As frontier models saturate tasks, those tasks should move to smaller models with custom inference engines. The expensive model reasons. The cheap models do the mechanical work. This talk covers the CUDA kernels, RL training, and speculative decoding behind that split, and why it's the natural way intelligence organizes under compute constraints.
Coding Agents Ate the World
A huge portion of the industry spent the last two years building the wrong agents. We got chains, workflows, better chatbots. A hundred VC-funded frameworks gave us the illusion of agency wrapped in brittle control structures and old school determinism. Meanwhile, the teams actually winning kept arriving at the same uncomfortable answer: give the agent a code environment, and everything else solves itself. Coding agents aren't a developer tool. They never were. Code is the universal execution harness, it composes, it calls, it verifies, it spawns other agents, it runs at 3am without asking for clarification. No other AI modality does all of that. Not chatbots. Not RAG. Not your carefully orchestrated "multi-agent" workflow (which is really RPA) that demos beautifully and falls apart in week two. In this talk we'll trace the arc from deterministic pipelines to genuine agency, name the wrong turns, and make a claim that will make some people in this room uncomfortable: This is not a developer skill anymore. It's a leadership primitive.
Effective Context Engineering Techniques for AI
AI systems need more than intelligence; they need context. Without it, even the most advanced models can misinterpret information, lose track of details, or arrive at conclusions that don’t hold up. Context engineering is emerging as a discipline that shapes how AI perceives, recalls, and reasons about information. This talk will explore how context provides the foundation for reasoning, problem solving, and explainability in AI. We will look at techniques such as connected memory, contextual retrieval, and graph-based knowledge representation that give large language models a more reliable way to connect information and draw logical conclusions. Attendees will come away with a practical understanding of how to design effective context pipelines that align AI with real-world knowledge and user intent, and why context engineering is becoming a central part of building trustworthy and impactful AI systems.
Lunch
My Robot Thinks You’re a 10: Engineering Zero-Shot Compliments with Reachy Mini
We talk about "agents" constantly, but most are still trapped behind a glass screen. This session explores the engineering challenges of Embodied AI by turning the Reachy Mini into a real-time, multimodal Hype Robot. We will move past simple scripted movements to a solution where the robot perceives the audience and generates contextual, physical responses. I’ll dive into the technical stack required to bridge the gap between high-level LLM reasoning and low-level servo actuation. Attendees will learn how to manage latency in vision-to-action loops and how to build something interesting and fun with an open source robot connected to an LLM.
Your next user won't have eyes
Devtools have always had APIs, but the polish goes to dashboards, keyboard shortcuts for power users, and per-seat pricing. Meanwhile, agents are reading your docs more than humans are, wiring up services, spinning up sub-agents, and digging through logs at 3am while you snooze. They don't need a dashboard. They don't have eyes. This talk is about designing for that user: skills generated from your docs, clipboard payloads agents can act on, and error messages that tell them exactly what to fix. Your role is shifting from operator to architect. Come find out what that looks like.
Kill Your Retrieval Pipeline: Agentic Memory Is the New State of the Art
The retrieval pipeline is the default architecture for AI memory: embed, index, search, rerank, hope the right context makes it through. It's also a dead end. The next paradigm for AI memory is agentic. Instead of building elaborate infrastructure to compensate for what models can't do, let the model do the work. This talk covers how this approach produced a new state of the art on the most widely used long-term memory benchmark in conversational AI, why the field got stuck building workarounds, what it looks like when you stop, and where AI memory is actually heading.
Skills Issues
There is an enormous difference in quality and efficiency of outcomes when building or using agentic systems. We've talked about context engineering, competing standards and protocols, waves of interfaces and harnesses being 'dead' with another wave of similar shapes and techniques temporarily taking the hyped limelight...but what's missing in all these conversations is a common, clear-headed, unhyped set of intuitions and expectations we all can rely on to build great agentic things. To enjoy the new good old-fashioned engineering.
Afternoon Break
Using OSS models to build AI apps with millions of users
In this talk, Hassan will go over how he builds open source AI apps that get millions of users like roomGPT.io (2.9 million users), restorePhotos.io (1.1 million users), Blinkshot.io (1 million visitors), and LlamaCoder.io (1.4 million visitors). He'll go over his journey in AI, demo some of the apps that he's built, and dig into his tech stack and code to explain how he builds these apps from scratch. He’ll also go over how to market them and go over his top tips and tricks for building great full-stack AI applications quickly and efficiently. This talk will start from first principles and give you a glimpse into Hassan’s workflow of idea -> working app -> many users. Attendees should come out of this session equipped with the resources to build impressive AI applications and understand some of the behind the scenes of how they’re built and marketed. This will hopefully serve as an educational and inspirational talk that encourages builders to go build cool things.
The Multi-Model Future is Open Source
There are more great AI models than ever, and the best one changes every few weeks. But most AI coding tools were built to lock you into a single provider. In this talk, I will share what we're seeing over the thousands of enterprise conversations and their shift toward efficiency and control. The winners will be the tools that give developers the freedom to choose. Multi-model and open source aren't features. They're the foundation of what comes next.
MCP vs. Command Line: A Head-to-Head Evaluation of Agent Tool Integration Patterns
As AI agents become increasingly capable, a critical architectural question emerges: how should we give them access to tools and capabilities? Two competing patterns have gained traction—Model Context Protocol (MCP), which enables dynamic, runtime tool discovery and execution, and the command line, including skill files, which embed structured instructions and best practices directly into the agent's context. But which approach actually produces better outcomes? In this talk, I'll present results from a rigorous evaluation comparing agent performance across both paradigms. I'll show how each approach affects task completion rates and output quality across document generation, data analysis, and multi-step workflows.
IDEs are dead. Long live IDEs
We've been hearing a lot about how IDEs are dying. But, are they really dead? My view is that the definition of what an IDE is will change. There are still essential IDE-like features which are necessary for productive software engineers to do their job, but there are also a million other things which are not required anymore. I'll go through some fun examples from my experience at Cursor, and show you some of our explorations for the future of the IDE interface.
Afterparty
Speaker Index
Fast background on the people on stage.

Alisa Fortin
Product Manager at Google DeepMind for Google AI Studio, helping bring frontier generative capabilities into usable product surfaces.

Alvin Pane
Engineering Lead at OutRival, working on memory systems and challenging default assumptions around retrieval-first AI architectures.

Anna Juchnicki
Senior Software Engineer at Pinterest, bringing an enterprise systems perspective on governed agents, infra safety, and operational automation.

Ben Davis
Known from T3 and adjacent developer-media circles, bringing a sharp, opinionated angle on SDK choices and what developers should actually build with.

Ben Vinegar
Founder of Modem and longtime developer tooling leader, known for pragmatic workflows and an operator’s perspective on how tools fit real teams.

Dave Kiss
Senior Community Engineering Lead at Mux, focused on APIs, developer ergonomics, and what it means to design products for agents as first-class users.

David Gomes
Software Engineer at Cursor, working inside one of the most visible AI coding products and thinking about how the IDE is mutating rather than disappearing.

David House
Engineering Manager at G2i, close to real-world adoption stories and the human side of agentic development inside teams.

Dax Raad
Co-founder of OpenCode and a recognizable voice around coding agents, context engineering, and practical AI-assisted software workflows.

Dexter Horthy
CEO and co-founder of HumanLayer, focused on production agent workflows, human-in-the-loop systems, and making AI actions accountable inside real organizations.

Erik Thorelli
Head of DX at CodeRabbit, focused on code review, workflow quality, and the shape of useful agentic engineering primitives.

Gabe Greenberg
CEO and founder of G2i, the engineering talent platform behind AI Engineer events and the broader React Miami / Frontier Tech Week ecosystem.

Geoffrey Huntley
Founder of Latent Patterns, known for strong takes on software economics, AI-enabled development, and the shifting leverage profile of engineering teams.

Guillaume Vernade
GenMedia Developer Advocate at Google DeepMind, focused on helping developers work with Google’s generative media stack in practice.

Hassan El Mghari
Director of Developer Experience at Together AI, known for shipping popular AI apps and speaking concretely about idea-to-product loops.

Jesse Willman
At Cohere leading engineering program work around machine learning, with a vantage point on what it takes to build and ship strong general-purpose models.

Kent C. Dodds
Independent software engineer and educator, widely known in the JavaScript world and increasingly active in agent workflows, DX, and pragmatic tooling.

Laurie Voss
AI Engineer / Head of DevRel at Arize AI, with a strong systems and evaluation lens on tooling patterns, observability, and agent performance.

Lech Kalinowski
Senior AI System Engineer at CallStack, working near the edge of mobile, on-device AI, and practical deployment constraints.

Lena Hall
Senior Director of Developer Relations at Akamai, with a practical bent toward developer platforms, demos that move, and embodied AI experimentation.

Max Stoiber
At OpenAI working on ChatGPT Apps; previously known for frontend infrastructure and developer tooling, with sharp opinions on developer experience and product taste.

Nnenna Ndukwe
AI Principal Developer Advocate at Qodo, working at the intersection of AI-assisted coding, workflow quality, and developer education.

Nyah Macklin
Senior Developer Advocate at Neo4j, bringing graph and knowledge representation ideas into the discussion around AI context and memory.

Philip Kiely
Head of Developer Relations at Baseten, translating model serving and inference tradeoffs into practical guidance for builders shipping AI products.

Rick Blalock
Founder of Agentuity, focused on agent-native product design and the argument that coding environments are becoming the universal runtime for useful agents.

Rita Kozlov
VP of Product at Cloudflare, working on infrastructure that supports internet-scale workloads and the next wave of agentic compute.

Sarah Chieng
Head of DevX at Cerebras, with a systems-level perspective on performance, developer experience, and the real cost of inference bottlenecks.

Shashank Goyal
Founding engineer at OpenRouter, close to real-world model traffic and practical tradeoffs in inference, reliability, tool use, and cost.

Stefan Avram
Head of Business at OpenCode, speaking from the front lines of how enterprises choose coding tools, models, and degrees of openness.

Tejas Bhakta
Founder of Morph LLM, exploring specialized model systems and what multi-model architectures can do better than a single generalist.