A quarterly about machines that think — and the substrates that hold them.
There is a convention in this industry, barely two years old and already calcifying, that an AI agent is a thing which calls tools. You give the model a menu: ten functions, fifty functions, five hundred. It reads the menu, picks a line, fills in the parameters, and waits. You run the function. You hand back the result. Repeat until done, or until the context window gives up, whichever comes first.
This picture is already wrong, and what Cloudflare shipped across the autumn and winter of 2025–26 is the clearest evidence yet that the industry has been building around a false primitive. The real primitive was there the whole time. It is called code.
Large language models have seen millions of lines of real production code. They have seen roughly zero lines of your bespoke tool-call schema. Ask which format they are fluent in. Then stop asking them to speak a pidgin.
That is the argument at the center of this issue — and it turns out to have structural consequences that reach far beyond token accounting. If the agent writes code, the code needs somewhere safe to run. If it runs somewhere safe, that somewhere can be anywhere. If it can be anywhere, the agent is no longer tied to a single session or a single laptop. It becomes, as the Cloudflare team puts it in their Project Think announcement, less like a tool and more like infrastructure.
This issue is our attempt to map what that shift looks like from inside the platform that is building the clearest version of it. We are not neutral about the direction. We are, however, trying to be honest about the trade-offs.
— The Desk
If a language model is a program that completes text, and code is the text it was most carefully trained on, why are we still asking it to fill out forms?
The conventional MCP server, circa late 2024, was a catalog. Each tool a card in the catalog. Each card, consuming some number of tokens in the model's context window every time the model had to reason about what was available. For a small service — five tools, ten tools — this was tolerable. For anything the size of a real cloud platform, it was a catastrophe in slow motion.
Cloudflare has roughly 2,500 API endpoints. Rendered as individual tool definitions, the full Cloudflare MCP server would consume approximately 1.17 million tokens — more than the context window of every frontier model currently in production. No agent could use it. No agent could even load it.
So the team did something that in retrospect looks obvious. They gave the model two tools. One called search. One called execute. The search tool returns relevant slices of the OpenAPI spec on demand. The execute tool runs JavaScript the model writes.
Everything else — pagination, retries, conditional branching, chaining three API calls and returning only the field you need from the third — happens inside that single execution block. The model does not narrate. It codes.
The numbers are almost comic. A fixed footprint of roughly 1,000 tokens, regardless of how many endpoints the underlying API exposes. A 99.9% reduction from the naïve MCP implementation. And — this is the part that keeps getting buried — it works better. Not just cheaper. Better.
The reason is training distribution. Every major frontier model has been trained on an ocean of real TypeScript, real Python, real Go. Tool-call schemas are a dialect the model learned from a few hundred synthetic examples inside a fine-tuning set. Asked to chain five tool calls, the model stumbles. Asked to write a function that makes five API calls and returns a summary, the model is on familiar ground. It is writing the kind of code a mid-career engineer writes on a Tuesday afternoon.
This observation, once stated, starts to feel retroactively inevitable. Anthropic arrived at essentially the same pattern independently, published as Code Execution with MCP. The CodeAct paper had pointed in this direction in early 2024. The convergence is the signal.
The elegant part of the Cloudflare implementation is what the sandbox presents to the model. Not a raw API. Not a tool list. A typed JavaScript environment where every tool is a method on a codemode.* namespace, with TypeScript definitions generated automatically from the underlying tool specs.
Tool names with hyphens or dots — common in MCP — are automatically sanitized into valid identifiers. my-server.list-items becomes my_server_list_items. The $refs in OpenAPI specs are pre-resolved before the spec is ever passed into the sandbox. Authentication lives on the host side, never inside the code the model writes.
None of this is surprising engineering, individually. Taken together it represents a rethinking of what we hand the model at inference time — from a menu of discrete operations to a fluent programming environment with all the affordances a senior engineer would expect.
// The model writes this. It runs in a sealed V8 isolate. // No filesystem. No env vars. Outbound fetch disabled. const zones = await codemode.cloudflare_zones_list(); const problematic = zones.filter(z => z.status !== 'active'); for (const zone of problematic) { const rules = await codemode.cloudflare_firewall_rules_list({ zone_id: zone.id }); if (rules.length === 0) { await codemode.cloudflare_ddos_protection_enable({ zone_id: zone.id, mode: 'high' }); } } return { enabled: problematic.length };
What you see above is, in one respect, deeply unremarkable: it is a piece of TypeScript that any competent engineer could write in a minute. That is precisely the point. The model, writing this, is not performing an exotic feat. It is doing the thing it is best at.
An isolate, not a container. The difference is the whole business model.
The obvious place to run AI-generated code is a container. Spin up a Linux environment. Install what you need. Execute. Tear down. Every AI sandboxing startup in 2024 and early 2025 was essentially a layer of orchestration on top of this basic loop, and most of them were good, in a tolerable-cost, minutes-to-start, megabytes-of-RAM kind of way.
The problem is the math. Containers take hundreds of milliseconds to boot and hundreds of megabytes to hold. At a consumer scale — one agent per user, and in Project Think's vision, potentially several agents per user — the unit economics collapse. You cannot keep a warm container for every end user of every application. You cannot spin one up on every request. You can cheat by reusing containers across users, and many people do, but then you have traded the security property that was the entire point of the sandbox.
What Cloudflare had, quietly, for years, was a different primitive: the V8 isolate. A Worker is an isolate. It starts in milliseconds. It holds in single-digit megabytes. It was engineered originally as a way to run untrusted third-party code at the CDN edge, which means the security posture is battle-tested in ways most container runtimes are not.
The Dynamic Worker Loader, which shipped alongside the Code Mode announcement, lets a Worker instantiate another Worker on the fly, with code specified at runtime, inside a fresh isolate on the same physical machine. No round trip to a warm pool. No sizing decisions. Whatever region the request landed in is the region the sandbox runs in, microseconds after the model finishes writing.
The sandbox guarantees matter. The default Dynamic Worker has no filesystem. It has no environment variables — a hard guarantee against the prompt-injection pattern of tricking an agent into exfiltrating secrets. Outbound fetch() and connect() are blocked at the runtime level via globalOutbound: null. Anything the code inside the sandbox needs to reach must come through an explicit fetcher handler on the host side. The host keeps the secrets. The sandbox borrows capability, not credentials.
The catch, of course, is that the sandbox runs JavaScript. Workers also support Python and WebAssembly, but for the small snippets a model writes on demand, the load-and-run cost of JS is a fraction of the alternatives. For a human, that is a preference question — plenty of engineers would rather write Python than JS. For a model, the choice is irrelevant. The model is equally fluent in both; the runtime cost is not.
If you wanted to be uncharitable about this, you would point out that Cloudflare has built a product that plays to the precise strength of the runtime they already operate. You would not be wrong. You would just be missing the larger claim, which is that the shape of the agent economy — one instance per user, dormant most of the time, billions of instances at the tail — happens to align very neatly with what V8 isolates do well and what containers do poorly.
It is possible this alignment will turn out to be a coincidence of timing. It is also possible Cloudflare saw this coming.
| Layer | What it holds | Runtime |
|---|---|---|
| Host Worker | Secrets, auth, fetch handlers, tool implementations | Standard Worker isolate |
| Dynamic Worker (sandbox) | Model-generated code only | Fresh isolate, no FS, no env, no outbound |
| RPC bridge | codemode.* method calls | Workers RPC |
| Tool backend | MCP servers, REST APIs, internal bindings | Host-side, credentialed |
An ephemeral agent is a tool. A durable agent is infrastructure. Cloudflare is betting the distinction is structural.
The coding agents of 2025 — Claude Code, Codex, Cursor, a dozen others — established a pattern. An LLM with the ability to read files, write code, execute it, and remember what it learned turns out to look less like a developer tool and more like a general-purpose assistant. People started using them for things that had nothing to do with code. Filing taxes. Negotiating purchases. Running entire business workflows.
Everyone who used them seriously ran into the same walls. They live on your laptop. They cost money whether they are working or not. They require manual setup — dependencies, secrets, authentication — every time. And there is a deeper structural issue the industry has been avoiding: these agents are one-to-one. Each one serves a single user on a single task. The multi-tenant economics that made SaaS work do not apply.
A restaurant, the Project Think announcement notes, has a menu and a kitchen optimized to churn out dishes at volume. An agent is more like a personal chef. Different ingredients, different techniques, different tools every time.
Project Think is Cloudflare's answer, and it is a set of primitives rather than a product. The primitives are what they have observed coding agent infrastructure converging on, abstracted one layer up.
The most interesting idea in Project Think is the notion of an execution ladder — a graduated hierarchy of sandboxes, from cheap and restricted to expensive and powerful, that an agent climbs only when a task demands it. Most operations never leave the lowest rung. A stray task might reach the top. The model chooses where to execute based on what the code needs.
The rungs are not equivalent, and that is the point. Most agent work — the million routine operations — happens on rung 01, the isolate. Climbing to rung 04 is a last resort, for legacy Linux dependencies or binary tools that cannot be reimplemented in JavaScript. The ladder encodes a cost gradient, but it also encodes a security gradient: the higher you climb, the more surface area you expose.
The second primitive worth attention is sub-agents. An agent in Project Think can spawn children — isolated child agents with their own SQLite, their own scratch space, communicating back to the parent over typed RPC. This is structurally different from the "fan out a prompt across N parallel calls" pattern. Sub-agents persist. They can be resumed. They can be forked.
The session model is similarly tree-structured. Messages branch. Branches can be compacted independently. The whole history is full-text searchable. If this reads like a version-controlled conversation, that is roughly the intended mental model: the agent's interaction history is something you navigate, not something that scrolls off the top of a buffer.
The most speculative primitive, and the one most likely to seem either inevitable or reckless depending on where you sit, is self-authored extensions. Agents in Project Think can write their own tools at runtime — generate a new capability, register it, use it. The tool does not need to exist ahead of time. The agent discovers a need, writes the code, and extends its own surface.
This is the kind of capability that either makes you excited about emergent agent behavior or deeply uneasy about safety. Cloudflare's answer is that the new tool is just more code in the sandbox; it inherits the same restrictions. That is technically true and philosophically incomplete, but it is a more rigorous starting point than most of what has been shipped under the banner of "agent self-improvement" over the past year.
Some arithmetic from Cloudflare's announcement, because the numbers are the argument.
The bar above is not a visualization trick. It is drawn to scale. The sliver of ochre on the second row is what one thousand tokens looks like when one million one hundred seventy thousand occupies the full width.
The part of the stack that gets loudest when it is missing and quietest when it works.
An agent that runs for a single session, against a clean codebase, with a benchmark-sized set of files, can get away with stuffing everything into context. An agent that runs for weeks — against a production system, across interruptions, across model upgrades — cannot. This is the observation that sent Cloudflare's team to build Agent Memory, a dedicated service accessed via Worker binding or REST, engineered for the workloads they kept seeing on their own platform.
The design concept worth noting is the memory profile: a named container of memories that can be attached to an agent, but does not have to be. A team of engineers can share a memory profile across coding agents, so that something one person's agent learned — a convention, an architectural decision, a piece of tribal knowledge — is available to everyone else's agents the next morning. A code review bot and a coding agent can share memory so that review feedback shapes future code generation.
The bet is not subtle. The knowledge your agents accumulate stops being ephemeral and starts becoming a durable team asset. You are accruing institutional memory as a side effect of using the tool, and that memory is portable across agents and across humans.
The terms you will encounter in Cloudflare's agent documentation, briefly defined.
@cloudflare/agents. Base classes and primitives for stateful agents backed by Durable Objects.@cloudflare/codemode. Generates TypeScript definitions from your tools, ships them to the model, runs what it writes in a Dynamic Worker.@cloudflare/think. A base class and set of primitives for long-running, durable coding agents.A quarterly about machines that think, and the substrates that hold them. Issue № 01 — The Code Mode Issue — was reported and assembled in the spring of 2026 from Cloudflare's published announcements, documentation, and open-source repositories.
All technical claims have been verified against primary sources. Editorial framing is the magazine's own. No tool calls were harmed in the writing of this issue.