The model is no longer the unit of infrastructure
Tom · Mon, 27 Apr 2026
Agents are moving past the model endpoint. Computer use, MCP, memory, plugins, sandboxes, and background execution are becoming the real system. That changes where context has to live.
For the last couple of years, most of the agent conversation has been about the model.
Which one writes better code. Which one reasons better. Which one has the larger context window. Which one tops SWE-bench this month. That conversation is still important, but it is no longer enough to explain what is happening.
The model is becoming one component inside a larger system.
The interesting work is moving to the body: the runtime around the model, the tools it can reach, the apps it can operate, the memory it carries between tasks, the sandboxes it can safely execute inside, the events that wake it up, and the permissions that decide what it is allowed to touch.
Once you see that, the current agent race makes more sense.
The body is becoming the product
OpenAI's latest Codex release is a clear example. Codex can now operate a Mac by seeing the screen, clicking, and typing with its own cursor. Multiple agents can work in parallel without taking over the user's focus. It has an in-app browser, image generation, memory, automations, remote devbox support, richer file previews, and more than 90 plugins that combine skills, app integrations, and MCP servers.¹
That is not just a coding assistant getting more features. It is a different abstraction altogether.
The model is still there, obviously, but the useful ‘interface’ is increasingly in the harness around it. Can the agent move through the actual workflow? Can it use the awkward internal tool that never got an API (and probably never will..)? Can it test the frontend, reproduce the bug, paste screenshots into a PR, check the docs, and pick the task back up tomorrow without having its hand held?
That work is not solved by a better completion endpoint alone.
Anthropic's Managed Agents post makes the same shift from a different angle. Their architecture explicitly separates the brain from the hands and the session: Claude and its harness, the sandboxes and tools that do work, and the durable log of what happened. They describe harness assumptions going stale as models improve, including a context-reset mitigation that made sense for Sonnet 4.5 and became dead weight with Opus 4.5.²
That is the point. The model moves. The harness moves. The assumptions move.
Agent infrastructure is becoming the system that lets those parts move independently.
Two different bodies
OpenAI and Anthropic are diverging on their bets.
OpenAI is pushing hard on computer use. If software has a graphical interface, the agent can attempt to use it. No vendor has to publish an API, and no internal team has to build an MCP server. The long tail of (legacy) enterprise software suddenly becomes reachable because the agent can drive the same surface a human drives.
Anthropic is pushing hard on explicit interfaces. MCP, connectors, scoped tools, hosted long-horizon agents, durable sessions, sandboxes, and clean boundaries between the agent loop and the execution environment. That approach is more structured. It asks the ecosystem to expose useful interfaces for agents to call.
Both approaches make sense and both will matter.
Computer use reaches the messy software that already exists. Structured interfaces are cleaner when they exist and safer when the boundary matters. Most serious teams will end up using both, because most real organisations contain both kinds of work.
Some tasks belong in a sandbox with tightly scoped credentials. Some belong in a browser. Some need a CLI. Some need an MCP server. Some need to click through the vendor portal nobody wants to admit still runs half the operation.
The model is not the unit that contains all of this anymore.
The agent system is.
Context cannot live inside one body
This is where the infrastructure boundary starts to matter.
If the model was the whole product, it was natural to think of context as something that lived with the model: prompt, context window, memory feature, session history. But once agents have different bodies, that stops working.
Your team might use Codex for computer-use workflows, Claude Code for tightly scoped development tasks, Cursor in the editor, a CI agent in the pipeline, and a custom internal agent for deployments. Each of those agents can be good at its own job. Each will also build up a partial view of how your organisation works.
If that context stays inside the tool that observed it, the organisation does not learn. The tool learns.
That is the subtle failure mode. Not that any one provider is doing something wrong. Not that memory features are bad. Memory is useful. Session state is useful. Managed execution is useful.
But tool memory is not organisational memory.
The thing your team needs to preserve is not just "what did this agent do in this session?" It is why the auth service is shaped that way. Which migration is half-finished. Which repo owns the policy nobody remembers. Which deployment path is safe on Fridays. Which generated test failures were real and which were noise. Which instructions helped agents succeed and which ones sent them sideways.
That knowledge has to survive the model. It has to survive the harness. It has to survive the desktop app, the editor, the CI runner, and the provider you happen to be paying this quarter.
Otherwise every new body starts from zero.
The durable layer is the organisation
This is the part that feels under-discussed in the rush to compare agent products.
Benchmarks tell you something about the brain. Product demos tell you something about the body. Neither tells you where the organisation's knowledge should live.
For a single person, it may be fine for context to sit inside one agent app. The app remembers your preferences, your recent work, the tools you use, the patterns you repeat. That is useful, and it will get more useful.
For a team, the problem is different.
The context that matters is shared, contested, versioned, and often outside the codebase. It lives in PRs, ADRs, incident writeups, deployment history, Slack threads, Linear tickets, architecture docs, test failures, and agent runs. Some of it is factual. Some of it is intent. Some of it is governance. Some of it is just scar tissue.
That is not a good fit for a model provider-specific memory feature.
It needs to be queryable by whatever agent is doing the work, wherever it is. It needs to be scoped to the repo, service, team, and task. It needs to learn from outcomes, not just from documents someone remembered to write. It needs to be inspectable and correctable by the team. And it needs to travel as the agent layer changes.
The stable thing is not the model.
It is the organisation's accumulated understanding of how the system works.
What this points to
The right abstraction is not "which model has memory?"
It is "where does agent knowledge live?"
The answer should not be inside a single model provider, desktop app, IDE, or hosted runtime. Those systems should be able to use the knowledge layer, contribute back to it, and be replaced without taking the knowledge with them.
That is the layer we think needs to exist.
Open source, because teams need to inspect the layer that represents how they work. Agent-agnostic, because the best agent for a task will keep changing. Git-native, because engineering teams already understand review, history, and ownership there. Self-learning, because no team is going to manually maintain perfect context across every repo, tool, and agent run.
The model is no longer the unit of infrastructure.
The agent system is. And underneath it, the context layer has to belong to the organisation.
ctx| is the open-source, agent-agnostic, self-learning context layer for AI engineering agent fleets. If you're deploying agents at scale and want the knowledge graph to live with you, not your model provider, get in touch or join the growing waitlist.
This article draws on the following recent research:
- [1] OpenAI (2026) — Codex for (almost) everything — OpenAI — openai.com/index/codex-for-almost-everything
- [2] Martin, L., Cemaj, G., Cohen, M. (2026) — Scaling Managed Agents: Decoupling the Brain from the Hands — Anthropic Engineering Blog — anthropic.com/engineering/managed-agents