Context is the bottleneck. Not the model.

Tom & Jakub · Wed, 1 Apr 2026

Frontier models and bigger context windows make agents more capable — but they don't make them more knowledgeable about how your organisation builds things. The limit is what context exists, whether it's right, and whether it travels.

Every few weeks something impressive drops that moves the frontier of what AI agents can do.

Google Research published TurboQuant recently — a KV cache compression algorithm that squeezes the same context window into a sixth of the memory with no accuracy loss — which I've been trying out locally. MIT CSAIL published Recursive Language Models earlier this year (we wrote about that here), showing agents can navigate 10 million token inputs by treating context as an external environment rather than loading it all at once. Model providers keep shipping larger context windows with each generation.

Each of these is genuinely impressive, especially for those running local models. Each of them is solving a different problem than the one that actually limits engineering agent fleets.

The problem isn't how much context an agent can hold. It's what context is available to hold in the first place — and whether it's the right context, at the right time, from the right place.


What each advance actually solves

Better models reason better, write better code, make better architectural decisions. But they still don't know how your company builds things. A frontier model in early 2026 has read most of the public internet. It has not read your three-year-old ADR about why you chose PostgreSQL over MongoDB, or the incident post-mortem that explains why your payments service has that unusual retry logic. No amount of reasoning capability substitutes for organisational knowledge that was never in the training data.

TurboQuant and context window expansion make the KV cache more efficient — compressing the memory footprint of what's already in the context window by up to 6× without accuracy loss.¹ This is meaningfully useful. Agents can work with longer contexts on the same hardware. Long-context inference becomes economically viable at scales that were previously expensive. But efficient storage of context you already have doesn't fetch context you don't. An agent given 6× more room to think still can't reason about what it hasn't been shown.

RLMs address session memory — the ability of a single agent to navigate a large corpus within a single run. As we've written about separately, this is impressive work on a real problem. But it's scoped to one session, one corpus, one agent. The model that brilliantly navigates your 10 million token codebase today starts from zero tomorrow. And it still requires the right corpus to be identified and loaded in the first place.²

All three advances operate at the fringes of what agents can do in a session. None of them address what happens before the session starts, or what persists after it ends.


The real problem: context is not portable

Software grows like grass. AI agents are fertiliser.

As execution ceases to be the bottleneck — and it already has — taste and decision-making become the bottleneck. The question shifts from can the agent build it? to does the agent know how we build things here? And in the absence of a human who knows, you need a knowledge system that does.

That knowledge is currently scattered. It lives in a governance repo that the agent in the application repo doesn't know about. It lives in an ADR written eighteen months ago by an engineer who has since left. It lives in the implicit standards that every senior developer carries in their head and that no junior developer — and no agent — can find. It lives in the delta between what the documentation says and what the codebase actually does.

This isn't a context window problem. The context window doesn't solve what it can't see. The problem is that organisational knowledge isn't portable — it doesn't travel with agents across repos, across tools, across sessions, and across time.


Proprietary tools are making this harder, not easier

Every tool in the engineering stack is racing to add agents. Linear just announced that issue tracking is dead — agents are the native UX. CI/CD pipelines are becoming agentic. IDEs, CLIs, code review tools — all evolving toward AI-native interfaces at different rates, on different timelines, with different scoped views of what your engineering organisation knows.

Each of these agents has access to a slice of institutional context. The Linear agent knows your tickets. The CI agent knows your pipeline. The IDE agent knows the file you have open. None of them know what the others know. None of them know what your governance repo says. None of them can access what an agent run from last week learned.

As the number of agentic tools compounds, so does the fragmentation of context. You end up with a fleet of capable but organisationally blind agents, each operating from a limited view, each solving problems without access to the decisions and constraints that shaped the system they're working in.

Individual repo-level tools — custom markdown file hierarchies, proprietary SaaS tools for cross-repo search — are attempts to address this. Engineers are good at self-solving problems, and some of the solutions are clever. But as we've written about before, these solutions are often custom and too brittle to scale. A CLAUDE.md file that works for three repos becomes a maintenance burden at thirty. A manual knowledge base that reflects reality on day one drifts from it by month three.


Context portability as a category

What's missing is a layer that makes context portable — not just within a session, but across sessions, agents, tools, and time.

You need a system that ingests continuously without requiring humans to decide what goes in. That builds relationships between things — not just stores documents, but understands that ADR-0005 governs the auth service which is owned by the security team that hasn't had a lead since January. That gets marginally more accurate with every agent run, so the tenth run benefits from what the first nine observed. That surfaces what's relevant before the agent asks — because it knows when a certain service is touched, certain constraints come with it. And that travels with the agent wherever it is — IDE, pipeline, issue tracker — through one MCP endpoint.

None of that exists today. Parts of it exist in isolation. Nobody has built the whole thing.


The brain needs to be stable

The model you're using will change. The cloud provider best practice you're following will shift. The ticket tool your team just migrated to will have a competitor eating its lunch within eighteen months. Agentic tools are appearing and disappearing faster than procurement can evaluate them.

The one thing that shouldn't keep changing is where your organisational knowledge lives.

That layer needs to be open source — so you trust it and can inspect it. Agent-agnostic — so it doesn't care which model wins next quarter. Git-native — so it fits how engineering teams already work and leaves an audit trail. And self-learning — so nobody has to maintain it.

It's not a feature of your model provider. It's not a feature of your IDE. It's infrastructure. And it should belong to you, not to whoever built the tool you're using this year.


What we're building

That's what ctx| is. Open source, agent-agnostic, self-learning. It ingests from your repos, your agent runs, your deployment patterns, your incident history — and builds a knowledge graph that compounds over time. Every agent, every tool, through one MCP.

Better models will keep arriving. Context windows will keep expanding. All of that makes agents more capable at execution.

It doesn't make them know how your organisation builds things.

That gap gets more expensive the more agents you run. It doesn't close on its own.


ctx| is the open-source, agent-agnostic, self-learning context layer for AI engineering agent fleets. If you're deploying agents at scale and want the knowledge graph to live with you, not your model provider, get in touch or join the growing waitlist.

Join the waitlist


References

This article draws on the following recent research:

  • [1] Zandieh, A., Daliri, M., Hadian, M., Mirrokni, V. (2025)TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate — Google Researcharxiv.org/abs/2504.19874
  • [2] Zhang, A. L., Kraska, T., Khattab, O. (2025)Recursive Language Models — MIT CSAILarxiv.org/abs/2512.24601