Proactive context and memory for AI agents

A research paper on agent memory landed recently that's worth reading carefully if you're building in this space.

Memory in the Age of AI Agents: A Survey by Hu, Liu, Yue, Zhang et al. - spanning researchers at NUS, Fudan, Peking University, Oxford, and Georgia Tech - gives the field something it has badly needed: a precise, shared vocabulary for what agent memory actually is, what forms it takes, and what it's for.[1]

We want to use that vocabulary to be direct about where the engineering community is, where it isn't, and what we think has to be built next.

The problem with how we've been talking about this

If you've tried to explain agent memory to an engineering team, you've hit this wall: everyone is using different words for different things. Arguments about whether RAG "counts" as memory, or whether a context window is "really" long-term memory, are mostly terminology disputes in technical clothing.

The paper names this directly. The field has suffered from a "proliferation of loosely defined memory terminologies" that has "obscured conceptual clarity." Traditional categories like long-term and short-term memory have "proven insufficient to capture the diversity and dynamics of contemporary agent memory systems."[1]

Their replacement framework - forms, functions, dynamics - is the first taxonomy we've read that maps cleanly to what engineering organisations are actually struggling with. It also maps directly to how we think about what ctxpipe needs to do.

One framing from the survey that we keep coming back to: memory is a first-class primitive for agent intelligence. Without it, agents are brilliant amnesiacs. Capable in the moment, empty at the start of every session.

Forms: what carries memory - and what you can actually control

Three dominant implementations:

Token-level memory is what lives in the context window - prompt history, retrieved documents, injected instructions. It's ephemeral, session-scoped, gone when the session ends. This is the most important form in standard use cases, because it's the primary one developers can directly control. RAG is token-level memory. AGENTS.md files are token-level memory. The impressive Recursive Language Model work out of MIT CSAIL - which we'll write about next - is token-level memory, however cleverly managed.

Parametric memory is baked into model weights during training. It's why GPT-5 knows what PostgreSQL is without being told. You usually don't control this - it comes with the model.

Latent memory encodes information in compressed hidden states like session summary, context compression, etc It comes from the tools like Claude Code and infrastructure developers build around models.

The practical implication: because token-level memory is the layer developers control most directly, it becomes the critical lever. Almost all current engineering effort on agent memory is investment in this layer - smarter prompts, bigger context windows, better retrieval pipelines. These are real improvements to the same form of memory. They're all session-scoped. They all start from zero the next run.

That's not a criticism of token-level work. It's a constraint worth naming clearly, because the industry has been treating it as the whole problem when it's one third of it.

Functions: your organisational knowledge and your know-how

This is where the taxonomy gets genuinely useful.

Factual memory records knowledge facts from observing the environment and agents' interactions with users.[1] This is your organisational knowledge - what the organisation knows. Your ADRs, coding standards, ownership maps, runbooks, architectural decisions. The things that, if an agent knew them, it would behave consistently with how your team has decided to build.

Experiential memory allows the system to learn from experiences and "incrementally enhances the agent's problem-solving capabilities through task execution."[1] This is your organisational know-how learning loop. Not what was written down, but what was discovered through doing. Which approaches keep failing on which services. The constraint every senior engineer carries in their head because it bit them eighteen months ago. The pattern that experienced developers recognise immediately and junior developers - and agents - keep relearning from scratch.

Working memory manages the workspace during a task. What the agent is currently reasoning about. Active, ephemeral, gone when the task ends. Parts of working memory can become persistent - a useful observation captured mid-session can be promoted into factual or experiential memory. And parts of factual and experiential memory flow into working memory at the right moment as context.

The honest state of play: most engineering organisations have invested in working memory infrastructure, and some have begun building static factual memory (governance repos, structured ADRs). Almost nobody has addressed experiential memory - the layer that actually compounds, that actually makes the organisation smarter over time rather than just better-prompted in the moment.

Dynamics: this is where the real work is

The survey's dynamics framework - formation, evolution, retrieval - is where we spend most of our thinking at ctxpipe. It's also where we think the field is most practically underdeveloped.

Memory formation is the process by which informational artifacts, or "signals" - agent outputs, tool results, reasoning traces, environmental feedback - are selectively transformed into memory. Not everything gets stored. A well-designed formation operator decides what has future utility and structures it appropriately.

The key word is automatically. Memory formation should not require human curation. The survey identifies "automation-oriented memory design" as a critical frontier precisely because current systems require engineers to manually decide what goes into the knowledge base, how it's structured, and when it's updated.[1] That's a tax that scales badly and fails silently - the knowledge base drifts from reality the moment the team stops maintaining it.

Memory evolution is how formed memories change: consolidation (merging related fragments), updating (revising in light of new information), and forgetting (discarding what's superseded or no longer relevant).

This is underappreciated. A knowledge base that only grows becomes noisy and contradictory. If an ADR is superseded, agents querying the old version will behave incorrectly. If an ownership map has a departed engineer as a service lead, agents routing questions will hit a dead end. Memory that doesn't evolve is memory that lies. The evolution operators - update, merge, forget, resolve conflicts - are as important as formation, and almost no current system handles them explicitly.

Memory retrieval is how the right context reaches the right agent at the right time. Not just "find the nearest embedding" - but surface what's genuinely relevant to what the agent is trying to do, including context the agent didn't know to ask for.

Most current systems handle retrieval reasonably well at the working memory layer. Formation and evolution - the harder, more operationally demanding problems - receive almost no engineering attention.

Where ctxpipe operates in this framework

We're not building a better context window or a smarter RAG pipeline. Those are working memory problems with increasingly good solutions.

ctxpipe operates primarily in the dynamics layer - the part the survey identifies as most underdeveloped and most consequential.

Formation: We ingest signal from your data sources continuously - git repos, ADRs, runbooks, deployment patterns, observability data, agent run traces - and form agent-friendly memory from them automatically. No human curation required. The knowledge graph grows from what actually happens in your engineering organisation, not from what someone remembered to document.

Evolution: We keep the graph current by updating memories when they change, merging related fragments, forgetting what's superseded, and resolving conflicts - including the conflicts between what your governance docs say and what your codebase actually does. Memory that reflects organisational reality rather than aspirational documentation.

Retrieval: The Intelligence MCP surfaces the right context at the right time - not just in response to agent queries, but proactively, when the knowledge graph identifies something the agent should know that it hasn't asked about. The survey describes "proactive memory management" as a critical direction for agent memory research.[1] We know that is achievable now, and it's what separates a knowledge base from an intelligent system.

The knowledge graph is the mechanism that makes factual and experiential memory persistent and queryable. The ingestion pipeline handles formation. The Intelligence MCP handles retrieval. But the evolution operators - the continuous update, merge, forget, and conflict resolution - are what keep the whole system alive rather than slowly stale.

That's the distinction that matters. Memory that is actively managed by the system itself, not crafted and maintained by humans. Memory that is alive.

The question worth sitting with

The survey closes with a claim that should land differently once you've worked through the taxonomy: memory is "a first-class primitive in the design of future agentic intelligence."[1]

Most engineering organisations are treating memory as an afterthought - something to bolt on after agents are running, usually as a prompt engineering exercise or a documentation sprint that someone does once and nobody maintains.

The taxonomy suggests a different frame entirely. Memory is infrastructure. It should be designed before the agents, not retrofitted after the problems appear. It should compound from the first run. And it should be managed automatically by the system, not manually by engineers who have other things to do.

The research community has named what's needed with more precision than we've had before. The engineering community hasn't built it yet.

That's the gap we're working in.

ctxpipe is the open-source, agent-agnostic, self-learning context layer for AI engineering agent fleets. If you're deploying agents at scale and want the knowledge graph to live with you, not your model provider, request SaaS early access.

Request early access · View GitHub repo

References

This article draws on the following recent research:

[1] Hu, Y., Liu, S., Yue, Y., Zhang, G. et al. (2025) — Memory in the Age of AI Agents: A Survey — Forms, Functions and Dynamics — NUS, Fudan University, Peking University, Oxford, Georgia Tech, et al. — arxiv.org/abs/2512.13564