Systemising agent-agnostic harnesses

Last week, OpenAI published a detailed engineering analysis on an internal product built using purely AI agents. Over a million lines of code, 1,500 pull requests, with zero manually typed code, using Codex agents over five months.

Birgitta Böckeler at Thoughtworks followed with concise commentary that's worth reading alongside it (shared below).

Both articles circle the same growing problem, and we think it's the most important insight in software engineering right now:

From the agent's point of view, anything it can't access in-context while running effectively doesn't exist.

Everything the OpenAI team built - their knowledge base, their architectural linters, their doc-gardening agents, their version-controlled plans - was engineered to solve a single problem: getting the right context to the agent at the right time. They call it context engineering. We've been calling it the same thing since we started building ctx|.

But here's what the OpenAI post doesn't say loudly enough: they did all of this by hand, for one product, in one repo, with a small elite team who were also the engineers of the harness itself.

That's not a criticism, it’s just highlighting the reality that it’s a proof of concept - and beyond many organisations to prototype their way through it.

Now the real question begins..

What happens when you have 50 repos? Or 500?

Birgitta asks whether harnesses will become the new service templates - standardised starting points teams fork, evolve, and contribute back to. It's a great analogy. But service templates have a well-known problem: they fork immediately and drift permanently. This is no different from the eternal documentation drift and decay issue (which is now solved for APIs — appear.sh).

Each team shapes them to their context and they stop being templates at all.

Now add AI agents to that picture. Agents aren't reading one repo. They're touching multiple services, crossing domain boundaries, encountering years of accumulated decisions that live in Confluence pages, Slack threads, Linear tickets, and the heads of people who left two years ago.

The OpenAI team acknowledged it directly: "That Slack discussion that aligned the team on an architectural pattern? If it isn't discoverable to the agent, it's illegible."

At one repo, you can manually curate what goes in. At org scale, with not a handful of agents but 1000s, curation doesn't scale. You need infrastructure.

Systemising a harness

What OpenAI built is impressive engineering for a single context surface. What enterprises need is a context layer - something that sits behind every agent, across every repo, every tool, every domain - and makes institutional knowledge legible to agents automatically.

That's what we're building with ctx|.

◎

Interactive diagram available on desktop.

Ctx| · Knowledge Infrastructure
One repo was just the proof of concept.
OpenAI's harness experiment showed what's possible. Companies need infrastructure to do it at scale — without five months of manual curation.
The Fork & Drift Problem
Harnesses as service templates — what happens next?
t=0t=nowHARNESSv1.0 · orgPaymentsdriftedAuthdriftedPlatformdriftedDatadriftedML/OpsdriftedSecuritydrifted— 6 teams · 6 harnesses · 0 shared context —Without shared infrastructure, harnesses behave like service templates: forked immediately, shaped to local context, and drifted permanently. Six teams, six diverging harnesses, zero shared context. The same pattern that broke documentation now breaks agent context.
Systemising a Harness
What OpenAI proved manually at one repo — and what Ctx| does at org scale
OpenAI · One Repo · Manual
Ctx| · Org Scale · Systematic
Knowledge Base
Manually maintained
One repo. One team. Curated by hand over five months. Every AGENTS.md written, every ADR linked, every constraint documented by the engineers who built the harness itself.
1 repo · manual
→
Self-learning graph
Ingests across your entire estate — repos, ADRs, Confluence, Linear, Jira, Slack, Datadog. Learns from agent interactions. Promotes patterns that work. Flags drift automatically.
org-scale · automatic
Arch. Constraints
Custom linters & structural tests
Bespoke tooling built for one product's constraints. Works for the team that built them. Doesn't transfer to the next repo, the next team, or the next agent.
bespoke · per-repo
→
Governed instruction hierarchy
AGENTS.md, skills, MCPs — versioned in git, reviewed in PRs. Promotion and demotion controls route the right rules to the right agents at the right time, across every domain.
git-native · fleet-wide
Agent Connectivity
Codex connected to its own toolset
Tight coupling between one agent runtime and one bespoke context surface. Powerful within the experiment. Doesn't generalise to an organisation running multiple agent runtimes.
one runtime · closed
→
Single MCP — every agent
Cursor, Claude Code, Copilot, custom workspaces — all connected through one MCP interface. One entry point to the organisation's full knowledge graph. Agent-agnostic by design.
all runtimes · open
"Engineering is rapidly moving from writing code to designing environments, feedback loops, and control systems."
— Ctx|, systemising-an-agent-agnostic-harnesses
CTX| — CTXPIPE.AI

Where OpenAI manually maintained a knowledge base in a single repo, ctx| builds a self-learning knowledge graph that ingests across your entire organisation: repos, monorepos, ADRs, dependencies, and the tools your teams use daily, be it Linear, Confluence, Jira, Slack, Datadog, and so on. It learns from agent interactions, promoting patterns that work and flagging drift.

Where OpenAI built custom linters and structural tests to enforce architectural constraints, ctx| brings governance to the instruction hierarchy itself - AGENTS.md, skills, MCPs - versioned in git, reviewed in PRs, with promotion and demotion controls so the right rules reach the right agents at the right time.

Where OpenAI connected Codex to its own toolset, ctx| connects every agent - Cursor, Claude Code, Copilot, custom workspaces - through a single MCP interface. A single entry point to your organisation’s full graph. Agent-agnostic.

The insight from OpenAI's experiment shows us that engineering is rapidly moving from writing code to designing environments, feedback loops, and control systems. ctx| is infrastructure for exactly that shift - not for one product team running a five-month experiment, but for organisations deploying hundreds or thousands of agents across their entire estate.

What this changes about the harness conversation

Birgitta raises a question we find fascinating: will we converge on fewer tech stacks, fewer topologies, because they're easier to harness? Probably yes, at the codebase level. But at the organisational level, enterprises don't get to start from an empty git repository. They have decades of decisions, a myriad of stacks, and technical debt that would drown any static analysis tool.

As with Appear, our design principal is to meet organisations where they are. The knowledge graph ingests what exists. The instruction hierarchy layers on top. Agents get progressively better context as the graph learns, not because someone manually curated every AGENTS.md file across 300 repos.

This is also why OpenAI's framing of "repo-local, versioned artifacts" is necessary but not sufficient when we zoom out. For a single team, repo-local is correct. For an enterprise, the context surface extends across domains and tools. The graph is the critical connective tissue.

What we're taking from this moment

The OpenAI and Thoughtworks articles together mark something of a line in the sand. The serious practitioners are now converging on context engineering as the infrastructure problem. The "just write better prompts" or “prompt designer” phase is behind us. The "just maintain AGENTS.md" phase is already showing its limits as we now ask more and expect more of our agents. The fault tolerance for agents is reducing as it moves through its own hype cycle. What comes next is the infrastructure phase, and that's the phase we built ctx| for.

We're two founders, Jakub and Tom, building this with a deep experience in dev tooling and working around enterprises and startups where the shift is happening in real time. If your team is hitting those problems - or planning for them - we'd love to talk.

ctx| is the open-source, agent-agnostic, self-learning context layer for AI engineering agent fleets. If you're deploying agents at scale and want the knowledge graph to live with you, not your model provider, request SaaS early access.

Request early access · View GitHub repo

References

Articles referenced in this post:

[1] OpenAI — Harness engineering — openai.com
[2] Birgitta Böckeler — Exploring harness engineering — martinfowler.com