Scaling AI on the Map
Two things let a fleet of agents work on one codebase without tearing it apart. The first is a feedback loop tight enough that an agent converges instead of wandering. The second is a substrate durable enough that the map survives the agent, the model upgrade, and the eighteenth month. The map gives you both, and they carry equal weight.
Half one — the loop that makes agents finish
Hand an agent a vague task against a prose spec and you get an open-ended judgment problem. The agent writes something, then has to decide, by re-reading paragraphs, whether what it wrote matches what was meant. That decision is slow, fuzzy, and self-graded — which is exactly how agents stall, loop, or declare victory over nothing.
The typed map turns that judgment call into a mechanical convergence:
agent generates → tsc + `requirements compliance --strict`
→ structured Feature × AC coverage report
→ agent reads the exact gaps
→ fixes them → PASSagent generates → tsc + `requirements compliance --strict`
→ structured Feature × AC coverage report
→ agent reads the exact gaps
→ fixes them → PASSThe compiler answers in sub-seconds, and compliance --strict answers in the shape of a list: these ACs are uncovered, this feature satisfies no requirement, this approved requirement has no implementation. The agent is not asked "do you think this is right?" It is told, precisely, what is missing. A bounded, addressable gap is something an agent can close. An unbounded "does this feel done?" is something it can only guess at. Bounded loops are the difference between an agent that finishes and an agent that wanders — and that bound is exactly what you need to point many agents at one repo and trust the result. (For the longer treatment of the agent experience, see The AI Agent Experience.)
The map also does the other half of the agent's job: it constrains the input. Faced with a typed requirement graph, the model cannot free-wheel — to claim a task done it must land the work as Requirement → Feature → AC, and the compliance gate refuses "done" while the chain to proof has a hole. The agent is forced to auto-codify your intent into the legend before it is allowed to stop. The type system is the leash; the gate is the finish line.
Half two — the substrate that survives month six
A tight loop is worthless if the map evaporates when the conversation ends. This is where the boring properties — the ones that sound like packaging — turn out to be the whole point.
- It is
.mdand typed source. The contract is the artifact. There is no export step, no second system to keep in sync, no rich-text page that drifts from the code. What the human reads and what the machine checks are the same files. - It is git-able. Every requirement is a diff. A change of intent shows up in a pull request, reviewable line by line — by a human, or by another agent reviewing the first. The why gets the same scrutiny as the code, because it lives in the same place.
- It scales via git. Branch it, merge it, fan ten agents across ten features in parallel. There is no central tracker to serialise through, no API quota on your roadmap. Git is the coordination layer the map already lives in.
- It is forkable.
git clone, thennpx requirements compliance, reproduces the entireREQ → … → EVIDENCEchain on any machine, with zero out-of-band setup. The governance ships with the repo, not in a SaaS you rent and lose access to. - It is supply-chain controllable. When AI generates specifications, not just code, the specs themselves become a supply chain — and an ungoverned one is how you ship promises no test backs. Here every AI-authored requirement is gated by
compliance --strict, its drift is hash-detected, and its provenance is a commit with an author and a date. You decide what your agents are allowed to promise on your behalf.
And the piece that ties durability directly to AI: you can ship .md playbooks that tell the agent how to consume and produce requirements, ACs, and tests — and version them next to the code. That is the part nobody plans for. The model gets upgraded. The teammate leaves. The you-of-eighteen-months-ago is a stranger. The framing does not live in a chat history that evaporates; it lives in files, in git, checked by a gate. Past month six, the map is still the map — and the next agent, on the next model, reads the same legend the last one did.
The whole argument, in one line
We began with a territory that had no map, where AI multiplied output and intent rotted in silence. We made the map a domain — Domain-Driven Design applied to the why — and refused to stop modelling until it touched the proof. We gave it a typed legend, REQ ↔ FEAT ↔ AC ↔ TEST ↔ IMPL ↔ EVIDENCE, loose where it must flex and rigid where it must hold, governed by a Style that ships with a sensible default. And because that map is typed, git-tracked, forkable, and compiler-gated, it is the one thing that lets you hand the keys to an AI fleet without losing the plot.
Stop tracking what you promised — type it. Then let the AI read the map instead of guessing the territory.