Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Chapter 21 — The Human Side, Revisited

The naming session is free the first time. It gets expensive the hundredth time — unless the tool makes the hundredth cheaper than the first.

In March 2026 I wrote The Human Side of Requirements as Code. That post argued a simple thing: the value of the type system is not the compiler, it is the 15-minute conversation the compiler forces a developer and a product owner to have before a single test gets written. The naming session. Two people agreeing on what a word means, and a machine making sure neither of them forgets.

I stand by that argument. It was the hinge of the earlier series and it is the hinge of this one too.

But the post was written when the code base had roughly 20 features and no first-class Requirement type. Since then, under the package @frenchexdev/requirements, the corpus has grown. There are now 54 tests, 22 Requirements, 25 Features, a traceability graph, a compliance scanner, a wizard, a watch mode. The shape of the DSL has changed. The shape of the conversation has not.

That invariance is the subject of this chapter.

Between the earlier post in March and this chapter in April, the number of Requirements in active projects using this pattern grew from "a handful" to something that genuinely stresses human memory. That stress is where the work of this chapter lives. It is one thing to claim that a compiler-enforced naming session scales; it is another thing to stand inside a 200-Requirement corpus and feel the levers the tool has actually grown to keep the claim honest. The earlier post was written from the first position. This one is written from the second.

A modest warning to the reader: some of what follows will sound like an audit of specific decisions — rejected decorators, tuned weights, measured budgets. That register is deliberate. The claim that a DSL stays humane at scale is not a claim that can be made in the abstract; it has to be cashed out in the particular small choices that, each individually, earned their place. The chapter's specificity is the argument.

As the number of Requirements crosses the threshold where no one can hold them all in their head, the human side is not protected by good intentions. It is protected — or abandoned — by specific, boring, load-bearing design choices inside the tooling. Three of those choices:

  1. A friction budget at the API surface. Every new decorator, every new field, every new keyword must pay for its own weight in use. The ones that do not, do not ship.
  2. AC suggestion heuristics inside the wizard. The requirements feature new flow does not hand the user a blank page. It hands them a page with the most likely AC names already typed out, each one spelled the way the rest of the corpus spells it.
  3. Bootstrap commands at the CLI. REQ-BOOTSTRAP-ZERO-FRICTION fixes a hard budget: a contributor with a fresh clone must author a valid Feature end-to-end in under two minutes.

The earlier post described the 15-minute conversation. This chapter describes the instruments that keep that conversation cheap across the two-hundredth iteration.

Each of these three levers has a different character. The friction budget is a design-time discipline: it operates in the moments before any code is written, when the team is deciding what the DSL will and will not carry. The AC suggester is an inference lever: it operates at the moment of authorship, pulling the next candidate out of the existing corpus rather than the author's memory. The bootstrap command is an orchestration lever: it operates across the full authoring lifecycle, sequencing prompts so that the user walks a funnel rather than filling a form. Three distinct modes of intervention, each working at a different timescale, each contributing to the same invariant.

The reason the three must work together — rather than any one of them being enough on its own — is that cognitive load is itself multi-layered. A DSL with a strict friction budget but no wizard still forces every user to type the whole thing by hand. A DSL with a good wizard but a sprawling API surface forces the wizard to expose every quirk of that surface, turning the wizard into a form with forty fields. A DSL with a zero-friction bootstrap command but no AC suggester hands the user a well-sequenced path and then asks them, at the critical step, to invent vocabulary out of nothing. Each lever covers a distinct failure mode; none of them covers all three. The interaction is the point.

What the earlier post could not say yet

Reading requirements-human-side.md through the lens of today's DSL, two things are visible that were not visible at the time.

First, the earlier post assumed a stable scale. It described what happens when a PM and a developer name the ACs for one feature. It did not describe what happens when the same PM and the same developer face their 40th feature in the same code base, with 22 Requirements behind them and five styles to choose from. It did not need to. At 20 features the corpus still fits in one afternoon of attention.

Second, the earlier post described the compiler as the enforcement mechanism and stopped there. The compiler catches a typo in @Implements<NavigationFeature>('backButtonRestores'). Good. But by the time the compiler sees the typo, the human has already typed it. There is a step earlier than the compiler — the step where the AC name is chosen in the first place — and the earlier post did not speak about that step because the DSL did not help with it yet.

This chapter is about the steps before the compiler. The steps inside the human's head, or inside the prompt library, or inside a small heuristic over the existing corpus.

If the compiler is a backstop, the instruments in this chapter are the front-of-house. They are the things that keep the naming session from collapsing under its own weight as the requirements count grows.

A third observation, less methodologically important but worth flagging for anyone re-reading the earlier post. The earlier post used the word Requirement rhetorically throughout — it is in the title, the description, the examples — but the type it modelled was only ever Feature. The Chapter 00 of this dog-food series did the close reading on that gap; it named, precisely, the stratum that was missing. This current chapter takes the complement of that reading. Chapter 00 asked: what was missing from the model? This chapter asks: now that the model is there, what keeps it from crushing the humans who use it?

The two chapters are not symmetric in weight. Chapter 00 is structural: it identifies what the type system had to add. Chapter 21 is operational: it identifies what the wrapping — the wizard, the CLI, the friction budget — had to add around the type system so that the type system's new richness did not become the new burden. Both chapters agree on the silent premise: a DSL that scales is not a DSL that grew more features. It is a DSL that paid, at every axis of growth, the specific small prices that keep its use-cost flat.

A fourth observation, less architectural than procedural. The decision to re-read the earlier post through this chapter's lens is itself an instance of the discipline the package is trying to instil. The older post is not edited away; it stays on the site, unchanged, at its original publication date. This chapter does not correct the earlier post; it extends it. The earlier post's argument is still valid at its own scale; this chapter's argument becomes necessary at a larger one. Both are true. Both should be readable by someone walking in from either end.

That is how a long-lived corpus of writing on a tool ought to work, and it is why this site uses a section-per-phase structure — typed-specs/, requirements-dogfood/, and so on — rather than overwriting earlier work with later revisions. The DSL's growth is traceable in the writing as well as in the code. A reader can follow the sequence: Feature as abstract class (typed-specs), requirement as first-class type (dog-food chapter 00 through 13), DX as a typed commitment (chapters 13b through 20), the invariants that keep the DX humane at scale (this chapter). Each layer sits on the one below it. None of them collapses the others. That is the same friction-budget rule applied to the writing: edit by addition where the earlier layer is still true, not by overwriting.

The friction budget

There is a phrase that keeps returning in my notes for this package. Friction is the tax every future user pays for every decision we make today. That phrase is not original — any API designer who has outlived a bad options object has internalised it — but it becomes sharper when the API is a DSL meant to be authored by humans sitting next to other humans, mid-conversation.

Friction, in the DSL sense, is not the same as difficulty. A DSL can be difficult — sometimes it should be difficult, when the concept it models is genuinely subtle — and still have low friction. Friction is what the DSL asks of the user over and above the thing the user is trying to do. A Requirement has inherent cost: the naming session, the precision, the risk statement, the verification method. The DSL's friction is on top of that cost. It is the syntactic overhead, the boilerplate, the decisions that have nothing to do with the Requirement and everything to do with the tool.

A DSL with low friction makes the inherent cost visible; the user feels only the thinking. A DSL with high friction buries the thinking under the paperwork; the user stops caring which sentence to write because they are busy remembering which decorator goes where. The DSL becomes the antagonist. The naming session, which the earlier post named as the whole point, becomes impossible to hold on to, because the tool is in the way.

The friction budget for @frenchexdev/requirements is a small rule that the whole team, such as it is, has signed up to: every decorator, every abstract-class field, every CLI flag must pay for its own weight in use. If we cannot produce a reading of the DSL in which the new surface area earns its cost three times over within the existing corpus, it does not ship. It goes back to a note. Sometimes it returns in a different shape. Sometimes it never returns.

The budget is enforced by rejection, not by rationing. There is no cap on how many decorators the DSL may carry. There is a cap on what we are willing to say about each one. A decorator that does not survive the reading is not added, full stop.

A short and partial list of what was rejected. Each line is a feature the DSL does not have, and the reason it does not have it.

@Priority was rejected — priority is a field

Early sketches had a decorator @Priority(Priority.Critical) that you would place above a Feature or Requirement class. The intuition was pleasant: priority is important, decorators are visible, let priority be visible too. The rejection came from a simple question. What does the decorator let you say that the field does not?

The field on the abstract base class:

export abstract class Feature extends /* ... */ {
  readonly id: string;
  readonly title: string;
  readonly priority: Priority;
  // ...
}

This field already enforces: every Feature has a priority; priority is one of the values in the Priority enum; priority is readable at runtime; priority is checked by every tool that traverses the corpus. A decorator would have to either duplicate that enforcement or weaken it. Both were worse than the status quo.

The real test was imagined usage. Would a naming session go any better if the PM saw @Priority(Priority.Critical) above the class rather than readonly priority = Priority.Critical inside it? The honest answer was no. The decorator would be one more thing to explain. The field is one thing already explained.

The decorator was cut. Nothing was lost.

The generalisable rule the @Priority case produced is worth stating: if a proposed decorator would express a single fact about the class it decorates, that fact should be a field. Decorators are for relations between classes; fields are for facts within a class. This rule is not enforced by the type system — nothing in TypeScript prevents a decorator from being written for any purpose — but it is enforced at the review boundary, and the first question on every decorator proposal is now the one @Priority failed.

@Category was rejected — styles handle this

A second temptation was @Category('security') or @Category('ux'), a way of tagging Requirements by rough business domain. It had a clean pitch: at 100 Requirements, you want to filter by category in the compliance report. The rejection came from noticing that style already does this work.

The package ships five styles — default, industrial, lean, agile, kanban — and every Requirement declares exactly one. A style is not a tag. It is a whole rhetorical register, with its own templates, its own validators, its own vocabulary. A Requirement authored in the industrial style looks different in the tree, in the compliance report, in the wizard prompts, in the markdown export, from the same Requirement authored in the agile style.

Styles answered the category question more precisely than a category decorator would have. Security Requirements in the earlier sketch would have been category-tagged; in the DSL as shipped, they are industrial-style Requirements, because that is the register security work lives in. No decorator needed.

A secondary consideration weighed in. Categories invite proliferation. One team adds security, another adds compliance, a third adds billing, and now the filter dropdown is a forest no one can prune. Styles are bounded — five of them, all in the package, versioned together — so they do not proliferate. The category decorator would have opened a flood gate the DSL has no interest in opening.

There is a methodological observation hidden in this rejection that is worth surfacing. The @Category decorator was rejected not because it was bad, but because it was redundant with something richer. That is a specific failure mode of API surface growth: a new feature that duplicates, in thinner form, a capability the DSL already carries in thicker form. The thin duplicate displaces attention from the thick original, and within a few sprints, new users default to the thin surface because it is easier to learn, while the thick surface — where the real expressive power lives — quietly decays. A friction budget that only watches total surface area misses this; a friction budget that watches coverage overlap catches it.

Cut. Nothing lost.

@Owner('stephane') was rejected — let git tell you

There was a brief proposal for an @Owner decorator that attached a GitHub handle to each Requirement. At first glance this is innocent. Ownership is a real social fact; surfacing it in the source ought to reduce friction.

The rejection came from asking what question the decorator would answer that git blame does not already answer. The honest list was short.

  • Who last edited the Requirement? git blame.
  • Who originally authored the Requirement? git log with --follow.
  • Who is responsible for the Requirement now? A social fact that changes on a weekly basis and would rot in the source within a quarter.

The first two are solved outside the DSL. The third is not a DSL-level question. An @Owner decorator would either duplicate what git tells you, or become a lie within three sprints. Neither outcome was worth the extra line at the top of every Requirement.

Cut.

@Tags('foo', 'bar') was rejected — strings are not a type

The tags proposal was the closest call of the four. Tags are flexible. Tags compose. Tags are easy to understand. Tags would let a new contributor search the corpus without needing to learn the style taxonomy.

The rejection came from the fundamental commitment the DSL makes in every other place. Things are types, not magic strings. Priorities are an enum. Test levels are an enum. Requirement IDs are branded strings validated by a smart constructor. Feature IDs are branded strings validated by a smart constructor. Tags, by contrast, are exactly the kind of unvalidated, free-form string soup the rest of the DSL refuses.

Adding a @Tags decorator would have been a deliberate departure from the type-everything rule, and the argument for that departure did not survive the first round of review. If we want to group Requirements, the group is either a style, a refinement parent, or a Feature. Three real relations, each typed, each queryable, each checkable by the compiler. That is enough.

The reluctance was real. Tags are, in general, a good answer to a broad class of problems in information systems. They are cheap to add, easy to combine, familiar to every developer. The rejection was not of tags in general; it was of tags in this particular DSL, where the type-first commitment had already paid for itself across the rest of the surface. Breaking the commitment for one convenience feature would have cost more than the convenience was worth. The precedent matters more than the feature.

Cut. Some reluctance. No regret.

@Deprecated was rejected — status is a field, too

A fifth proposal, less often remembered, was @Deprecated as a decorator you would add above a Requirement or Feature class to signal that it should no longer be authored against. This one nearly got through, because deprecation is a real lifecycle event and most type systems carry it as a decorator of one shape or another.

The rejection came from the same place as @Priority. The DSL already carries an explicit status field on every Requirement, drawn from an enum that includes 'Draft', 'Approved', 'Deprecated', 'Withdrawn', and two more. A decorator would duplicate one slice of an enum that already enforces the lifecycle in full. Worse, a decorator could drift out of sync with the field — two places to change, two places to get wrong — while the field alone cannot drift from itself.

Cut. The lifecycle enum does the work. The decorator was an instinct, not a necessity.

What survived the budget

The surface that did survive is small on purpose. The decorators that ship are these, and only these:

  • @Refines(...) — builds the Requirement refinement graph.
  • @Satisfies(...) — binds a Feature to the Requirements it satisfies.
  • @Implements<F>('acName') — binds a test to a specific AC on a Feature.
  • @FeatureTest(Feature) — declares a test file's Feature scope.
  • @Verifies(...) — declares a test's Requirement scope (direct verification, not via Feature).

Five decorators. Each one answers a question no field could answer: what depends on what. Each one produces a relation in the traceability graph. Each one can be pointed at and defended in one sentence. That sentence is the cost the decorator paid for being in the DSL at all.

The test the surviving five passed, and the rejected four did not, is worth stating plainly so it survives future pressure. A decorator earns its place if, and only if, it expresses a relation between two classes that could not be expressed as a field on one of them. @Refines relates two Requirements. @Satisfies relates a Feature to Requirements. @Implements relates a test method to an AC name on a Feature. @FeatureTest relates a test class to a Feature class. @Verifies relates a test to Requirements. All five are inter-class relations. None of them could be a single field on a single class without duplicating information elsewhere in the graph.

The four rejected decorators all failed the same half of that test. Priority is intra-class: it belongs to one Requirement and has no other endpoint. Category is intra-class. Owner is intra-class. Tags are intra-class. Deprecation status is intra-class. There is no other class on the far side of the relation, so there is no relation, so there is no need for a decorator. The field carries the weight, the compiler checks the field, the tool queries the field. Done.

Every time the team has proposed a sixth decorator, the question has been: what relation does it express, and is that relation orthogonal to the five above? So far the answer has been no, and the surface has stayed at five. That is the budget, enforced.

Why this particular budget works

A friction budget only works if it says the same thing twice — once to the person proposing the new decorator, and once to the person reading the DSL six months later. A budget that is enforced at design time but invisible in the finished product is just a private discipline; it does not compound.

This particular budget works because it is structural. The rule "decorators express inter-class relations, fields express intra-class facts" is readable from the finished DSL without anyone explaining it. A new contributor who opens a Requirement file for the first time sees fields and decorators together and can, without being told, reconstruct the rule that separates them: the decorators all point somewhere else in the corpus, the fields all stay inside the class. That pattern is then the contributor's own internalised budget the first time they propose a sixth decorator. The budget reproduces itself, silently, through readability alone.

This is the kind of structural discipline the earlier requirements-human-side post could gesture at but not yet demonstrate — because at 20 features the discipline is invisible. At 200, it is the difference between a DSL someone can still learn in an afternoon and one that has calcified into tribal knowledge.

AC suggestion heuristics

Friction budget is a design-time lever. It keeps the tool small. But a small tool still has to be used, and use means the human has to type. The second lever addresses that step: the step where the AC name is first chosen, before the compiler ever gets a chance to check it.

This is the slice of the DSL where the requirements-human-side post ended. The post described the naming session and stopped. It did not describe what happens inside the IDE, or inside the wizard, when a tired developer — at the end of a sprint, with four other tabs open, with the PM on a call — sits down to type the AC names for the feature that just emerged from the 15-minute conversation. At 200 Requirements, that moment is where drift starts.

The wizard addresses it with three heuristics, each worth stating precisely.

Heuristic 1: pre-filled EARS slots

The EARS grammar — Easy Approach to Requirements Syntax, from Mavin and colleagues — defines five patterns for a Requirement statement. Ubiquitous, event-driven, state-driven, unwanted behaviour, optional feature. Every Requirement in the default and agile styles of the DSL carries a statement field with a pattern tag.

When the wizard prompts for a Requirement statement, it does not offer a free-form text box. It offers five named slots, one per pattern, with the non-optional parts already keyed in. The user selects a pattern and fills in the blanks, or they accept the pattern's default skeleton and refine it later.

A ubiquitous pattern looks like this in the prompt:

Pattern: ubiquitous
  response: [the system shall] ___

The user sees the sentence frame first. They finish it second. The full statement is built from parts the wizard already knows are legal. The prompt does not let them skip the pattern selection; pattern is not optional.

This is a small thing. It is also the difference between a wizard that asks "please describe this Requirement" (blank page, no shape, no constraint) and a wizard that asks "please finish this sentence" (known shape, known grammar, constrained vocabulary). The friction is moved from the author's shoulders onto the pattern library.

At 20 Requirements, the pattern library is cosmetic. At 200, it is load-bearing: it is what prevents the same Requirement being phrased two ways by two authors in two sprints.

The five EARS patterns, as the DSL carries them, are worth spelling out because the shape of each one is the shape of the prompt the wizard serves. A ubiquitous Requirement is always-on: the system shall Y. An event-driven Requirement is triggered: when X, the system shall Y. A state-driven Requirement is conditional on a continuing state: while X, the system shall Y. An unwanted-behaviour Requirement is negative and triggered: if X, then the system shall Y where Y is typically a mitigation, an error state, or a refusal. An optional-feature Requirement is conditional on configuration: where X is available, the system shall Y.

Each of those templates is carried as a typed object in the Requirement's statement field. The wizard serves the five templates as a typed choice list; the user picks one; the wizard then prompts for the blanks, one at a time, validating at each step. A user who picks state-driven is asked for the state X and then for the response Y. A user who picks event-driven is asked for the trigger X and then for the response Y. The prompts never ask for a free-form sentence. They ask for pattern slots.

The consequence at 200 Requirements is that every statement in the corpus is grammatically parse-able by the pattern it declares. A compliance tool that wants to extract the trigger from every event-driven Requirement does not need a natural-language parser; it reads the when slot. A tool that wants to flag every unwanted-behaviour Requirement that omits its mitigation reads the then slot and checks for emptiness. The grammar becomes a machine-readable substrate underneath the prose, without anyone having written the substrate by hand. The pattern library paid for itself in a single tool that used it, and has paid for itself again in every tool since.

There is a further benefit that the earlier post could not anticipate, because it sits at a register the earlier DSL could not carry. The five patterns are stylistically stable. A PM and a developer arguing about whether a Requirement is event-driven or state-driven are not arguing about words — they are arguing about the shape of the world the Requirement describes. Is the behaviour triggered by an event, or is it conditional on an ongoing state? The pattern choice forces the team to answer that question explicitly, in a sentence structure, rather than implicitly, in an ambiguous prose paragraph that both parties will later interpret differently. The pattern is not a writing aid. It is a modelling aid, with writing as a side-effect.

Heuristic 2: AC name suggestion from the corpus

The second heuristic lives in packages/requirements/src/cli/ac-suggester.ts and it is the most concretely DX-shaped part of the DSL. Given a Feature title — say, "adds items to the cart" — the suggester returns three to five camelCase AC name candidates mined deterministically from the rest of the corpus.

The algorithm is small enough to state end-to-end. It is intentional that it uses no LLM.

  1. Parse every .ts file in the features directory with ts-morph.
  2. Collect the method name of every abstract method whose return type is ACResult on a class extending Feature.
  3. For each collected name, tokenise camelCase into [verb, ...nouns]. tocClickLoadsPage becomes ['toc', 'click', 'loads', 'page'], first token as verb, the rest as nouns.
  4. Build a verb-frequency map over the whole corpus.
  5. Tokenise the Feature title. Drop stop-words (the, and, of, ...). Lowercase. That gives the title's noun set.
  6. Score every candidate AC name: score = 2 × verbFrequency(verb) + |nouns ∩ titleNouns|.
  7. Sort by score descending, break ties lexicographically.
  8. Return the top three to five names that still pass the AC name pattern.

The scoring is not magic. It rewards two things: verbs that are already common in the corpus (so you spell opens the way everyone else spells opens), and name tokens that overlap with the title's own words (so addsItemToCart scores well for a Feature titled "adds items to the cart").

What this buys the human: the keyof T typo problem is caught before it is typed. The earlier post described the compiler error on 'backButtonWorks' versus 'backButtonRestores'; the suggester quietly prevents that pair from being proposed in the first place, because Works is not a frequent verb in the corpus and restores is.

The suggester is pure. It is deterministic. It depends on a FileSystem port and nothing else. It is covered by unit tests that freeze the corpus and assert the top-5 output. It runs in single-digit milliseconds on the full package. It is the smallest amount of intelligence that repays its cost every time the wizard runs.

Crucially, the suggester does not force its candidates. The wizard lets the user override. But overrides are, in practice, rare — because the candidates are, in practice, the right names. When they are not the right names, they are a better starting point than the blank field.

A worked example makes the effect concrete. Suppose the corpus already contains, among many others, these AC names: tocClickLoadsPage, tocClickScrollsToAnchor, backButtonRestoresState, backButtonPreservesScroll, deepLinkLoadsPage, deepLinkRestoresState, bookmarkableUrlReflectsState, escapeClosesModal, escapeReturnsFocus, rightClickOpensPalette, swatchChangesAccent, accentPersistsAcrossReload. Twelve names, six distinct verbs: click, button, link, url, closes, changes, persists, returns, and a few singletons. The verb-frequency map reads something like: click appears twice, button appears twice, link appears twice, closes once, changes once, persists once, returns once, the rest once.

Now a new Feature arrives with the title "close button dismisses the modal". The title's noun set after stop-word drop is {close, button, dismisses, modal}. The suggester scores every corpus name. escapeClosesModal gets 2 × freq('escape') + |{closes, modal} ∩ {close, button, dismisses, modal}| — the verb escape appears once, so 2 × 1 = 2, and the noun overlap is {modal} with one element, giving a final score of 3. backButtonPreservesScroll gets 2 × freq('back') + |{button, preserves, scroll} ∩ {close, button, dismisses, modal}| = 2 + 1 = 3. escapeReturnsFocus gets 2 + 0 = 2. tocClickLoadsPage gets 4 + 0 = 4, because click is frequent.

The top candidates at this point are tocClickLoadsPage (4), escapeClosesModal (3), backButtonPreservesScroll (3), then ties broken lexicographically. None of those is the AC the user actually wants — but the presence of closes in escapeClosesModal primes the user to spell their own AC as closesModal rather than closeModal or dismissesModal, which are the two other natural but inconsistent choices. The corpus's vote, mediated by the ranking, pulls the new name toward the existing vocabulary.

That is the whole point of the heuristic. Not to be right. To be consistent. Consistency compounds.

The suggester's blind spot is worth naming, too, because the earlier post was explicit that the compiler cannot catch misunderstanding and this chapter ought to be equally explicit about its heuristic's limits. The suggester has no semantic model. It does not know what the Feature does, only what words appear in its title and what words appear in existing AC names. If the new Feature's behaviour is genuinely unlike anything the corpus has seen, the suggester will pick its five most confidently wrong answers and offer them. The user overrides. The override costs nothing because the blank field was the alternative anyway.

What the suggester buys, then, is not correctness. It is a default that is better than the empty string roughly ninety percent of the time, at a cost of zero when it is wrong. That ratio is what a heuristic needs to earn its place in the DSL under the same friction-budget rule applied to decorators. If the suggester ever drops below that ratio — if a redesign of the scoring function becomes necessary — it will be rewritten under the same constraint: pure, deterministic, FS-free in its ranking core, single-digit milliseconds, testable with a frozen corpus.

A small design choice inside the suggester that is worth unpacking: the scoring formula weighs verb frequency at twice the weight of noun overlap. At first glance this looks like a tunable parameter that should be exposed, or at least benchmarked. It is not exposed. The 2x weighting is a fixed constant, documented, and committed alongside the test corpus that freezes the expected top-5 output.

The reason for the fixed weighting is the same reason the rest of the DSL refuses configuration knobs. Every tunable parameter is an invitation for the DSL's behaviour to diverge between installations. A team that tunes the verb weight from 2x to 3x will get different suggestions from a team that leaves it at the default, and the corpus they each accumulate will drift in subtly different directions over the next dozen sprints. The 2x is not optimal for every corpus; it is fixed across every corpus, so that the DSL behaves the same way regardless of who is running it. Uniform behaviour is more valuable than locally-optimal behaviour, at the scale the package aspires to.

This is the same line of reasoning that rejected @Tags earlier. Both would have opened a configuration surface the DSL cannot usefully defend across time. Both were cut.

There is also a subtle incentive alignment at play in the 2x weighting. By prioritising verb frequency over noun overlap, the suggester pulls new AC names toward vocabulary consistency rather than toward semantic match. That is the right direction for a DSL whose job is to keep the vocabulary coherent. A suggester that weighted noun overlap more heavily would produce AC names that matched the Feature's title more closely, at the cost of drifting away from the corpus's verb conventions. The choice was between two axes of "correctness" — title-match and corpus-coherence — and the DSL chose corpus-coherence as the more valuable long-term invariant. The author can always override; the corpus cannot easily self-correct once it has drifted.

Heuristic 3: scaffolded test bodies

The third heuristic is the most mechanical of the three. Once the wizard has an accepted FeatureSpec, the scaffolder produces not only the Feature class file but also a test skeleton, pre-populated with one @Implements per AC and a pending-style body.

A scaffolded test body looks like this:

@FeatureTest(AddsItemsToCartFeature)
export class AddsItemsToCartTests {
  @Implements<AddsItemsToCartFeature>('tocClickAddsRow')
  clickAddsRow(): ACResult {
    // TODO: arrange, act, assert
    return { status: 'NotImplemented' };
  }

  // ... one method per AC ...
}

The human never types the AC name a second time. The wizard wrote it once inside the Feature class and once inside the test scaffold, both from the same AcSpec, so by construction they agree. keyof T has nothing to catch because there is no typo possible.

This heuristic is not clever. It is barely a heuristic. It is a code generator. But it closes the one remaining path by which the two halves of the naming agreement — the Feature class and its tests — could drift apart at authoring time.

There is a small but load-bearing detail in how the scaffolded test body is shaped. The return { status: 'NotImplemented' } is a first-class value in the ACResult union, recognised by the compliance scanner as "present but pending". A test method that returns NotImplemented counts as bound (the @Implements link is real, the AC name is spelled correctly) but unverified (the body does not actually assert anything yet). The compliance report then shows the Feature at, say, 6/8 — six ACs with real tests, two ACs with pending tests. No red where there should be yellow.

This is a small accommodation for the reality of incremental work. The naming session is done. The Feature class is written. The test skeleton is written. But the implementation of the first test has not happened yet. The DSL refuses to treat that state as a failure; it treats it as a named, tracked, reportable intermediate state. The human's work is recorded for what it is: some done, some yet to do.

Without this accommodation, the incentive gradient points the wrong way. A developer faced with "zero tests, zero compliance" versus "pending tests, partial compliance" has an incentive to defer running the wizard until they are ready to implement every AC at once — which is to say, to defer it indefinitely. With the accommodation, the wizard becomes safe to run immediately after the naming session, which is when the conversation is freshest and the vocabulary is most precise. Friction reduction at every surface.

How the three levers interact in a single session

It is worth walking through one concrete authoring session end-to-end, to show how the three levers compose rather than merely coexist. The session is hypothetical but constructed from real shapes; every step is something the DSL actually does today.

The scenario: a PM and a developer have just finished a 15-minute naming session. They agreed on a new Feature, tentatively titled "close button dismisses the modal", with four ACs they sketched on a whiteboard. The PM is about to leave; the developer wants to get the Feature into the repo before the context fades. They have ninety seconds before the PM's next meeting.

The developer opens a terminal in the package root and types requirements feature new. The wizard starts. The first prompt is the Feature ID.

Friction-budget lever at work. The prompt does not ask the developer to decide whether to use a decorator, a field, or a tag. Those decisions were made during the DSL's design, at the time the friction budget refused the @Priority, @Category, @Owner, @Tags, and @Deprecated decorators. The developer, mid-session, does not have to hold those trade-offs in their head. They just answer the prompt.

Next prompt: Feature title. The developer types "close button dismisses the modal". The wizard brands the title and continues.

Next prompt: priority. The default is High. The developer picks Medium, because the PM's priority hint was "important but not urgent". One keystroke.

Next phase: AC loop. This is where the AC suggester earns its place.

AC-suggester lever at work. Before asking for the first AC name, the wizard runs the suggester over the existing corpus. It takes the title the developer just typed, tokenises it, scores every existing AC name, and returns the top five candidates. The candidates for "close button dismisses the modal" include escapeClosesModal (score 3, from the noun modal overlapping), clickOpensModal (score 2, from the verb click being frequent), backButtonPreservesScroll (score 3, high verb frequency). None of the five is the exact AC the developer wants — clickDismissesModal, say — but the candidate list primes the spelling. The developer types clickDismissesModal, confident that Dismisses is a verb form consistent with the corpus's vocabulary (because Closes, Opens, Loads, all of the same shape, are in the candidate list).

The compiler will later enforce that @Implements<CloseModalFeature>('clickDismissesModal') matches exactly. The suggester has made it likely that the spelling will be right on the first try.

The wizard prompts for expected test levels. The developer picks [unit, e2e]. One multiselect. The wizard loops back to "add another AC?" The developer adds three more — escapeDismissesModal, focusReturnsToTrigger, backdropClickDismissesModal — each one scored against the corpus before it is typed, each one spelled consistently with existing vocabulary.

Next phase: cross-link. Which Requirements does this Feature satisfy?

Bootstrap-orchestration lever at work. The wizard presents the existing Requirements as a multiselect. The developer picks two: REQ-KEYBOARD-ESCAPE-CLOSES-OVERLAYS and REQ-FOCUS-MANAGEMENT-ROUND-TRIPS. Both Requirements already exist in the corpus, drafted in a prior sprint. The Feature is now linked to those Requirements at authoring time, not as an afterthought.

Next prompt: confirmation. The wizard renders a summary — ID, title, priority, four ACs, two @Satisfies links — and asks "Create this Feature?" The developer says yes.

The wizard now produces three artifacts, from a single accepted FeatureSpec:

  1. A Feature class file, with the abstract class and the four abstract AC methods.
  2. A spec.json file, with the same content in a machine-readable form for any future tool that wants it.
  3. A test skeleton file, with one @Implements per AC and a pending NotImplemented body.

All three files agree by construction. The AC names cannot drift between them, because they all came from the same AcSpec[] array in memory.

Elapsed time, end-to-end: roughly ninety seconds on a warm machine, well inside the two-minute budget. The PM's next meeting has not started yet. The Feature is in the repo. The tests are scaffolded. The compliance scanner, if run, will report the Feature at 0/4 — zero implementations — which is honest, because the developer has not written the test bodies yet.

The PM leaves for the meeting. The developer spends the next two hours implementing the four tests. Each test's @Implements<CloseModalFeature>('clickDismissesModal') reference is checked by keyof T at compile time; none of them is misspelled, because the AC names were produced from the same AcSpec[] the Feature class was generated from. The compliance scanner runs green within two hours of the whiteboard sketch.

This is the chain of small wins the three levers produce. None of them individually is decisive. Together, they compress the loop from whiteboard sketch to green compliance report from what would otherwise be a half-day of boilerplate-plus-typo-hunting down to two hours of actual engineering work.

The earlier post described the 15-minute naming session. This chapter has described the next ninety seconds, and the two hours after that. The tool's contribution to each of those windows is specific, named, measurable, and small. The human's contribution — the actual thinking, the actual test-writing, the actual naming — remains the thing the tool cannot do. That division of labour is the design.

The friction-budget diagram

The three heuristics are ways of moving friction off the human's plate. They do not eliminate friction; they relocate it. The diagram below maps the relocation — from the raw idea at the top, through the wizard, to an approved Requirement at the bottom. Each gate is a place where the DSL asks a small, bounded question instead of a large, unbounded one.

Diagram
"Friction funnel. Each gate turns an open question into a bounded one; the human's cognitive load drops monotonically as they descend."

The funnel has a concrete budget. The ReqBootstrapZeroFrictionRequirement — written as a typed Requirement in the package's own corpus — commits the wizard to a two-minute total walk-through from the first prompt to a compilable file on disk. Two minutes is not aspirational. It is the rejection threshold. If the wizard takes three minutes, the Requirement fails its fit-criteria and blocks a merge.

A note on why none of this uses an LLM

The three levers this chapter describes — friction budget, AC suggester, bootstrap command — could all be implemented with a large language model plugged into each seam. The AC suggester could be a prompt to an LLM that returns five candidates. The bootstrap command could be an LLM-driven dialog that skips the form-filling entirely. The friction budget could be policed by an LLM that reads every pull request and scores its proposal against the existing API.

None of those implementations ship in the package. None of them are planned. This is worth a brief defence, because it is the path most contemporary DSL tooling is taking right now, and the package's refusal to take it is deliberate.

The friction budget reasoning above — pure, deterministic, FS-free, single-digit milliseconds, testable with a frozen corpus — is the positive form of the argument. The negative form is this: an LLM-based suggester cannot be made deterministic, cannot be tested against a frozen corpus, cannot run FS-free in its core, and cannot be counted on to be consistent across installations. Every single constraint the current suggester meets, an LLM-based one breaks. The trade would be cleverness in exchange for repeatability; the DSL prefers repeatability.

There is also an ecological argument, which carries less weight in a technical chapter but is honest to name. Running an LLM call for every AC suggestion is several orders of magnitude more compute than running the current suggester. The current suggester takes low single-digit milliseconds per call on a laptop CPU. An LLM call takes hundreds of milliseconds and draws measurable energy from a datacentre. The multiplicative factor — call-count across a sprint, across a team, across a year — is large enough that the choice has a real footprint. The DSL does not owe its users extravagance. It owes them a tool that works. A 2x-weighted verb frequency over a ts-morph parse tree works. The LLM would also work, probably marginally better on some metrics, at a cost that is hard to justify against the improvement.

This is not a blanket refusal of LLMs in all developer tooling. It is a specific refusal at the specific seams where determinism is load-bearing. Elsewhere in the broader workflow — drafting prose, summarising pull requests, exploring designs — LLMs are used freely. The friction-budget rule does not say never add; it says make the addition earn its cost. At the AC-suggestion seam, the cost-benefit does not pencil out. Elsewhere, it might. Each seam gets its own calculation.

The connection back to the human side of the DSL: every time a seam is handed to an LLM, the human's feedback loop gets a little longer and a little less legible. An LLM call is a black box in a way that a scoring function over a parse tree is not. A developer who is surprised by a candidate can read the suggester's source and understand the surprise; a developer who is surprised by an LLM's output can only shrug. Legibility of the feedback loop is part of what keeps the DSL humane. Every seam that trades determinism for cleverness is a seam that moves the DSL a little further from legibility.

So the levers are what they are. Boring, inspectable, bounded. That is not a limitation of the current implementation; it is a property the implementation was designed to preserve.

Bootstrap at zero friction

REQ-BOOTSTRAP-ZERO-FRICTION is worth reading in its canonical form, because it is the clearest statement in the package of what the tool promises a first-time user.

export abstract class ReqBootstrapZeroFrictionRequirement extends Requirement<DefaultStyleType> {
  readonly id = 'REQ-BOOTSTRAP-ZERO-FRICTION';
  readonly title = 'One command bootstraps a Feature end-to-end';
  readonly priority = Priority.High;
  readonly status = 'Approved' as const;
  readonly kind = 'UserStory' as const;

  readonly statement = {
    pattern: 'ubiquitous' as const,
    response:
      'let a user create a Feature class, its spec.json, and the scaffolded ' +
      'test stubs by running one interactive command, with no manual file editing.',
  };

  readonly rationale = {
    claim: 'Manual creation of Feature files is the friction that prevents adoption of the DSL.',
    kind: 'evidence-based' as const,
    evidence: [
      { kind: 'expert-opinion' as const,
        expert: 'user-conversation-2026-04-14',
        recordedAt: '2026-04-14' },
    ],
  };

  readonly fitCriteria = [
    { kind: 'unit-test' as const,
      describes: 'wizard walks id/title/priority/AC-loop and produces a valid FeatureSpec',
      binds: ['wizardCollectsIdTitlePriority', 'wizardLoopsAcsUntilEnd'] },
    { kind: 'demonstration' as const,
      scenario: 'End-to-end wizard run in a temp dir yields a compilable Feature file + passing tests.' },
  ];

  readonly verificationMethod = 'Test' as const;

  readonly source = { type: 'stakeholder' as const, role: 'user', date: '2026-04-14' };

  readonly risk = {
    level: 'High' as const,
    ifNotMet:
      'The DSL stays niche: adoption capped at authors willing to hand-craft every Feature file.',
  };
}

Several things are worth naming about this file, because they are exactly the things the earlier requirements-human-side post could not show.

First, the friction budget is itself a Requirement. The wizard's two-minute promise is not a wiki page and not a code comment. It is a typed Requirement<DefaultStyleType> with a status: 'Approved', a rationale, two fit-criteria, a source, and a risk level. It lives in the same graph as every other Requirement the package makes. It is traversed by the same compliance scanner. It can be linked to by any Feature that claims to satisfy it.

Second, the rationale is honest. The evidence is 'user-conversation-2026-04-14' — a date, a kind, a claim that friction is the blocker to adoption. No metric-worship, no pretend-quantification. The fit-criteria are what the tool actually does: two unit-test bindings and one end-to-end demonstration.

Third, the risk is stated plainly. If not met, the DSL stays niche. That is not a marketing sentence. It is the failure mode the package would be in today if the wizard were three minutes long instead of two. The Requirement is load-bearing because the risk is real.

The Feature that satisfies this Requirement is FEATURE-NEW-COMMAND, the wizard itself, walked end-to-end in chapter 13b. That chapter shows the transcripts. This chapter names the constraint the transcripts were written against.

The two-minute budget was measured, not chosen. It was the wall-clock time of the first full end-to-end walk-through of the wizard on a fresh clone, recorded on 14 April 2026 on a mid-range laptop with a warm npm cache. That number — 107 seconds, round it up — became the budget. Not because two minutes is a magic threshold, but because the wizard already hit two minutes on the day it shipped, and any future version that runs slower than that is a regression.

The budget is a ratchet. It is allowed to shrink, never to grow. If a refactor brings the wall-clock down to 80 seconds, the Requirement's fit-criteria test is updated to fail above 85 seconds. If a refactor brings it up to 130 seconds, the Requirement fails immediately and the refactor is blocked. The two-minute figure in the risk statement is the public-facing number; the internal number is whatever the last recorded run clocked, plus a small tolerance.

This ratchet-shape is another form the friction budget takes. Surface area is budgeted at the design moment; performance is budgeted at every recorded run. Both are cost-curves the DSL refuses to let drift upward. The wizard's job is to stay under its own recorded time; the DSL's job is to stay under its own recorded surface. Neither budget is absolute; both are monotone.

The two-minute budget was also not arbitrary in its origin. It came from a conversation with a potential user who said, approximately, "if I can't get a feel for what the tool does in the time it takes me to drink a coffee, I will not come back." That sentence is recorded as the Requirement's source field — type: 'stakeholder', role: 'user', date: '2026-04-14' — and it is the kind of evidence the earlier requirements-human-side post argued should be captured in the specification, not in a separate interview archive. The rationale field carries the evidence forward; the risk field names what the DSL is protecting against; the fit-criteria name the two concrete ways the tool will be checked against the budget.

That structure — claim plus evidence plus risk plus fit-criteria — is the same structure every other Requirement in the corpus uses. The bootstrap-zero-friction Requirement is not a special case. It is an ordinary Requirement about the tool itself, author-ed with the same discipline the tool asks of its users. The dog-food is real at this specific level: the friction budget of the wizard is enforced by a Requirement that the wizard's own developers must satisfy, under the same rules they ask of everyone else.

There is a reading of this arrangement that deserves to be named. The wizard is not merely a convenience; it is a mechanism of self-enforcement. If the wizard stops being two minutes long, the Requirement fails its fit-criteria; the compliance report drops; the next release cannot be cut. The two-minute budget becomes a constraint on the package's own evolution. A well-meaning refactor that slows the wizard down by thirty seconds is caught before it merges, because the fit-criteria test walks the wizard end-to-end and times it. The human side of the DSL has grown a fuse.

When scale hurts

The three levers so far — friction budget, AC suggestion, bootstrap — address authoring. They keep the cost of adding the 200th Requirement close to the cost of the first.

Authoring is not the only phase where cognitive load shows up. Reading a 200-Requirement corpus is its own problem. A contributor who joins the project in sprint 40 does not author first; they read first. If the reading story is bad, the authoring story never gets a chance.

Before the CLI surface, a honest description of what 200 Requirements actually feels like, from the reader's side. Two hundred Requirements is approximately five times more than a person can hold in active working memory at once. It is approximately ten times more than a person can recall by name after a week away from the code base. It is approximately fifty times more than a person can review in a single pull-request sitting with any real attention. Those ratios are not precise; they are calibration. The point is that 200 Requirements is past every threshold of human recall.

What this means in practice is that the reading toolkit is not an optional convenience. It is the substitute for the recall the human cannot offer. Every command in the toolkit replaces a faculty the reader has lost at scale: list replaces the ability to remember the scope, trace replaces the ability to remember the links, orphan replaces the ability to notice the gaps, the compliance report replaces the ability to summarise the state.

A tool that refuses to build those substitutes is a tool that assumes every reader has unlimited memory. That assumption is false at 20 Requirements (where it is merely wasteful) and disastrous at 200 (where it turns every reader into a perpetual novice). The reading toolkit exists because the alternative — a DSL that trusts the reader to remember — is not a DSL anyone can actually use past a certain size.

One sub-problem worth naming explicitly, because it is where scale hurts most sharply: rediscovery. A Requirement that was authored six months ago, reviewed, approved, and then slept on because its satisfying Feature was deferred, is at high risk of being authored again under a different ID when the work finally surfaces. At 20 Requirements, rediscovery is trivial — a developer remembers. At 200, it is the default outcome unless the tool actively prevents it.

The prevention mechanism is the combination of requirements list --status=approved, the wizard's cross-link prompt, and the rule that Requirement IDs are branded and uniquely validated. Together they make it cheap, at the moment a naming session produces a candidate Requirement, to check whether something close to it already exists. The wizard's cross-link step lists every existing Requirement before the new Feature is cross-linked; a developer scanning that list is likely to see the near-duplicate and notice the conflict. The list is not magical — a determined developer can still author a duplicate if they skim — but it makes the right thing easy enough that the wrong thing becomes the conscious choice.

This is the same pattern that applies to all three reading-toolkit commands: they do not guarantee the right outcome, but they lower the cost of achieving it to the point where the wrong outcome requires active indifference. A team that is paying ordinary attention will catch the duplicate, notice the orphan, read the trace. A team that is actively careless will not — and no tool can compensate for that. The DSL is aimed at the first case.

One more scale-related observation. The reading toolkit's value compounds with the authoring tooling, in a way that is not obvious until the corpus is large enough. The AC suggester's candidate list draws from the same corpus that requirements list surfaces; the corpus's quality — how consistently it has been authored, how clean its vocabulary — directly feeds back into how good the suggester's candidates are. A well-maintained corpus makes every new authoring session cheaper. A drifting corpus makes every new authoring session more expensive. The reading toolkit is what the team uses to keep the corpus well-maintained, and the authoring toolkit is what benefits from that maintenance. The two are mutually reinforcing, and neither works as well without the other.

The DSL addresses reading with three CLI commands and one report. None of them is novel. All of them are necessary.

requirements list

The requirements list command prints every Requirement in the corpus with one line per entry: id, priority, status, style, title. Plain text, sortable, greppable.

The command accepts filters. The filters are boring on purpose.

requirements list --status=approved
requirements list --status=draft --priority=critical
requirements list --style=industrial
requirements list --style=agile --status=approved

Each filter maps to a typed field on Requirement. Each combination is a && conjunction. There is no query language beyond that. At 200 Requirements, the filter combinators cover the reading-case fully: what is approved right now?, what is still draft?, what is critical and open?, what does the industrial-style half of the corpus look like?.

Nothing about this is sophisticated. That is the point. The sophistication was spent on the typed fields. The reading surface is as small as it can be.

A brief aside on what the filter language deliberately does not support. There is no --title-contains=..., no --full-text-search, no --regex. The rejection is the same as the @Tags rejection: once the reading language allows free-form string matching over unstructured prose, it drifts away from the type-safety commitment. A filter that matches title substrings will return false negatives the first time a Requirement is renamed, and false positives the first time two unrelated Requirements happen to share a noun. The typed filters are boring, yes, but every Requirement they return is guaranteed to match the filter's intent, because the filter's intent is spelled out in the type system. Boring plus sound beats flexible plus unsound, at the reading surface as everywhere else.

The one concession to free-form querying is the requirements trace command, below, which accepts a specific --req=... or --feature=... identifier. That one degree of freedom is bounded by the smart constructors that validate RequirementId and FeatureId: the user cannot ask trace about a misspelled ID; the CLI rejects the input at the parse step, before any graph walk begins. Boring plus sound, again.

requirements trace

The second reading lever is the traceability graph. requirements trace --req=REQ-BACK-BUTTON-RESTORES prints the closed loop: the Requirement itself, the Features that satisfy it, the ACs on those Features, the tests that implement those ACs. One command, four levels, one answer: is this Requirement covered, and if so, by what?

The same command inverted — requirements trace --feature=NAVIGATION — walks the graph the other way: from Feature up to Requirements, down to ACs, down to tests. A PM who wants to know "what does feature X promise?" runs this command. A developer who wants to know "what will break if I change this AC?" runs this command.

The graph is built once at startup from the decorated classes and cached. It is not a database. It is not a service. It is a plain object in memory that the CLI walks.

The in-memory choice is deliberate. A database would add operational cost — migrations, backups, a connection string, a schema versioning story — for no expressive gain over what the decorators already encode. A service would add a boundary where the graph would need to be re-serialised and re-parsed, inviting drift between the in-code relation and its network representation. Both choices would have added friction, at a level that would compound on every contributor who ran the command. The plain object wins on every axis except scale — and the corpus at which the plain object stops fitting in memory is roughly four orders of magnitude above the current size. That is a bridge to cross when the DSL reaches it; until then, the smallest tool that works is the correct tool.

A worked example of trace output, for concreteness:

REQ-BACK-BUTTON-RESTORES  [Approved, High, default]
  satisfied by:
    NAVIGATION
      backButtonRestoresState  → implemented by NavigationTests.backButtonRestoresState
      backButtonPreservesScroll → implemented by NavigationTests.backButtonPreservesScroll
  refined by:
    REQ-BACK-BUTTON-PRESERVES-SCROLL
      satisfied by: NAVIGATION (backButtonPreservesScroll)

Five lines of output. One Requirement, one Feature, two ACs, two tests, one refinement child. The full closed loop for one concept, printed in the shape of a tree. A PM reading this output sees the promise and the coverage together. A developer sees the contract and the implementation together. Neither needs a separate tool to cross-reference the pieces; the pieces are the output.

The tree shape, and not a table shape, is deliberate. A table flattens the refinement graph; the refinement relation is hierarchical by definition, and flattening it forces the reader to reconstruct the hierarchy in their head. The tree renders the hierarchy directly. At 200 Requirements, where refinement graphs can reach three or four levels deep on the dense cases, the difference between a tree and a table is the difference between a reader who can read the output and a reader who gives up.

requirements orphan

The third reading lever is the orphan detector. An orphan is a Requirement with no Feature that satisfies it, or a Feature with no Requirement it satisfies, or an AC with no test that implements it. Each of these is a gap in the closed loop.

requirements orphan

prints the gaps. Nothing else. The output is a shopping list for the next sprint. At 200 Requirements, the orphan count is the single most important number the tool can surface: it is the count of promises the system has not yet kept.

Orphans are the opposite of the friction budget. The friction budget keeps the tool from growing weeds at the authoring end. The orphan detector keeps the tool from accumulating debt at the coverage end. Both are gardening. Both are unglamorous. Both are necessary.

The orphan detector classifies orphans into three distinct kinds, and the distinction matters for how the team responds to each.

Orphan Requirements are Requirements with no Feature that satisfies them. This is the most common orphan at the start of a sprint, and the least alarming: a Requirement drafted and approved, awaiting the Feature work. The remedy is to plan the Feature. The orphan count in this category tracks the pipeline depth.

Orphan Features are Features that satisfy no Requirement. This is always alarming. A Feature exists in the code base but does not, according to the traceability graph, answer any promise the system has made. Either the Feature is gold-plating — functionality no stakeholder asked for — or the corresponding Requirement is missing from the corpus. Either failure is a gap. The remedy is to write the missing Requirement, or to remove the Feature. There is no third option.

Orphan ACs are acceptance criteria with no test that implements them. This is the most granular orphan, and the most frequent during active development. The remedy is to write the test. The compliance scanner catches these automatically, but the orphan detector separates them from the other two categories so that a reader scanning the orphan list can triage at a glance.

The classification is load-bearing because the three kinds demand different cognitive responses. An orphan Requirement is a plan to act. An orphan Feature is a diagnosis to act on now. An orphan AC is a task to schedule. Collapsing all three into a single list — as an earlier version of the command did — forced the reader to re-classify every item on every read. The split moved that work off the reader's plate and into the tool, at a cost of twelve lines of code that have not needed to change in six months.

The compliance report

The fourth instrument — strictly a report rather than a command — is the per-Feature compliance percentage the earlier post described. At 200 Requirements spread across 25 Features, the report is what the PM reads at stand-up. Each Feature's compliance is one number. Green means every AC has at least one test with a matching @Implements. Red means at least one AC does not.

The report does not lie, because it cannot lie. It is generated by walking the same graph the trace and orphan commands walk. A Feature that reads 8/8 in the compliance report has 8 @Implements links in the test corpus, all of them pointing at real abstract methods on the Feature class. If one of them were a typo, the build would have failed before the report ran. The compiler is the gatekeeper. The report is the summary.

There is one place where the report does something the earlier post could not describe. It surfaces the per-style compliance in addition to the per-Feature compliance. Industrial-style Requirements and agile-style Requirements each accumulate their own coverage percentage, readable side-by-side. That split is useful because the two styles have different stakes. An industrial Requirement at 80% compliance is a safety-hazard letter; an agile Requirement at 80% compliance is a healthy sprint. The same percentage means different things depending on the register. The report honours that by never averaging across styles. It prints one row per style, plus a grand total, and leaves the interpretation to the reader who understands which stakes apply.

This is a small design choice. It is also exactly the kind of choice the friction-budget rule forced into the open. An earlier version of the report printed only the grand total, on the argument that a single number is the easiest thing for a tired reader to parse. The rejection came from the reverse argument: a single number is the easiest thing for a tired reader to misinterpret. The per-style split adds three lines to the report and removes one well-documented failure mode. Three lines of friction paid, one silent error mode retired. That is the budget working in the direction of the reader, not the author.

A week in the life of requirements list

Abstract descriptions of CLI commands leave out the thing that matters most — how often the commands are actually typed. A brief accounting from the last sprint on this package, for calibration.

requirements list --status=approved was run, on average, once a day. It is the first command a contributor runs when they pick up a ticket: what have I already committed to? The output is the opposite of anxiety; it names the scope and stops.

requirements list --status=draft was run roughly once every two days. It is the PM's command: what am I still drafting, what is not yet approved, what is the queue in front of the team? The draft list is where the naming session's raw output lives before it earns the Approved status.

requirements trace --req=... was run several times a day, almost always in response to a bug report or a question. What is covered, and by what? The command's honesty is the reason it gets used: the answer is in the tool, not in anyone's memory.

requirements orphan was run at the end of each day, roughly, and always before a merge. It is the gardening command. The orphan count is the moral temperature of the corpus; a day where the count goes up is a day where the team accumulated a promise they did not keep.

None of these frequencies are enforced. They emerged from use. They are the pattern a human settles into when the tool is cheap enough to run on reflex rather than on ceremony. The cheapness is the friction-budget rule, applied at the CLI surface: every flag, every subcommand, every piece of output earns its cost by being the answer to a question someone actually asked.

A related observation: the CLI commands are not the primary reading interface in most sessions. The primary reading interface is the source tree itself. Because every Requirement and every Feature is a TypeScript file in a known directory, a developer with a good editor can jump between them with standard navigation — go-to-definition, find-references, symbol search. The requirements trace command is a convenience for the rarer cases where the navigation spans four levels (REQ → FEAT → AC → test) and the editor would require four jumps instead of one. For the more common two-level cases, the editor alone suffices.

This is a design choice that compounds. By keeping the primary artifact as TypeScript source, the DSL inherits the entire tooling ecosystem around TypeScript — editors, linters, formatters, language servers, renderers — without having to reimplement any of it. The CLI's reading commands are a thin supplement on top of that inherited richness, not a replacement for it. The friction budget again: the CLI adds only what the editor cannot already do, and nothing else.

A counter-example, for clarity. An earlier sketch of the CLI included a requirements show REQ-XXX command that would print a Requirement's full contents — statement, rationale, fit-criteria, source, risk — in a pretty-printed form. The rejection came from asking what the command offered that the editor did not. The answer was: slightly prettier formatting. The rejection was: that is not enough. The editor already opens the file, renders it with syntax highlighting, folds the sections the user does not care about, and jumps to the fields the user does care about. A pretty-printer would have been a second surface to maintain for a marginal improvement. Cut. The editor already reads.

The invariant the earlier post promised

Every lever in this chapter exists in service of the invariant the requirements-human-side post named explicitly: the 15-minute naming session is the point. Everything else — the types, the decorators, the wizard, the CLI — is scaffolding around that conversation. The scaffolding is there to make sure the conversation stays cheap across hundreds of iterations, not to replace it.

At 20 Requirements, the scaffolding looks like over-engineering. The conversation works fine without it. A PM and a developer sitting together can keep 20 ACs in their head. They can type the AC names without a suggester. They can read the corpus without a filter. They can find the orphans by scrolling.

At 200 Requirements, the scaffolding is the only reason the conversation still happens at all. Without the friction budget, the API would have grown to the point where the naming session would need its own tutorial. Without the AC suggester, every new AC name would be a new coin flip against the existing vocabulary. Without the bootstrap command, the cost of starting a new Feature would dominate the cost of finishing it. Without requirements list, trace, orphan, the report, the reading-case would drown.

This is the argument the earlier post assumed but could not yet show. A good DSL does not replace the human conversation. It does not try to. It pays the cost of remaining cheap, in exchange for the human continuing to do the work that only the human can do: choosing the words.

A concrete test of this invariant, worth running periodically. Take a random Requirement from the corpus, read its statement aloud, and ask: could the PM who commissioned this Requirement recognise their own intent in this sentence? If yes, the DSL has done its job. If no — if the sentence has been bent toward some technical convenience, or fossilised into jargon, or shortened past its original precision — the DSL has failed an invariant, regardless of what the compliance report says. The numbers can be green while the words are wrong, and when that happens, the tool has lost the thread.

This is the test that no machine can run. It is a test of meaning, not of structure, and meaning is the PM's responsibility, not the compiler's. The DSL's role, in this test, is to keep the cost of running it low: the Requirement should be quotable in full in under a minute, the statement should parse as one of the five EARS patterns, the rationale should name real evidence. If all those things are true, the PM can run the test quickly. If any of them are not true, the PM has to spelunk, and the test becomes expensive enough that it stops being run, and the invariant erodes silently.

The friction budget, the AC suggester, the bootstrap command, the reading toolkit — they all work in service of keeping the PM's test cheap. That is their ultimate purpose. Everything else is plumbing.

The tool stays humane because the tool stays small. The tool stays small because the budget stays strict. The budget stays strict because every time the team proposes a sixth decorator, someone says: name the relation it expresses that the five existing decorators do not. And so far, that question has always been enough.

It is worth naming what this chapter is not claiming. It is not claiming that the DSL is finished. It is not claiming that the friction budget is right in every case or that the AC suggester will never need a rewrite. It is not claiming that the wizard is the only way to author a Feature; the abstract classes can still be written by hand, and are, for the denser cases where the wizard's template does not fit. The chapter is claiming only that the three levers above — friction budget at the API surface, AC suggestion heuristics in the wizard, bootstrap and reading commands in the CLI — are the specific, named, load-bearing choices that keep the earlier post's argument alive past the point where headcount and corpus size would otherwise bury it.

There is also a quieter claim, which the chapter has tried to make by accumulation rather than by assertion. The earlier post was not wrong; it was early. The naming session was the point then and the point now. What has changed is that the DSL has grown enough instruments around the naming session to keep its cost roughly constant as the corpus scales. Constant cost at scale is not free. It is paid for by the rejections, the heuristics, the two-minute budget, the four-command reading toolkit. The earlier post stated the ideal; this one shows the bill.

The connection to Don't Burden Developers

There is a sibling post on this site — Don't Burden Developers with Cognitive Load — that argues a more general version of the same case this chapter has made for a specific DSL. The general argument is: every abstraction the developer must hold in their head is a tax on every future decision they make inside that abstraction. A good abstraction pays for itself by replacing two or three other abstractions with one. A bad abstraction adds itself to the stack without removing anything, and the stack gets heavier until the developer gives up.

The three levers in this chapter are the DSL-specific form of that principle. The friction budget is the refusal to stack. The AC suggester is the refusal to ask the developer to hold the corpus in their head. The bootstrap command is the refusal to ask the developer to hold the lifecycle in their head. Each lever is a deliberate offload of cognitive work from the human onto the tool, for the specific work where the tool can do it deterministically and cheaply.

The opposite failure mode — the tool that accumulates cleverness at the author's expense — is what the sibling post calls the second-system effect in its modern form. A DSL that starts simple and grows by accretion, with each new feature adding a decorator, a flag, a mode, a keyword, eventually becomes a system only its own maintainers can hold in their heads. At that point the DSL has lost its audience; it is a private tool for the people who built it.

The friction budget exists to prevent this specific outcome. Every new surface must justify itself against the existing surface, and the justification must be substitutive rather than additive: the new feature must either replace something, or unlock a use case the existing surface cannot serve. Features that merely add to the surface without subtracting from it — no matter how small, no matter how well-intentioned — are the beginning of the accretion curve the sibling post warns about.

This is also why the friction budget is, in practice, enforced more strictly as the DSL grows than as it begins. Early in a DSL's life, the surface is small enough that a new feature's cost is proportionally small; the budget is permissive. Late in a DSL's life, the surface is large enough that any new feature compounds against all the existing features; the budget tightens. The DSL's review discipline has to account for this gradient, or the accretion curve wins by default. The smaller the surface, the stricter the budget — the reverse of the instinct most API designers have when defending a new proposal.

What this chapter added

To recap precisely what this chapter put on the table that the earlier post did not:

  • The friction budget as a named rule, with a concrete list of rejected surface area: @Priority, @Category, @Owner, @Tags. The rejections are not anecdotes; they are the working definition of what the DSL will and will not carry.
  • The AC suggester as a deterministic ranking over the existing corpus, pure, FS-free in its core, covered by its own Requirements. The heuristic is documented; the scoring is reproducible; the upper bound on the number of candidates is five.
  • The wizard's EARS pre-fill as a grammar-constrained prompt, not a free-form text box. The grammar is Mavin's five patterns. The prompt is selection before completion.
  • REQ-BOOTSTRAP-ZERO-FRICTION as a typed, committed Requirement that fixes the wizard's two-minute end-to-end budget and names its risk.
  • The reading toolkit: requirements list, requirements trace, requirements orphan, the compliance report. Four instruments that let a human operate a 200-Requirement corpus without holding it all in their head.

The earlier post described the 15-minute conversation. This chapter described the instruments that keep the conversation cheap on the hundredth iteration, on the two-hundredth, on the thousandth. Neither description is complete without the other.

The naming session is still the point. It always was. The tool's job is to make sure that remains true as the corpus grows.

A note on how these levers are maintained

Every lever above — the friction budget, the suggester's 2x weighting, the two-minute wizard budget, the four reading commands — requires ongoing maintenance. The budget does not enforce itself; someone has to refuse the sixth decorator when it gets proposed. The suggester does not keep its ratio automatically; someone has to rerun the frozen-corpus test when the ranking function is touched. The two-minute budget does not hold by magic; someone has to add the end-to-end timer test to CI.

The maintenance work is, by design, small enough that it does not dominate the package's engineering time. Roughly: one review comment per proposed decorator (rare — maybe once a quarter), one test update per suggester refactor (also rare), one CI-timer tweak per wizard refactor (slightly more frequent), and a weekly glance at the orphan count (routine). None of these items is architecturally heavy; all of them are on the "normal engineering hygiene" plane rather than the "project of its own" plane.

The reason the maintenance stays cheap is that the levers are self-describing in code. The friction budget is a rule in the review guide, enforceable by any reviewer who reads it. The suggester's 2x weighting is a named constant, documented. The two-minute budget is a typed Requirement with fit-criteria, reachable via requirements trace --req=REQ-BOOTSTRAP-ZERO-FRICTION. The reading commands are their own test suite. A new maintainer who reads the codebase can inherit the levers without a knowledge-transfer session, because the levers explain themselves in the artefacts they produce.

This is the payoff of having authored the package's own DX commitments as first-class Requirements. REQ-BOOTSTRAP-ZERO-FRICTION is not a blog post or a design-doc commitment; it is a TypeScript file with a specific shape. A future maintainer who does not know the author can read it, understand the commitment, and choose to keep or revise it. The commitment travels with the code. That is the only form of maintenance discipline that survives team turnover, and the package's Requirement corpus is the vehicle that makes it work.

A short catalogue of the levers the DSL did not add

It is worth closing the catalogue by naming the levers that were considered for cognitive-load reduction but rejected. Each of these would have been plausible. Each was cut.

A natural-language search over the corpus. Proposed as "type a sentence, find the nearest Requirement." Rejected for the same reason the requirements list command does not support full-text search: a nearest-match over free-form prose inherits all the failure modes of string similarity — false positives on shared nouns, false negatives on renames. The typed filters return exactly what the user asked for; the fuzzy search would return a guess.

An automatic REQ-ID generator. Proposed as "the tool picks the ID so the author does not have to." Rejected because the ID is load-bearing vocabulary; a generated ID would be generated nonsense, and once generated, impossible to rename cheaply (because @Satisfies and cross-references would all have to move). The smart constructor on RequirementId validates the ID's shape; the author still picks the content. That division was deliberate.

An automatic fit-criteria generator. Proposed as "infer the tests from the statement." Rejected because the fit-criteria are a human commitment, not a derivative of the statement. Generating them from the prose would make them feel automatic, which would invite the author to skip the judgement step where they decide what, exactly, would count as evidence that the Requirement is met. Some choices should not be automated, because automating them erodes the very judgement the DSL is trying to preserve.

A dashboard. Proposed, repeatedly, in every form a dashboard takes — HTML, TUI, embedded widget. Rejected because dashboards tend to grow until they become their own UI surface, with their own bugs, their own UX debt, their own maintenance cost. The compliance report as plain text hits every information-display need the dashboard would have addressed, at a fraction of the surface area. Boring plus small, again, beats flashy plus large.

Each of these rejections saved the DSL from a class of compounding complexity. None of them is a closed decision — any of them could come back, under the same friction-budget discipline, if a future version of the package finds a shape that passes the rule "express something the current surface cannot". So far, none have.

A closing observation on what gets called DX

The phrase developer experience has a specific, narrow meaning in most toolchains. It means the polish around an engine that already works. It means the spinner, the colour, the friendly error message, the short command name. That polish matters — the earlier chapter 13b of this series walks through exactly that kind of polish on the wizard — but it is a surface phenomenon. The polish sits on top of something.

What this chapter has tried to show is that the something the polish sits on top of is not just an engine. It is a set of structural commitments about what the tool will and will not ask of its users. Those commitments are not DX in the narrow sense; they are DX in the sense that matters most, which is the sense where every decision the tool makes about its own internals is also a decision about the cognitive weight on the user's shoulders.

The friction budget is not a DX feature. It is the precondition for any DX feature to survive beyond its release notes. The AC suggester is not a DX feature either; it is the load-bearing choice that lets the naming session stay cheap at corpus scale. The bootstrap command is not just a scaffolder; it is the measurable instrument by which the package keeps a promise to its newest users.

All three are boring. All three were chosen deliberately, in preference to flashier alternatives, because the boring choice was the one that would still be doing its job in sprint 40 — when the team has turned over, the corpus has tripled, and the original naming session is a commit hash no one can place. The earlier post argued that the tool's job was to keep a promise between two people. This chapter argues that the tool's job, at scale, is to keep that promise durable across people and time.

That is the revision the earlier post needed, and the reason this chapter exists. Not because the earlier post was wrong, but because the tool grew up.

A quieter corollary about rhetoric

A quieter corollary, worth naming because it applies outside the DSL as well. The friction budget is, ultimately, a form of rhetorical discipline. It asks, on every proposal, what the tool is saying to its users by carrying that proposal. Every decorator is a sentence. Every CLI flag is a sentence. Every prompt in the wizard is a sentence. The tool is, in aggregate, a long speech to every future user.

A tool that keeps adding sentences without removing any ends up as a speech no one can follow. A tool that edits itself — that cuts, rejects, refuses, refactors — ends up as a speech that says fewer things, more precisely, with more load-bearing weight on each sentence. The friction budget is, in that frame, an editorial pass. The rejections are not failures; they are edits.

This framing also aligns with the content-authoring discipline the rest of this site is written under. Prose drafts are edited; code should be edited too. The abstractions that survive an edit pass are better than the abstractions that go through unedited, for the same reason a third-draft paragraph is better than a first-draft paragraph. The discipline is the same; only the medium differs.

The final claim of this chapter, then, is small: edit the DSL the way you edit the prose. The earlier requirements-human-side post argued that the tool's job is to preserve a conversation. This chapter has argued that preserving a conversation means editing the tool as ruthlessly as the conversation asks for precision. The editing is what keeps the tool humane. The budget is what keeps the editing honest. The instruments — the suggester, the bootstrap, the reading toolkit — are what the edited tool leaves behind, each one carrying the smallest weight that still earns its place.

None of it is sophisticated. All of it is deliberate. And every new decorator that gets proposed, every new flag that gets suggested, every new prompt that gets drafted, goes through the same question the DSL has always asked of itself: what does this let us say that we cannot already say? When the answer is nothing, the proposal is cut. When the answer is something real, the proposal is added, and something else is usually cut in the same commit to keep the budget balanced.

The budget, it turns out, is never finished. It is the work of the tool as much as any of its features are.

One final pointer to the earlier post

For readers who reached this chapter without having read The Human Side of Requirements as Code first, a closing pointer. The earlier post is the premise of this one. Read on its own, this chapter is a catalogue of instruments; read in sequence with the earlier post, it is an argument about what those instruments are for.

The earlier post's thesis, in one sentence: the compiler's value is not enforcement but the conversation it forces. Two people agreeing on what a word means, and a machine making sure neither of them forgets.

This chapter's thesis, in one sentence: as the corpus grows past the threshold of human recall, the conversation stays cheap only because the DSL pays, at every surface, the specific small costs that substitute for the recall the humans no longer have.

Both theses are the same thesis at different scales. The earlier post described the first ten Requirements. This one described the next two hundred. The promise in both cases is the same: the human side is what gets preserved when everything else is designed with that preservation in mind.

That is the whole argument. The levers are the evidence. The rejections are the receipts. The budget is the discipline that keeps the argument true as the code grows.

Everything else, as the sibling post on don't-burden-developers puts it, is cognitive load — the tax the tool pays, so its users do not have to.

⬇ Download