A Style is a tiny compiler. Test it like one.
There's a moment you hit, six months into building a typed DSL, when you realise that the thing you've been treating as a "small library" is actually a tiny compiler — front-end, type system, error reporter, the whole shape — and that testing it the way you'd test a feature is exactly how you ship a system that lies to its own users.
This post is about that moment, and what changed in the way I write tests for @frenchexdev/requirements after I had it. It uses the two reference Styles that ship in the box — KanbanStyle (src/styles/kanban.ts) and IndustrialStyle (src/styles/industrial.ts) — as the worked examples. Their tests live in test/unit/kanban-style.test.ts and test/unit/industrial-style.test.ts. Both clear a 98 % per-file coverage gate. Neither uses describe or it.
The full how-to is in the package docs (HOW-TO-TEST-YOUR-STYLE.md). This post is the why, with three failure modes I'd have shipped if I'd kept testing my Style the way I test my features, a deep dive on the reporter, the dog-fooding recursion that holds the whole thing together, an honest comparison to Zod, the property-based tests that catch the bugs example tests can't see, the CI configuration that wires everything together, and the questions a reader is likely to ask that the post would otherwise leave dangling.
The realisation
Here's what a RequirementStyle looks like, structurally:
interface RequirementStyle {
readonly vocabulary: StyleVocabulary; // data
readonly validators: StyleValidators; // pure rules
readonly templates: StyleTemplates; // skeletons
readonly reporter: RequirementReporter; // rendering
}interface RequirementStyle {
readonly vocabulary: StyleVocabulary; // data
readonly validators: StyleValidators; // pure rules
readonly templates: StyleTemplates; // skeletons
readonly reporter: RequirementReporter; // rendering
}Look at it without the Requirement prefix and squint:
vocabulary→ the lexer's keyword table. The set of allowed kinds, statuses, risk levels, source types.validators→ the type-checker. Pure functions that walk a candidate spec and emit errors.templates→ the parser snippets / standard library. Pre-filled skeletons the wizard offers.reporter→ the emitter / pretty-printer. The thing that produces the auditable artefact people read.
That isn't a metaphor. The CLI literally invokes these in the order a compiler invokes lex / parse / typecheck / emit. The wizard is the parser front-end; compliance --strict is the type-checker pass; the markdown reporter is the codegen.
The compiler analogy, in detail
Spelled out concretely with the actual code paths the CLI runs:
Lex — the wizard prompt loop. When a user types requirement new, the CLI walks vocabulary.requirementKinds and renders an interactive select. The user picks Safety. The CLI walks vocabulary.statementPatterns filtered for the chosen kind, renders another select, the user picks safety-function. Then for each slot of the chosen pattern, the CLI prompts for a value. Every prompt is sourced from the vocabulary, every choice is constrained to declared values — exactly the way a lexer constrains tokens to a declared keyword table. The user can't enter a Safety requirement with kind Safetie because the prompt won't offer it.
Parse — schema acceptance into the spec object. The wizard's accumulated answers are assembled into a candidate spec object: { kind: 'requirement', requirementKind: 'Safety', statement: { pattern: 'safety-function', action: '…', demand: '…', … }, … }. This is the parse tree. The shape isn't checked yet — that's the next stage — but the structural assembly happens here.
Type-check — validators.validateSpec(spec). This is where the discipline rules fire. The validator walks the candidate spec, applies the per-kind rules (Safety → SIL, safety-function → SIL VM, Regulatory → standards source), and returns a ValidationResult with an array of errors keyed by JSON path. Errors at this stage halt the wizard with a typed message; the user fixes them and re-submits. This is the stage that makes the difference between a compliant Requirement and a free-text claim of compliance, and it's the stage that's most expensive to leave untested.
Emit — reporter.renderMarkdown(spec) and the spec-to-source serialisation. Two emitters in one. The first writes the canonical TypeScript file under requirements/ — that's the source-code emit, the thing the codebase depends on at compile time. The second renders human-readable markdown for documentation, audit packages, change-control briefs. Two different output languages, same input AST. Every compiler with multiple back-ends has this shape.
The validator runs three times. This is the operational detail that makes the analogy more than decorative:
- Wizard time. The user is typing; the validator runs after each prompt's submit. Errors here halt the wizard. Failure mode caught: live typos.
- CI compliance time. Every
*.tsfile underrequirements/is loaded; the validator runs against each spec. Errors here fail CI. Failure mode caught: drift between the file in the repo and the current vocabulary (e.g. the file was valid when written, but a vocabulary change has since invalidated it). - Test time. The Style test suite invokes the validator against synthetic fixtures. Errors here fail the build. Failure mode caught: regressions in the validator itself.
The same function, three contexts, three classes of bug caught. Skip any one and a class of bug ships silently. (And skip the test-time invocation specifically — which is what this post is about — and the other two contexts are running an unverified type-checker, which means they catch what they happen to catch and miss what they happen to miss, with no warning.)
Once you see it that way, the testing strategy a compiler engineer would write — exhaust every branch of the type-checker, fuzz the lexer, snapshot the emitter, never trust an untested defensive ?? — is exactly what you need. And the testing strategy most TypeScript developers default to — happy path + a couple of edge cases, structural assertions on the output — is exactly what you can't get away with.
I had been writing tests in the feature-developer register: a few describe blocks per file, an it('rejects bad input') and an it('accepts good input'), call it 70 % coverage and ship. For features, that's defensible. For a Style, it ships lies in the Requirements file.
Lies in the Requirements file
Here's the failure mode that reframed it for me. Imagine a Safety requirement, written by someone in a hurry — Friday afternoon, end of sprint, three more requirements to file before the weekend — that ends up in the repo:
class CloseValveOnOverpressure extends Requirement<IndustrialStyleType> {
readonly id = 'REQ-S1';
readonly title = 'Close XV-101 on PAHH-101 trip';
readonly requirementKind = 'Safety';
readonly risk = { level: 'NonSIL', ifNotMet: 'pressure release' };
readonly verificationMethod = 'UnitTest';
// …
}class CloseValveOnOverpressure extends Requirement<IndustrialStyleType> {
readonly id = 'REQ-S1';
readonly title = 'Close XV-101 on PAHH-101 trip';
readonly requirementKind = 'Safety';
readonly risk = { level: 'NonSIL', ifNotMet: 'pressure release' };
readonly verificationMethod = 'UnitTest';
// …
}Two things are wrong here. The Style validator is supposed to catch both:
- A
Safetyrequirement withrisk.level: NonSILshould be rejected — IEC 61511 requires SIL 1-4. - A
safety-functionstatement withverificationMethod: UnitTestshould be rejected — you cannot unit-test a SIF.
If the validator doesn't catch them, this Requirement passes every CI gate. The wizard accepts it. compliance --strict reports it as compliant. The TypeScript compiles. The PR is approved with two reviewers. The next compliance dashboard shows 100 %.
Six months later, the audit happens.
Auditor: I see
REQ-S1is listed as a Safety-Instrumented Function for the overpressure scenario on V-101. What's its SIL rating?You: Let me check. [opens the file] It says
NonSIL.Auditor: That can't be right — a SIF without a SIL rating isn't a SIF. Per IEC 61511 you need at least SIL 1, and the LOPA worksheet I saw earlier suggested SIL 2 is the right answer here. Can you show me the SIL determination record?
You: There isn't one in the file. The validator was supposed to require it.
Auditor: The validator that runs in your CI?
You: Yes.
Auditor: But the validator didn't reject this Requirement, did it? It's been in the repo for six months. Your CI has been green that whole time.
You: I think the test for that rule was deleted in a refactor.
Auditor: So your evidence of compliance is a test that doesn't exist, gating a rule that doesn't fire, on a Requirement that claims compliance. Is that a fair summary?
That's the failure mode. It isn't "my code has a bug". It's "the file I'm using as evidence of compliance is lying, and I have no way to know". The Requirement looks well-formed. The TypeScript compiles. The CI is green. And the rule it claims to satisfy was never actually checked.
A compiler engineer would call this an unsoundness. The type system promises a property (Safety implies SIL ∈ {1,2,3,4}); a missing test on the validator means the type system silently breaks the promise on certain inputs. Compiler people lose sleep over this — Rust's unsafe audits, Swift's exhaustivity-checker bug bounties, TypeScript's strictNullChecks migration are all about closing exactly this gap. DSL authors should lose sleep over it too.
The Style tests are the only thing standing between "someone wrote a free-text Safety requirement at 2 AM on a Friday" and "that requirement is now in the audit binder unchallenged". They are not a polish layer. They are the integrity layer.
The four surfaces, four failure modes
The framing dictates the discipline. Each of the four sub-interfaces has its own failure mode when untested, and its own kind of test:
| Surface | Untested failure mode | Test class |
|---|---|---|
vocabulary (data) |
Workflow refers to undeclared state; risk matrix uses unknown level; pattern slot conflicts with a template id. | Data-invariant tests — expect(states.has(transition.from)) over the in-memory constants. |
validators (rules) |
Domain rule silently accepts malformed spec; rejects valid spec; unknown enum slips through. | Branch coverage tests — one test per if arm, both branches, error message regex'd. |
templates (skeletons) |
Wizard offers a template referencing a non-existent pattern; findTemplate returns undefined for a declared id. |
Round-trip + cross-surface tests — findTemplate(t.id) === t plus skeleton-pattern presence. |
reporter (rendering) |
Markdown emits [object Object]; loses a slot; explodes on a missing optional field; produces silent empty string. |
Render-shape tests — exhaust every pattern × every defensive branch. |
KanbanStyle's test file holds 44 test methods for ~500 lines of style code. IndustrialStyle's holds 9 dense methods for ~530 lines — denser because each method exercises three or four sub-cases. The ratio of test-to-code matters less than the ratio of branches covered to branches that exist. If your validator has 30 if branches and 25 of them are tested, your Style has 5 silent unsoundnesses.
The rest of the post walks each surface in order — vocabulary first (the cheapest tests, the ones first-time Style authors most often skip), then the two-validator architecture, then three worked failure modes for validator rules, then the reporter, then templates × vocabulary coupling, then the meta-layer (compliance graph, dog-fooding, cross-Style smoke), then property-based testing, then the things this post deliberately doesn't cover.
Vocabulary tests — the cheapest, most load-bearing
Vocabulary tests are pure data assertions over constants. There's no function to mock, no branch to exhaust — just expect(setOfStates.has(transition.from)).toBe(true). They take three lines, run in microseconds, and catch the single most common Style bug: the vocabulary lying about its own shape.
Three vocabulary invariants every Style needs to test:
Invariant 1 — Status workflow well-formedness. Every transition's from and to must be a declared state. The initial state must be declared. Every terminal state must be declared.
@Verifies<RequirementCommandsFeature>('projectCanDefineCustomStatusWorkflow')
'every transition references a declared state'() {
const states = new Set(KANBAN_VOCABULARY.statusWorkflow.states);
for (const t of KANBAN_VOCABULARY.statusWorkflow.transitions) {
expect(states.has(t.from), `from=${t.from}`).toBe(true);
expect(states.has(t.to), `to=${t.to}`).toBe(true);
}
expect(states.has(KANBAN_VOCABULARY.statusWorkflow.initial)).toBe(true);
for (const term of KANBAN_VOCABULARY.statusWorkflow.terminal ?? []) {
expect(states.has(term)).toBe(true);
}
}@Verifies<RequirementCommandsFeature>('projectCanDefineCustomStatusWorkflow')
'every transition references a declared state'() {
const states = new Set(KANBAN_VOCABULARY.statusWorkflow.states);
for (const t of KANBAN_VOCABULARY.statusWorkflow.transitions) {
expect(states.has(t.from), `from=${t.from}`).toBe(true);
expect(states.has(t.to), `to=${t.to}`).toBe(true);
}
expect(states.has(KANBAN_VOCABULARY.statusWorkflow.initial)).toBe(true);
for (const term of KANBAN_VOCABULARY.statusWorkflow.terminal ?? []) {
expect(states.has(term)).toBe(true);
}
}The failure mode this prevents: someone renames Selected to Pulled in the states array, forgets to update the transitions array, and the wizard now offers a Backlog → Selected transition that targets a state that doesn't exist. When the user picks it, the next prompt loop has no valid next-state to offer and either crashes or silently locks the workflow. The test catches it on the next CI run, with a message naming the offending edge: from=Backlog: Expected true to be true (received false because 'Selected' is not in states).
This is also pitfall PF6 in HOW-TO-CREATE-YOUR-STYLE.md. The first time I shipped a custom Style without it, I lost an afternoon to exactly this bug.
Invariant 2 — Risk matrix references declared levels. If your Style has a 2-D risk matrix (probability × impact → level), the cells must reference levels that exist in the flat list:
@Verifies<RequirementCommandsFeature>('projectCanDefineCustomRiskTaxonomy')
'riskTaxonomy cells reference declared levels'() {
const levels = new Set(KANBAN_VOCABULARY.riskTaxonomy.levels);
for (const cell of KANBAN_VOCABULARY.riskTaxonomy.matrix!.cells) {
expect(levels.has(cell.level)).toBe(true);
}
}@Verifies<RequirementCommandsFeature>('projectCanDefineCustomRiskTaxonomy')
'riskTaxonomy cells reference declared levels'() {
const levels = new Set(KANBAN_VOCABULARY.riskTaxonomy.levels);
for (const cell of KANBAN_VOCABULARY.riskTaxonomy.matrix!.cells) {
expect(levels.has(cell.level)).toBe(true);
}
}Same shape of bug, same shape of test. If you don't have a matrix, skip this one but assert levels.length > 0 instead — an empty levels list means the wizard's risk-level select is empty and the user can't pick anything.
Invariant 3 — Patterns and source kinds have at least one required slot. A statementPattern or sourceKind with no required slots lets the wizard accept an empty record:
@Verifies<RequirementCommandsFeature>('fromFeatureFlagPreFillsFromExistingFeature')
'every sourceKind exposes at least one required slot'() {
for (const sk of KANBAN_VOCABULARY.sourceKinds) {
expect(sk.slots.some(s => s.required), sk.kind).toBe(true);
}
}
@Verifies<RequirementCommandsFeature>('requirementNewPromptsEarsPattern')
'every statementPattern declares slots and a templated string'() {
for (const p of KANBAN_VOCABULARY.statementPatterns) {
expect(p.slots.length, p.pattern).toBeGreaterThan(0);
expect(p.template, p.pattern).toContain('{');
}
}@Verifies<RequirementCommandsFeature>('fromFeatureFlagPreFillsFromExistingFeature')
'every sourceKind exposes at least one required slot'() {
for (const sk of KANBAN_VOCABULARY.sourceKinds) {
expect(sk.slots.some(s => s.required), sk.kind).toBe(true);
}
}
@Verifies<RequirementCommandsFeature>('requirementNewPromptsEarsPattern')
'every statementPattern declares slots and a templated string'() {
for (const p of KANBAN_VOCABULARY.statementPatterns) {
expect(p.slots.length, p.pattern).toBeGreaterThan(0);
expect(p.template, p.pattern).toContain('{');
}
}The expect(p.template).toContain('{') is a stand-in for "the template will produce something interpolated". A pattern with declared slots but a literal-string template is a silent reporter bug — the slot values are accepted by the wizard but never appear in the rendered output. The data-invariant test catches it on the next CI run, with a message naming the offending pattern.
These three tests together are about ten lines. They're the entire vocabulary test surface for a typical Style. They will save you several hours of debugging the first time someone refactors the vocabulary and forgets to update one of the cross-references.
validateStatement vs validateSpec — two functions, two test surfaces
The validator interface has two methods, not one:
interface StyleValidators {
validateStatement(statement: unknown): ValidationResult;
validateSpec(spec: unknown): ValidationResult;
}interface StyleValidators {
validateStatement(statement: unknown): ValidationResult;
validateSpec(spec: unknown): ValidationResult;
}This is an architectural distinction that a casual reader (or a casual test author) misses. The two functions have different scopes, different invocation contexts, and different test responsibilities.
validateStatement validates only the statement sub-object — the EARS-shaped sentence (or whatever pattern the Style declares). Its scope is "is this a well-formed statement under one of the Style's declared patterns?". It runs per prompt in the wizard, immediately after the user finishes typing the slot values for a chosen pattern. The wizard rejects on error, lets the user fix, re-submits.
validateSpec validates the entire spec object — kind, status, risk, statement, source, verificationMethod, the whole shape. Its scope is "is this a well-formed Requirement under the Style's discipline rules, given that the statement inside it is also well-formed?". It runs at submit time in the wizard (after every prompt has been answered), and at CI time against every file in requirements/. By contract, it also calls validateStatement internally on spec.statement and propagates those errors into its own error list.
// Conceptually, in the Style:
validateSpec(spec) {
const errors = [];
// … per-kind discipline rules …
const stmtResult = this.validateStatement(spec.statement);
errors.push(...stmtResult.errors.map(e => ({ ...e, path: `statement.${e.path}` })));
return { ok: errors.length === 0, errors };
}// Conceptually, in the Style:
validateSpec(spec) {
const errors = [];
// … per-kind discipline rules …
const stmtResult = this.validateStatement(spec.statement);
errors.push(...stmtResult.errors.map(e => ({ ...e, path: `statement.${e.path}` })));
return { ok: errors.length === 0, errors };
}What this means for tests: each function has its own test surface. A test of validateStatement should isolate the statement and exercise pattern × slot combinations:
@Verifies<RequirementCommandsFeature>('requirementNewPromptsEarsPattern')
'validateStatement accepts fully-filled service-request'() {
const r = KANBAN_VALIDATORS.validateStatement(fullyFilled('service-request'));
expect(r.ok).toBe(true);
}
@Verifies<RequirementCommandsFeature>('requirementNewPromptsEarsPattern')
'validateStatement rejects unknown pattern'() {
const r = KANBAN_VALIDATORS.validateStatement({ pattern: 'bogus' });
expect(r.ok).toBe(false);
expect(r.errors[0]!.path).toBe('statement.pattern');
}@Verifies<RequirementCommandsFeature>('requirementNewPromptsEarsPattern')
'validateStatement accepts fully-filled service-request'() {
const r = KANBAN_VALIDATORS.validateStatement(fullyFilled('service-request'));
expect(r.ok).toBe(true);
}
@Verifies<RequirementCommandsFeature>('requirementNewPromptsEarsPattern')
'validateStatement rejects unknown pattern'() {
const r = KANBAN_VALIDATORS.validateStatement({ pattern: 'bogus' });
expect(r.ok).toBe(false);
expect(r.errors[0]!.path).toBe('statement.pattern');
}A test of validateSpec exercises spec-level discipline rules — per-kind branches, source rules, status rules — not statement-level slot presence:
@Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion')
'validateSpec: Expedite without expedite-pull pattern is rejected'() {
// Spec is well-formed; statement is well-formed; the cross-cutting
// discipline rule (Expedite kind requires expedite-pull pattern) is
// what's being tested.
const r = KANBAN_VALIDATORS.validateSpec({
kind: 'requirement', requirementKind: 'Expedite', status: 'Backlog',
statement: fullyFilled('service-request'), // valid statement, wrong pattern for kind
});
expect(r.errors.some(e => e.path === 'statement.pattern' && /expedite-pull/.test(e.message))).toBe(true);
}@Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion')
'validateSpec: Expedite without expedite-pull pattern is rejected'() {
// Spec is well-formed; statement is well-formed; the cross-cutting
// discipline rule (Expedite kind requires expedite-pull pattern) is
// what's being tested.
const r = KANBAN_VALIDATORS.validateSpec({
kind: 'requirement', requirementKind: 'Expedite', status: 'Backlog',
statement: fullyFilled('service-request'), // valid statement, wrong pattern for kind
});
expect(r.errors.some(e => e.path === 'statement.pattern' && /expedite-pull/.test(e.message))).toBe(true);
}And one test verifies the propagation contract — that validateSpec actually delegates to validateStatement and merges the errors with the right path prefix:
@Verifies<RequirementCommandsFeature>('requirementNewWizardCollectsAllFields')
'validateSpec propagates validateStatement errors with statement.* path prefix'() {
const r = INDUSTRIAL_VALIDATORS.validateSpec({
kind: 'requirement', requirementKind: 'Functional', status: 'Draft',
statement: { pattern: 'ubiquitous' }, // missing required `response` slot
});
expect(r.ok).toBe(false);
expect(r.errors.some(e => e.path.startsWith('statement.'))).toBe(true);
}@Verifies<RequirementCommandsFeature>('requirementNewWizardCollectsAllFields')
'validateSpec propagates validateStatement errors with statement.* path prefix'() {
const r = INDUSTRIAL_VALIDATORS.validateSpec({
kind: 'requirement', requirementKind: 'Functional', status: 'Draft',
statement: { pattern: 'ubiquitous' }, // missing required `response` slot
});
expect(r.ok).toBe(false);
expect(r.errors.some(e => e.path.startsWith('statement.'))).toBe(true);
}Without the propagation test, a refactor that swaps validateSpec to a different statement-validation path (say, inlining a custom check instead of delegating) silently breaks the contract — and every validateStatement-only test still passes, because it's testing the function directly. Only the propagation test catches the architectural drift.
The Kanban test file has nine tests on validateStatement (one per pattern, plus three negative-shape tests) and ~15 on validateSpec (per-kind discipline, source rules, propagation, hygiene). The split mirrors the split in the production code.
Worked failure mode #1 — The Anderson rule
David J. Anderson's flow rule for the Intangible class of service in Kanban is one of those rules that sounds soft until you've watched a team ignore it for six months.
The rule: an Intangible work item — strategic investment, technical debt repayment, learning experiment — must declare a quantitative threshold at which it will be re-evaluated. Without it, Intangible items accumulate indefinitely as "important but not urgent" and starve the system. The board fills with strategic intent that nobody pulls.
KanbanStyle encodes this in the validator. From kanban.ts, conceptually:
if (s.requirementKind === 'Intangible') {
const stmt = s.statement as Record<string, unknown>;
if (!stmt?.threshold) {
errors.push({
path: 'statement.threshold',
message: 'Intangible class of service requires a re-evaluation threshold (Anderson rule)',
});
}
}if (s.requirementKind === 'Intangible') {
const stmt = s.statement as Record<string, unknown>;
if (!stmt?.threshold) {
errors.push({
path: 'statement.threshold',
message: 'Intangible class of service requires a re-evaluation threshold (Anderson rule)',
});
}
}Three lines, one rule, one citation. Now the test (kanban-style.test.ts:198):
@Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion')
'validateSpec: Intangible with empty threshold is rejected (Anderson rule)'() {
const r = KANBAN_VALIDATORS.validateSpec({
kind: 'requirement', requirementKind: 'Intangible', status: 'Backlog',
statement: { pattern: 'intangible-investment', work: 'w', benefit: 'b', threshold: '' },
});
expect(r.errors.some(e =>
e.path === 'statement.threshold' && /Anderson/.test(e.message)
)).toBe(true);
}
@Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion')
'validateSpec: Intangible with explicit threshold accepts discipline branch'() {
const r = KANBAN_VALIDATORS.validateSpec({
kind: 'requirement', requirementKind: 'Intangible', status: 'Backlog',
statement: { pattern: 'intangible-investment', work: 'w', benefit: 'b', threshold: 'when NPS > 70' },
});
expect(r.errors.some(e => e.path === 'statement.threshold')).toBe(false);
}@Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion')
'validateSpec: Intangible with empty threshold is rejected (Anderson rule)'() {
const r = KANBAN_VALIDATORS.validateSpec({
kind: 'requirement', requirementKind: 'Intangible', status: 'Backlog',
statement: { pattern: 'intangible-investment', work: 'w', benefit: 'b', threshold: '' },
});
expect(r.errors.some(e =>
e.path === 'statement.threshold' && /Anderson/.test(e.message)
)).toBe(true);
}
@Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion')
'validateSpec: Intangible with explicit threshold accepts discipline branch'() {
const r = KANBAN_VALIDATORS.validateSpec({
kind: 'requirement', requirementKind: 'Intangible', status: 'Backlog',
statement: { pattern: 'intangible-investment', work: 'w', benefit: 'b', threshold: 'when NPS > 70' },
});
expect(r.errors.some(e => e.path === 'statement.threshold')).toBe(false);
}Two tests. They look mechanical, but four design choices in those eleven lines are doing real work.
Choice 1 — The test name cites the source. (Anderson rule) is in the method name. When the test fails six months from now and someone reads it cold, they have a search term. They google "Anderson Kanban Intangible threshold" and land on chapter 7 of Kanban: Successful Evolutionary Change and understand in two minutes why the rule exists. Without that citation, the test name says "rejects empty threshold" and the maintainer's first instinct is "looks like an over-eager validation, let me delete it".
Choice 2 — The error message contents are tested. /Anderson/.test(e.message) requires the validator's error message to also cite Anderson. The user who hits the error in the wizard sees something like threshold required (Anderson rule on intangible class of service). They can google it too. This is a tiny piece of UX armoured by a test — the message can't drift to 'invalid' or 'wrong' or any other UX-hostile shorthand without breaking the build.
Choice 3 — The fixture sets threshold: '' explicitly, not undefined. This catches the validator typo where someone wrote if (!stmt.threshold) instead of if (stmt.threshold === undefined). Empty string is falsy; if the test only used undefined, both implementations would pass and the production code would silently accept threshold: '' as valid.
Choice 4 — The positive control ('when NPS > 70') doesn't just check r.ok === true, it checks specifically that statement.threshold isn't in the errors list. Why? Because some other unrelated rule could be failing on this fixture, and r.ok === false would hide a regression in the threshold check itself. The test isolates the one rule under test from background noise.
The failure mode I would have shipped without these tests: someone refactors the validator into a switch statement, accidentally drops the Intangible branch, all the other tests still pass (because they don't touch Intangible), and from that day forward every Intangible Requirement passes validation with no threshold. The Kanban board fills with strategic intent. The team stops pulling Intangible work. The discipline that the Style was supposed to enforce is now silently absent, and there's no way to know without re-reading the validator source.
The test prevents exactly that. It costs eleven lines.
Worked failure mode #2 — The safety-function chain
The Industrial example is structurally more interesting because it isn't one rule, it's a chain of three cooperating rules, and skipping the test on any one of them breaks the whole chain.
The three rules:
- A
Safetyrequirement must haverisk.level ∈ {SIL1, SIL2, SIL3, SIL4}. NeverNonSIL. - A
safety-functionstatement must haveverificationMethod ∈ {SILProofTest, SILValidation, Certification}. NeverUnitTest. - A
Regulatoryrequirement must have a typedsource ∈ {iec-standard, iso-standard, isa-standard, regulation}. Neverstakeholder.
Each rule on its own would let you ship a malformed Safety requirement. Together they form a small provenance system: the requirement is rated (rule 1), the rating has a verification path (rule 2), and the regulation it derives from is cited (rule 3). Remove any one and the chain breaks; the requirement still looks complete to a casual reader, but the audit trail has a hole.
Here's the test for rule 1 (industrial-style.test.ts:47):
@Verifies<RequirementCommandsFeature>('projectCanDefineCustomRiskTaxonomy')
'Safety requirement with NonSIL risk is rejected, SIL2 is accepted'() {
const bad = INDUSTRIAL_VALIDATORS.validateSpec(
baseSafetySpec({ risk: { level: 'NonSIL', ifNotMet: 'x' } }),
);
expect(bad.ok).toBe(false);
expect(bad.errors.some(e => e.path === 'risk.level')).toBe(true);
// Also: risk missing entirely
const missing = INDUSTRIAL_VALIDATORS.validateSpec(baseSafetySpec({ risk: undefined }));
expect(missing.ok).toBe(false);
expect(missing.errors.some(e => e.path === 'risk.level')).toBe(true);
const good = INDUSTRIAL_VALIDATORS.validateSpec(baseSafetySpec());
expect(good.ok, JSON.stringify(good)).toBe(true);
}@Verifies<RequirementCommandsFeature>('projectCanDefineCustomRiskTaxonomy')
'Safety requirement with NonSIL risk is rejected, SIL2 is accepted'() {
const bad = INDUSTRIAL_VALIDATORS.validateSpec(
baseSafetySpec({ risk: { level: 'NonSIL', ifNotMet: 'x' } }),
);
expect(bad.ok).toBe(false);
expect(bad.errors.some(e => e.path === 'risk.level')).toBe(true);
// Also: risk missing entirely
const missing = INDUSTRIAL_VALIDATORS.validateSpec(baseSafetySpec({ risk: undefined }));
expect(missing.ok).toBe(false);
expect(missing.errors.some(e => e.path === 'risk.level')).toBe(true);
const good = INDUSTRIAL_VALIDATORS.validateSpec(baseSafetySpec());
expect(good.ok, JSON.stringify(good)).toBe(true);
}Three sub-cases packed into one method:
- NonSIL is rejected — the literal rule.
- Missing
riskentirely is rejected — the corner case where someone forgets the field. This catches therisk?.levelvsrisk.leveltypo silently lettingundefinedthrough. - A valid SIL2 baseline is accepted — the positive control.
The JSON.stringify(good) in the assertion message is a small detail that pays off the day the test fails: you immediately see which error caused the failure, not just expected true got false. Optional-chaining-style typos in the validator are exactly the kind of bug that produces a silent-pass on production data and a green CI; the missing-field case is what catches them.
Rule 2 is similar in shape, with one notable twist — the enumerative-positive loop:
for (const vm of ['SILProofTest', 'SILValidation', 'Certification']) {
const good = INDUSTRIAL_VALIDATORS.validateSpec(baseSafetySpec({ verificationMethod: vm }));
expect(good.ok, `${vm}: ${JSON.stringify(good)}`).toBe(true);
}for (const vm of ['SILProofTest', 'SILValidation', 'Certification']) {
const good = INDUSTRIAL_VALIDATORS.validateSpec(baseSafetySpec({ verificationMethod: vm }));
expect(good.ok, `${vm}: ${JSON.stringify(good)}`).toBe(true);
}If a future maintainer "simplifies" the allowed list to just SILProofTest, this loop fails specifically on SILValidation and Certification with a clear message naming each one. Without enumerating, a single expect(good.ok).toBe(true) on SILProofTest would pass and the regression on the other two methods would ship silently.
Rule 3 (industrial-style.test.ts:84) follows the same triplet pattern: rejected wrong type, rejected missing, enumerated accepts.
The chain matters because the three rules share a vocabulary. If someone renames iec-standard to iec in the vocabulary but forgets to update rule 3, then every Regulatory requirement with source.type: 'iec-standard' is suddenly rejected, and the migration is invisible until somebody tries to add a new requirement and the wizard rejects it. The enumerative-positive loop on rule 3 catches this on the next CI run — not six weeks later when someone notices.
The chain is the system. Each rule is one test (with sub-cases). Skipping any one of them means a Safety requirement can ship that looks correct and is structurally hollow.
Worked failure mode #3 — Standard CoS and flow-driven sources
The third example is qualitatively different because it isn't about safety or compliance — it's about methodological purity. It encodes the rule that distinguishes Kanban from "Scrum with a board".
In Kanban's classes-of-service model, the Standard class is the default lane: the work that flows through the system at the system's natural cadence. The discipline rule: a Standard requirement must come from a flow signal, not from an ad-hoc stakeholder ask. The legal source types are flow-metric, replenishment, backlog-refinement, cycle-time-analysis, cumulative-flow-diagram. Forbidden: stakeholder, ops-incident, executive-request.
Why? Because the moment Standard work starts flowing in from "the VP needs this", you've stopped doing Kanban — you're doing push-based delivery with WIP labels. The whole point of Kanban is that work is pulled based on observed flow signals, not pushed based on positional authority. The validator rule is the encoded version of David J. Anderson's distinction between commitment (a flow event the team accepted) and demand (a request someone made to the team).
The test (kanban-style.test.ts:220):
@Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion')
'validateSpec: Standard without flow-driven source is rejected'() {
const r = KANBAN_VALIDATORS.validateSpec({
kind: 'requirement', requirementKind: 'Standard', status: 'Backlog',
statement: fullyFilled('service-request'),
source: { type: 'ops-incident' }, // not a flow source
});
expect(r.errors.some(e => e.path === 'source.type')).toBe(true);
}
@Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion')
'validateSpec: Standard with no source at all is rejected'() {
const r = KANBAN_VALIDATORS.validateSpec({
kind: 'requirement', requirementKind: 'Standard', status: 'Backlog',
statement: fullyFilled('service-request'),
});
expect(r.errors.some(e => e.path === 'source.type')).toBe(true);
}
@Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion')
'validateSpec: Standard with flow-metric source accepts'() {
const r = KANBAN_VALIDATORS.validateSpec({
kind: 'requirement', requirementKind: 'Standard', status: 'Backlog',
statement: fullyFilled('service-request'),
source: { type: 'flow-metric' },
});
expect(r.errors.some(e => e.path === 'source.type')).toBe(false);
expect(r.ok).toBe(true);
}@Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion')
'validateSpec: Standard without flow-driven source is rejected'() {
const r = KANBAN_VALIDATORS.validateSpec({
kind: 'requirement', requirementKind: 'Standard', status: 'Backlog',
statement: fullyFilled('service-request'),
source: { type: 'ops-incident' }, // not a flow source
});
expect(r.errors.some(e => e.path === 'source.type')).toBe(true);
}
@Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion')
'validateSpec: Standard with no source at all is rejected'() {
const r = KANBAN_VALIDATORS.validateSpec({
kind: 'requirement', requirementKind: 'Standard', status: 'Backlog',
statement: fullyFilled('service-request'),
});
expect(r.errors.some(e => e.path === 'source.type')).toBe(true);
}
@Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion')
'validateSpec: Standard with flow-metric source accepts'() {
const r = KANBAN_VALIDATORS.validateSpec({
kind: 'requirement', requirementKind: 'Standard', status: 'Backlog',
statement: fullyFilled('service-request'),
source: { type: 'flow-metric' },
});
expect(r.errors.some(e => e.path === 'source.type')).toBe(false);
expect(r.ok).toBe(true);
}Same triplet pattern: rejected wrong source, rejected missing source, accepted flow-driven source.
What makes this rule interesting — and what makes the test for it interesting — is that the failure mode is invisible at the requirement level. A Safety requirement with risk: NonSIL is obviously wrong if you know IEC 61511; an outsider can spot it. A Standard requirement with source: { type: 'ops-incident' } looks completely fine to anyone who isn't a Kanban practitioner. The validator rule is the only thing that protects the methodological invariant.
This is also where the asymmetric pair of assertions in the third test really earns its keep. Look at the last two lines:
expect(r.errors.some(e => e.path === 'source.type')).toBe(false);
expect(r.ok).toBe(true);expect(r.errors.some(e => e.path === 'source.type')).toBe(false);
expect(r.ok).toBe(true);The first assertion checks that the source-type rule didn't fire. The second checks that nothing else fired either. Without the second assertion, the test could be passing with r.ok === false because some unrelated rule (e.g. a typo in the service-request pattern's slot list) is now failing on every Standard test. The catch-all r.ok === true is the canary; the targeted path check is the diagnostic.
The failure mode I would have shipped without this test: someone adds ops-incident to the legal source list "because operations needs to file work too", the discipline rule loses its meaning, and within a quarter the team is doing Scrum-with-WIP under a Kanban label. The test prevents the silent broadening of the source enum — and the test name ('Standard without flow-driven source is rejected') is the rule statement that explains why the broadening is wrong, in language a future maintainer will understand without needing to read Anderson.
Three failure modes, three different rule families: process discipline (Anderson), regulatory compliance (IEC 61511), methodological purity (flow-driven sources). All three encoded as validator branches. All three protected by tests that are also their own documentation.
The reporter as emitter — negative-data-as-positive-test
So far I've been talking about the validator. The reporter is the other half of the unsoundness story, and it has a different shape of failure mode.
A validator's failure is false negative: an invalid spec passes. A reporter's failure is silent corruption: a valid spec renders as [object Object], or an empty string, or with a slot value missing. The validator failure is silent because the validator never spoke. The reporter failure is silent because the reporter spoke but lied — the markdown looks plausible until you look closely.
The Kanban reporter test for the ticket card (kanban-style.test.ts:344) is the most interesting test in the package, because it's structured around deliberately malformed input — every fixture entry is wrong in a different way, and each one has a comment naming the defensive branch in the reporter that it exercises.
Here's the fixture in full:
const spec = {
kind: 'requirement',
id: 'REQ-FD',
title: 'Regulator deadline',
requirementKind: 'FixedDate',
status: 'Analysis',
priority: 'High',
verificationMethod: 'SlaMet',
statement: { pattern: 'fixed-date-commitment', deadline: '2026-06-30', deliverable: 'report', penalty: 'fine' },
fitCriteria: [
{ kind: 'unit-test', description: 'schema validates', status: 'passed' },
{ kind: 'metric', description: 'filed on time', status: 'pending' },
{ kind: 'orphan' }, // exercises description ?? kind fallback
{ }, // exercises '(unnamed criterion)' branch
],
risk: { level: 'DelayHarmFixedDate' },
statusHistory: [
{ to: 'Backlog', at: '2026-01-01' },
{ state: 'Selected' }, // exercises e.state fallback, missing at
{ }, // exercises '?' fallback
],
};const spec = {
kind: 'requirement',
id: 'REQ-FD',
title: 'Regulator deadline',
requirementKind: 'FixedDate',
status: 'Analysis',
priority: 'High',
verificationMethod: 'SlaMet',
statement: { pattern: 'fixed-date-commitment', deadline: '2026-06-30', deliverable: 'report', penalty: 'fine' },
fitCriteria: [
{ kind: 'unit-test', description: 'schema validates', status: 'passed' },
{ kind: 'metric', description: 'filed on time', status: 'pending' },
{ kind: 'orphan' }, // exercises description ?? kind fallback
{ }, // exercises '(unnamed criterion)' branch
],
risk: { level: 'DelayHarmFixedDate' },
statusHistory: [
{ to: 'Backlog', at: '2026-01-01' },
{ state: 'Selected' }, // exercises e.state fallback, missing at
{ }, // exercises '?' fallback
],
};Read the comments. Each fitCriterion is a different shape of malformation:
{ kind: 'unit-test', description: 'schema validates', status: 'passed' }— well-formed positive control.{ kind: 'metric', description: 'filed on time', status: 'pending' }— well-formed pending control.{ kind: 'orphan' }— nodescription. The reporter's code path iscriterion.description ?? criterion.kind. Without this fixture, that??is uncovered; the next refactor could turn it intocriterion.description ?? '(no description)'and silently lose thekindfallback for every criterion in the wild missing a description.{ }— nokindeither. The reporter's code path iscriterion.description ?? criterion.kind ?? '(unnamed criterion)'. Without this fixture, the final?? '(unnamed criterion)'is uncovered; the reporter could silently emit- [ ] undefinedand look fine in single-checkbox cases.
Same pattern for statusHistory:
{ to: 'Backlog', at: '2026-01-01' }— well-formed.{ state: 'Selected' }— uses an alternate field name (stateinstead ofto), missing theattimestamp. Exercises theentry.to ?? entry.statefallback and the missing-atbranch.{ }— exercises the ultimate'?'fallback.
Then the assertions, every one of them anchored to one of those branches:
const md = KANBAN_REPORTER.renderMarkdown(spec);
expect(md).toContain('# [FixedDate] REQ-FD — Regulator deadline');
expect(md).toContain('**Due**: 2026-06-30');
expect(md).toContain('**Penalty if missed**: fine');
expect(md).toContain('## Statement');
expect(md).toContain('## Done when');
expect(md).toContain('- [x] schema validates'); // kind: 'unit-test', status: 'passed'
expect(md).toContain('- [ ] filed on time'); // status: 'pending'
expect(md).toContain('- [ ] orphan'); // description ?? kind
expect(md).toContain('- [ ] (unnamed criterion)'); // description ?? kind ?? '(unnamed criterion)'
expect(md).toContain('**Delay profile**: DelayHarmFixedDate');
expect(md).toContain('_history_:');
expect(md).toContain('Backlog@2026-01-01'); // to + at, normal
expect(md).toContain('Selected'); // to ?? state
expect(md).toContain('?'); // ultimate fallbackconst md = KANBAN_REPORTER.renderMarkdown(spec);
expect(md).toContain('# [FixedDate] REQ-FD — Regulator deadline');
expect(md).toContain('**Due**: 2026-06-30');
expect(md).toContain('**Penalty if missed**: fine');
expect(md).toContain('## Statement');
expect(md).toContain('## Done when');
expect(md).toContain('- [x] schema validates'); // kind: 'unit-test', status: 'passed'
expect(md).toContain('- [ ] filed on time'); // status: 'pending'
expect(md).toContain('- [ ] orphan'); // description ?? kind
expect(md).toContain('- [ ] (unnamed criterion)'); // description ?? kind ?? '(unnamed criterion)'
expect(md).toContain('**Delay profile**: DelayHarmFixedDate');
expect(md).toContain('_history_:');
expect(md).toContain('Backlog@2026-01-01'); // to + at, normal
expect(md).toContain('Selected'); // to ?? state
expect(md).toContain('?'); // ultimate fallbackThis is the negative-data-as-positive-test technique. The test isn't checking that good input renders correctly — other tests do that. It's checking that degenerate input doesn't crash and produces sensible-looking output. Reporter code is full of ?? '…' defaults, and without tests like this, those defaults silently rot. A maintainer reads the reporter, sees criterion.description ?? criterion.kind, thinks "the kind fallback is never hit in practice, let me simplify to just criterion.description ?? '(none)'", and ships. Six months later somebody files a Requirement with no description, every criterion in their report says (none) instead of citing the criterion's kind, and the audit reads like nobody knew what they were measuring.
There's a deeper point here. The traditional way to catch this kind of bug is mutation testing: take the production code, mutate every ?? operator (a ?? b → a, then a ?? b → b), re-run the test suite, see which mutants survive. A surviving mutant is an uncovered branch. Mutation testing is rigorous but expensive — it's a 100x runtime cost and complex tooling.
The negative-data-as-positive-test approach gives you most of the value of mutation testing for free: by deliberately constructing fixtures that exercise each defensive branch, you're effectively pre-mutating the input space. If a ?? branch is reachable by any reasonable input, your fixture finds it; if it isn't, the fact that no fixture can reach it is itself information (the branch is dead code, delete it).
The cost: one well-commented fixture per reporter function, with as many entries as the reporter has defensive branches. The Kanban ticket card test takes ~40 lines and pushes the reporter to 100 % branch coverage. That's the trade.
The single test method covers four branches × two output assertions each. Eight assertions. One fixture. That's the density you want in reporter tests.
Templates × vocabulary — the cross-surface bug
Templates are the smallest of the four surfaces, and the one I gave the least attention in the table above. There's a reason: the interesting template bug isn't a template bug at all — it's a cross-surface coupling bug between templates and vocabulary, and neither template tests nor vocabulary tests catch it alone.
Here's the bug. A template's skeleton typically references a statementPattern by string id:
{
id: 'expedite-blocker',
label: 'Expedite (blocking incident)',
description: 'A pull-class request triggered by an active blocker on another flow.',
skeleton: {
requirementKind: 'Expedite',
statementPattern: 'expedite-pull', // ← string reference into vocabulary.statementPatterns
priority: 'Critical',
defaultFitCriterionKinds: ['unit-test', 'incident-ticket'],
},
},{
id: 'expedite-blocker',
label: 'Expedite (blocking incident)',
description: 'A pull-class request triggered by an active blocker on another flow.',
skeleton: {
requirementKind: 'Expedite',
statementPattern: 'expedite-pull', // ← string reference into vocabulary.statementPatterns
priority: 'Critical',
defaultFitCriterionKinds: ['unit-test', 'incident-ticket'],
},
},The string 'expedite-pull' is a reference into vocabulary.statementPatterns. If someone renames expedite-pull to expedite-pull-v2 in the vocabulary and forgets to update this template, then:
- Vocabulary tests still pass — every state in the workflow references a declared state, every matrix cell references a declared level, every pattern declares slots.
- Template tests still pass —
findTemplate(t.id) === tfor every declared template id. - Validator tests still pass — every pattern in the vocabulary is exercised by
fullyFilled. - Reporter tests still pass — every pattern is rendered.
The bug only surfaces at wizard runtime: the user runs requirement new --template expedite-blocker, the wizard pre-fills statementPattern: 'expedite-pull', then prompts for the slots of that pattern — and finds no pattern with that id in the vocabulary. The wizard either crashes or prompts for an empty slot list and submits a malformed statement that the validator then rejects. Either way, the template is broken on first use and nobody noticed.
The cross-surface test that closes the gap:
@Verifies<RequirementCommandsFeature>('requirementNewWizardCollectsAllFields')
'every template skeleton uses a declared statementPattern'() {
const patterns = new Set(KANBAN_VOCABULARY.statementPatterns.map(p => p.pattern));
for (const t of KANBAN_TEMPLATES.templates) {
if (t.skeleton.statementPattern) {
expect(patterns.has(t.skeleton.statementPattern), `template ${t.id}`).toBe(true);
}
}
}
@Verifies<RequirementCommandsFeature>('requirementNewWizardCollectsAllFields')
'every template skeleton uses declared kind / status / verificationMethod'() {
const kinds = new Set(KANBAN_VOCABULARY.requirementKinds);
const statuses = new Set(KANBAN_VOCABULARY.statusWorkflow.states);
const vms = new Set(KANBAN_VOCABULARY.verificationMethods);
for (const t of KANBAN_TEMPLATES.templates) {
if (t.skeleton.requirementKind) expect(kinds.has(t.skeleton.requirementKind), t.id).toBe(true);
if (t.skeleton.status) expect(statuses.has(t.skeleton.status), t.id).toBe(true);
if (t.skeleton.verificationMethod) expect(vms.has(t.skeleton.verificationMethod), t.id).toBe(true);
}
}@Verifies<RequirementCommandsFeature>('requirementNewWizardCollectsAllFields')
'every template skeleton uses a declared statementPattern'() {
const patterns = new Set(KANBAN_VOCABULARY.statementPatterns.map(p => p.pattern));
for (const t of KANBAN_TEMPLATES.templates) {
if (t.skeleton.statementPattern) {
expect(patterns.has(t.skeleton.statementPattern), `template ${t.id}`).toBe(true);
}
}
}
@Verifies<RequirementCommandsFeature>('requirementNewWizardCollectsAllFields')
'every template skeleton uses declared kind / status / verificationMethod'() {
const kinds = new Set(KANBAN_VOCABULARY.requirementKinds);
const statuses = new Set(KANBAN_VOCABULARY.statusWorkflow.states);
const vms = new Set(KANBAN_VOCABULARY.verificationMethods);
for (const t of KANBAN_TEMPLATES.templates) {
if (t.skeleton.requirementKind) expect(kinds.has(t.skeleton.requirementKind), t.id).toBe(true);
if (t.skeleton.status) expect(statuses.has(t.skeleton.status), t.id).toBe(true);
if (t.skeleton.verificationMethod) expect(vms.has(t.skeleton.verificationMethod), t.id).toBe(true);
}
}Two tests, both walking the templates list and asserting that every string reference into the vocabulary points to a declared value. Add a template, the loops cover it. Rename a vocabulary value, the next CI run catches every template that referenced the old name.
This is the third example of a recurring shape in Style testing: the bug that requires testing the relationship between two surfaces, not either surface in isolation. The first two were in the validator chain (rule 2 references rule 1's output via risk.level; rule 3 references the source taxonomy). The template-vocabulary coupling is the cleanest example because it's purely structural — pure string-reference well-formedness, no runtime semantics.
If you publish a Style with these two cross-surface tests, the templates × vocabulary coupling is closed for life; nobody can break it without breaking the build. If you don't, the coupling fails silently every time the vocabulary is touched.
The compliance graph: tests as Feature spec
Now the part that was hardest to internalise. None of these tests use describe or it. Every one of them uses @FeatureTest + @Verifies:
@FeatureTest(RequirementCommandsFeature)
class IndustrialStyleTests {
@Verifies<RequirementCommandsFeature>('projectCanDefineCustomRiskTaxonomy')
'Safety requirement with NonSIL risk is rejected, SIL2 is accepted'() { /* … */ }
}@FeatureTest(RequirementCommandsFeature)
class IndustrialStyleTests {
@Verifies<RequirementCommandsFeature>('projectCanDefineCustomRiskTaxonomy')
'Safety requirement with NonSIL risk is rejected, SIL2 is accepted'() { /* … */ }
}This isn't a stylistic preference and it isn't an aesthetic. The decorators (src/decorators.ts) wire each test method into a graph. The graph has three node types:
- Features (e.g.
RequirementCommandsFeature) — abstract classes that declare a list of named acceptance criteria as method names. - AC anchors — the strings passed to
@Verifies<F>('ac-name'). Type-checked: you can't anchor to an AC the Feature doesn't declare. - Tests — the methods themselves.
compliance --strict walks the graph and fails CI on either of two conditions:
- Orphan AC: a Feature declares an AC that no test verifies. The Feature claims to do X, no test proves X.
- Orphan test: a
describe/ittest that doesn't anchor to any AC. The test exists in the void; whatever it verifies isn't part of any Feature's contract.
The implication for Style testing is sharp: if I write a test for the Anderson rule using it('rejects empty threshold', …), it runs and goes green. The Anderson rule still gets enforced at runtime. But the Feature that owns the AC requirementNewRequiresNonEmptyFitCriterion has no test attached to that AC — the compliance report shows it as un-verified, CI fails, and I'm forced to either delete the AC (lying about Feature scope) or re-anchor the test (the right move).
This is what makes the test names readable as English sentences:
'Safety requirement with NonSIL risk is rejected, SIL2 is accepted'
'safety-function statement requires SIL verification method'
'Regulatory requirement requires iec/iso/isa/regulation source'
'validateSpec: Intangible with empty threshold is rejected (Anderson rule)'
'validateSpec: Standard without flow-driven source is rejected''Safety requirement with NonSIL risk is rejected, SIL2 is accepted'
'safety-function statement requires SIL verification method'
'Regulatory requirement requires iec/iso/isa/regulation source'
'validateSpec: Intangible with empty threshold is rejected (Anderson rule)'
'validateSpec: Standard without flow-driven source is rejected'These aren't test descriptions. They're rule statements — IEC 61511 and Anderson made executable.
What a Feature actually looks like
Here's where the abstraction bottoms out. A Feature is an abstract class whose method names are the acceptance criteria:
import { Feature } from '@frenchexdev/requirements';
export abstract class RequirementCommandsFeature extends Feature {
readonly id = 'FEAT-REQ-CMD';
readonly title = 'Requirement Commands';
readonly description = 'CLI commands for creating, listing, validating, and reporting on Requirements.';
// Each abstract method IS an acceptance criterion. The method name is the AC id.
abstract requirementNewPromptsEarsPattern(): void;
abstract requirementNewRequiresNonEmptyFitCriterion(): void;
abstract requirementNewWizardCollectsAllFields(): void;
abstract requirementNewLoopsChildFeaturesUntilEnd(): void;
abstract requirementListShowsCoveragePerRequirement(): void;
abstract requirementShowReportUsesInverseTerms(): void;
abstract projectCanDefineCustomStatusWorkflow(): void;
abstract projectCanDefineCustomRiskTaxonomy(): void;
abstract fromFeatureFlagPreFillsFromExistingFeature(): void;
abstract requirementSyncReusesDiffCore(): void;
// … 14 more …
}import { Feature } from '@frenchexdev/requirements';
export abstract class RequirementCommandsFeature extends Feature {
readonly id = 'FEAT-REQ-CMD';
readonly title = 'Requirement Commands';
readonly description = 'CLI commands for creating, listing, validating, and reporting on Requirements.';
// Each abstract method IS an acceptance criterion. The method name is the AC id.
abstract requirementNewPromptsEarsPattern(): void;
abstract requirementNewRequiresNonEmptyFitCriterion(): void;
abstract requirementNewWizardCollectsAllFields(): void;
abstract requirementNewLoopsChildFeaturesUntilEnd(): void;
abstract requirementListShowsCoveragePerRequirement(): void;
abstract requirementShowReportUsesInverseTerms(): void;
abstract projectCanDefineCustomStatusWorkflow(): void;
abstract projectCanDefineCustomRiskTaxonomy(): void;
abstract fromFeatureFlagPreFillsFromExistingFeature(): void;
abstract requirementSyncReusesDiffCore(): void;
// … 14 more …
}Three structural choices doing real work:
- Abstract methods, not concrete ones. The Feature can never be instantiated. It exists purely as a type, a name, and a list of AC method names. Concrete subclasses don't exist; the Feature's only consumers are the decorators that read its method names off the prototype.
- Method names ARE the acceptance criteria. There's no separate
acceptanceCriteria: string[]array. The method names — accessed via TypeScript'skeyof— are what@Verifies<F>('name')validates against. Add an abstract method, you've added an AC; remove one, the corresponding@Verifiescalls fail to type-check. - Method bodies are absent (abstract). The Feature describes what the system must do, not how. The "how" lives in the implementation; the "verification" lives in the tests. A Feature with concrete method bodies would conflate three things that should stay separate.
The decorator @Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion') works by reading the type parameter at compile time and constraining the string argument to the union of the Feature's abstract method names. TypeScript will refuse to compile if you pass an unknown name. You literally cannot anchor a test to an AC that doesn't exist.
This is also why Features must be abstract class, not interface. Decorators need a runtime reference (the class identifier) to register the test against; an interface evaporates at compile time. The two-layer structure — abstract class for runtime identity + abstract methods for AC names — is what makes the compliance graph work.
What the green compliance report looks like
Here's a fragment of what compliance --strict renders for RequirementCommandsFeature, edited for length:
═══ RequirementCommandsFeature ════════════════════════════════════════════
24 acceptance criteria, 24 verified, 0 orphan ✓
┌─ projectCanDefineCustomRiskTaxonomy
│ Verified by:
│ ✓ IndustrialStyleTests
│ 'Safety requirement with NonSIL risk is rejected, SIL2 is accepted'
│ ✓ KanbanStyleTests
│ 'riskTaxonomy cells reference declared levels'
│
├─ requirementNewPromptsEarsPattern
│ Verified by:
│ ✓ IndustrialStyleTests
│ 'safety-function statement requires SIL verification method'
│ ✓ KanbanStyleTests
│ 'validateStatement accepts fully-filled service-request'
│ 'validateStatement accepts fully-filled fixed-date-commitment'
│ 'validateStatement accepts fully-filled expedite-pull'
│ 'validateStatement accepts fully-filled intangible-investment'
│ … (4 more)
│
├─ requirementNewRequiresNonEmptyFitCriterion
│ Verified by:
│ ✓ KanbanStyleTests
│ 'validateSpec: Intangible with empty threshold is rejected (Anderson rule)'
│ 'validateSpec: Standard without flow-driven source is rejected'
│ 'validateSpec: Expedite with empty cod is rejected'
│ 'validateSpec: FixedDate with empty deadline is rejected'
│ … (12 more)
│
├─ projectCanDefineCustomStatusWorkflow
│ Verified by:
│ ✓ KanbanStyleTests
│ 'status workflow declares Testing→Development and Ready→Development rework loops'
│ ✓ IndustrialStyleTests
│ 'Regulatory requirement requires iec/iso/isa/regulation source'
│
…
═══ Total ═════════════════════════════════════════════════════════════════
12 Features, 187 ACs, 187 verified, 0 orphan ACs, 0 orphan tests ✓═══ RequirementCommandsFeature ════════════════════════════════════════════
24 acceptance criteria, 24 verified, 0 orphan ✓
┌─ projectCanDefineCustomRiskTaxonomy
│ Verified by:
│ ✓ IndustrialStyleTests
│ 'Safety requirement with NonSIL risk is rejected, SIL2 is accepted'
│ ✓ KanbanStyleTests
│ 'riskTaxonomy cells reference declared levels'
│
├─ requirementNewPromptsEarsPattern
│ Verified by:
│ ✓ IndustrialStyleTests
│ 'safety-function statement requires SIL verification method'
│ ✓ KanbanStyleTests
│ 'validateStatement accepts fully-filled service-request'
│ 'validateStatement accepts fully-filled fixed-date-commitment'
│ 'validateStatement accepts fully-filled expedite-pull'
│ 'validateStatement accepts fully-filled intangible-investment'
│ … (4 more)
│
├─ requirementNewRequiresNonEmptyFitCriterion
│ Verified by:
│ ✓ KanbanStyleTests
│ 'validateSpec: Intangible with empty threshold is rejected (Anderson rule)'
│ 'validateSpec: Standard without flow-driven source is rejected'
│ 'validateSpec: Expedite with empty cod is rejected'
│ 'validateSpec: FixedDate with empty deadline is rejected'
│ … (12 more)
│
├─ projectCanDefineCustomStatusWorkflow
│ Verified by:
│ ✓ KanbanStyleTests
│ 'status workflow declares Testing→Development and Ready→Development rework loops'
│ ✓ IndustrialStyleTests
│ 'Regulatory requirement requires iec/iso/isa/regulation source'
│
…
═══ Total ═════════════════════════════════════════════════════════════════
12 Features, 187 ACs, 187 verified, 0 orphan ACs, 0 orphan tests ✓What the red compliance report looks like
This is the part that matters in practice, because it's what you see when you've broken the contract. Two failure shapes:
Shape 1 — orphan AC (someone added an AC method to a Feature but no test verifies it):
═══ RequirementCommandsFeature ════════════════════════════════════════════
24 acceptance criteria, 23 verified, 1 orphan ✗
┌─ requirementShowExportsToPdf
│ ⚠ ORPHAN AC — no test anchors to this acceptance criterion
│ Declared at: requirements/features/requirement-commands.ts:47
│ Suggested fixes:
│ 1. Write a test method @Verifies<RequirementCommandsFeature>('requirementShowExportsToPdf')
│ 2. Or: remove the abstract method if the AC is no longer in scope
│
…
═══ Errors ════════════════════════════════════════════════════════════════
1 orphan AC across 1 Feature
Run with --explain to see traceability graph for each orphan
CI failed: orphan ACs must be either verified or removed.═══ RequirementCommandsFeature ════════════════════════════════════════════
24 acceptance criteria, 23 verified, 1 orphan ✗
┌─ requirementShowExportsToPdf
│ ⚠ ORPHAN AC — no test anchors to this acceptance criterion
│ Declared at: requirements/features/requirement-commands.ts:47
│ Suggested fixes:
│ 1. Write a test method @Verifies<RequirementCommandsFeature>('requirementShowExportsToPdf')
│ 2. Or: remove the abstract method if the AC is no longer in scope
│
…
═══ Errors ════════════════════════════════════════════════════════════════
1 orphan AC across 1 Feature
Run with --explain to see traceability graph for each orphan
CI failed: orphan ACs must be either verified or removed.Shape 2 — orphan test (someone wrote a describe/it test that doesn't anchor):
═══ Test discovery ════════════════════════════════════════════════════════
⚠ ORPHAN TESTS — 3 tests not anchored to any Feature AC
test/unit/some-other-thing.test.ts
describe('quick check', () => { it('works', …) })
⚠ 'quick check > works' — no @FeatureTest / @Verifies anchor
test/unit/wip.test.ts
it('debug helper', …)
⚠ 'debug helper' — no @FeatureTest / @Verifies anchor
═══ Errors ════════════════════════════════════════════════════════════════
3 orphan tests detected
compliance --strict requires every test to anchor to a Feature AC
CI failed: rewrite tests to use @FeatureTest + @Verifies, or move them
to a documented test/scratch/ directory excluded from compliance.═══ Test discovery ════════════════════════════════════════════════════════
⚠ ORPHAN TESTS — 3 tests not anchored to any Feature AC
test/unit/some-other-thing.test.ts
describe('quick check', () => { it('works', …) })
⚠ 'quick check > works' — no @FeatureTest / @Verifies anchor
test/unit/wip.test.ts
it('debug helper', …)
⚠ 'debug helper' — no @FeatureTest / @Verifies anchor
═══ Errors ════════════════════════════════════════════════════════════════
3 orphan tests detected
compliance --strict requires every test to anchor to a Feature AC
CI failed: rewrite tests to use @FeatureTest + @Verifies, or move them
to a documented test/scratch/ directory excluded from compliance.The red output is the dual of the green: every line is named, the file path is given, and the suggested fix is concrete. This is what makes the gate operationally usable rather than just structurally correct — when you break it, the report tells you exactly what to do.
The dog-fooding recursion (the Meta angle)
There's a recursion in this system that took me a while to fully see. It has three levels.
Level 1 — The package's own tests use the package's own decorators. @FeatureTest and @Verifies are exported from src/decorators.ts. The tests in test/unit/ import them from there and use them. If I break the decorator system — change the registration semantics, mistype a generic constraint, accidentally make @Verifies a no-op — the test suite stops compiling. There's no escape hatch where I can write a quick it('this still works', …) to verify the decorator is broken; the verification mechanism is the broken thing.
Level 2 — The package's own requirements files describe the package itself, using the package's own vocabulary. Under packages/requirements/requirements/, there are TypeScript files describing RequirementCommandsFeature, ComplianceReportFeature, TraceabilityGraphFeature and so on — using the very Requirement<DefaultStyleType> and Feature base classes the package exports. So the package's specifications of its own features are written in the language the package implements.
Level 3 — The Style tests verify a Style that's used to spec the very Features the tests anchor to. When IndustrialStyleTests @Verifies an AC of RequirementCommandsFeature, it's claiming "this test for the Industrial Style proves a property of the Requirement Commands Feature". And RequirementCommandsFeature is itself a Feature written using the package's own DSL. So the test for the Style verifies a Feature that's specified in the very system the Style is part of.
This is dog-fooding pushed to the point of being a load-bearing structural property. There is no parallel universe where I "first build the system, then write tests in a separate framework, then describe the features in a separate document". Everything is built using everything else. Three benefits, one cost.
Benefit 1 — drift is structurally impossible. A spec document that drifts away from the code is the bane of every requirements engineering effort I've ever been near. Specs grow stale because the cost of updating them is uncoupled from the cost of changing the code. In this package, the spec is in TypeScript files that import from the same module the code does, so any rename, any signature change, any vocabulary update breaks the spec at compile time. The spec can't drift because it's the same kind of artefact as the code.
Benefit 2 — the test suite stress-tests the public API. Every test method I write exercises the package's public exports under realistic conditions. If the export shape is awkward, the tests are awkward to write — and I notice immediately, not when the first external user files a GitHub issue. A whole class of "this API is a mess" feedback gets caught by me writing my own tests against my own API.
Benefit 3 — the tests are executable documentation for the package's users. Anyone considering the package can read test/unit/industrial-style.test.ts and see exactly how to use the package, what the test pattern looks like, what AC anchoring looks like in practice. The test files are the most honest examples possible — they're what I actually use, not what I wrote for the README.
Cost — the bootstrap problem. The first version of @FeatureTest couldn't be tested using @FeatureTest, because @FeatureTest didn't exist yet. The first version of the package's own requirements files couldn't be written using the package's vocabulary, because the vocabulary wasn't shipped yet. So the first commits used describe/it and free-text comments, then I migrated as the abstractions stabilised. Bootstrapping a self-describing system is genuinely awkward; you have to write the tooling, then rewrite your tools' tests using the tooling, then rewrite your tools' specs using the tooling, then ship. There is no shortcut. (And once you've done it, the temptation to never go through that pain again is what holds the recursion together — you become extremely reluctant to accept any contribution that doesn't fit the dog-fooded shape, because every escape hatch undoes the structural drift-prevention.)
This is the deep reason the package is opinionated about @FeatureTest/@Verifies to the point of forbidding describe/it outright. The forbidding isn't aesthetic. It's that every describe/it test is a small puncture in the drift-prevention property, and enough small punctures and the property stops holding. The recursion is either complete or it isn't. There's no halfway.
Cross-Style consistency — the smoke contract
The package ships five Styles: DefaultStyle, IndustrialStyle, LeanStyle, AgileStyle, KanbanStyle. Each has its own dedicated test file with hundreds of assertions. But there's an invariant across all of them that no per-Style test can express: every Style satisfies the RequirementStyle interface uniformly.
This is the meta-test, and it lives in test/unit/styles-smoke.test.ts. Its job: iterate over every Style in the registry and assert the universals — every Style has a non-empty id, a semver version, a vocabulary with at least one of each shape, validators that handle the trivial cases, a reporter that doesn't crash on the simplest valid spec.
import { ALL_STYLES } from '../../src/styles/registry';
@FeatureTest(RequirementCommandsFeature)
class StylesSmokeTests {
@Verifies<RequirementCommandsFeature>('requirementSyncReusesDiffCore')
'every shipped Style has the four sub-interfaces wired through identity'() {
for (const style of ALL_STYLES) {
expect(style.id, `${style.id}: id`).toMatch(/^@?[\w-]+(\/[\w-]+)?$/);
expect(style.version, `${style.id}: version`).toMatch(/^\d+\.\d+\.\d+/);
expect(style.vocabulary, `${style.id}: vocabulary`).toBeDefined();
expect(style.validators, `${style.id}: validators`).toBeDefined();
expect(style.templates, `${style.id}: templates`).toBeDefined();
expect(style.reporter, `${style.id}: reporter`).toBeDefined();
}
}
@Verifies<RequirementCommandsFeature>('requirementNewWizardCollectsAllFields')
'every Style declares non-empty vocabulary lists'() {
for (const style of ALL_STYLES) {
const v = style.vocabulary;
expect(v.requirementKinds.length, `${style.id}: requirementKinds`).toBeGreaterThan(0);
expect(v.statusWorkflow.states.length,`${style.id}: states`).toBeGreaterThan(0);
expect(v.statementPatterns.length, `${style.id}: statementPatterns`).toBeGreaterThan(0);
expect(v.sourceKinds.length, `${style.id}: sourceKinds`).toBeGreaterThan(0);
expect(v.verificationMethods.length, `${style.id}: verificationMethods`).toBeGreaterThan(0);
}
}
@Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion')
'every Style validator rejects null and accepts a non-requirement spec'() {
for (const style of ALL_STYLES) {
expect(style.validators.validateSpec(null).ok, `${style.id}: null`).toBe(false);
expect(style.validators.validateSpec({ kind: 'feature' }).ok, `${style.id}: feature`).toBe(true);
}
}
}
void StylesSmokeTests;import { ALL_STYLES } from '../../src/styles/registry';
@FeatureTest(RequirementCommandsFeature)
class StylesSmokeTests {
@Verifies<RequirementCommandsFeature>('requirementSyncReusesDiffCore')
'every shipped Style has the four sub-interfaces wired through identity'() {
for (const style of ALL_STYLES) {
expect(style.id, `${style.id}: id`).toMatch(/^@?[\w-]+(\/[\w-]+)?$/);
expect(style.version, `${style.id}: version`).toMatch(/^\d+\.\d+\.\d+/);
expect(style.vocabulary, `${style.id}: vocabulary`).toBeDefined();
expect(style.validators, `${style.id}: validators`).toBeDefined();
expect(style.templates, `${style.id}: templates`).toBeDefined();
expect(style.reporter, `${style.id}: reporter`).toBeDefined();
}
}
@Verifies<RequirementCommandsFeature>('requirementNewWizardCollectsAllFields')
'every Style declares non-empty vocabulary lists'() {
for (const style of ALL_STYLES) {
const v = style.vocabulary;
expect(v.requirementKinds.length, `${style.id}: requirementKinds`).toBeGreaterThan(0);
expect(v.statusWorkflow.states.length,`${style.id}: states`).toBeGreaterThan(0);
expect(v.statementPatterns.length, `${style.id}: statementPatterns`).toBeGreaterThan(0);
expect(v.sourceKinds.length, `${style.id}: sourceKinds`).toBeGreaterThan(0);
expect(v.verificationMethods.length, `${style.id}: verificationMethods`).toBeGreaterThan(0);
}
}
@Verifies<RequirementCommandsFeature>('requirementNewRequiresNonEmptyFitCriterion')
'every Style validator rejects null and accepts a non-requirement spec'() {
for (const style of ALL_STYLES) {
expect(style.validators.validateSpec(null).ok, `${style.id}: null`).toBe(false);
expect(style.validators.validateSpec({ kind: 'feature' }).ok, `${style.id}: feature`).toBe(true);
}
}
}
void StylesSmokeTests;Three tests, one loop each, every Style covered. The assertion-message argument (`${style.id}: ...`) is the load-bearing detail: when the test fails on LeanStyle, the message says LeanStyle: requirementKinds: Expected length to be greater than 0 (received 0) — you know which Style is broken without having to bisect.
This is the test that catches a new Style that doesn't conform. If somebody adds a sixth Style with a missing verificationMethods list, this test fails on the next CI run with a clear message naming the new Style. Without it, the per-Style test suites would all pass (because each tests its own Style), and the broken Style would only fail at runtime when somebody first tried to use it.
The pattern generalises: any time you have N implementations of an interface, write a smoke test that loops over the N implementations and checks the K interface universals. The cost is one test file with K assertions. The benefit is that the (N+1)th implementation is automatically covered the moment it's added to the registry, and any drift from the interface contract fails immediately with a named diagnostic.
Why not just Zod?
The obvious question for a TypeScript reader: why not just use Zod? Zod is excellent, mature, idiomatic, and has shape validation with .refine() for custom rules. The package would be a third the size if it used Zod under the hood. So why not?
The honest answer comes in three parts.
Part 1 — Zod gives you shape validation. A Style needs cited deontic rules.
Here's what the Anderson rule looks like in Zod:
const IntangibleStatement = z.object({
pattern: z.literal('intangible-investment'),
work: z.string().min(1),
benefit: z.string().min(1),
threshold: z.string().min(1),
});
const Spec = z.discriminatedUnion('requirementKind', [
z.object({ requirementKind: z.literal('Intangible'), statement: IntangibleStatement }),
// …
]);const IntangibleStatement = z.object({
pattern: z.literal('intangible-investment'),
work: z.string().min(1),
benefit: z.string().min(1),
threshold: z.string().min(1),
});
const Spec = z.discriminatedUnion('requirementKind', [
z.object({ requirementKind: z.literal('Intangible'), statement: IntangibleStatement }),
// …
]);This works. It rejects an Intangible statement with empty threshold. But notice what's missing:
- No citation. The error message is whatever Zod's default message is — typically
String must contain at least 1 character(s) at "statement.threshold". The Anderson rule has been encoded but the user has no idea where it comes from. The error tells them that the field is required, not why. - No traceability. The validator runs, the error fires, the wizard rejects the input — but there's no link between the schema definition and any Feature acceptance criterion. If the rule is silently removed in a refactor, no compliance report will tell you. Zod's schema is a leaf — it's not part of a graph.
- No discoverability. A new user reading the schema sees
threshold: z.string().min(1)and has no way to know this is a non-trivial methodological rule with a name and a chapter in a book. It looks like an arbitrary required-field check. The next refactor will treat it like one.
The Style validator with the citation and the tested error message is doing something Zod fundamentally doesn't do: it's encoding the rule with its rationale and gating any future change to the rule on a test that proves the rationale survived.
Part 2 — Zod has no compliance graph.
When you run your Zod tests, you get green/red. When you run compliance --strict against the package, you get the rendered report shown above — every Feature, every AC, every test that verifies it, with cross-package linkages. Zod could plausibly be wrapped in a custom decorator system to produce something similar, but at that point you've rebuilt the AC graph in user-land and you're using Zod purely as the leaf-level shape checker. Which is fine — and indeed the package could be refactored to use Zod under the hood for the leaf-level shape checks while retaining the AC graph and the per-Style validator architecture on top. That refactor hasn't happened because the per-Style validators are pretty thin (~40 lines each excluding the reporter), and the cost of maintaining a Zod adapter layer for them isn't obviously cheaper than the current approach.
Part 3 — Zod's .refine() doesn't compose across files the way separate validators do.
The Industrial validator's three-rule chain (Safety→SIL, safety-function→SIL VM, Regulatory→source) is implemented as three separate if blocks in one function. In Zod, you'd write it as three .refine() calls on the discriminated union. That works, but the refinements live on the schema, which lives in industrial.ts, which means a third party who wants to add a fourth rule has to fork the schema. With the current architecture, the validator function is just a function — a third party can write a wrapping validator that runs the Industrial validator first, then their own additional rules, and compose them at the Style level. Zod doesn't naturally compose this way; refinements are baked into the schema definition.
The honest conclusion: Zod is the right tool for shape validation in a typical TypeScript project. The Style architecture is doing more than shape validation — it's encoding cited deontic rules with traceable test verification — and that "more" is exactly what makes the package useful for compliance work. If your needs stop at shape validation, use Zod. If they extend to "and prove the rule is enforced and tell me where the rule comes from and link it to a Feature", you need something Style-shaped. The package gives you that.
(There's a separate world of io-ts / runtypes / Yup comparisons that has the same shape: each library is excellent at what it does, none of them give you the AC graph or the cited error contract. The differentiation isn't "we built a better validator", it's "we built a different kind of artefact".)
Property-based testing for Styles
Example-based tests catch bugs you thought of when you wrote the test. Property-based tests catch bugs you didn't.
For Styles, two properties are particularly worth encoding. The package uses fast-check (property.test.ts for the cross-cutting properties; per-Style property tests are recommended additions).
Property 1 — Every fully-filled statement of every declared pattern is accepted.
@Verifies<RequirementCommandsFeature>('requirementNewPromptsEarsPattern')
'fast-check: any well-formed statement is accepted by validateStatement'() {
fc.assert(fc.property(
fc.constantFrom(...KANBAN_VOCABULARY.statementPatterns),
fc.string({ minLength: 1, maxLength: 50 }),
(schema, value) => {
const stmt: Record<string, string> = { pattern: schema.pattern };
for (const slot of schema.slots) if (slot.required) stmt[slot.name] = value;
return KANBAN_VALIDATORS.validateStatement(stmt).ok === true;
},
));
}@Verifies<RequirementCommandsFeature>('requirementNewPromptsEarsPattern')
'fast-check: any well-formed statement is accepted by validateStatement'() {
fc.assert(fc.property(
fc.constantFrom(...KANBAN_VOCABULARY.statementPatterns),
fc.string({ minLength: 1, maxLength: 50 }),
(schema, value) => {
const stmt: Record<string, string> = { pattern: schema.pattern };
for (const slot of schema.slots) if (slot.required) stmt[slot.name] = value;
return KANBAN_VALIDATORS.validateStatement(stmt).ok === true;
},
));
}What this catches: the day a new pattern is added to KANBAN_VOCABULARY.statementPatterns but the validator's switch statement doesn't have a case for it. Example tests using fullyFilled('service-request') only cover the patterns mentioned by name in the test file. The property test, by reading statementPatterns directly, automatically covers every new pattern from the moment it's added to the vocabulary. No "I forgot to add a test for the new pattern" bug can ever ship.
Property 2 — renderStatement is deterministic and contains every required slot value.
@Verifies<RequirementCommandsFeature>('requirementListShowsCoveragePerRequirement')
'fast-check: renderStatement is deterministic and slot-preserving'() {
fc.assert(fc.property(
fc.constantFrom(...KANBAN_VOCABULARY.statementPatterns),
fc.dictionary(fc.string({ minLength: 1, maxLength: 30 }), fc.string({ minLength: 1, maxLength: 30 })),
(schema, slotValues) => {
const stmt: Record<string, string> = { pattern: schema.pattern };
for (const slot of schema.slots) {
stmt[slot.name] = slotValues[slot.name] ?? `default-${slot.name}`;
}
const r1 = KANBAN_REPORTER.renderStatement(stmt);
const r2 = KANBAN_REPORTER.renderStatement(stmt);
// Determinism
if (r1 !== r2) return false;
// Slot preservation
for (const slot of schema.slots) {
if (!r1.includes(stmt[slot.name])) return false;
}
return true;
},
));
}@Verifies<RequirementCommandsFeature>('requirementListShowsCoveragePerRequirement')
'fast-check: renderStatement is deterministic and slot-preserving'() {
fc.assert(fc.property(
fc.constantFrom(...KANBAN_VOCABULARY.statementPatterns),
fc.dictionary(fc.string({ minLength: 1, maxLength: 30 }), fc.string({ minLength: 1, maxLength: 30 })),
(schema, slotValues) => {
const stmt: Record<string, string> = { pattern: schema.pattern };
for (const slot of schema.slots) {
stmt[slot.name] = slotValues[slot.name] ?? `default-${slot.name}`;
}
const r1 = KANBAN_REPORTER.renderStatement(stmt);
const r2 = KANBAN_REPORTER.renderStatement(stmt);
// Determinism
if (r1 !== r2) return false;
// Slot preservation
for (const slot of schema.slots) {
if (!r1.includes(stmt[slot.name])) return false;
}
return true;
},
));
}This covers two properties at once: determinism (calling the reporter twice with the same input produces identical output — catches accidental introduction of a Math.random() or Date.now() or non-stable iteration order) and slot preservation (every slot value the user provided makes it into the rendered output — catches the day someone "simplifies" the template substitution loop and accidentally drops a slot).
The combination of these two properties with the example-based reporter tests gives you near-mutation-testing-quality coverage of the reporter without the runtime cost of mutation testing. The example tests cover the defensive branches and the specific output shape; the properties cover the universals (every pattern, every slot, every input).
A third property worth considering — the round-trip law:
'fast-check: validateSpec passes ⟹ renderRequirement produces non-empty output''fast-check: validateSpec passes ⟹ renderRequirement produces non-empty output'This is a coupling property: it asserts that anything the validator accepts, the reporter can render. If the validator accepts a spec and the reporter then crashes on it, you've shipped a Style that's internally inconsistent — the wizard would let users save Requirements that the reporter then refuses to display. Catching that requires generating valid specs (using the validator as the oracle), which is a more involved fast-check setup, but for production Styles where the validator and reporter are in active development, the round-trip property is the one that catches the cross-component drift.
The cost of these properties is low (10-30 lines each, runtime in the low-millisecond range for ~100 trials), and they catch shapes of bug example tests can't see. They are the cheapest way to push your coverage report past 98 % and start catching the bugs that 98 % branch coverage doesn't capture (input-distribution bugs, cross-component coupling bugs, "I forgot to handle the new case" bugs).
Why no snapshot tests?
A reader from the JS testing world will ask: why no snapshot tests on the markdown reporter? It seems like the obvious move — render a known-good spec, snapshot the output, fail the build if the snapshot drifts. The package deliberately avoids this. Three reasons.
Reason 1 — Snapshots get rubber-stamped on update.
The dynamic of every snapshot-based test suite I've ever seen: a test fails, the maintainer runs vitest --update-snapshots (or jest -u), the snapshot is now whatever the code produces, the test passes. The "test" is a no-op — it's verifying that the code produces what the code produces. Snapshots only catch unintentional changes; the moment a maintainer updates them as a chore, they catch nothing. This is the "snapshots-as-rubber-stamp" problem and it's structural, not a discipline issue.
The Kanban ticket-card test asserts expect(md).toContain('# [FixedDate] REQ-FD — Regulator deadline') — that's a contract about the H1 format that survives updates. A snapshot would only assert that the output is unchanged, which is a weaker contract: it doesn't say anything about what the output should look like, only that it shouldn't change.
Reason 2 — Snapshots couple to formatting, not contract.
If I add a blank line to the markdown reporter, every snapshot breaks. Every. Snapshot. The maintainer regenerates them all in one keystroke. The blank-line change wasn't a breaking change to anyone consuming the markdown — it was a formatting tweak — but it forces a noisy test update that desensitises maintainers to snapshot diffs. Two months later, a real breaking change happens (a slot value drops out), the maintainer runs --update-snapshots from muscle memory because the diff "looks like just formatting", and the regression ships.
The contract assertions don't have this problem. expect(md).toContain('REQ-FD') doesn't care about whitespace, blank lines, or section ordering. It cares about the load-bearing fact: REQ-FD must appear in the output. Add a blank line, the test still passes; drop the id from the output, the test fails with a specific message.
Reason 3 — Snapshots don't anchor to the compliance graph.
A snapshot test passes or fails. It doesn't say which AC of which Feature it verifies. Even if you wrap the snapshot in a @Verifies decorator, the compliance report has to render the test name — and "renders the FixedDate ticket card matches snapshot" is a less informative AC verification than "renderMarkdown renders ticket card with fixed-date deadline and penalty prominently". Snapshots produce test names that describe the mechanism of verification, not the content of what's verified.
Where snapshots would be defensible: if the reporter output were a published artefact (an XML schema, a JSON wire format, a regulatory submission file) where any byte change is a contract change, snapshots would make sense — every byte is the contract. The Kanban ticket card markdown isn't that. It's human-readable output with formatting freedom. Contract assertions over toContain, toMatch, and structural regex are the right fit.
This isn't a universal rule — snapshots are excellent for some test surfaces. They're wrong for Style reporters specifically, for the three reasons above. Worth saying explicitly because the absence is otherwise unexplained.
On size, scope, and what this post doesn't cover
A reader who has gotten this far is reasonably thinking: "that's a lot of test code for a 500-line file." It is. Three honest acknowledgments.
The size objection is real. Two test files at 500+ lines each, plus property tests, plus the smoke test, plus the cross-surface coupling tests. Total surface for both reference Styles: ~1200 lines of test for ~1000 lines of production code. A 1.2:1 ratio.
That ratio is normal for compiler-flavoured code and below average for compliance-critical code. The TypeScript compiler's test suite is roughly 10:1 against the compiler proper. The Rust compiler's is similar. The reason is that compilers have many small branches (every operator, every keyword, every statement form) and each branch is its own potential unsoundness — same shape as a Style's discipline rules. A 1.2:1 ratio for a Style is not over-investment; it's barely meeting the bar.
If 1.2:1 feels like too much for your project, the question to ask is: what's the cost of a malformed Requirement in your domain? If the answer is "we'll catch it in code review", you don't need a Style — use plain TypeScript and skip the discipline. If the answer is "an audit finding, a regulatory action, a safety incident, or a compliance fine", 1.2:1 is cheap insurance.
What this post deliberately doesn't cover. Three things worth naming explicitly so adopters know to look elsewhere:
fitCriterionAdapters— the live-data integration point. Each Style can ship adapters that wirefitCriteriato real metrics platforms (Datadog query, SonarQube quality gate, Grafana SLO, GitHub Actions check). Testing those adapters is its own discipline (network mocking, recorded fixtures, contract tests against the upstream platform's API) and isn't covered here. The reference Styles ship withfitCriterionAdapters: []— adapters are downstream concerns. Look at the adapter test patterns intest/unit/property.test.tsfor the contract shape.Brownfield migration. What if you have an existing requirements codebase using the
naturalpattern as a free-text fallback? The Style's validators warn but don't reject; tests for that warning are different from tests for hard-fail validation rules. The HOW-TO-CREATE-YOUR-STYLE.md pitfall PF3 covers the warning-visibility discipline. A full brownfield-migration testing playbook is its own post; this one is for greenfield Style authoring.Reporter performance. Validators run hundreds of times per CI invocation (once per Requirement file × once per Feature × once per
--strictwalk). For repos with thousands of Requirements, this matters. The current implementation is microsecond-scale per spec on plain objects, so the real cost is dominated by file I/O. No dedicated performance tests exist; the smoke tests indirectly assert "doesn't take pathologically long" by virtue of running in CI under a time budget. If your repo grows past 10k Requirements, you'll want explicit benchmarks — also out of scope here.
The post focuses on the per-Style author's discipline: vocabulary, validators, templates, reporter, testing each, anchoring to ACs, the dog-fooding recursion. That's the load-bearing knowledge for shipping a Style that doesn't lie. The three areas above are downstream concerns where the discipline is similar but the test patterns differ.
CI configuration — the gates spelled out
The post has been waving in the direction of "the CI gates" without showing the configuration. Here's what actually wires the discipline together.
vitest.config.ts — per-file thresholds. Coverage is enforced per file, not globally. A global "98 % coverage" gate is misleading because a single 100 %-covered file (e.g. a constants file) can mask multiple sub-50 %-covered files (the actually-interesting code). Per-file thresholds prevent this:
import { defineConfig } from 'vitest/config';
export default defineConfig({
test: {
coverage: {
provider: 'v8',
reporter: ['text', 'html', 'lcov'],
// Global floor — anything not explicitly listed must clear this:
lines: 90, branches: 90, functions: 90, statements: 90,
// Per-file ceilings for the load-bearing surfaces:
thresholds: {
'src/styles/default.ts': { lines: 98, branches: 98, functions: 98, statements: 98 },
'src/styles/industrial.ts': { lines: 98, branches: 98, functions: 98, statements: 98 },
'src/styles/lean.ts': { lines: 98, branches: 98, functions: 98, statements: 98 },
'src/styles/agile.ts': { lines: 98, branches: 98, functions: 98, statements: 98 },
'src/styles/kanban.ts': { lines: 98, branches: 98, functions: 98, statements: 98 },
'src/decorators.ts': { lines: 100, branches: 100, functions: 100, statements: 100 },
},
},
},
});import { defineConfig } from 'vitest/config';
export default defineConfig({
test: {
coverage: {
provider: 'v8',
reporter: ['text', 'html', 'lcov'],
// Global floor — anything not explicitly listed must clear this:
lines: 90, branches: 90, functions: 90, statements: 90,
// Per-file ceilings for the load-bearing surfaces:
thresholds: {
'src/styles/default.ts': { lines: 98, branches: 98, functions: 98, statements: 98 },
'src/styles/industrial.ts': { lines: 98, branches: 98, functions: 98, statements: 98 },
'src/styles/lean.ts': { lines: 98, branches: 98, functions: 98, statements: 98 },
'src/styles/agile.ts': { lines: 98, branches: 98, functions: 98, statements: 98 },
'src/styles/kanban.ts': { lines: 98, branches: 98, functions: 98, statements: 98 },
'src/decorators.ts': { lines: 100, branches: 100, functions: 100, statements: 100 },
},
},
},
});Two things to notice:
Branches and lines are different numbers. A function with
if (a) { … } else { … }has 2 branches but ~5 lines. 100 % line coverage with 50 % branch coverage means you executed every statement but only on one side of every conditional. For validators, branches is what matters — everyifarm is a rule. Configuring onlylines: 98lets a Style ship with half its rules untested.Decorators are at 100 %. They're the load-bearing infrastructure that the entire compliance graph depends on. Anything below 100 % means a code path in the decorator system is unverified, and that means the compliance graph itself might silently mis-anchor. The cost of getting decorators to 100 % is one extra negative test per branch; the cost of not getting them to 100 % is the entire downstream guarantee.
The compliance --strict integration. Vitest verifies the tests. compliance --strict verifies the graph. They're different gates that catch different failures:
// package.json
{
"scripts": {
"test:unit": "vitest run --coverage",
"test:compliance": "requirements compliance --strict",
"test:all": "npm run test:unit && npm run test:compliance"
}
}// package.json
{
"scripts": {
"test:unit": "vitest run --coverage",
"test:compliance": "requirements compliance --strict",
"test:all": "npm run test:unit && npm run test:compliance"
}
}Run them in order. vitest first (fast feedback on broken tests); compliance --strict second (slower, requires all tests to be discovered first). CI runs test:all. Both must pass.
What a failing CI looks like in practice. Three failure modes a maintainer will hit:
Coverage threshold breach — vitest exits with
ERROR: Coverage for branches (97.4%) does not meet global threshold (98%). Fix: find the uncovered branch in the coverage HTML report (coverage/index.html) and add the missing test.Orphan AC —
compliance --strictprints the orphan-AC report shown above, exits non-zero. Fix: write a test anchoring to the un-verified AC, or remove the AC from the Feature if it's no longer in scope.Orphan test —
compliance --strictprints the orphan-test report, exits non-zero. Fix: add@FeatureTestand@Verifiesto the test class/method, or move the test to a documentedtest/scratch/directory excluded from compliance discovery.
None of this is exotic. The point is the combination: vitest enforces coverage, compliance --strict enforces traceability, and the per-file thresholds make the global numbers meaningful. Skip any one and the gate weakens to the point of being decorative.
What I changed in my own habits
A few practices I now apply to every Style I write — including third-party Styles I'm reviewing as PRs:
1. Vocabulary tests first. Three lines of FSM well-formedness, three lines of risk-matrix membership, three lines of "every pattern has slots". Cheapest tests in the suite, biggest stake when missing. Write them before any validator test.
2. The fixture reads the vocabulary, not a literal. The Kanban tests have a fullyFilled(pattern) helper that builds a valid statement by walking the pattern's declared slots. New slot, all tests automatically pick it up.
3. Triplet pattern: wrong / missing / accepted. Every discipline rule gets three test methods (or three sub-cases): rejected on the wrong value, rejected on the missing field, accepted on the valid baseline. Missing field catches optional-chaining typos; positive control catches false-positive regressions.
4. Error messages are tested contracts. /Anderson/.test(e.message) makes the user-visible error message a tested string. Every load-bearing validator error should have at least one regex assertion on its message that names the rule.
5. Reporter tests with deliberately malformed fixtures. Every ?? and ? in the reporter is a defensive branch. One fixture entry per branch, with a comment naming the branch. Kanban ticket-card test is the reference shape.
6. Cross-surface tests for templates × vocabulary. Loop over templates, assert every string reference resolves into the vocabulary. Closes the cross-surface coupling bug for life.
7. Cross-Style smoke tests. Loop over ALL_STYLES, assert the universals. Catches new Styles that don't conform to the interface contract.
8. Bundle uses referential equality. expect(KanbanStyle.vocabulary).toBe(KANBAN_VOCABULARY) (toBe, not toEqual). Catches the silently-bad refactor that copies the vocabulary into the bundle instead of referencing it.
9. Test file ends with void TestClassName;. Without it, the bundler tree-shakes the class, decorators don't register, tests silently don't run, CI is green and lying.
10. Coverage gate at 98 % per file, branches included. Lines coverage alone is misleading; branches is what matters for validators. Per-file thresholds prevent global averages from masking under-covered files.
11. Property test per Style: at least one fc.constantFrom(...statementPatterns). Catches "I added a pattern and forgot to test it" automatically. Cheapest insurance against future-pattern bugs.
12. Test names are rule statements, not test descriptions. 'Safety requirement with NonSIL risk is rejected, SIL2 is accepted' is a sentence about IEC 61511. 'rejects bad input' is a test description. Write the rule.
13. Both gates run in CI. vitest --coverage AND compliance --strict. They catch different failure modes; either gate alone is incomplete.
Where this lives
The full how-to with the 30-item publication checklist is in the package docs:
HOW-TO-TEST-YOUR-STYLE.md— the discipline document, ~5000 words, every test pattern with code and rationale.HOW-TO-CREATE-YOUR-STYLE.md— the companion doc on writing a Style in the first place, with a worked LegalStyle.- The reference test suites:
kanban-style.test.ts(44 methods, 98 % coverage),industrial-style.test.ts(9 dense methods, 98 % coverage),styles-smoke.test.ts(cross-Style universals). - The cross-cutting property tests:
property.test.ts.
If you're writing your own Style — a LegalStyle for deontic engineering, a MedicalStyle for IEC 62304, an AutomotiveStyle for ISO 26262, a FinancialStyle for Basel / MiFID, or anything else — the how-to is the operational manual. This post is the why.
Closing
A RequirementStyle isn't a feature. It's a tiny compiler front-end whose output is the Requirements file someone will hand to an auditor. The bar that makes sense for a feature — happy path covered, a couple of error cases, ship it — is exactly the bar that produces unsoundnesses in a Style validator: rules that exist in the vocabulary but aren't enforced, error messages that drift, defensive ?? branches that silently swallow malformed input, templates that reference renamed patterns, new Styles that don't conform to the interface, snapshot tests that get rubber-stamped on every formatting change.
The discipline that fixes it is borrowed wholesale from compiler engineering: write the vocabulary tests first because they're cheap and load-bearing; split your validator tests by validateStatement vs validateSpec because the architectural split is real; exhaust every discipline rule with the wrong / missing / accepted triplet; fix the error message as a tested contract; exercise the reporter with deliberately malformed input; close the cross-surface template × vocabulary coupling with one loop-test; smoke-test all Styles uniformly via the registry; fast-check the universals; configure per-file coverage thresholds with branches included; run both vitest --coverage and compliance --strict in CI; and never let a describe/it slip in because the compliance graph is the spec.
The three failure modes I walked through — the Anderson rule on Intangible class of service, the safety-function → SIL → standards chain, and the flow-driven sources rule for Standard CoS — are real ones. I would have shipped all three if I'd kept testing the way I used to. The eleven lines of test for the Anderson rule, the three-rule chain for the Industrial example, and the triplet for flow-driven sources, are the difference between a Requirement file you can hand to an auditor and a Requirement file that lies on your behalf.
The reporter tests with the deliberately malformed fixtures are what stops the rendered markdown from quietly emitting [object Object] or (none) or ? where a thoughtful default should be. The cross-surface templates × vocabulary test closes the coupling bug that no per-surface test catches. The cross-Style smoke contract makes new Styles automatically covered the moment they enter the registry. The compliance graph is what stops the test suite from drifting away from the Feature ACs it claims to verify. The dog-fooding recursion is what stops the package's own specifications from drifting away from the package's own implementation. The property-based tests are what catch the bugs you didn't think of — particularly the new patterns added six months from now by someone who isn't reading every test file before committing.
The choice not to use Zod isn't because Zod is bad — it's because cited validation, with traceable AC anchoring and tested error message contracts, is a different shape of artefact from shape validation. Both have their place. The Style architecture sits in the place where compliance work actually lives: rules whose rationale is part of the rule. The choice not to use snapshots isn't because snapshots are bad — it's because rubber-stamped snapshots are decorative, and the Style reporter's contract is structural enough that toContain and toMatch express it more precisely than a frozen byte-string.
That's what testing a Style is for. It's why the test suite for a Style is bigger than the Style itself, why every test name reads like a sentence from a domain textbook, why the compliance report renders test names next to AC names, why the per-file coverage gate is at 98 % with branches included, and why the package would rather have you write a third-party Style with the full discipline than accept a contribution that uses describe/it.
A Style is a tiny compiler. Test it like one.
Companion: HOW-TO-TEST-YOUR-STYLE.md. Source for the package: packages/requirements. Both Styles ship with the box; the tests are read-aloud-able as the specs of their respective domains. Comparisons to Zod, io-ts and runtypes appreciated as PRs against the docs — adversarial benchmarks especially welcome.