Part 11 — Testing strategy: 22 ACs across 5 Features
The previous nine articles walked the engine and the example. This article walks the test apparatus that ties the example back to its acceptance criteria. Five Feature classes declare 22 acceptance-criteria methods, twenty test files in packages/sourcegen-example/test/unit/ bind every method via @Verifies, and a 208-line industrial-sandbox.ts harness materialises the canonical input on a temp filesystem per test. The whole apparatus dog-foods @frenchexdev/requirements-core — the same Feature/Requirement framework described in cmf-design article 13 — and the same one the closing-the-loop series describes for hexagonal traceability.
The argument of this article is operational. Given a multi-stage codegen pipeline with the engine invariants articulated in Parts 03–06, given the ten generators walked in Parts 07–10, what does it take to verify the result is correct? Four properties: idempotence (a second run is a no-op), determinism (byte-identical output across runs), hand-edit detection (verify-mode catches drift), convergence (fixpoint terminates within maxIterations). All four are asserted, all four are tied to bound @Feature/@Requirement artefacts, all four can be reproduced by a consumer reading the test files. After this article the consumer holds the operational story end-to-end.
The five Features and twenty-two acceptance criteria
The five Feature classes live in requirements/features/, one file per family:
| Feature | id | Bindings count |
|---|---|---|
IndustrialWeavingFeature |
SGE-INDUSTRIAL-WEAVING |
4 |
EntityScaffoldingFeature |
SGE-ENTITY-SCAFFOLDING |
4 |
FsmTypedDispatchFeature |
SGE-FSM-TYPED-DISPATCH |
4 |
ModuleWiringFeature |
SGE-MODULE-WIRING |
4 |
DeterminismAndVerifiabilityFeature |
SGE-DETERMINISM-VERIFIABILITY |
6 |
Five Features × four-or-six ACs each = 22 acceptance criteria total. Each Feature class extends Feature from @frenchexdev/requirements-core and is decorated with @Satisfies(<Requirement>) to wire it back to its anchor Requirement. The Feature for entity scaffolding (entity-scaffolding.feature.ts) is representative:
@Satisfies(ReqEntityScaffoldingRequirement)
export abstract class EntityScaffoldingFeature extends Feature {
readonly id = 'SGE-ENTITY-SCAFFOLDING';
readonly title = '@Entity → typed Repository + DTO + Mapper + Validator';
readonly priority = Priority.High;
readonly enabled = true;
abstract repositoryShapeMatchesEntityFields(): ACResult;
abstract dtoUsesDistributiveOmitAndBranded(): ACResult;
abstract mapperRoundTripsEntity(): ACResult;
abstract validatorEnforcesPkAndUnique(): ACResult;
}@Satisfies(ReqEntityScaffoldingRequirement)
export abstract class EntityScaffoldingFeature extends Feature {
readonly id = 'SGE-ENTITY-SCAFFOLDING';
readonly title = '@Entity → typed Repository + DTO + Mapper + Validator';
readonly priority = Priority.High;
readonly enabled = true;
abstract repositoryShapeMatchesEntityFields(): ACResult;
abstract dtoUsesDistributiveOmitAndBranded(): ACResult;
abstract mapperRoundTripsEntity(): ACResult;
abstract validatorEnforcesPkAndUnique(): ACResult;
}Three observations.
The first is the abstract keyword on every AC method. The Feature class does not implement the ACs; it declares them. The implementation is in the test file, where each AC is bound via @Verifies to a real test method. The Feature class is a contract — here are the four behaviours this Feature claims to deliver — and the test file is the proof.
The second is the return type ACResult. Every AC method returns a { ok: boolean; reason?: string }. The shape is the canonical assertion type used by the requirements framework; a consumer can run a Feature programmatically (calling each AC in turn) and aggregate results. The mechanic is described in closing-the-loop article 06. For purposes of this series, the relevant property is that every AC has a typed signature and a test file binds it.
The third is the enabled flag. A Feature can be temporarily disabled by setting enabled = false, in which case its ACs are not required. The example sets enabled = true everywhere (every Feature is in scope); the flag exists for the case where a Feature is being incubated and not yet ready to gate CI. Discovery of disabled Features is part of the dog-fooding story; for this series the flag is on for all five.
The @Verifies binding
A test file binds an AC to a test method. The pattern from the requirements framework, used in every test file in test/unit/:
import { Verifies, FeatureTest } from '@frenchexdev/requirements-core';
import { EntityScaffoldingFeature } from '../../requirements/features/entity-scaffolding.feature.js';
@FeatureTest(EntityScaffoldingFeature)
export class EntityScaffoldingTests {
@Verifies(EntityScaffoldingFeature, 'repositoryShapeMatchesEntityFields')
testRepositoryShape() {
/* ... assertions ... */
}
/* three more @Verifies methods, one per AC */
}import { Verifies, FeatureTest } from '@frenchexdev/requirements-core';
import { EntityScaffoldingFeature } from '../../requirements/features/entity-scaffolding.feature.js';
@FeatureTest(EntityScaffoldingFeature)
export class EntityScaffoldingTests {
@Verifies(EntityScaffoldingFeature, 'repositoryShapeMatchesEntityFields')
testRepositoryShape() {
/* ... assertions ... */
}
/* three more @Verifies methods, one per AC */
}The @Verifies(<Feature>, <method>) decorator binds the test method to a specific AC. The framework can then introspect the test file at compile time (or runtime, with a discovery pass) and compute which ACs are bound and which are unbound. An unbound AC is a coverage hole; the framework's compliance tooling surfaces it.
The dog-fooding pays off here. The example is both a consumer of the engine and a consumer of the requirements framework — every claim the example makes about what it does is bound to a typed AC, every AC is bound to a test method. A consumer reading the example can trace from "the example says it converges in three iterations" through IndustrialWeavingFeature.fixpointReachedInThreeIterations() to a specific test method that asserts outcome.iterations === 3.
The framework is described in cmf-design article 13 and feature-tracking. For purposes of this series, the relevant fact is that it exists and that the example uses it consistently.
The industrial sandbox
The 208-line test/helpers/industrial-sandbox.ts is the test harness that every test file uses to materialise the canonical input on a temporary filesystem. Three responsibilities:
The first is temp directory management. The harness uses fs.mkdtempSync(path.join(os.tmpdir(), 'sge-industrial-')) to create a fresh root per test. Every test gets an isolated filesystem; tests cannot interfere with each other; cleanup happens in the cleanup() method (typically called from a Vitest afterEach). The temp-directory pattern is standard but the discipline of one fresh root per test is what makes the pipeline tests deterministic — a failed test cannot leave stale files for the next test to trip over.
The second is canonical input materialisation. The harness defines DEFAULT_INPUT_FILES (industrial-sandbox.ts:75-139) — the same inputs walked in Part 02, templated as TypeScript strings. When a test calls makeIndustrialSandbox(), the harness writes those strings to the temp filesystem, reproducing the canonical pipeline input. Two side files round out the materialisation: a templated attributes.ts (the eight decorator declarations) and a templated runtime/types.ts (the Repository/Mapper/Validator interfaces and the EntityId<K> template-literal type). The harness uses templates rather than referring to packages/sourcegen-example/src/ directly to keep tests independent of any specific filesystem layout — the harness is self-contained.
The third is generator wiring. The harness exports DEFAULT_GENERATORS — the ten production generator instances in lex order — and constructs a SourceGenConfig that points at the temp filesystem. A test that wants to override the generator list (for instance, to test what happens with only stages 0, 20, and 30) passes generators: [...] to makeIndustrialSandbox. The default list is the production list; deviations are explicit.
The harness also exports readGenerated(outDir), a helper that returns the contents of every generated file as a Record<filename, contents>. Tests use this to assert disk state directly. Two tests can call readGenerated on the same outDir and compare hash sets — that is how fiveRunsByteIdentical is implemented.
The pattern is mature and standard for codegen testing. The 208-line harness encodes a meaningful amount of "right defaults" and lets each individual test focus on its specific assertion.
The four asserted properties
All twenty-two ACs aggregate, at the engine level, into four properties. Each property is asserted by at least one specific test method.
Property 1 — Idempotence
A second run on identical inputs is a no-op at the disk level. Asserted by runFixpoint(cfg) followed by a second runFixpoint(cfg) on the same outDir. The second run's outcome.iterations should be 1 (one iteration to confirm no new emissions, one to confirm no new emissions again — actually the runner exits after the first no-op iteration so iterations = 1 in this case). The disk state should be unchanged at the bit level, verifiable by hashing every generated file before and after.
A subtle property: idempotence is not the same as byte-identity across runs. Idempotence says "running twice does not change the disk after the first run"; byte-identity says "running twice produces the same file contents". The two coincide for a deterministic engine, and the example's tests assert both — but they are conceptually distinct.
Property 2 — Determinism (byte-identity across runs)
Five runs on identical inputs produce sha256-identical files. Asserted by fiveRunsByteIdentical() — the test runs the pipeline five times against fresh sandboxes, hashes every output, asserts all five hash sets are equal. The "five" is intentional (Part 05 walks the rationale): two runs could in principle ping-pong between two attractor states; five runs on a deliberately complex pipeline is strong evidence of true determinism.
The property fails immediately if any generator inadvertently introduces non-determinism — a Date.now(), an unsorted iteration over a Map, a hash function that depends on insertion order. The test is a regression boundary; a future change that breaks determinism fails the test before it can ship.
Property 3 — Hand-edit detection (verify-mode)
A clean run followed by a hand-edit followed by verify surfaces SG0010. Asserted by verifyDetectsHandEdit() — the test runs the pipeline, manually edits one of the generated files (changing a comment, say, or an identifier name), runs verify mode, asserts the diagnostic list contains an SG0010 for that specific file. This is the property that lets a CI pipeline catch teammates who edit generated files directly.
The complementary property — a clean run followed by verify returns ok — is asserted by verifyAfterRunIsClean(). Together the two assert that the verify-mode is neither over-triggering on clean output nor under-triggering on hand-edits.
Property 4 — Convergence
The pipeline reaches its fixpoint within maxIterations. Asserted by fixpointReachedInThreeIterations() — outcome.iterations === 3 for the canonical example. The complementary property — MAX_ITERATIONS aborts a non-converging pipeline — would require a synthetic divergent generator and is not currently in the test suite (the live generators are designed to converge). The forward assertion is what matters operationally.
Composing the four
A consumer's CI pipeline runs npx vitest run --coverage. The Vitest runner discovers the twenty test files, runs the bound @Verifies methods, surfaces failures in a structured report. A passing CI run is evidence that all four properties hold; a failing CI run names the specific AC that broke. The four properties together are the operational acceptance test for the example as a whole.
sourcegen verify and tsc --noEmit on outDir
The unit tests assert engine invariants. A second layer of verification asserts consumability: the generated output must itself be valid TypeScript that a consumer can compile.
The first half of that layer is sourcegen verify, walked in Part 05. The command runs the pipeline in memory and compares to disk. A consumer's CI pipeline runs sourcegen verify after sourcegen run to assert that the just-committed disk state matches what the engine would produce on a hypothetical re-run — i.e., that the commit hasn't been corrupted by any subsequent step.
The second half is tsc --noEmit on outDir. The acceptance criterion generatedOutputTypechecks() in req-determinism-and-verifiability.ts:41 asserts that the full generated tree under outDir passes tsc --noEmit. This catches the case where a generator's emission accidentally produces invalid TypeScript — for example, an import path that doesn't resolve, a type reference to a non-existent symbol, a syntax error in a template literal. The tsc --noEmit pass is fast (no emission, just type-checking) and definitive.
In a consumer's CI pipeline, the natural composition is:
# 1. Generate (or fail)
npx sourcegen run --config sourcegen.config.ts
# 2. Verify on-disk matches the engine's view
npx sourcegen verify --config sourcegen.config.ts
# 3. Type-check the generated output
npx tsc --noEmit -p tsconfig.generated.json
# 4. Run the unit tests
npx vitest run --coverage# 1. Generate (or fail)
npx sourcegen run --config sourcegen.config.ts
# 2. Verify on-disk matches the engine's view
npx sourcegen verify --config sourcegen.config.ts
# 3. Type-check the generated output
npx tsc --noEmit -p tsconfig.generated.json
# 4. Run the unit tests
npx vitest run --coverageThe four steps are independent failures — a failing tsc --noEmit does not mean sourcegen verify would fail, and vice versa. They cover complementary classes of bug. A CI pipeline runs all four and surfaces the specific failure. A consumer who picks up @frenchexdev/ts-codegen-pipeline for their own pipeline replicates the four-step pattern in their own CI definition.
What the testing strategy does not (yet) cover
Three properties a reader might expect in a comprehensive testing apparatus, with notes on where they sit.
First, performance benchmarks. The example's tests assert correctness, not speed. The pipeline runs in well under a second on the canonical input, but the test suite does not measure or assert that. A future revision could add a req-tscp-performance.ts requirement ("the canonical pipeline runs in under 1000ms on standard hardware") and a benchmark test; the current suite does not. The omission is deliberate — performance characterisation deserves its own series, and the engine's design makes it easy to add later.
Second, fuzz testing of the input space. The tests use the canonical input plus a small number of variants (empty input, conflict scenarios for SG0020, dangling references for SG-DANGLING-LIFECYCLE). They do not generate random inputs. A property-based testing approach using fast-check would generate decorated classes within the schema bounds and assert that the engine handles every input shape gracefully. The library README explicitly mentions fast-check integration in the TestStubGenerator's "Tier 2" emissions; the example's test suite does not currently use it but could.
Third, integration with a downstream consumer. The tests assert that the example itself converges, type-checks, verifies. They do not assert that a downstream consumer of the example's outputs (a hypothetical CRUD application built on top of OrderRepository, OrderDto, OrderMapper, OrderValidator) compiles and runs. The omission is structural — there is no downstream consumer in the workspace yet. When Ide.Dsl-TS lands, the expectation is that its test suite will verify integration with the engine and the example, closing the loop end-to-end.
Bridge
Part 12 is the capstone. With the engine, the example, and the test apparatus all on the table, the closing article steps back to ask what was borrowed from Roslyn (and what was deliberately not), what the next consumer (Ide.Dsl-TS) inherits from this foundation, and how the ontology-parameterized DSL pattern composes with the engine's invariants. Part 12 closes the series.
The Feature for this article is FEAT-TSGEN-11 in assets/features.ts. Acceptance criteria: Feature/Requirement/Verifies pattern introduced; industrial-sandbox harness explained; idempotence/determinism/hand-edit/convergence properties asserted; sourcegen verify + tsc --noEmit as CI pair explained. Each section above maps to one of those ACs.