Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Part 04 — Strictly additive: invariants under load

The previous article showed how virtFS lets ten producers see each other's outputs through a fixpoint loop. This article addresses the dual question: with ten producers writing into a shared filesystem, what stops them from stepping on each other? The answer is a small contract — three explicit error codes plus a prune-on-delete pass at commit time — and a single load-bearing principle: producer identity is the unit of conflict resolution.

The contract is enforced at three boundaries: at the API surface (the absence of delete and truncate on VirtFs), at the path level (SG0040 InvalidEmissionPath rejects writes outside outDir), and at the content level (SG0020 EmissionDivergence rejects two distinct producers writing different contents to the same path). All three behaviours are visible in roughly seventy lines of code and all three are bound to fit criteria in packages/sourcegen-example/requirements/requirements/req-determinism-and-verifiability.ts. After this article the reader holds the engine's safety story.

Why "strictly additive" and not "transactional"

A reader coming from database or filesystem worlds might expect the safety story to be transactional — a generator opens a transaction, writes some files, commits or rolls back. The TS port deliberately does not work that way. It works additively: a generator can only call addSource; there is no delete, no truncate, no rename, no move. The whole API surface for mutation is one method, and that method only adds files (or updates a file the same producer already wrote).

The reason is given precisely in req-tscp-strictly-additive-emission.ts:

"Source-of-truth boundaries (human-authored vs machine-generated) must remain visible in git diff. Mutation of human sources destroys intent without audit, makes fixpoint convergence undecidable (a generator that mutates a file another generator reads can oscillate indefinitely), and turns recovery into chirurgical git revert."

Three reasons in one sentence, each load-bearing.

The first is auditability. With strict additivity, every line in outDir was emitted by some producer in some run; every line outside outDir is human-authored. The boundary is a directory boundary, visible in git diff at filename level — User.dto.generated.ts is generated, User.entity.ts is not. A reviewer can read a PR and immediately tell which lines need scrutiny and which lines are derived. Mutation breaks this; if a generator can rewrite User.entity.ts, the line that was changed could be intent or could be machine-noise, and the reviewer has to dig into the generator's source to find out which.

The second is fixpoint decidability. The fixpoint loop terminates when an iteration produces no new emissions. No new emissions is well-defined as long as the only operation generators can perform is "add or update a file under outDir". If generators could mutate arbitrary files, two generators with conflicting opinions about a user file could oscillate forever — generator A rewrites the file to state X, generator B rewrites it to state Y, generator A rewrites it back to X, and so on. The fixpoint check would have no way to know whether convergence was reached or whether the loop was simply trapped in a stable cycle. With strict additivity, oscillation is impossible: a file is either present in virtFS at content C or it is not, and a divergent emission is rejected by SG0020 rather than allowed to overwrite. The loop terminates or it errors; it never oscillates.

The third is recovery. With strict additivity, recovery from a botched generator is rm -rf outDir/ — no surgery, no git revert, no manual reconciliation. The disk state is rebuildable from the input by definition. A pipeline that mutated user files would not have this property; recovery would require a git stash, git revert, or worse, depending on how interleaved the mutations were with the user's own work. The contract says generators are not allowed to put you in that situation.

SG0040 — path-escape rejection

The first error code is SG0040 InvalidEmissionPath, raised when a generator attempts to write outside outDir. The validation is in virtfs/path-validation.ts:

export function validateRelPath(relPath: string, producerId: string): Diagnostic | null {
  if (relPath.length === 0) {
    return makeSG0040(relPath, producerId, 'empty path');
  }
  if (path.isAbsolute(relPath) || /^[a-zA-Z]:[/\\]/.test(relPath)) {
    return makeSG0040(relPath, producerId, 'absolute path');
  }
  const posixized = relPath.replace(/\\/g, '/');
  const normalized = path.posix.normalize(posixized);
  if (normalized === '..' || normalized.startsWith('../')) {
    return makeSG0040(relPath, producerId, 'path escapes outDir');
  }
  return null;
}

Three rejection cases, exhaustive. The first is the empty string — useful as a defensive check, since a generator that returns an empty path almost certainly has a bug elsewhere. The second is an absolute path, in either POSIX (/foo) or Windows (C:\foo) form. The third is a relative path that, after normalisation, escapes outDir via .. traversal. The check is conservative: anything not provably safe is rejected.

The boundary defended by SG0040 is the boundary defended by strict additivity itself. A generator that could write to ../src/foo.ts could mutate user code through path manipulation; a generator that could write to /etc/passwd could do worse. The check does not trust the producer; it does not interrogate the producer's intent; it tests the path against a small set of explicit rejection rules and produces a diagnostic if any of them match. The producer that would have written outside outDir is named in the diagnostic (producerId), so the operator can identify the offending generator without bisecting.

What the check does not do is more interesting. It does not consult a real filesystem; the validation is purely on the relPath string. It does not check for symlinks; the in-memory virtFS doesn't have symlinks. It does not check for filesystem case-collisions on case-insensitive volumes; that check is in the commit phase (Part 06) where it actually matters. The path validation in virtfs/path-validation.ts is fast, deterministic, and free of I/O — exactly what a hot path through addSource needs.

SG0020 — producer divergence

The second error code is SG0020 EmissionDivergence, raised when two distinct producers attempt to write different contents to the same relPath. The check is in virtfs/internal.ts:81-99:

const existing = state.emissions.get(file.relPath);
if (existing) {
  if (existing.contents === file.contents) {
    // Idempotent overlap — no-op (does NOT append to journal).
    return { newOrChanged: false };
  }
  if (existing.producerId !== file.producerId) {
    // Cross-producer divergence (SG0020) — refuse, virtFs unchanged.
    const diag: Diagnostic = {
      severity: 'error',
      code: 'SG0020',
      message: `EmissionDivergence: producers '${existing.producerId}' and '${file.producerId}' both emit different contents for '${file.relPath}'`,
      file: file.relPath,
      producerId: file.producerId,
    };
    return { newOrChanged: false, diagnostics: [diag] };
  }
  // Same producer updating its own emission with new contents — allowed.
}

The decision tree here is the heart of the additive contract. There are three cases, in order of frequency.

The most common case is the idempotent overlap: the same producer (or a different one) writes the same contents to the same path. This is allowed and silent — newOrChanged: false, no diagnostic, no journal entry. It happens routinely in a converged fixpoint: every generator runs again on the third iteration, every emission is byte-identical to the second iteration, every addSource is a no-op. Without this rule the loop would never terminate (every iteration would record new emissions in the journal even when nothing changed); with this rule, the convergence check if (!anyEmitted) is well-defined.

The second case is same producer, new contents. This is allowed without a diagnostic — the producer is updating its own emission, presumably because new information became available since the last iteration. The barrel generator in the example does this on iteration 1: it emitted a barrel in iteration 0, the registry was added in iteration 1, the barrel re-emits to absorb the registry. Both emissions have the same producerId: '99-index.barrel', different contents. The second emission overwrites the first in the emissions Map; the journal records both because both were newOrChanged: true from the producer's perspective.

The third case is the cross-producer divergence: two different producers write different contents to the same path. This is the contract violation — neither producer can know whose intent should win, and the engine refuses to choose. The rejection is hard: the second emission is not applied (virtFS state unchanged), SG0020 is raised, and the run will abort at the next error check in the runner. Both producer ids are surfaced in the diagnostic so the operator can identify the conflict.

The asymmetry is deliberate. Same producer, new contents is allowed because a generator's iterations across the fixpoint represent the same author's evolving understanding — the producer can be trusted to have a coherent view of what it should emit at this path. Different producer, new contents is rejected because two producers writing to the same path are a category error; either the path is mis-allocated (one of them should be writing somewhere else) or the producers have an unstated dependency that should be made explicit. The contract refuses to paper over the ambiguity.

A test case for this exact rule lives in req-determinism-and-verifiability.ts, bound to conflictingProducersFailWithSG0020() (line 40). The test spins up a synthetic configuration with two producers that target the same path, runs the pipeline, and asserts both outcome.ok === false and that an SG0020 diagnostic is present.

Prune-on-delete

The third behaviour is not an error code — it is a quiet pass at commit time. When the pipeline runs on input I and emits files F, then runs again on input I' (where some entity has been deleted) and emits files F' (where the corresponding entity-children have not been emitted), the commit phase removes from disk every file in F \ F'. The user's User.entity.ts was deleted; the User.dto.generated.ts, User.repository.generated.ts, User.mapper.generated.ts, User.validator.generated.ts are no longer emitted; on commit they are removed from disk.

The mechanism is in virtfs/commit.ts, called via:

const preCommitFiles = await scanExistingGeneratedFiles(outDir, fs);
await commitTo({
  outDir,
  emissions: handle.getFinalEmissions().values(),
  preCommitFiles,
  fs,
});

scanExistingGeneratedFiles enumerates every file currently on disk under outDir that ends in .generated.ts. commitTo writes every emission to disk, then removes any pre-existing generated file not in the current emissions set. The diff between what was on disk before and what is being committed now is the prune set.

The behaviour is conservative in two ways. First, only files matching the .generated.ts suffix are pruned; user files in outDir (rare but legal) are left alone. Second, the prune happens only after the commit succeeds; if any rename in the commit phase fails, the prune is skipped, leaving disk state recoverable. Part 06 walks the commit phase in detail.

The acceptance criterion in req-determinism-and-verifiability.ts:39 reads: "Deleting an input entity file removes its generated children on next commit." — bound to the test method deletedInputRemovesGenerated().

What the additive contract forbids — and the alternatives

A generator cannot mutate user code. A generator cannot rename or move a generated file. A generator cannot empty or partially overwrite a file another producer wrote. The list is short and exclusionary, and a TS team coming from ts-patch or ts-transformer-keys will immediately ask: how do I extend a user class without mutating the file it's in?

The answer is the standard Roslyn answer in TS clothing: inheritance, mixin, or module augmentation. The library README states it as four alternatives (README.md:127-133):

  • Inheritance — the generator emits class FooCompanion extends Foo { /* extra */ } in Foo.companion.generated.ts. Consumers import FooCompanion, get the extended behaviour. The user's Foo is untouched.
  • Mixin composition — the generator emits a function withFooBar<T>(Base: T): T & FooBar that the user calls explicitly. The user's class declaration is untouched; the user opts in by composing the mixin at usage sites.
  • TS module augmentationdeclare module './foo' { interface Foo { extra(): void } }. This is the closest analogue to C# partial class and is the technique used by libraries like @types/express to extend interfaces declared in third-party packages. Module augmentation is type-only; it does not add runtime behaviour, but for many code-generation use cases (additional method signatures, branded types) that is exactly what is needed.
  • Re-export with extra symbols — the generator emits export { Foo } from './foo'; export const FOO_META = { /* ... */ };. The user imports the generated module and gets both the original symbol and the new ones.

The example uses a fifth pattern, slightly different: declare-and-import. The DTO generator (stage 30) emits export interface UserDto { /* ... */ } in User.dto.generated.ts and the repository generator (stage 20) emits export interface UserRepository extends Repository<User, UserKey> { /* ... */ } in User.repository.generated.ts. The user's User class stays exactly as the user wrote it. Consumers import the DTO from the generated DTO file, the Repository from the generated repository file, the entity from the user's source. Three files for three concerns; one of them is human-authored, the other two are machine-authored.

A team that genuinely needs to mutate a user file — for instance, to insert a method into a class declaration — should not use this library for that purpose. The right tool is a separate codemod CLI, with explicit user consent at invocation time. The library README is explicit (req-tscp-strictly-additive-emission.ts:36-37): "If a real use case demands mutation, it must be exposed as a separate codemod CLI (cf. requirements-lib/rename-core), not folded into the source-gen pipeline."

What the contract holds under load

A reader might suspect the additive contract is fragile under load — that it is easy to demonstrate on a two-stage pipeline but breaks down at ten. The example's test suite asserts the opposite by construction.

The pipeline runs ten producers against the canonical input. Each producer's emissions overlap in conventional ways: stage 50 (mapper) reads stage 30 (dto) and emits to a different relPath; stage 60 (validator) reads both 30 and 40 and emits to a third relPath; stage 0 (registry, iter 1) reads stage 20 (repository) and emits to a fourth relPath. The partitioning of relPaths across producers is disjoint — no two producers ever target the same ${ClassName}.${role}.generated.ts. The configuration is by construction conflict-free.

But the test suite verifies this property as a test, not as a structural invariant. The fit criteria in req-determinism-and-verifiability.ts include conflictingProducersFailWithSG0020: a synthetic configuration is set up with a deliberate collision, and the test asserts the engine surfaces SG0020. The contract is checked from both sides — the conflict-free configuration runs cleanly five times in a row (fiveRunsByteIdentical), the conflicting configuration aborts deterministically (conflictingProducersFailWithSG0020).

The two assertions together close the loop: the contract holds when respected, and is enforced when violated. A consumer of @frenchexdev/ts-codegen-pipeline can rely on either side.

Bridge

Part 05 walks the second pillar of determinism: the four-line banner that wraps every emission, the content hash that detects hand-edits, and the verify command that turns those primitives into a CI gate. Banners are the visible marker of the additive contract — every machine-emitted file carries one; every human-emitted file does not — and the hash is the proof that the engine's output has not drifted between the time it was emitted and the time verify is run.

The Feature for this article is FEAT-TSGEN-04 in assets/features.ts. Acceptance criteria: SG0040 path-escape contract stated; SG0020 producer-divergence contract stated; prune-on-delete semantics explained; additive contract contrasted with mutation-free alternatives. Each section above maps to one of those ACs.

⬇ Download