Part 05 — Banners, hashes, and verify mode
The previous article walked the additive contract — what stops ten producers from stepping on each other through the engine's path validation, divergence detection, and prune-on-delete behaviour. This article walks the second pillar of the safety story: deterministic output. Every generated file carries a four-line banner whose third line identifies the producer and the SHA-256 hash of the body. The hash is taken over the body alone — no timestamps, no iteration counts, no run-id — which means two consecutive runs on identical inputs produce byte-identical output on disk. That property is the precondition for the sourcegen verify command, which runs the pipeline in memory, compares the in-memory state to disk, and surfaces a diagnostic for every file that has drifted.
After this article the reader holds the determinism story end-to-end. The full code is roughly seventy lines (plus thirty more for the verify driver), all in packages/ts-codegen-pipeline/src/emit/banner.ts and emit/verify.ts. The acceptance criteria are bound in req-determinism-and-verifiability.ts.
The banner — four lines, fully deterministic
Every generated file begins with a four-line envelope. The wrapping function is in emit/banner.ts:22-26:
export function withBanner(body: string, meta: BannerMeta): string {
const hash = sha256Short(body);
const metaLine = `// producer: ${meta.producerId} contentHash: sha256:${hash}`;
return [BANNER_OPEN, BANNER_HEADER, metaLine, BANNER_CLOSE, body].join('\n');
}export function withBanner(body: string, meta: BannerMeta): string {
const hash = sha256Short(body);
const metaLine = `// producer: ${meta.producerId} contentHash: sha256:${hash}`;
return [BANNER_OPEN, BANNER_HEADER, metaLine, BANNER_CLOSE, body].join('\n');
}A concrete output ahead of the explanation. The repository generator (stage 20, producer id 20-entity.repository) emits the following banner for Order.repository.generated.ts:
// <auto-generated>
// GENERATED BY @frenchexdev/ts-codegen-pipeline — DO NOT EDIT
// producer: 20-entity.repository contentHash: sha256:a3f7e2b4c1d0
// </auto-generated>
import type { Repository } from '../runtime/types.js';
import type { Order } from '../entities/order.entity.js';
import type { OrderKey } from './Order.dto.generated.js';
export interface OrderRepository extends Repository<Order, OrderKey> {}// <auto-generated>
// GENERATED BY @frenchexdev/ts-codegen-pipeline — DO NOT EDIT
// producer: 20-entity.repository contentHash: sha256:a3f7e2b4c1d0
// </auto-generated>
import type { Repository } from '../runtime/types.js';
import type { Order } from '../entities/order.entity.js';
import type { OrderKey } from './Order.dto.generated.js';
export interface OrderRepository extends Repository<Order, OrderKey> {}Four observations.
The first is that the banner is exactly four lines. Open marker, human-readable header, machine-readable meta line, close marker. The body that follows is whatever the producer rendered, byte for byte. The banner is structurally trivial — lines.slice(0, 4) is the banner, lines.slice(4) is the body — which makes the round-trip cheap.
The second is that the meta line carries only two facts: the producer id and a SHA-256 hash of the body. There is no timestamp. There is no iteration index. There is no run id. There is no host name. There is no human-readable date. The deliberate absence of those fields is what makes two runs on identical inputs produce byte-identical disk state. A timestamp would defeat byte-identity; an iteration index would defeat byte-identity if the convergence count ever shifted; a run id is by definition non-deterministic. None of those three are present.
The third is that the hash is truncated to twelve hex characters (banner.ts:71: .slice(0, 12)). Twelve hex characters is forty-eight bits — enough to make accidental collision astronomically unlikely (one in 2.8 × 10^14) without making the banner line wider than seventy-two columns. Long enough to be safe; short enough to read and grep.
The fourth is that BANNER_OPEN and BANNER_CLOSE are XML-style line comments (banner.ts:3-5): // <auto-generated> and // </auto-generated>. The choice is deliberate — many editors and tooling (Visual Studio, ReSharper, GitHub's "view raw" UI) recognise the <auto-generated> marker as a hint to suppress lint warnings, hide the file from "find-in-files" by default, or render the file with a "do not edit" decoration. The TS port inherits the convention because it is already in operators' muscle memory.
The hash — what it covers and what it does not
The hash is computed over the body and only the body (banner.ts:23: sha256Short(body)). It does not cover the banner itself; if it did, the hash would be self-referential and the formula would have to be solved iteratively. The verify-mode check (banner.ts:58-68) extracts the body, recomputes its SHA-256, and compares the first twelve hex characters to the value in the meta line:
export function verifyBanneredHash(banneredContents: string): boolean {
let body: string;
let meta: ParsedBanner;
try {
body = extractBody(banneredContents);
meta = extractMeta(banneredContents);
} catch {
return false;
}
return sha256Short(body) === meta.contentHash;
}export function verifyBanneredHash(banneredContents: string): boolean {
let body: string;
let meta: ParsedBanner;
try {
body = extractBody(banneredContents);
meta = extractMeta(banneredContents);
} catch {
return false;
}
return sha256Short(body) === meta.contentHash;
}The check tells the operator one fact: the body, as it currently sits on disk, was produced by the engine (or, more precisely, was produced by some engine using the same hash function and the same body). It does not say who edited the body, or when, or what diff would be needed to bring it back into agreement with the engine; those are tasks for git, not for the engine. What the engine guarantees is the binary outcome: hash matches, or hash does not match.
Three properties fall out of the body-only design.
First, the banner is rewritable without changing the hash. If a future revision of the engine changes the human-readable header line ("GENERATED BY ... — DO NOT EDIT"), already-generated files do not invalidate; their bodies are unchanged, their hashes still match. This makes the banner format itself a soft contract that can evolve without forcing every consumer to regenerate.
Second, the hash is a primitive, not a signature. It is not a cryptographic signature; it does not prove who wrote the file. An attacker with write access to the project could overwrite the file body and recompute the hash. The hash detects accidental edits and regeneration drift; it does not protect against malicious actors. That is the right scope: source-control already protects against malicious actors, and the engine is not in that role.
Third, two different generators producing structurally similar bodies have different hashes because the bodies differ in detail. Two different runs of the same generator on the same inputs produce identical bodies (the determinism property), and therefore identical hashes. This is what makes byte-identity verifiable: a clean run, followed by a verify, returns ok: true, diagnostics: [].
The verify command — three error codes
The sourcegen verify command runs the entire pipeline in memory, without committing to disk, and then compares the in-memory virtFS state to whatever is currently on disk. The implementation is in emit/verify.ts:31-85. It surfaces three error codes.
SG0010 — file edited by hand
A disk file's content differs from the engine's in-memory output, and the disk file's banner hash does not match its own body. This means: someone edited the file directly, and either kept or didn't bother updating the banner. The check (emit/verify.ts:54-65):
if (diskContents === file.contents) continue;
// Differ — distinguish hand-edit from regeneration drift.
const hashConsistent = verifyBanneredHash(diskContents);
diagnostics.push({
severity: 'error',
code: hashConsistent ? 'SG0011' : 'SG0010',
message: hashConsistent
? `RegenerationDrift: ${file.relPath} differs from current generation`
: `GeneratedFileEditedByHand: ${file.relPath}`,
/* ... */
});if (diskContents === file.contents) continue;
// Differ — distinguish hand-edit from regeneration drift.
const hashConsistent = verifyBanneredHash(diskContents);
diagnostics.push({
severity: 'error',
code: hashConsistent ? 'SG0011' : 'SG0010',
message: hashConsistent
? `RegenerationDrift: ${file.relPath} differs from current generation`
: `GeneratedFileEditedByHand: ${file.relPath}`,
/* ... */
});The semantic distinction is sharp: if the banner hash matches the body, the body has not been hand-edited (the body and its hash were produced together by the engine), so the disagreement with virtFS is regeneration drift — the user changed sources, did not re-run, the file on disk is out of date. If the banner hash does not match the body, someone edited the body without recomputing the hash, which is the signature of a hand-edit. SG0010 is hand-edit, SG0011 is drift.
SG0011 — regeneration drift (three sub-cases)
The drift code covers three sub-cases: the disk file differs from virtFS but the banner hash is consistent (the file was a clean engine output that has gone stale because user sources changed); the disk file is missing (some operation removed it without re-running the engine); a disk file exists that virtFS does not produce (its input was deleted; the next commit would purge it).
All three cases mean the same operational thing: the disk and the engine disagree. The fix in every case is to run sourcegen run (which would commit the new state) and re-verify. The three sub-cases share the SG0011 code because their resolution is identical, but the diagnostic message disambiguates them.
Composing SG0010 and SG0011 in a CI gate
A CI pipeline running sourcegen verify interprets the two codes the same way: any non-zero exit from the command means "the disk and the engine do not agree". The build is failed; the operator is asked to either re-run sourcegen run and commit the result (if the user changed sources), or to revert the hand-edit (if a teammate is mistakenly editing generated files). The two error codes do not need separate handling in the CI definition; they need separate handling only in the post-mortem, when figuring out what to fix.
The acceptance criterion in req-determinism-and-verifiability.ts:38 reads: "Hand-editing a generated file then verify-only surfaces SG0010." — bound to verifyDetectsHandEdit(). The five-run byte-identity test on line 36 — "Five runs produce identical sha256 per file." — is the corresponding "happy path" assertion: clean run, no hand-edit, no drift, all hashes stable.
The five-run determinism assertion
The byte-identity property is not a corollary; it is a separately tested invariant. The fit criterion fiveRunsByteIdentical asserts that five consecutive runs on identical inputs produce sha256-identical output per file. Five — not two — because two could in principle alternate between two attractor states (the engine producing one output, then noticing it's already on disk and producing a different one). Five runs without drift is strong evidence that the engine has reached a stable fixed point that survives repeated invocation.
The implementation is in the test file test/unit/determinism-verifiability.test.ts (196 lines, walked in Part 11). The mechanic: spin up the canonical sandbox, run sourcegen run five times, take a sha256 of every file under outDir after each run, assert all five hash sets are identical. If any hash changes between runs, the test fails with a precise diff of which file's hash drifted.
The five-run assertion is what makes byte-identity a property of the engine rather than a property of one lucky run. A consumer of @frenchexdev/ts-codegen-pipeline can rely on the byte-identity claim because it has been verified five times on a deliberately complex pipeline with ten producers and a backward edge.
What the banner system does not do
Three properties a reader might expect, and the reasons they are not provided.
First, the banner does not carry version information. Future revisions of the engine could in principle write a different banner format; old files with the previous format would still be parseable (the parser is structural — open marker, header, meta line, close marker — and tolerant of header text changes), but new files would have the new format. The decision to omit a version number from the banner is deliberate: an engine that wants to evolve its banner format can do so without breaking already-generated files, and without committing to a long backward-compatibility tail. If the format ever breaks compatibility severely enough to need versioning, the version is added then; until then, the absence is the simpler choice.
Second, the banner does not carry a list of inputs the producer read. A more elaborate banner could include "inputs: [User.entity.ts:5d3a, Order.entity.ts:4f7c]" and turn into a precise dependency graph. The choice not to do this is one of scale: every generator would need to track its inputs precisely, every banner line would grow with the number of inputs read, and the verify command would have to re-validate input hashes too. The simpler design — body-hash only — is sufficient for the safety properties the engine claims, and the elaborate version is left for a future variant if the need arises.
Third, the verify command is not a CI signature replacement. A CI pipeline that runs sourcegen verify is checking engine determinism, not commit authenticity. The git commit signature, the build provenance, the supply-chain attestations — all of those are upstream concerns and are not what sourcegen verify is for. A CI gate composes sourcegen verify with whatever its existing trust boundary is; verify is one input to that gate, not the whole gate.
Bridge
Part 06 walks the third pillar of the safety story: the atomic commit. With ten producers writing into virtFS and a fixpoint loop terminating on convergence, the commit phase is what turns the in-memory state into the on-disk state — and it does so atomically, all-or-nothing, leaving prior disk state intact if any single rename fails. After Part 06 the engine's three invariants (additive, deterministic, atomic) are all on the table.
The Feature for this article is FEAT-TSGEN-05 in assets/features.ts. Acceptance criteria: four-line banner layout described; no-timestamp determinism rationale stated; sourcegen verify drift-detection explained; hand-edit detection via content hash explained. Each section above maps to one of those ACs.