Part 05 — DRY and the LanguageIR as contract
The Don't-Repeat-Yourself principle is easy to state, easy to overapply, and genuinely hard to use well. This article takes a concrete baseline — an actual VSCode extension's package.json, annotated field by field — and shows what repetition looks like in practice, so that the LanguageIR we introduce next can be judged on what it deduplicates rather than on aesthetic preference. It then states the IR shape as a proposal, names the emitters as strategies over that IR, and closes on the two constraints that keep DRY from devouring itself: knowing when to stop factoring, and planning the IR's evolution from the start.
A VSCode package.json teardown
A minimal DSL extension's manifest looks roughly like the block below — the structure a hand-written extension for @frenchexdev/requirements would ship with today. The layout is faithful to the VSCode contribution-points documentation; the content is illustrative.
Proposal (design-in-public) — annotated baseline:
{ "name": "requirements-ide", "displayName": "Requirements IDE", "version": "0.1.0", "engines": { "vscode": "^1.85.0" }, "categories": ["Programming Languages"], "main": "./dist/extension.js", "activationEvents": ["onLanguage:requirements"], "contributes": { "languages": [{ "id": "requirements", "extensions": [".req.ts"], "aliases": ["Requirements", "req"], "configuration": "./language-configuration.json" }], "grammars": [{ "language": "requirements", "scopeName": "source.requirements", "path": "./syntaxes/requirements.tmLanguage.json" }], "snippets": [{ "language": "requirements", "path": "./snippets/requirements.json" }], "commands": [{ "command": "requirements.compliance", "title": "Requirements: run compliance --strict" }], "taskDefinitions": [{ "type": "requirements", "required": ["command"], "properties": { "command": { "type": "string", "description": "Subcommand" } } }] } }{ "name": "requirements-ide", "displayName": "Requirements IDE", "version": "0.1.0", "engines": { "vscode": "^1.85.0" }, "categories": ["Programming Languages"], "main": "./dist/extension.js", "activationEvents": ["onLanguage:requirements"], "contributes": { "languages": [{ "id": "requirements", "extensions": [".req.ts"], "aliases": ["Requirements", "req"], "configuration": "./language-configuration.json" }], "grammars": [{ "language": "requirements", "scopeName": "source.requirements", "path": "./syntaxes/requirements.tmLanguage.json" }], "snippets": [{ "language": "requirements", "path": "./snippets/requirements.json" }], "commands": [{ "command": "requirements.compliance", "title": "Requirements: run compliance --strict" }], "taskDefinitions": [{ "type": "requirements", "required": ["command"], "properties": { "command": { "type": "string", "description": "Subcommand" } } }] } }
Read against the @Language declaration of requirements-ide.spec.ts, the duplication is obvious. The string "requirements" appears seven times (language id, activation event, grammar.language, snippet.language, task.type, command prefix, extension name stem). The scope name "source.requirements" appears twice (manifest and grammar JSON). The file extension .req.ts appears in the manifest and is the knowledge the TextMate grammar injection selectors depend on. The command id requirements.compliance appears in the manifest and — when the generated TypeScript TaskProvider registers the task — in src/extension.ts as a string literal again. For a single DSL, this is tolerable. Across five DSLs, each with three to five commands, six to twelve tokens, one to three snippets, the repetition is no longer a nuisance: it is a drift surface. Every "requirements" string that a refactor misses becomes a broken link between the manifest and the code, invisible at build time, noticed at runtime.
The three categories of repetition map onto three derivation rules:
- Fields mechanically derivable from the spec. The language id (from
@Language({id})), the extensions (@Language({extensions})), the scope name (@Language({scopeName})), the grammar path (fixed convention:syntaxes/<id>.tmLanguage.json), the snippets path (snippets/<id>.json), the language-configuration path. These are the fields the meta-DSL writes without asking. - Fields that require projecting a collection. The commands array (one entry per
@Executormethod), the task-definition types (one per executor), the activation events (one per language, plus one per executor command that should auto-start). Collection projection, not substitution. - Fields that must stay explicit in the spec. The extension
versionstring (project-controlled, not spec-controlled), the engines compatibility range (toolchain-controlled), the publisher id (organisational), the marketplace categories. The meta-DSL does not invent these; the spec's@Languageoptions carry them or the forge's CLI flags do.
The first two categories are what DRY buys. The third is the reminder that some repetition between manifest and code is accurate — it encodes a decision the spec author owns and the forge must pass through faithfully.
The LanguageIR as a contract
The LanguageIR is the data type the extractor produces and every emitter consumes. Its value is not "a clever internal structure"; its value is that there is exactly one and it is a plain object.
Proposal (design-in-public) — the IR shape:
interface LanguageIR { readonly schemaVersion: '2026-04-14'; readonly language: { id: string; extensions: readonly string[]; scopeName: string; aliases: readonly string[]; features: readonly string[]; // dog-food: Feature IDs this DSL claims }; readonly tokens: readonly TokenRecord[]; readonly rules: readonly RuleRecord[]; readonly snippets: readonly SnippetRecord[]; readonly lspFeatures: readonly LspFeatureRecord[]; readonly executors: readonly ExecutorRecord[]; } interface TokenRecord { readonly name: string; readonly pattern: string; // RegExp source; flags forbidden readonly scope: string; // TextMate scope, derived if omitted } interface SnippetRecord { readonly prefix: string; readonly body: readonly string[]; // lines with ${n:placeholders} readonly description?: string; } interface LspFeatureRecord { readonly capability: 'diagnostics' | 'hover' | 'completion' | 'definition'; readonly methodName: string; // the @LspFeature-decorated method } interface ExecutorRecord { readonly command: string; readonly label: string; readonly taskKind?: 'build' | 'test' | 'run'; readonly methodName: string; }interface LanguageIR { readonly schemaVersion: '2026-04-14'; readonly language: { id: string; extensions: readonly string[]; scopeName: string; aliases: readonly string[]; features: readonly string[]; // dog-food: Feature IDs this DSL claims }; readonly tokens: readonly TokenRecord[]; readonly rules: readonly RuleRecord[]; readonly snippets: readonly SnippetRecord[]; readonly lspFeatures: readonly LspFeatureRecord[]; readonly executors: readonly ExecutorRecord[]; } interface TokenRecord { readonly name: string; readonly pattern: string; // RegExp source; flags forbidden readonly scope: string; // TextMate scope, derived if omitted } interface SnippetRecord { readonly prefix: string; readonly body: readonly string[]; // lines with ${n:placeholders} readonly description?: string; } interface LspFeatureRecord { readonly capability: 'diagnostics' | 'hover' | 'completion' | 'definition'; readonly methodName: string; // the @LspFeature-decorated method } interface ExecutorRecord { readonly command: string; readonly label: string; readonly taskKind?: 'build' | 'test' | 'run'; readonly methodName: string; }
Five invariants shape the IR:
- Serialisable. Every field is a string, a number, a boolean, or a nested record of those. No
RegExpobjects, noDates, no class instances. The IR can be JSON-stringified, committed to a snapshot file, and diffed across runs. - Immutable. Every array is
readonly. Emitters cannot mutate the IR they receive; if an emitter needs a derived view (say, tokens grouped by scope prefix), it computes it locally. - Flat. No back-references between records. A token does not point at a rule; a rule cites a token by name. The IR is a bag of records with symbolic links, not a graph of shared-object references.
- Versioned. The
schemaVersionfield is the first field of every IR instance. Part of the DRY discipline is that the version lives in the IR, not in the emitters, not in the extractor — it is the IR's self-description. - Complete. Anything an emitter needs is in the IR. An emitter that wants to know the Feature IDs this DSL claims reads
ir.language.features; it does not reach back intospec.ts. The IR is the contract; going behind it is a bug.
The IR is not clever. That is the point. A sophisticated IR — one that carries derived views, cached projections, normalisation — is an IR that does work, which means it has logic, which means it has bugs, which means its bugs become the emitters' bugs. A dumb IR is a boring data structure, and the emitters can be judged on their own.
Emitters as strategies over the IR
Given one IR, the meta-DSL's fan of emitters is a direct application of the Strategy pattern. Each emitter knows how to project the IR into one output artefact, and no emitter knows about any other's artefact. The seven built-ins the series commits to:
| Emitter | Reads from IR | Writes | Drawn from |
|---|---|---|---|
| Manifest | language, snippets.length, executors, lspFeatures.length > 0 |
package.json |
contributes.* fields |
| Grammar | tokens, rules |
syntaxes/<id>.tmLanguage.json |
TextMate JSON |
| Snippets | snippets |
snippets/<id>.json |
VSCode snippet JSON |
| Language-config | language (brackets/comments inferred or declared) |
language-configuration.json |
VSCode language configuration |
| LSP server | lspFeatures, tokens |
src/server.ts |
vscode-languageserver scaffolding |
| Extension host | language, executors, lspFeatures.length > 0 |
src/extension.ts |
vscode-languageclient + activation |
| Task provider | executors |
src/taskProvider.ts (referenced from host) |
vscode.TaskProvider scaffolding |
Strategy here means: each emitter implements the Emitter port introduced in Part 04 (name, outputs, emit(ir, fs)), swaps in and out behind the registry, and runs in any order — they do not communicate with each other. The only ordering constraint is that the host entry references the task provider, so the two files must coexist in the final sibling project, but each emitter writes its own file independently and the reference is by static path, not by cross-emitter call.
The premature-factoring trap
DRY's failure mode is well-documented: eliminating apparent repetition between two things that are incidentally similar produces a shared abstraction that breaks, painfully, the first time the two things need to diverge. The meta-DSL has three places where this trap is visible in advance, and where the design explicitly refuses to factor.
Trap one — shared "TextMate-ish" emitter. The grammar emitter writes TextMate JSON; the snippets emitter writes a different JSON with overlapping vocabulary (scope, pattern, content). A reflexive DRY move would define a JsonDslEmitter base class and have both inherit from it. This is wrong. TextMate scopes and VSCode snippet scopes are both called "scope" but are unrelated concepts — a TextMate scope is a dotted string the editor uses for colour lookup; a snippet scope is a language id. Sharing a base would couple two unrelated evolutions. The right shape is two independent emitters that happen to both write JSON.
Trap two — shared LSP-handler scaffolding between features. The four LSP handlers (diagnostics, hover, completion, definition) all register with the same vscode-languageserver connection object and all follow the same "receive request, call user method, return result" skeleton. It is tempting to factor a registerHandler(capability, userMethod) helper. The factoring is fine as long as the four capabilities keep identical shapes; the first time one needs to stream progress (diagnostics does, via publishDiagnostics being a notification and not a request), or one needs to register dynamically at runtime, the helper splits back into four, with worse names than if it had never existed. Defer the factoring until the fourth capability is identical.
Trap three — shared "DSL author options" struct. Every emitter needs a few authoring knobs (verbose logging, a dry-run flag, an output directory). A single EmitterOptions struct accreted over time collects every knob every emitter ever wanted, and becomes the place where "how did this field get set?" becomes impossible to answer. The right shape is that each emitter's constructor takes the options it needs; options shared across emitters are passed explicitly to each.
The meta-rule is: factor in response to three repetitions of the same shape, not two. Two looks like a pattern; three proves it. Factoring on two is the move that produces most of the regret in our field.
Versioning the IR
The LanguageIR.schemaVersion field is not ornamental. The IR will evolve — a future article might add a semanticTokens record type, a future emitter might require a lspFeatures[].options field, a future DSL might need multi-document scope rules the current IR cannot express. The versioning policy, stated once here so every later emitter can assume it:
- The version is a date literal, not a SemVer number. SemVer for data schemas invites the fiction that "breaking" and "non-breaking" are distinguishable; dates just record when the shape changed.
- Forward-compatible additions — a new optional field on an existing record, a new optional record array — bump the date but do not break existing emitters. Emitters read what they know; unknown fields are ignored.
- Breaking changes — removing a field, renaming a field, changing a field's type — also bump the date, and require a migration: a pure function
migrate_<oldDate>_to_<newDate>(old): newcommitted to the forge alongside the type definition. The extractor emits only the current version; old IRs kept as snapshots are migrated on read. - The schema version is validated on entry to every emitter. An emitter writes its required version as a constant and refuses to run against any other. This is a paranoid check — the extractor is the only producer and always writes the current version — but it catches the failure mode where an out-of-tree emitter, written against an old forge, is plugged into a new one.
This policy is how packages/requirements already treats its $schemaVersion field on specs. The meta-DSL inherits the pattern verbatim — one more example of transposing a monorepo convention rather than inventing a new one.
What the IR does not replace
A closing honesty box. The IR is the single source of truth for what the DSL is. It is not the source of truth for:
- What the extension's version number is. That lives in
package.json, derived from the forge's CLI flags or the invoking project's ownpackage.json. The IR does not carry marketplace metadata. - What the user's project convention is. Which folder under the repo the generated extension lives in, what the package name prefix is, whether the extension ships bundled via esbuild or unbundled — all forge-level concerns, not IR-level.
- How the forge logs its work. Log messages, progress reporting, error formatting are the
Loggerport's responsibility, not the IR's.
Drawing the line explicitly is what keeps the IR from becoming a god-object-by-accretion. Everything that is about the DSL is in the IR. Everything that is about running the forge is outside it. Part 06 takes that clean split and uses it to build a test strategy that verifies the emitters without ever invoking ide-forge build end to end.
Build counterpart
The IR-as-contract posture is the pivot of the companion build series, Ide.Dsl — Build. Build 03 — LanguageIR as contract gives the IR an ajv-validated JSON Schema, branded primitives, and a date-as-version ($schemaVersion: '2026-04-14'); Build 02 — The extractor is the sole writer; Build 07 — LSP server emitter is one of several readers — never mutating, never crosstalking with siblings. The "one writer, many readers" discipline pinned down here is what makes the build series's emitter-per-article structure possible.