Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Part 07 — Syntax micro-DSL: tokens, scopes, tmLanguage, semantic tokens

The Syntax micro-DSL adds colour and structure recognition on top of the language identity declared in article 06. It produces two layered outputs: a static TextMate grammar JSON file that VSCode applies on every keystroke without involving any server, and an optional semantic-tokens overlay that resolves cross-file or context-dependent colour through the language server. The two layers are deliberately separate because they pay different costs and serve different needs: the TextMate layer is cheap and limited; the semantic layer is rich and round-tripped through the LSP. A well-designed DSL uses both, with TextMate carrying the load on tokens whose colour can be decided locally and semantic tokens reserved for the cases TextMate cannot express.

Concern

VSCode's highlighting system has two layers. The first, TextMate, is a regex-based recogniser that runs in the editor process: each line is scanned with the language's grammar, tokens are matched, and scopes are assigned. The grammar lives in a JSON file declared via contributes.grammars. It is fast, it has no awareness of cross-file context, and it has well-known limitations — it cannot tell a free identifier from a known symbol, it cannot colour a property reference based on the type of its receiver, it cannot follow a chain of extends clauses to recognise an inherited member. These limitations are why the second layer exists.

The second, semantic tokens, is an LSP capability (textDocument/semanticTokens) where the language server provides per-token classifications based on actual analysis. The editor merges the semantic-token overlay with the TextMate-derived scopes; rules in the user's theme can target either source. Semantic tokens cost a round-trip per relevant edit, so they are reserved for the scopes TextMate cannot reasonably express.

The Syntax micro-DSL gives the DSL author one declarative surface that produces both layers. Tokens whose colour can be decided locally compile to TextMate patterns; tokens whose colour requires cross-file resolution compile to semantic-token classifications served by the LSP. The author does not allocate one to the other manually; the micro-DSL's default heuristics (regex predictability, presence of @ReferenceLink resolution) make the choice with an explicit override available.

The Surface

import { Token, SemanticToken } from '@frenchexdev/ide-dsl-syntax';
import { Concept, Property } from '@frenchexdev/ide-dsl-kernel';

@Concept({ id: 'cmf.req.Feature' })
export class Feature {
  @Property({ type: 'string', constraint: /^FEATURE-\d+$/ })
  @Token('feature-id', { scope: 'entity.name.tag.feature.requirements' })
  id!: string;
}

// File-level token declarations (static, regex-driven)
@Token('priority-literal', {
  pattern: /\bPriority\.(Low|Medium|High|Critical)\b/,
  scope: 'constant.language.priority.requirements',
})
export class PriorityLiteralToken {}

@Token('decorator-keyword', {
  pattern: /@(Feature|Satisfies|Refines|FeatureTest|Verifies|Expects|Exclude)\b/,
  scope: 'storage.type.annotation.requirements',
})
export class DecoratorKeywordToken {}

// Cross-file semantic token (resolved by the LSP)
@SemanticToken('feature-reference', {
  classifies: (node) => node.kind === 'FeatureRef',
  kind: 'enumMember',
  modifiers: ['readonly'],
})
export class FeatureReferenceToken {}

@Token is the static, regex-driven shape: a name, a regex pattern (or a binding to a @Property constraint when colocated on a Concept field), a TextMate scope. The scope name follows the TextMate convention; standard scopes (storage.type, entity.name.tag, constant.language, ...) inherit theme colours; custom suffixes refine without overriding the base. @SemanticToken is the dynamic shape: a classifier function over the kernel AST, plus a semantic-token kind and modifiers from the LSP standard taxonomy. The classifier runs server-side, on every relevant edit, and contributes to the overlay the editor merges.

The colocation pattern matters. Putting @Token('feature-id', { ... }) directly on the Feature.id property keeps the highlighting rule next to the data it colours. The kernel's extractor reads both decorators on the same field and the Syntax micro-DSL pairs them: the constraint regex from @Property({ constraint }) becomes the TextMate pattern automatically, with the scope from @Token providing the colour. This is one of the article 04 mass-production examples — one declarative fact (constraint) feeds both Diagnostics (article 11) and Syntax (here).

Kernel boundary

What this micro-DSL takes from the kernel:

  • The Structure Model (every Concept and its properties, with constraints).
  • The current AST at server time (for @SemanticToken classifiers).
  • The LanguageRegistry from article 06's Language micro-DSL (to find the scopeName to use as the grammar root).

What it gives back:

  • The TextMate grammar JSON file, registered with the Extension host's emitter chain.
  • A semantic-token contribution declared on the LSP host's contribution registry, picked up at server startup.

The boundary is one-way for the static layer (Syntax reads kernel, emits artefacts, no kernel write) and partially round-tripped for the semantic layer (the LSP host calls back into Syntax's classifiers per request, the classifiers walk the kernel AST). The classifiers are pure functions over kernel state — they never mutate.

Emitted artefacts

// syntaxes/requirements.tmLanguage.json (generated)
{
  "$schema": "https://raw.githubusercontent.com/martinring/tmlanguage/master/tmlanguage.json",
  "name": "Requirements",
  "scopeName": "source.ts.requirements",
  "patterns": [
    { "include": "#priority-literal" },
    { "include": "#decorator-keyword" },
    { "include": "#feature-id" },
    { "include": "source.ts" }
  ],
  "repository": {
    "priority-literal": {
      "match": "\\bPriority\\.(Low|Medium|High|Critical)\\b",
      "name": "constant.language.priority.requirements"
    },
    "decorator-keyword": {
      "match": "@(Feature|Satisfies|Refines|FeatureTest|Verifies|Expects|Exclude)\\b",
      "name": "storage.type.annotation.requirements"
    },
    "feature-id": {
      "match": "\\bFEATURE-\\d+\\b",
      "name": "entity.name.tag.feature.requirements"
    }
  }
}
// package.json contributions (merged)
{
  "contributes": {
    "grammars": [
      {
        "language": "requirements",
        "scopeName": "source.ts.requirements",
        "path": "./syntaxes/requirements.tmLanguage.json"
      }
    ],
    "semanticTokenScopes": [
      {
        "language": "requirements",
        "scopes": {
          "enumMember.readonly": ["entity.name.tag.feature.requirements"]
        }
      }
    ]
  }
}

The TextMate file includes source.ts at the end of its patterns array — a grammar inheritance trick: the requirements grammar's patterns take precedence, and unmatched tokens fall through to TypeScript's grammar so that ordinary TS constructs (keywords, identifiers, types) keep their highlighting. This is what makes a .req.ts file readable as both "a requirements file" and "a TypeScript file" depending on which scope rule wins.

The semanticTokenScopes block maps semantic-token kinds back to TextMate scope names, so a theme that colours entity.name.tag.feature will also colour enumMember.readonly in this language — keeping the visual presentation consistent across the two layers.

Both files carry kernel Banners.

Composition with peers

  • Language (article 06) — supplies the scopeName and id.
  • LSP host (article 20) — registers the semantic-token classifier as a textDocument/semanticTokens handler.
  • Hover (article 10) — independently inspects tokens at the cursor; both micro-DSLs see the same kernel AST node and their roles do not overlap (Hover renders content; Syntax classifies for colour).
  • Completion (article 08) — reads constraint regexes the Syntax micro-DSL has already computed for @Property({ constraint }) patterns; the regex computation is shared via the kernel's Structure Model, not duplicated.

The composition is asymmetric: Syntax is heavily consumed by other micro-DSLs (everyone needs to know what tokens look like), heavily consumes the kernel, and depends on no other micro-DSL.

MPS aspect referent

The relevant MPS aspect is Editor, specifically the cell colour rules sub-aspect. MPS lets a language author attach colour metadata to the cells used in projection editing; we adopt the per-Concept declarative shape (colour rules colocated with the data they colour), drop the projectional cell binding (we do not project, we render text), and translate the metadata to TextMate + semantic tokens. The conceptual move — colour is data on the metamodel, not separate config — is the contribution we keep from MPS; the implementation is entirely VSCode-shaped.

Boundary justification

Why is highlighting not in the LSP host? The TextMate layer must run synchronously on every keystroke; round-tripping through an LSP server would add latency the editor cannot tolerate. The semantic layer is served by the LSP host, but the host only routes; it does not own the classification logic. Splitting the concerns lets the static layer ship without an LSP dependency at all (a consumer can use Syntax with no language server and still get fast highlighting).

Why is the static layer not in the Language micro-DSL? Because the unit of evolution is different (article 06 boundary justification, restated). The language identity changes rarely; the syntax grammar evolves continuously. Coupling them means every grammar change risks an accidental identity change. Splitting them lets the identity stabilise while syntax keeps moving.

Requirements

FEAT-MICRODSL-07 in assets/features.ts:

  • tokenDeclarationSurfaceShown — the Surface section shows @Token and @SemanticToken with colocation and standalone forms.
  • tmLanguageEmissionPathDescribed — the Emitted artefacts section walks both files, names the grammar inheritance trick, names the Banner.
  • semanticTokensFallbackJustified — the Concern section spells out TextMate's limitations and why semantic tokens exist; the Surface section shows the binding.
  • boundaryAgainstHighlightingInLspExplained — the Boundary justification section names the keystroke-latency argument.

Article 08 picks up with Completion, the first of the LSP-routed micro-DSLs.

⬇ Download