Phase 2 — Extraction via TypeScript Compiler API

Phase 1 filled in missing transitions by reading JSDoc diagrams, canTransition switches, and state assignments. Every machine now has an explicit transitions array in its @FiniteStateMachine decorator. Phase 2 reads those decorators — and everything else they declare — to produce a structured graph of the entire SPA.

The input is a set of TypeScript source files. The output is data/state-machines.json: 43 machine nodes, 10 adapter nodes, 27 edges, and a generation timestamp. This file is the single source of truth for the interactive explorer (Phase 3), the composition graph, the topology scanner (Part VII), and the compliance report. Every downstream consumer reads from this JSON. No downstream consumer reads the TypeScript source directly.

This part documents the graph schema, the DI seams that keep extraction pure and testable, the AST-walking functions that read decorator metadata, the as const unwrapping that recovers literal types from TypeScript's AST representation, the three edge-building algorithms, the composition root that ties everything together, and the final JSON output with a real excerpt from this codebase.

By the end, you will understand: why the extractor is 995 lines of pure TypeScript Compiler API code with zero side effects, how each field in the @FiniteStateMachine decorator maps to a graph property, and how import declarations between machine files become structural edges in the composition graph.

The Graph Schema

The extractor produces a StateMachineGraph. Three types compose it: MachineNode, AdapterNode, and GraphEdge. Two node categories, three edge kinds. The schema is flat — no nesting, no circular references, no opaque IDs. Every id is a human-readable string prefixed by its category: machine:page-load-state, adapter:SidebarMask.

interface StateMachineGraph {
  generatedAt: string;          // ISO timestamp
  machines:    MachineNode[];   // 43 entries
  adapters:    AdapterNode[];   // 10 entries
  edges:       GraphEdge[];     // 27 entries
}

The generatedAt field serves two purposes. First, it tells the developer when the graph was last rebuilt — a stale graph is a common source of confusion ("the explorer shows the old states"). Second, it provides a deterministic value for tests: the ExtractorEnv.clock function is injected, so unit tests pass () => '2026-01-01T00:00:00.000Z' and snapshot assertions are stable.

MachineNode

Every src/lib/*.ts file that carries a @FiniteStateMachine decorator produces one MachineNode. The extractor reads both the decorator properties and the module's exported symbols.

interface MachineNode {
  id:          string;            // 'machine:page-load-state'
  name:        string;            // 'page-load-state'
  file:        string;            // 'src/lib/page-load-state.ts'
  exports:     string[];          // every exported symbol
  functions:   string[];          // subset of exports that are functions or const factories
  states:      string[];          // from decorator: ['idle', 'loading', ...]
  events:      string[];          // from decorator: ['startLoad', 'markRendering', ...]
  description?: string;           // one-line from decorator
  transitions: TransitionEntry[]; // from decorator or AST inference
  emits:       string[];          // CustomEvent names this FSM dispatches
  listens:     string[];          // CustomEvent names this FSM listens to
  guards:      string[];          // named guard predicates
  feature?:    { id: string; ac: string }; // requirement link
  scope?:      'singleton' | 'scoped' | 'transient';
  category:    'machine';         // discriminant
}

The id is always 'machine:' + basename where basename is the filename without .ts. This makes IDs predictable: given the file src/lib/copy-feedback-state.ts, the ID is machine:copy-feedback-state. No lookup table needed.

The exports array lists every top-level exported declaration — functions, variables, types, interfaces. The functions array is a strict subset: only exported functions and const variable declarations (which in this codebase are always factory functions). The distinction matters for the explorer's per-machine popover, which shows "transition methods" — the functions list — without the noise of re-exported type aliases and interfaces.

The transitions array uses TransitionEntry:

interface TransitionEntry {
  method: string;   // the function/method that triggers the transition
  to:     string;   // target state
  from:   string;   // source state, or '*' for wildcard
}

This maps directly from the decorator's { from, to, on } entries — the on field becomes method. The name change is deliberate: in the decorator, on reads as "on this event"; in the graph, method reads as "this method triggers the transition." Different audiences, different vocabulary.

AdapterNode

Adapters live in src/app-shared.ts. Each is an IIFE-style module: const SidebarMask = (() => { ... })() or const SidebarResize = (function() { ... })() or const TerminalDots = { init() { ... } }. The extractor detects all three patterns and produces one AdapterNode per match.

interface AdapterNode {
  id:        string;       // 'adapter:SidebarMask'
  name:      string;       // 'SidebarMask'
  file:      string;       // 'src/app-shared.ts'
  machines:  string[];     // machine IDs this adapter references
  emits:     string[];     // CustomEvent names this adapter dispatches
  listens:   string[];     // CustomEvent names this adapter listens to
  category:  'adapter';    // discriminant
}

The machines array is built by walking the IIFE body for identifiers that match exported symbols from any machine. If the adapter body references createScrollSpyMachine and createSidebarResizeState, the extractor resolves those identifiers to machine:scroll-spy-machine and machine:sidebar-resize-state via a pre-built export index.

The emits and listens arrays are inferred from AST patterns — dispatchEvent(new CustomEvent('sidebar-mask-change')) and addEventListener('dblclick', handler). Standard DOM events (click, scroll, resize, keydown, etc.) are excluded via a 40-entry deny set. Only application-level custom events appear.

GraphEdge

Edges connect nodes. Three kinds:

interface GraphEdge {
  from:       string;                         // node id
  to:         string;                         // node id
  kind:       'imports' | 'event' | 'composes';
  eventName?: string;                         // only when kind === 'event'
}

Kind	Meaning	Detection
`imports`	Adapter references a machine's exported symbol	Identifier scan in adapter IIFE body
`event`	Node A emits event X, node B listens to event X	Cross-reference emits/listens arrays
`composes`	Machine A imports from machine B's module	Import declaration scan between machine files

The eventName field is only set for event edges. It carries the string name of the custom event — 'app-ready', 'sidebar-mask-change', 'scrollspy-active' — so the explorer can label the edge.

Edge deduplication uses priority: if both an imports edge and a composes edge connect the same (from, to) pair, only the composes edge survives. The priority order is composes > event > imports. This prevents the graph from showing two edges between a coordinator and its sub-machine — one from the import declaration (structural) and one from the factory call (semantic). The semantic edge is more informative.

The ExtractorEnv Interface

The extractor has zero side effects. No fs.readFile. No new Date(). No process.argv. Everything the extractor needs from the outside world is injected through a single interface:

interface ExtractorEnv {
  libSources:       LibSource[];     // pre-loaded src/lib/*.ts files
  appSharedSource:  LibSource;       // pre-loaded src/app-shared.ts
  clock:            () => string;    // deterministic timestamp
}

Each LibSource is a pair of { relPath: string; text: string } — the path relative to the repo root (with forward slashes) and the raw TypeScript source text. The CLI shell (scripts/extract-state-machines.ts) reads the filesystem and builds these objects. The unit tests build them from string literals:

// In a unit test — no filesystem, no clock, no side effects
const env: ExtractorEnv = {
  libSources: [
    { relPath: 'src/lib/my-machine.ts', text: `
      import { FiniteStateMachine } from './finite-state-machine';

      @FiniteStateMachine({
        states: ['idle', 'active'] as const,
        events: ['activate'] as const,
        transitions: [
          { from: 'idle', to: 'active', on: 'activate' },
        ] as const,
        emits:   [] as const,
        listens: [] as const,
        guards:  [] as const,
        description: 'Test machine.',
        feature: { id: 'TEST', ac: 'activates' } as const,
      })
      export class MyMachineFsm {}

      export function createMyMachine() { /* ... */ }
    ` },
  ],
  appSharedSource: { relPath: 'src/app-shared.ts', text: '' },
  clock: () => '2026-01-01T00:00:00.000Z',
};

const graph = buildGraph(env);
expect(graph.machines).toHaveLength(1);
expect(graph.machines[0].states).toEqual(['idle', 'active']);

This design makes extraction trivially testable. The test suite for state-machine-extractor.ts uses in-memory source strings exclusively. No filesystem mocks. No temporary files. No cleanup. The test runner loads source text from describe blocks and asserts against the returned graph structure. If the TypeScript Compiler API changes behavior across versions, the tests catch it immediately — they exercise the same code paths as production.

The clock function is the simplest seam. Production code passes () => new Date().toISOString(). Tests pass a constant. The generatedAt field in the output is deterministic in tests and current in production. One function parameter eliminates an entire category of flaky snapshot failures.

There is also an AsyncFileReader interface and a loadLibSources helper for the CLI shell:

interface AsyncFileReader {
  readFile(absPath: string): Promise<string>;
}

async function loadLibSources(
  libDir:         string,
  filenames:      readonly string[],
  reader:         AsyncFileReader,
  relPathPrefix:  string = 'src/lib',
): Promise<LibSource[]> {
  const reads = filenames.map(async (f) => {
    const text = await reader.readFile(path.join(libDir, f));
    return { relPath: `${relPathPrefix}/${f}`, text };
  });
  return Promise.all(reads);
}

The CLI shell enumerates src/lib/ via readdir, passes the filenames to loadLibSources with fs.promises as the reader, and hands the result to buildGraph. The library never touches the filesystem. The boundary is crisp: IO in the shell, pure computation in the library.

hasFiniteStateMachineDecorator — The Opt-In Signal

Not every file in src/lib/ is a machine. Some are pure helpers — render-image.ts, heading-sections.ts, parse-mermaid-directives.ts. These files export functions but carry no @FiniteStateMachine decorator. The extractor must distinguish machines from non-machines.

The old approach was a deny list: NON_MACHINE_LIB_FILES, a hand-maintained array of filenames to skip. Every time a developer added a new helper under src/lib/, they had to remember to add its name to the deny list. Forgetting meant the extractor would scan it, find no decorator, produce a machine node with empty states and transitions, and inject noise into the graph. The deny list was a maintenance burden that scaled linearly with the number of non-machine files.

The new approach is an opt-in detector. The extractor parses each file's AST and looks for a @FiniteStateMachine decorator on a class. Only files with the decorator are enrolled as machines. New helper files cost nothing to add — they are invisible to the extractor by default.

function hasFiniteStateMachineDecorator(sourceFile: ts.SourceFile): boolean {
  let found = false;

  function checkDecorator(decorator: ts.Decorator): boolean {
    const expr = decorator.expression;
    // Shape 1: @FiniteStateMachine (bare, no arguments)
    if (ts.isIdentifier(expr)) return expr.text === 'FiniteStateMachine';
    // Shape 2: @FiniteStateMachine({ ... }) (call expression)
    if (ts.isCallExpression(expr) && ts.isIdentifier(expr.expression)) {
      return expr.expression.text === 'FiniteStateMachine';
    }
    return false;
  }

  function visit(node: ts.Node): void {
    if (found) return;
    const decorators = ts.canHaveDecorators(node)
      ? ts.getDecorators(node)
      : undefined;
    if (decorators && decorators.some(checkDecorator)) {
      found = true;
      return;
    }
    ts.forEachChild(node, visit);
  }
  visit(sourceFile);
  return found;
}

Two shapes are accepted: the bare @FiniteStateMachine (unlikely in practice but handled for robustness) and the call expression @FiniteStateMachine({ ... }) which every real machine uses. The walker short-circuits on the first match — once found is true, it stops visiting children.

An important subtlety: this function walks the AST, not the source text. A naive text.includes('@FiniteStateMachine') approach would match comments, strings, and the decorator definition file itself (finite-state-machine.ts contains the string FiniteStateMachine in its export declaration). The AST walker only inspects actual Decorator nodes, so false positives are impossible.

The isMachineSource function wraps this check with file extension filtering:

function isMachineSource(relPath: string, text: string): boolean {
  const base = path.basename(relPath);
  if (!base.endsWith('.ts')) return false;
  if (base.endsWith('.d.ts')) return false;
  const sf = parseSource(relPath, text);
  return hasFiniteStateMachineDecorator(sf);
}

Declaration files (.d.ts) are excluded — they carry type information but not decorators. Only real .ts source files pass through.

The filterMachineSources function applies this filter to the full set of loaded sources:

function filterMachineSources(
  sources: readonly LibSource[]
): LibSource[] {
  return sources.filter(s => isMachineSource(s.relPath, s.text));
}

This is the first filter in the buildGraph composition root. Of the ~60 files in src/lib/, exactly 43 pass. The other ~17 are helpers with no decorator.

extractFsmDecoratorData — Reading the Decorator AST

Once a file is confirmed to contain the decorator, the extractor reads every property from it. The extractFsmDecoratorData function walks the AST, finds the @FiniteStateMachine({ ... }) call expression on a class whose name ends with Fsm, and extracts the object literal argument.

The function returns a FsmDecoratorData structure:

interface FsmDecoratorData {
  states:       string[];
  events:       string[];
  transitions:  DecoratorTransitionEntry[];  // { from, to, on }
  emits:        string[];
  listens:      string[];
  guards:       string[];
  description?: string;
  feature?:     { id: string; ac: string };
  scope?:       'singleton' | 'scoped' | 'transient';
}

The extraction logic visits each property assignment in the object literal and dispatches on the property name:

function extractFsmDecoratorData(sf: ts.SourceFile): FsmDecoratorData | null {
  let result: FsmDecoratorData | null = null;

  function visit(node: ts.Node): void {
    if (result) return;  // short-circuit after first match
    if (ts.isClassDeclaration(node) && node.name?.text.endsWith('Fsm')) {
      const decorators = ts.canHaveDecorators(node)
        ? ts.getDecorators(node)
        : undefined;
      if (decorators) {
        for (const dec of decorators) {
          if (!ts.isCallExpression(dec.expression)) continue;
          const callee = dec.expression;
          if (!ts.isIdentifier(callee.expression)) continue;
          if (callee.expression.text !== 'FiniteStateMachine') continue;
          const arg0 = callee.arguments[0];
          if (!arg0 || !ts.isObjectLiteralExpression(arg0)) continue;

          // ... property extraction loop (see below)
        }
      }
    }
    ts.forEachChild(node, visit);
  }

  visit(sf);
  return result;
}

The class name convention — *Fsm — is important. A file might contain multiple classes (rare but possible). Only the one ending in Fsm is treated as the machine's companion class. This convention is documented in the FSM tooling skill and enforced by review.

Inside the property extraction loop, each property assignment is matched by name. The val variable holds the unwrapped initializer (after as const stripping — see next section):

for (const prop of arg0.properties) {
  if (!ts.isPropertyAssignment(prop) || !ts.isIdentifier(prop.name)) continue;
  const propName = prop.name.text;
  const val = unwrapAsConst(prop.initializer);

  if (propName === 'states')   states.push(...readStringArray(val));
  if (propName === 'events')   events.push(...readStringArray(val));
  if (propName === 'emits')    emits.push(...readStringArray(val));
  if (propName === 'listens')  listens.push(...readStringArray(val));
  if (propName === 'guards')   guards.push(...readStringArray(val));

  if (propName === 'transitions' && ts.isArrayLiteralExpression(val)) {
    for (const el of val.elements) {
      const entry = unwrapAsConst(el as ts.Expression);
      if (!ts.isObjectLiteralExpression(entry)) continue;
      let from = '', to = '', on = '';
      for (const p of entry.properties) {
        if (!ts.isPropertyAssignment(p) || !ts.isIdentifier(p.name)) continue;
        const v = unwrapAsConst(p.initializer);
        if (!ts.isStringLiteral(v)) continue;
        if (p.name.text === 'from') from = v.text;
        if (p.name.text === 'to')   to   = v.text;
        if (p.name.text === 'on')   on   = v.text;
      }
      if (from && to && on) transitions.push({ from, to, on });
    }
  }

  if (propName === 'description' && ts.isStringLiteral(prop.initializer)) {
    description = prop.initializer.text;
  }

  if (propName === 'feature' && ts.isObjectLiteralExpression(val)) {
    let fid = '', fac = '';
    for (const p of val.properties) {
      if (!ts.isPropertyAssignment(p) || !ts.isIdentifier(p.name)) continue;
      const v = unwrapAsConst(p.initializer);
      if (!ts.isStringLiteral(v)) continue;
      if (p.name.text === 'id') fid = v.text;
      if (p.name.text === 'ac') fac = v.text;
    }
    if (fid && fac) feature = { id: fid, ac: fac };
  }

  if (propName === 'scope' && ts.isStringLiteral(prop.initializer)) {
    const v = prop.initializer.text;
    if (v === 'singleton' || v === 'scoped' || v === 'transient') scope = v;
  }
}

Five property types follow the same pattern: call readStringArray(val) on the unwrapped value. The transitions property requires a nested loop because it is an array of object literals, each with from, to, and on string fields. The feature property is a single object literal with id and ac. The description and scope properties are plain string literals.

The function returns null if no @FiniteStateMachine decorator is found on any *Fsm class. This is safe — the caller (extractMachineFromSource) handles the null case by falling back to union-type detection for states and empty arrays for everything else.

as const Unwrapping

Every @FiniteStateMachine decorator in this codebase uses as const assertions:

@FiniteStateMachine({
  states: ['idle', 'loading', 'rendering', 'postProcessing', 'done', 'error'] as const,
  events: ['startLoad', 'markRendering', 'markPostProcessing', 'markDone', 'markError'] as const,
  transitions: [
    { from: 'idle', to: 'loading', on: 'startLoad' },
    // ...
  ] as const,
  emits:   ['app-ready', 'toc-headings-rendered'] as const,
  listens: [] as const,
  guards:  ['staleGeneration'] as const,
  // ...
})

The as const assertion is essential for the type system. Without it, TypeScript widens ['idle', 'loading'] to string[], and the machine's state type becomes string instead of 'idle' | 'loading'. Part III documented why this matters.

But as const also affects the AST. In TypeScript's AST representation, ['idle', 'loading'] as const is not an ArrayLiteralExpression — it is an AsExpression wrapping an ArrayLiteralExpression. If readStringArray received the raw node from the property assignment, it would check ts.isArrayLiteralExpression(node), get false (because the node is an AsExpression), and return an empty array. Every states/events/emits/listens/guards property would extract as [].

The unwrapAsConst helper strips the assertion:

function unwrapAsConst(node: ts.Expression): ts.Expression {
  return ts.isAsExpression(node) ? node.expression : node;
}

One line. One isAsExpression check. One level of unwrapping. The result is the underlying ArrayLiteralExpression (or ObjectLiteralExpression for feature, or StringLiteral for nested values). Every call site that reads a property value calls unwrapAsConst first:

const val = unwrapAsConst(prop.initializer);  // strips 'as const'
states.push(...readStringArray(val));          // now sees ArrayLiteralExpression

The readStringArray helper then iterates the array elements:

function readStringArray(val: ts.Expression): string[] {
  const out: string[] = [];
  if (ts.isArrayLiteralExpression(val)) {
    for (const el of val.elements) {
      if (ts.isStringLiteral(el)) out.push(el.text);
    }
  }
  return out;
}

Non-string elements are silently skipped. This is defensive: the decorator's type system already enforces that these arrays contain only string literals, so non-string elements would be a type error. But the extractor is a build-time tool that reads the AST of potentially malformed source — it should not crash on unexpected input.

The left side shows the raw AST: the property assignment's initializer is an AsExpression node that wraps the array. Without unwrapping, readStringArray receives the AsExpression, fails the isArrayLiteralExpression check, and returns [].

The right side shows what happens after unwrapAsConst: readStringArray receives the ArrayLiteralExpression directly and extracts both string literals.

This pattern recurs throughout the extractor. The transitions loop unwraps each array element (const entry = unwrapAsConst(el)). The feature property unwraps the object literal. Nested property values inside transition entries unwrap their string literals. Eleven call sites, one helper, zero missed as const assertions.

extractMachineFromSource — The Per-File Composition

The per-file extraction function combines decorator reading with module-level export scanning:

function extractMachineFromSource(source: LibSource): MachineNode {
  const sf = parseSource(source.relPath, source.text);
  const exportsList: string[] = [];
  const functions:   string[] = [];
  const unionStates: string[] = [];

  function visit(node: ts.Node): void {
    // Exported function declarations
    if (ts.isFunctionDeclaration(node) && hasExportModifier(node) && node.name) {
      exportsList.push(node.name.text);
      functions.push(node.name.text);
    }
    // Exported variable statements (const factories)
    if (ts.isVariableStatement(node) && hasExportModifier(node)) {
      for (const decl of node.declarationList.declarations) {
        if (!ts.isIdentifier(decl.name)) continue;
        exportsList.push(decl.name.text);
        functions.push(decl.name.text);
      }
    }
    // Exported type aliases — and state union detection
    if (ts.isTypeAliasDeclaration(node) && hasExportModifier(node)) {
      const name = node.name.text;
      exportsList.push(name);
      if (/State$/.test(name) && ts.isUnionTypeNode(node.type)) {
        for (const t of node.type.types) {
          if (ts.isLiteralTypeNode(t) && ts.isStringLiteral(t.literal)) {
            unionStates.push(t.literal.text);
          }
        }
      }
    }
    // Exported interfaces
    if (ts.isInterfaceDeclaration(node) && hasExportModifier(node)) {
      exportsList.push(node.name.text);
    }
    ts.forEachChild(node, visit);
  }
  visit(sf);

  // Decorator data takes priority over union-type detection
  const decoratorData = extractFsmDecoratorData(sf);
  const states = decoratorData ? decoratorData.states : unionStates;
  const events = decoratorData?.events ?? [];

  // Decorator transitions take priority over AST inference
  let transitions: TransitionEntry[];
  if (decoratorData?.transitions && decoratorData.transitions.length > 0) {
    transitions = decoratorData.transitions.map(t =>
      ({ method: t.on, from: t.from, to: t.to })
    );
  } else {
    const stateSet = new Set(states);
    transitions = stateSet.size > 0 ? extractTransitions(sf, stateSet) : [];
  }

  const base = path.basename(source.relPath).replace(/\.ts$/, '');
  return {
    id:          'machine:' + base,
    name:        base,
    file:        source.relPath,
    exports:     Array.from(new Set(exportsList)),
    functions:   Array.from(new Set(functions)),
    states,
    events,
    description: decoratorData?.description,
    transitions,
    emits:       decoratorData?.emits   ?? [],
    listens:     decoratorData?.listens ?? [],
    guards:      decoratorData?.guards  ?? [],
    feature:     decoratorData?.feature,
    scope:       decoratorData?.scope,
    category:    'machine',
  };
}

The priority chain is explicit:

States: decorator states array wins. If no decorator, fall back to union-type detection (export type FooState = 'idle' | 'active' where the type name ends with State).
Transitions: decorator transitions array wins. If the decorator exists but has no transitions (or has an empty array), fall back to extractTransitions — the AST-based inference from Part VIII.
Everything else: decorator properties or empty defaults. No fallback for emits, listens, guards, description, feature, or scope — those are only available from the decorator.

The extractTransitions fallback is the bridge from Phase 1 to Phase 2. Phase 1 infers transitions and patches the decorator. After the patch, the decorator has explicit transitions and the fallback is never reached. But if Phase 1 was skipped (e.g., a developer ran the extractor directly without build:state-graph), the fallback ensures the graph still has transition data — albeit less reliable than the Phase 1 output.

Edge Building

Three edge types, three detection algorithms. Each operates on a different data source: adapter IIFE bodies (imports), emits/listens arrays across all nodes (event), and import declarations between machine files (composes).

imports Edges

When the extractor processes src/app-shared.ts, it identifies adapter modules by their naming convention (PascalCase variable assigned an IIFE or object literal) and scans each IIFE body for identifiers that match exported symbols from any machine.

The buildMachineExportIndex function builds a lookup table: exported symbol name -> machine id. First-write-wins if two machines export the same name (rare but handled):

function buildMachineExportIndex(
  machines: ReadonlyArray<MachineNode>
): Map<string, string> {
  const index = new Map<string, string>();
  for (const m of machines) {
    for (const exp of m.exports) {
      if (!index.has(exp)) index.set(exp, m.id);
    }
  }
  return index;
}

Then extractAdaptersFromSource walks each adapter's initializer, collects every identifier that appears in the index, and creates an imports edge for each:

const referenced = new Set<string>();
function walk(n: ts.Node): void {
  if (ts.isIdentifier(n)) {
    const machineId = machineExportIndex.get(n.text);
    if (machineId) referenced.add(machineId);
  }
  ts.forEachChild(n, walk);
}
walk(decl.initializer);

for (const machineId of referenced) {
  edges.push({ from: adapterId, to: machineId, kind: 'imports' });
}

The result: adapter:SidebarResize imports machine:scroll-spy-machine, machine:sidebar-resize-state, and machine:zoom-pan-state. The edge tells the graph that SidebarResize is the wiring layer for those three machines.

event Edges

Event edges are cross-cutting. They connect any node that emits an event to every node that listens to the same event, regardless of whether the nodes are machines or adapters. The algorithm is a classic map-reduce:

// Build a listener index: event name -> node ids that listen
const listenersByEvent = new Map<string, string[]>();
for (const n of nodes) {
  for (const ev of n.listens) {
    const arr = listenersByEvent.get(ev) ?? [];
    arr.push(n.id);
    listenersByEvent.set(ev, arr);
  }
}

// For each emitter, create an edge to every listener of the same event
for (const n of nodes) {
  for (const ev of n.emits) {
    const listeners = listenersByEvent.get(ev) ?? [];
    for (const listenerId of listeners) {
      if (listenerId !== n.id) {  // no self-loops
        edges.push({
          from: n.id,
          to: listenerId,
          kind: 'event',
          eventName: ev,
        });
      }
    }
  }
}

The nodes array includes both adapters and machines. This is critical: some events flow machine-to-machine (e.g., machine:page-load-state emits toc-headings-rendered, machine:scroll-spy-machine listens to it) while others flow adapter-to-machine (e.g., adapter:SidebarMask emits sidebar-mask-change, machine:terminal-dots-state listens to it).

Self-loops are excluded: if a machine declares both emits: ['x'] and listens: ['x'], no edge is created from it to itself. This prevents visual clutter in the explorer without losing information — a machine that emits and listens to the same event is a coordinator, and its internal behavior is already visible in the transitions.

composes Edges

Composition is detected by scanning import declarations between machine files. If button-tooltip-state.ts imports from ./tooltip-state, and tooltip-state is a known machine name, the extractor creates a composes edge from machine:button-tooltip-state to machine:tooltip-state.

function extractMachineCompositions(
  sources: readonly LibSource[],
  machines: ReadonlyArray<MachineNode>,
): GraphEdge[] {
  const machineNames = new Set(machines.map(m => m.name));
  const edges: GraphEdge[] = [];

  for (const src of sources) {
    const machine = machines.find(m => m.file === src.relPath);
    if (!machine) continue;
    const sf = parseSource(src.relPath, src.text);

    ts.forEachChild(sf, node => {
      if (
        ts.isImportDeclaration(node) &&
        ts.isStringLiteral(node.moduleSpecifier)
      ) {
        const spec = node.moduleSpecifier.text;
        if (spec.startsWith('./') && spec !== './finite-state-machine') {
          const target = spec.replace('./', '');
          if (machineNames.has(target) && target !== machine.name) {
            edges.push({
              from: machine.id,
              to:   'machine:' + target,
              kind: 'composes',
            });
          }
        }
      }
    });
  }

  return edges;
}

The ./finite-state-machine import is excluded — every machine imports the decorator definition, and that import is structural plumbing, not composition. Only relative imports that resolve to other machine files are treated as composition edges.

The composition edges in this codebase:

From	To	Meaning
`machine:button-tooltip-state`	`machine:tooltip-state`	Button tooltip delegates to generic tooltip
`machine:explorer-coordinator`	`machine:explorer-detail-state`	Explorer coordinator owns detail panel
`machine:explorer-coordinator`	`machine:explorer-filter-state`	Explorer coordinator owns filter
`machine:explorer-coordinator`	`machine:explorer-selection-state`	Explorer coordinator owns selection
`machine:explorer-coordinator`	`machine:machine-popover-state`	Explorer coordinator owns popover
`machine:hot-reload-indicator`	`machine:hot-reload-observable`	Indicator composes observable
`machine:theme-coordinator`	`machine:accent-preview-state`	Theme coordinator owns preview
`machine:toc-breadcrumb-state`	(downstream)	Breadcrumb composes sub-machine
`machine:toc-tooltip-state`	`machine:tooltip-state`	TOC tooltip delegates to generic tooltip
`machine:tour-coordinator`	`machine:tour-toc-orchestrator`	Tour coordinator owns TOC step logic

These edges reveal the architecture's layering: coordinators compose sub-machines, and shared machines (like tooltip-state) are composed by multiple consumers. This is the same pattern as Part V's coordinator discussion, now made visible in the graph.

Edge Deduplication

After all three edge types are computed, the edges array may contain duplicates. The most common case: a coordinator imports from a sub-machine file (producing an imports-like edge from the adapter layer) and also composes it (producing a composes edge from the composition scanner). Both edges connect the same (from, to) pair. The graph should show only the more informative edge.

The deduplication uses a priority map:

const EDGE_PRIORITY: Record<GraphEdge['kind'], number> = {
  composes: 2,
  event:    1,
  imports:  0,
};

const edgeMap = new Map<string, GraphEdge>();
for (const e of edges) {
  const key = `${e.from}->${e.to}:${e.eventName ?? ''}`;
  const existing = edgeMap.get(key);
  if (!existing || EDGE_PRIORITY[e.kind] > EDGE_PRIORITY[existing.kind]) {
    edgeMap.set(key, e);
  }
}

composes wins over event wins over imports. The key includes the eventName to avoid merging distinct event edges between the same pair. Two edges with different event names are not duplicates — they represent two independent communication channels.

The Composition Root — buildGraph

All the pieces connect in buildGraph, the single entry point that the CLI shell calls:

function buildGraph(env: ExtractorEnv): StateMachineGraph {
  // 1. Filter machine sources via AST decorator detection
  const machines = env.libSources
    .filter(s => isMachineSource(s.relPath, s.text))
    .map(extractMachineFromSource)
    .sort((a, b) => a.name.localeCompare(b.name));

  // 2. Detect machine-to-machine composition
  const machineSources = env.libSources
    .filter(s => isMachineSource(s.relPath, s.text));
  const compositionEdges = extractMachineCompositions(machineSources, machines);

  // 3. Extract adapters and import edges from app-shared.ts
  const machineExportIndex = buildMachineExportIndex(machines);
  const { adapters, edges } = extractAdaptersFromSource(
    env.appSharedSource, machineExportIndex
  );

  // 4. Merge composition edges
  edges.push(...compositionEdges);

  // 5. Build event edges across all nodes
  const nodes = [...adapters, ...machines.map(m => ({
    id: m.id, emits: m.emits, listens: m.listens,
  }))];
  // ... listener index + emitter loop (shown above)

  // 6. Deduplicate edges
  // ... priority map (shown above)

  // 7. Sort deterministically
  adapters.sort((a, b) => a.name.localeCompare(b.name));
  dedupedEdges.sort((a, b) =>
    a.from.localeCompare(b.from) ||
    a.to.localeCompare(b.to) ||
    (a.eventName ?? '').localeCompare(b.eventName ?? '')
  );

  return {
    generatedAt: env.clock(),
    machines,
    adapters,
    edges: dedupedEdges,
  };
}

Seven steps. No side effects. The function takes an ExtractorEnv and returns a StateMachineGraph. The CLI shell serializes the result to data/state-machines.json with JSON.stringify(graph, null, 2). The extractor does not know about files, paths, or serialization formats.

The pipeline is deterministic: same source text, same clock value, same output. Sorting is lexicographic on names and IDs. No iteration order dependency, no random element. Two runs on the same source produce byte-identical JSON.

The Output: state-machines.json

The CLI shell serializes the graph to data/state-machines.json. As of this build: 3,732 lines, 85KB, 43 machines, 10 adapters, 27 edges. Here is an excerpt showing two machines, one adapter, and three edges:

{
  "generatedAt": "2026-04-12T14:46:20.519Z",
  "machines": [
    {
      "id": "machine:page-load-state",
      "name": "page-load-state",
      "file": "src/lib/page-load-state.ts",
      "exports": [
        "PageLoadState", "PageLoadCallbacks",
        "PageLoadMachine", "isStale", "createPageLoadMachine"
      ],
      "functions": ["isStale", "createPageLoadMachine"],
      "states": ["idle", "loading", "rendering", "postProcessing", "done", "error"],
      "events": ["startLoad", "markRendering", "markPostProcessing", "markDone", "markError"],
      "description": "Tracks SPA page-load lifecycle from fetch to post-processing.",
      "transitions": [
        { "method": "startLoad",          "from": "idle",           "to": "loading" },
        { "method": "markRendering",      "from": "loading",        "to": "rendering" },
        { "method": "markPostProcessing", "from": "rendering",      "to": "postProcessing" },
        { "method": "markDone",           "from": "postProcessing", "to": "done" },
        { "method": "markError",          "from": "*",              "to": "error" }
      ],
      "emits": ["app-ready", "toc-headings-rendered"],
      "listens": [],
      "guards": ["staleGeneration"],
      "feature": { "id": "PAGE-LOAD", "ac": "fullLifecycle" },
      "category": "machine"
    },
    {
      "id": "machine:scroll-spy-machine",
      "name": "scroll-spy-machine",
      "file": "src/lib/scroll-spy-machine.ts",
      "exports": [
        "HeadingPosition", "DetectionMode", "TransitionInput",
        "SpyState", "detectByScroll", "detectActiveSlug",
        "resolveToTocEntry", "transition"
      ],
      "functions": ["detectByScroll", "detectActiveSlug", "resolveToTocEntry", "transition"],
      "states": ["mouse", "scroll"],
      "events": ["transition", "setDetectionMode"],
      "description": "Detects the active heading under the cursor or top-of-viewport; drives sidebar highlight + URL hash updates.",
      "transitions": [
        { "method": "setDetectionMode", "from": "mouse",  "to": "scroll" },
        { "method": "setDetectionMode", "from": "scroll", "to": "mouse" }
      ],
      "emits": ["scrollspy-active"],
      "listens": ["toc-headings-rendered", "toc-animation-done"],
      "guards": [],
      "feature": { "id": "SPY", "ac": "transitionComputesState" },
      "category": "machine"
    }
  ],
  "adapters": [
    {
      "id": "adapter:SidebarMask",
      "name": "SidebarMask",
      "file": "src/app-shared.ts",
      "machines": [],
      "emits": ["sidebar-mask-change"],
      "listens": ["dblclick"],
      "category": "adapter"
    }
  ],
  "edges": [
    {
      "from": "adapter:SidebarMask",
      "to": "machine:terminal-dots-state",
      "kind": "event",
      "eventName": "sidebar-mask-change"
    },
    {
      "from": "machine:page-load-state",
      "to": "machine:scroll-spy-machine",
      "kind": "event",
      "eventName": "toc-headings-rendered"
    },
    {
      "from": "machine:explorer-coordinator",
      "to": "machine:explorer-detail-state",
      "kind": "composes"
    }
  ]
}

Three things to note in this excerpt:

The page-load-state machine has 6 states, 5 events, 5 transitions, and emits 2 custom events. Its feature link ties it to the PAGE-LOAD requirement with acceptance criterion fullLifecycle. The guards array names the staleGeneration predicate — the generation counter from Part IV.

The scroll-spy-machine has only 2 states (mouse and scroll) but 4 exported functions. Not every export is a state transition — detectByScroll, detectActiveSlug, and resolveToTocEntry are query functions that compute values without changing state. The functions array includes them because they are exported const declarations or function declarations. The transitions array correctly excludes them because they do not assign to the state variable.

The event edge from machine:page-load-state to machine:scroll-spy-machine on toc-headings-rendered represents a real architectural dependency: when page load finishes rendering, it signals the scroll spy to recompute heading positions. The edge is derived from the decorator metadata — page-load-state.emits includes 'toc-headings-rendered' and scroll-spy-machine.listens includes 'toc-headings-rendered'. No manual wiring.

fsm-composition.json — The D3-Ready Derivative

The interactive explorer (Part X) needs a simplified graph for its D3 force-directed layout. The full state-machines.json is too detailed — it contains exports, functions, transitions, and other properties that the force layout does not use. A separate build step produces data/fsm-composition.json with only the fields D3 needs:

{
  "generatedAt": "2026-04-12T14:46:20.519Z",
  "nodes": [
    {
      "id": "machine:accent-palette-state",
      "name": "accent-palette-state",
      "kind": "machine",
      "file": "src/lib/accent-palette-state.ts",
      "feature": { "id": "ACCENT", "ac": "rightClickOpensPalette" }
    },
    {
      "id": "machine:app-readiness-state",
      "name": "app-readiness-state",
      "kind": "machine",
      "file": "src/lib/app-readiness-state.ts",
      "scope": "singleton",
      "emits": ["app-ready", "app-route-ready"]
    }
  ],
  "edges": [
    {
      "from": "machine:app-readiness-state",
      "to": "machine:tour-coordinator",
      "kind": "event",
      "eventName": "app-ready"
    },
    {
      "from": "machine:explorer-coordinator",
      "to": "machine:explorer-detail-state",
      "kind": "composes"
    }
  ]
}

Each node carries id, name, kind (machine or adapter), file, and optionally feature, scope, emits. No transitions, no exports, no functions, no guards. Each edge carries from, to, kind, and optionally eventName. The format is directly consumable by D3's force simulation: nodes are the force nodes, edges are the links, kind drives the link color and style.

The composition graph is not a separate extraction — it is a projection of state-machines.json. The build step reads the full graph, strips the unnecessary fields, and writes the simplified version. One source of truth, two views.

The state-machine-extractor.ts Module

The entire extraction lives in a single file: scripts/lib/state-machine-extractor.ts. 995 lines of pure TypeScript Compiler API code. No filesystem calls. No process arguments. No side effects.

The module exports:

Export	Kind	Purpose
`StateMachineGraph`	interface	Top-level output schema
`MachineNode`	interface	Machine node schema
`AdapterNode`	interface	Adapter node schema
`GraphEdge`	interface	Edge schema
`ExtractorEnv`	interface	DI seam for the composition root
`LibSource`	interface	Source file pair (path + text)
`AsyncFileReader`	interface	IO abstraction for the CLI shell
`TransitionEntry`	interface	Single transition (method, from, to)
`FsmDecoratorData`	interface	Extracted decorator properties
`DecoratorTransitionEntry`	interface	Transition as written in decorator (from, to, on)
`parseSource`	function	TypeScript AST parser wrapper
`hasFiniteStateMachineDecorator`	function	Decorator detector
`isMachineSource`	function	File + decorator compound check
`filterMachineSources`	function	Batch filter
`loadLibSources`	function	Async file reader for CLI shell
`extractMachineFromSource`	function	Per-file extraction
`extractTransitions`	function	AST-based transition inference (fallback)
`buildMachineExportIndex`	function	Export symbol lookup table
`extractMachineCompositions`	function	Import-based composition detection
`extractAdaptersFromSource`	function	Adapter + import edge extraction
`buildGraph`	function	Composition root

Twenty-one exports. Every function is pure — takes data in, returns data out. The unit test suite (test/unit/state-machine-extractor.test.ts) covers them with in-memory source strings, achieving the 98%+ coverage gate enforced by the vitest configuration.

The design follows one principle: the extractor is a compiler. It reads source text (input), builds an AST (intermediate representation), extracts structured data (semantic analysis), and writes a JSON graph (code generation). The TypeScript Compiler API is not a convenience — it is the right tool for the job. Regex-based extraction would break on string literals containing decorator-like text, on commented-out decorators, on multi-line expressions, and on as const assertions. The AST handles all of these correctly because the TypeScript parser already handles them. The extractor inherits the parser's correctness for free.

What Phase 2 Produces for Phase 3

Phase 2's output is Phase 3's input. The state-machines.json file contains everything the SVG renderer needs:

Machine nodes with transitions — the renderer draws a statechart per machine, with boxes for states and arrows for transitions.
Adapter nodes with machine references — the renderer draws adapter boxes connected to their machines.
Event edges with labels — the renderer draws labeled arrows between nodes that communicate via custom events.
Composition edges — the renderer draws structural arrows from coordinators to sub-machines.
Feature links — the renderer colors nodes by requirement coverage (linked vs orphan).

No additional computation is needed between Phase 2 and Phase 3. The renderer reads the JSON, computes the layout, and writes SVG. The data contract between phases is the StateMachineGraph interface — stable, versioned, tested.

The full pipeline, from source to interactive explorer:

Phase 1 ensures every decorator has transitions. Phase 2 reads every decorator and builds the graph. Phase 3 renders the graph into an interactive explorer. Three phases, three responsibilities, three build commands, one unified pipeline: npm run build:state-graph.

The JSON file is also consumed by the topology scanner (Part VII) which reads emits and listens to detect drift, by the compliance report which reads feature to check requirement coverage, and by the composition graph builder which projects the full graph into the D3-ready format. Phase 2 is the hub. Everything downstream reads from it.

Continue to Part X: Phase 3 — SVG Rendering, Cache, and the Interactive Explorer →

`[` or `Alt+S`	Focus sidebar navigation
`]` or `Alt+C`	Focus main content
`↑` `↓`	Navigate between sidebar items
`Enter`	Open page / toggle section
`Space`	Toggle section expand/collapse
`Escape`	Close overlay / sidebar

`Ctrl+K`	Open search
`?`	Show this help

`Ctrl+=` or `Ctrl+↑`	Increase font size
`Ctrl+−` or `Ctrl+↓`	Decrease font size
`f`	Open console font selector

`Ctrl+⇧+=` or `Ctrl+⇧+↑`	Browser zoom in
`Ctrl+⇧+−` or `Ctrl+⇧+↓`	Browser zoom out
`Ctrl+⇧+0`	Reset browser zoom

`Tab`	Focus a diagram or image
`Enter`	Open full size overlay
`+` `−`	Zoom in / out (in overlay)
`Escape`	Close overlay, return focus

Phase 2 — Extraction via TypeScript Compiler API📋

The Graph Schema📋

MachineNode📋

AdapterNode📋

GraphEdge📋

The ExtractorEnv Interface📋

hasFiniteStateMachineDecorator — The Opt-In Signal📋

extractFsmDecoratorData — Reading the Decorator AST📋

as const Unwrapping📋

extractMachineFromSource — The Per-File Composition📋

Edge Building📋

imports Edges📋

event Edges📋

composes Edges📋

Edge Deduplication📋

The Composition Root — buildGraph📋

The Output: state-machines.json📋

fsm-composition.json — The D3-Ready Derivative📋

The state-machine-extractor.ts Module📋

What Phase 2 Produces for Phase 3📋