The Testability Wall
The AST scanner described in Part I needs a specific kind of codebase to work well: one where tests import the code they verify directly, through clean import paths that the scanner can resolve. If a test reaches production code through side effects, global state, or jsdom's simulated DOM — the scanner sees nothing.
Before the refactor, the codebase had 232 sync IO calls — fs.readFileSync, child_process.execSync, require() with dynamic paths — scattered through src/lib/ and scripts/lib/. Factories in src/lib/ imported document, window, localStorage, and fetch directly. Testing them required jsdom for everything. Coverage sat at ~43%.
The problem was not just coverage. It was traceability. A test that creates a JSDOM instance, injects HTML, and then calls a function that reaches document.querySelector() internally — the scanner sees the test import JSDOM from jsdom (external, filtered) and nothing else. The production function is reached through a side effect, not an import. The binding is invisible.
The Hexagonal Port Extraction
The fix was systematic: every external capability became a typed port in src/lib/external.ts.
// src/lib/external.ts — narrow typed interfaces
export interface FileSystem {
readFile(path: string, encoding: 'utf8'): Promise<string>;
writeFile(path: string, content: string): Promise<void>;
exists(path: string): Promise<boolean>;
readdir(dir: string): Promise<string[]>;
stat(path: string): Promise<{ isFile(): boolean }>;
unlink(path: string): Promise<void>;
mkdir(path: string, opts?: { recursive: boolean }): Promise<void>;
}
export interface Logger {
info(msg: string, ...args: unknown[]): void;
warn(msg: string, ...args: unknown[]): void;
error(msg: string, ...args: unknown[]): void;
}
export interface Scheduler {
setTimeout(cb: () => void, ms: number): number;
clearTimeout(id: number): void;
requestAnimationFrame(cb: () => void): number;
cancelAnimationFrame(id: number): void;
}// src/lib/external.ts — narrow typed interfaces
export interface FileSystem {
readFile(path: string, encoding: 'utf8'): Promise<string>;
writeFile(path: string, content: string): Promise<void>;
exists(path: string): Promise<boolean>;
readdir(dir: string): Promise<string[]>;
stat(path: string): Promise<{ isFile(): boolean }>;
unlink(path: string): Promise<void>;
mkdir(path: string, opts?: { recursive: boolean }): Promise<void>;
}
export interface Logger {
info(msg: string, ...args: unknown[]): void;
warn(msg: string, ...args: unknown[]): void;
error(msg: string, ...args: unknown[]): void;
}
export interface Scheduler {
setTimeout(cb: () => void, ms: number): number;
clearTimeout(id: number): void;
requestAnimationFrame(cb: () => void): number;
cancelAnimationFrame(id: number): void;
}Twelve port interfaces in total: ClassListLike, ElementLike, Clipboard, Scheduler, Clock, Fetcher, WsClient, FileSystem, CommandRunner, ProcessEnv, OutputSink, Logger. Composite dep bags combine them: BrowserDeps for DOM factories, NodeDeps for CLI tools.
Every factory function takes its capabilities as a parameter:
// Before: coupled to globals
export function createScrollSpyMachine() {
const observer = new IntersectionObserver(...);
window.addEventListener('scroll', ...);
// untestable without jsdom
}
// After: injected deps
export function createScrollSpyMachine(deps: {
window: WindowLike;
scheduler: Scheduler;
}) {
// testable with fakes: { scrollY: 0, addEventListener: () => {} }
}// Before: coupled to globals
export function createScrollSpyMachine() {
const observer = new IntersectionObserver(...);
window.addEventListener('scroll', ...);
// untestable without jsdom
}
// After: injected deps
export function createScrollSpyMachine(deps: {
window: WindowLike;
scheduler: Scheduler;
}) {
// testable with fakes: { scrollY: 0, addEventListener: () => {} }
}The pattern is the same everywhere: scripts/lib/*-core.ts contains pure logic consuming injected ports. scripts/*.ts is a thin shell (~30-60 lines) that wires the real adapters, parses argv, and calls the core. Tests instantiate the core with fakes — no mocking frameworks, no jsdom, no network. An in-memory FileSystem is a plain object with readFile: (p) => files.get(p).
A Concrete Factory: ScrollSpyMachine Before and After
The scroll spy machine detects which heading is currently visible as the user scrolls. Before the refactor, it reached directly into the DOM:
// BEFORE — coupled to globals, untestable without jsdom
export function createScrollSpyMachine() {
let activeSlug: string | null = null;
window.addEventListener('scroll', () => {
const headings = document.querySelectorAll('[data-slug]');
for (const h of headings) {
if (h.getBoundingClientRect().top < 100) {
activeSlug = h.getAttribute('data-slug');
}
}
});
return { getActiveSlug: () => activeSlug };
}// BEFORE — coupled to globals, untestable without jsdom
export function createScrollSpyMachine() {
let activeSlug: string | null = null;
window.addEventListener('scroll', () => {
const headings = document.querySelectorAll('[data-slug]');
for (const h of headings) {
if (h.getBoundingClientRect().top < 100) {
activeSlug = h.getAttribute('data-slug');
}
}
});
return { getActiveSlug: () => activeSlug };
}Testing this required new JSDOM(html), setting up a fake scroll position, dispatching a scroll event, and hoping jsdom's getBoundingClientRect() returned something useful (it does not — jsdom has no layout engine). The test was fragile, slow, and the AST scanner saw only import { JSDOM } from 'jsdom' — no binding to the production code.
After the refactor:
// AFTER — deps injected, testable with a 5-line fake
export function createScrollSpyMachine(deps: {
window: WindowLike;
scheduler: Scheduler;
}): SpyMachine {
let activeSlug: string | null = null;
// Uses deps.window.addEventListener, deps.scheduler.requestAnimationFrame
// No direct DOM access — the caller provides the capabilities
return { getActiveSlug: () => activeSlug, transition, /* ... */ };
}// AFTER — deps injected, testable with a 5-line fake
export function createScrollSpyMachine(deps: {
window: WindowLike;
scheduler: Scheduler;
}): SpyMachine {
let activeSlug: string | null = null;
// Uses deps.window.addEventListener, deps.scheduler.requestAnimationFrame
// No direct DOM access — the caller provides the capabilities
return { getActiveSlug: () => activeSlug, transition, /* ... */ };
}The test provides a fake window with scrollY: 200 and a list of heading positions as data:
@Verifies<ScrollSpyFeature>('activeHeadingUpdatesOnScroll')
'detects active heading based on scroll position'() {
const machine = createScrollSpyMachine({
window: { scrollY: 200, addEventListener: () => {} } as WindowLike,
scheduler: { requestAnimationFrame: (cb) => { cb(); return 0; } } as Scheduler,
});
const result = machine.transition({ headings: [{ id: 'intro', top: 0 }, { id: 'body', top: 150 }], /* ... */ });
expect(result.activeSlug).toBe('body');
}@Verifies<ScrollSpyFeature>('activeHeadingUpdatesOnScroll')
'detects active heading based on scroll position'() {
const machine = createScrollSpyMachine({
window: { scrollY: 200, addEventListener: () => {} } as WindowLike,
scheduler: { requestAnimationFrame: (cb) => { cb(); return 0; } } as Scheduler,
});
const result = machine.transition({ headings: [{ id: 'intro', top: 0 }, { id: 'body', top: 150 }], /* ... */ });
expect(result.activeSlug).toBe('body');
}No jsdom. No layout engine. No flaky scroll events. The test imports createScrollSpyMachine from src/lib/scroll-spy-machine.ts — the scanner resolves it, and the AC is bound. The hexagonal pattern made both testability and traceability possible in one move.
The 232 Sync Calls: Six Agents in Parallel
Migrating 232 sync IO calls is not a task for a single afternoon. It was done by six AI agents running in parallel, each on a disjoint scope:
| Agent | Scope | Violations migrated |
|---|---|---|
| 1 | scripts/lib/mermaid-*.ts |
34 |
| 2 | scripts/lib/compliance-*.ts |
41 |
| 3 | scripts/cli/core/*.ts |
28 |
| 4 | scripts/cli/commands/*.ts |
19 |
| 5 | src/lib/*.ts (browser factories) |
62 |
| 6 | test/**/*.ts (test helpers) |
48 |
Each agent followed the same pattern: read the sync call, identify the capability needed (filesystem, command runner, logger), check if the port exists in external.ts (create it if not), replace the direct call with a port method, and write the corresponding test with a fake.
No agent touched another agent's scope. No merge conflicts. The result: 232 violations went to 0 in a single coordinated session. The sync-usage scanner (described below) ensures no one re-introduces a sync call.
The Coverage That Would Not Flush
One friction point deserves its own section because it illustrates why hexagonal architecture is not just about testability — it is about observability.
After extracting the ports, vitest v8 still refused to write the coverage report. Tests passed, but the coverage/ directory stayed empty. The cause: scripts/lib/build-js.ts printed a console.log("✓ src/foo.ts → js/foo.js") line for every TypeScript entry it compiled. During tests that exercised the build pipeline, this flooded stdout with hundreds of lines. vitest v8's coverage reporter, which also writes to stdout, could not flush its output.
The fix was the same hexagonal pattern: inject a Logger port into the build-js core. In production, the logger prints to the console. In tests, the logger is a silent no-op:
// test
const silentLogger: Logger = {
info: () => {},
warn: () => {},
error: () => {},
};
const result = await makeBuildJs({ logger: silentLogger, fs: fakeFs }).run();// test
const silentLogger: Logger = {
info: () => {},
warn: () => {},
error: () => {},
};
const result = await makeBuildJs({ logger: silentLogger, fs: fakeFs }).run();The build-js module stopped flooding stdout during tests. vitest flushed the coverage. Coverage jumped from "not reported" to 99.75% on the first successful run.
This is a case study in how SOLID principles solve bugs you did not anticipate. The Logger port was introduced for testability — so tests would not pollute the console. But it also fixed a coverage reporting bug that had nothing to do with testability. The Logger port separated two concerns (build progress output and test coverage output) that shared a single channel (stdout). The Dependency Inversion Principle resolved a race condition that no one had diagnosed as a race condition.
If a module cannot call console.log directly, it cannot accidentally break another module's stdout contract. The hexagonal rule is not just a testing convenience — it is a structural guarantee about side-effect isolation.
The Sync-Usage Scanner
Extracting 232 sync IO calls is a one-time effort. Preventing new ones from creeping in is a permanent problem. Discipline does not scale — discipline means "someone remembers to check."
The solution is another AST scanner: scripts/lib/sync-usage-scanner.ts. It walks src/**, scripts/**, and test/** looking for:
fs.readFileSync,fs.writeFileSync, and every other*Syncmethod onfschild_process.execSync,child_process.spawnSyncrequire()calls with non-literal arguments (dynamic requires)
It is itself a feature — SYNC-USAGE-TRACKING with 5 ACs — verified by @Verifies tests that assert zero violations across each scope. If anyone re-introduces a readFileSync call anywhere in the codebase, the test fails. This is a ratchet, not discipline.
$ npx tsx scripts/scan-sync-usage.ts
scan-sync-usage: 0 violations in src/
scan-sync-usage: 0 violations in scripts/
scan-sync-usage: 0 violations in test/$ npx tsx scripts/scan-sync-usage.ts
scan-sync-usage: 0 violations in src/
scan-sync-usage: 0 violations in scripts/
scan-sync-usage: 0 violations in test/Coverage Hardening: 43% to 99.75%
With hexagonal ports in place, every factory became testable with cheap fakes. The coverage leap was not one heroic effort — it was the structural consequence of making testability the default:
- 159 test files, up from 19 in V1
- 2,642 tests, up from ~644
- vitest thresholds: 98% statements, 98% functions, 95% branches, 99% lines for
src/lib/** - scripts-specific thresholds: 99% lines for
compliance-core.ts, 98% for other cores
The remaining 0.25% uncovered code is thin adapter code — IIFE wrappers in src/app-static.ts and src/app-dev.ts that create real document / fs instances and pass them to factories. These are exercised only by E2E tests (Playwright), by design. The hexagonal rule: test the port, not the adapter.
The coverage structure mirrors the architecture. vitest.config.js explicitly includes only the testable cores:
coverage: {
include: [
'src/lib/**/*.ts',
'scripts/lib/**/*.ts',
'scripts/cli/commands/lib/**/*.ts',
'scripts/cli/core/**/*.ts',
'scripts/cli/tui/machines/**/*.ts',
],
}coverage: {
include: [
'src/lib/**/*.ts',
'scripts/lib/**/*.ts',
'scripts/cli/commands/lib/**/*.ts',
'scripts/cli/core/**/*.ts',
'scripts/cli/tui/machines/**/*.ts',
],
}Everything outside this list — thin shells, IIFEs, workers, IO adapters — is excluded. The coverage number means what it says: 99.75% of the code that can be unit-tested is unit-tested.
Why Clean Import Graphs Feed Traceability
The hexagonal architecture and the AST scanner are not independent improvements. They reinforce each other:
Before hex arch: A test imports jsdom, creates a fake DOM, and calls a function that internally reads document.querySelector('.toc-item'). The scanner sees import { JSDOM } from 'jsdom' — external module, filtered. The production function is reached through a side effect. The binding is invisible. The AC appears "unbound" in the compliance report.
After hex arch: A test imports createScrollSpyMachine from src/lib/scroll-spy-machine.ts and passes a fake WindowLike. The scanner sees the import, resolves it to a repo-relative path, confirms it is in scope, and binds the AC to the source file. The traceability is mechanically verifiable because the dependency graph is explicit.
This is the deeper reason the refactor had to be done as a whole: fixing the scanner without fixing the import graphs would have left hundreds of methods "empty." Fixing the import graphs without the scanner would have improved testability but not traceability. The two changes compose: clean imports make AST inference possible, and AST inference makes clean imports valuable beyond just testability.
Next: Part V — AI-Driven Self-Implementation shows what happens when the loop is closed tight enough that an AI agent can operate inside it.