Part VI: Testing Pure State Machines
98% statement coverage. 600ms unit test suite. Zero flaky tests. Zero DOM dependencies. The state machines are the most-tested code in the project because they are the easiest code to test.
The Payoff
The entire point of extracting state machines with callback injection (Parts II-IV) was testability. The entire point of migrating to TypeScript (Part V) was to make the interfaces compile-time contracts. This part shows the payoff: a testing strategy that's fast, thorough, and -- crucially -- not fragile.
The testing pyramid:
Each layer catches a different class of bugs:
- Unit tests catch logic errors in state transitions, guard clauses, and pure helpers.
- Property-based tests catch edge cases that example-based tests miss (empty strings, unicode, extreme values).
- E2E tests catch integration errors where the wiring layer connects machines to the real DOM.
The Unit Test Setup Pattern
Every state machine test follows the same setup() pattern. Here's the canonical example from spa-nav-state.test.ts:
import { describe, it, expect, vi } from 'vitest';
import {
createSpaNavMachine,
classifyNavigation,
type SpaNavCallbacks,
type SpaNavState,
} from '../../src/lib/spa-nav-state';
function setup() {
const stateChanges: SpaNavState[] = [];
const callbacks: SpaNavCallbacks = {
onStateChange: vi.fn((state: SpaNavState) => {
stateChanges.push(state);
}),
scrollToHash: vi.fn(),
toggleHeadings: vi.fn(),
startFetch: vi.fn(),
closeHeadings: vi.fn(() => false),
swapContent: vi.fn(),
updateActiveItem: vi.fn(),
pushHistory: vi.fn(),
postSwap: vi.fn(),
};
const machine = createSpaNavMachine(callbacks);
return { machine, callbacks, stateChanges };
}import { describe, it, expect, vi } from 'vitest';
import {
createSpaNavMachine,
classifyNavigation,
type SpaNavCallbacks,
type SpaNavState,
} from '../../src/lib/spa-nav-state';
function setup() {
const stateChanges: SpaNavState[] = [];
const callbacks: SpaNavCallbacks = {
onStateChange: vi.fn((state: SpaNavState) => {
stateChanges.push(state);
}),
scrollToHash: vi.fn(),
toggleHeadings: vi.fn(),
startFetch: vi.fn(),
closeHeadings: vi.fn(() => false),
swapContent: vi.fn(),
updateActiveItem: vi.fn(),
pushHistory: vi.fn(),
postSwap: vi.fn(),
};
const machine = createSpaNavMachine(callbacks);
return { machine, callbacks, stateChanges };
}Three things returned:
- machine — the state machine instance, ready to receive events.
- callbacks — all callbacks wrapped in
vi.fn()(Vitest mock functions). Every call is recorded: arguments, call count, call order. - stateChanges — an array that captures every state transition. Instead of checking
machine.getState()(which only shows the current state), you can verify the full transition history.
This pattern gives tests complete visibility into the machine's behavior without touching the DOM.
Testing State Transitions
describe('full navigation', () => {
it('should walk through idle → fetching → swapping → settled', () => {
const { machine, callbacks, stateChanges } = setup();
machine.navigate('/skills', '/about', null, '/skills');
expect(machine.getState()).toBe('fetching');
expect(callbacks.startFetch).toHaveBeenCalledWith('/skills');
machine.fetchComplete('<h1>Skills</h1>');
// closeHeadings returned false, so skip closingHeadings
expect(stateChanges).toEqual(['fetching', 'swapping', 'settled']);
expect(callbacks.swapContent).toHaveBeenCalledWith('<h1>Skills</h1>', '/skills');
expect(callbacks.pushHistory).toHaveBeenCalledWith('/skills', false);
expect(callbacks.postSwap).toHaveBeenCalledWith('/skills', null);
});
});describe('full navigation', () => {
it('should walk through idle → fetching → swapping → settled', () => {
const { machine, callbacks, stateChanges } = setup();
machine.navigate('/skills', '/about', null, '/skills');
expect(machine.getState()).toBe('fetching');
expect(callbacks.startFetch).toHaveBeenCalledWith('/skills');
machine.fetchComplete('<h1>Skills</h1>');
// closeHeadings returned false, so skip closingHeadings
expect(stateChanges).toEqual(['fetching', 'swapping', 'settled']);
expect(callbacks.swapContent).toHaveBeenCalledWith('<h1>Skills</h1>', '/skills');
expect(callbacks.pushHistory).toHaveBeenCalledWith('/skills', false);
expect(callbacks.postSwap).toHaveBeenCalledWith('/skills', null);
});
});The test drives the machine through a complete navigation cycle by calling methods in sequence: navigate(), then fetchComplete(). After each call, it asserts the state and which callbacks were invoked.
The stateChanges array is key. It captures the full sequence: ['fetching', 'swapping', 'settled']. This verifies not just the final state but every intermediate transition. If the machine accidentally transitions through an unexpected state, the array will show it.
Testing the closingHeadings Branch
it('should include closingHeadings when headings are open', () => {
const { machine, callbacks, stateChanges } = setup();
(callbacks.closeHeadings as ReturnType<typeof vi.fn>).mockReturnValue(true);
machine.navigate('/skills', '/about', null, '/skills');
machine.fetchComplete('<h1>Skills</h1>');
expect(machine.getState()).toBe('closingHeadings');
expect(stateChanges).toEqual(['fetching', 'closingHeadings']);
machine.transitionEnd();
expect(stateChanges).toEqual(['fetching', 'closingHeadings', 'swapping', 'settled']);
});it('should include closingHeadings when headings are open', () => {
const { machine, callbacks, stateChanges } = setup();
(callbacks.closeHeadings as ReturnType<typeof vi.fn>).mockReturnValue(true);
machine.navigate('/skills', '/about', null, '/skills');
machine.fetchComplete('<h1>Skills</h1>');
expect(machine.getState()).toBe('closingHeadings');
expect(stateChanges).toEqual(['fetching', 'closingHeadings']);
machine.transitionEnd();
expect(stateChanges).toEqual(['fetching', 'closingHeadings', 'swapping', 'settled']);
});By mocking closeHeadings to return true, the test forces the machine into the closingHeadings branch. Then it calls transitionEnd() to simulate the CSS animation completing. The test verifies the full transition sequence including the animation state.
Testing Guard Clauses
Guard clauses are the most important thing to test. They ensure the machine rejects invalid transitions:
describe('guard clauses', () => {
it('should ignore fetchComplete when not fetching', () => {
const { machine, callbacks } = setup();
// Machine is in 'idle' — fetchComplete should be a no-op
machine.fetchComplete('<h1>Nope</h1>');
expect(machine.getState()).toBe('idle');
expect(callbacks.swapContent).not.toHaveBeenCalled();
});
it('should ignore transitionEnd when not in closingHeadings', () => {
const { machine, callbacks } = setup();
machine.navigate('/skills', '/about', null, '/skills');
expect(machine.getState()).toBe('fetching');
// transitionEnd should be ignored in 'fetching' state
machine.transitionEnd();
expect(machine.getState()).toBe('fetching');
});
});describe('guard clauses', () => {
it('should ignore fetchComplete when not fetching', () => {
const { machine, callbacks } = setup();
// Machine is in 'idle' — fetchComplete should be a no-op
machine.fetchComplete('<h1>Nope</h1>');
expect(machine.getState()).toBe('idle');
expect(callbacks.swapContent).not.toHaveBeenCalled();
});
it('should ignore transitionEnd when not in closingHeadings', () => {
const { machine, callbacks } = setup();
machine.navigate('/skills', '/about', null, '/skills');
expect(machine.getState()).toBe('fetching');
// transitionEnd should be ignored in 'fetching' state
machine.transitionEnd();
expect(machine.getState()).toBe('fetching');
});
});The pattern: call a method that shouldn't work in the current state, then verify that (a) the state didn't change and (b) no callbacks were fired. This is the negative testing that prevents the race conditions from Part I.
Guard Clause Testing for CopyFeedbackMachine
The CopyFeedbackMachine has an explicit transition table. Testing it:
describe('invalid transitions', () => {
it('should not allow succeed() from idle', () => {
const { machine } = setup();
machine.succeed();
expect(machine.getState()).toBe('idle');
});
it('should not allow fail() from idle', () => {
const { machine } = setup();
machine.fail();
expect(machine.getState()).toBe('idle');
});
it('should not allow copy() → succeed() → succeed()', () => {
const { machine } = setup();
machine.copy();
machine.succeed();
expect(machine.getState()).toBe('success');
machine.succeed(); // Already in success — should be no-op
expect(machine.getState()).toBe('success');
});
});describe('invalid transitions', () => {
it('should not allow succeed() from idle', () => {
const { machine } = setup();
machine.succeed();
expect(machine.getState()).toBe('idle');
});
it('should not allow fail() from idle', () => {
const { machine } = setup();
machine.fail();
expect(machine.getState()).toBe('idle');
});
it('should not allow copy() → succeed() → succeed()', () => {
const { machine } = setup();
machine.copy();
machine.succeed();
expect(machine.getState()).toBe('success');
machine.succeed(); // Already in success — should be no-op
expect(machine.getState()).toBe('success');
});
});Each test verifies that one invalid transition is rejected. Together, they exercise the canTransition() guard function from every invalid starting state.
Guard Clause Testing for PageLoadMachine
The generation counter is a different kind of guard -- it checks identity, not state:
describe('stale generation', () => {
it('should reject markRendering with wrong generation', () => {
const { machine, callbacks } = setup();
const gen1 = machine.startLoad();
const gen2 = machine.startLoad(); // Bumps generation
const result = machine.markRendering(gen1); // Stale!
expect(result).toBe(false);
expect(callbacks.onStale).toHaveBeenCalledWith(gen1, gen2);
expect(machine.getState().state).toBe('loading'); // Unchanged
});
});describe('stale generation', () => {
it('should reject markRendering with wrong generation', () => {
const { machine, callbacks } = setup();
const gen1 = machine.startLoad();
const gen2 = machine.startLoad(); // Bumps generation
const result = machine.markRendering(gen1); // Stale!
expect(result).toBe(false);
expect(callbacks.onStale).toHaveBeenCalledWith(gen1, gen2);
expect(machine.getState().state).toBe('loading'); // Unchanged
});
});The test starts two loads, then tries to advance the first one. The machine detects the stale generation, calls onStale, and returns false. The state stays in loading (for the second load).
Testing Pure Helpers
Pure functions get their own test blocks. No setup() needed:
describe('classifyNavigation', () => {
it('should return hashScroll for same page with hash', () => {
expect(classifyNavigation('/about', '/about', '#education')).toBe('hashScroll');
});
it('should return toggleHeadings for same page without hash', () => {
expect(classifyNavigation('/about', '/about', null)).toBe('toggleHeadings');
});
it('should return fullNavigation for different page', () => {
expect(classifyNavigation('/skills', '/about', null)).toBe('fullNavigation');
});
it('should return fullNavigation for different page even with hash', () => {
expect(classifyNavigation('/skills', '/about', '#typescript')).toBe('fullNavigation');
});
});describe('classifyNavigation', () => {
it('should return hashScroll for same page with hash', () => {
expect(classifyNavigation('/about', '/about', '#education')).toBe('hashScroll');
});
it('should return toggleHeadings for same page without hash', () => {
expect(classifyNavigation('/about', '/about', null)).toBe('toggleHeadings');
});
it('should return fullNavigation for different page', () => {
expect(classifyNavigation('/skills', '/about', null)).toBe('fullNavigation');
});
it('should return fullNavigation for different page even with hash', () => {
expect(classifyNavigation('/skills', '/about', '#typescript')).toBe('fullNavigation');
});
});Four tests. Four inputs. Four expected outputs. No mocks. No setup. This is what pure functions buy you: trivially testable logic.
The detectByScroll function from ScrollSpyMachine:
describe('detectByScroll', () => {
it('should return last heading at or above threshold', () => {
const headings = [
{ id: 'h1', top: 0 },
{ id: 'h2', top: 100 },
{ id: 'h3', top: 200 },
];
expect(detectByScroll(headings, 150)).toBe('h2');
});
it('should return first heading when none are above threshold', () => {
const headings = [{ id: 'h1', top: 100 }];
expect(detectByScroll(headings, 50)).toBe('h1');
});
it('should return null for empty headings', () => {
expect(detectByScroll([], 100)).toBeNull();
});
});describe('detectByScroll', () => {
it('should return last heading at or above threshold', () => {
const headings = [
{ id: 'h1', top: 0 },
{ id: 'h2', top: 100 },
{ id: 'h3', top: 200 },
];
expect(detectByScroll(headings, 150)).toBe('h2');
});
it('should return first heading when none are above threshold', () => {
const headings = [{ id: 'h1', top: 100 }];
expect(detectByScroll(headings, 50)).toBe('h1');
});
it('should return null for empty headings', () => {
expect(detectByScroll([], 100)).toBeNull();
});
});Geometry as plain objects. No DOM. No scroll events. The function receives { id, top }[] and returns a string.
Property-Based Testing with fast-check
Example-based tests verify specific inputs. Property-based tests verify invariants -- statements that should hold for all possible inputs.
The property-based.test.ts file uses fast-check to generate random inputs and verify that invariants hold:
import fc from 'fast-check';
import { slugify } from '../../src/lib/helpers';
describe('slugify invariants', () => {
it('should never throw for any string input', () => {
fc.assert(
fc.property(fc.string(), (input) => {
expect(() => slugify(input)).not.toThrow();
})
);
});
it('should always produce lowercase output', () => {
fc.assert(
fc.property(fc.string(), (input) => {
const result = slugify(input);
expect(result).toBe(result.toLowerCase());
})
);
});
it('should never produce consecutive dashes', () => {
fc.assert(
fc.property(fc.string(), (input) => {
const result = slugify(input);
expect(result).not.toMatch(/--/);
})
);
});
it('should be idempotent', () => {
fc.assert(
fc.property(fc.string(), (input) => {
const once = slugify(input);
const twice = slugify(once);
expect(twice).toBe(once);
})
);
});
});import fc from 'fast-check';
import { slugify } from '../../src/lib/helpers';
describe('slugify invariants', () => {
it('should never throw for any string input', () => {
fc.assert(
fc.property(fc.string(), (input) => {
expect(() => slugify(input)).not.toThrow();
})
);
});
it('should always produce lowercase output', () => {
fc.assert(
fc.property(fc.string(), (input) => {
const result = slugify(input);
expect(result).toBe(result.toLowerCase());
})
);
});
it('should never produce consecutive dashes', () => {
fc.assert(
fc.property(fc.string(), (input) => {
const result = slugify(input);
expect(result).not.toMatch(/--/);
})
);
});
it('should be idempotent', () => {
fc.assert(
fc.property(fc.string(), (input) => {
const once = slugify(input);
const twice = slugify(once);
expect(twice).toBe(once);
})
);
});
});fast-check generates 100 random strings per test by default. Each one is checked against the invariant. If any string violates the invariant, fast-check shrinks it to the smallest counterexample and reports it.
These tests found a real bug: slugify(' ') (all spaces) produced an empty string, which broke URL generation. The fix: return a default slug for empty results.
Other properties tested:
matchScore()returns 0 for empty query (with arbitrary target strings)parseFrontmatter()returns valid YAML for any input (never throws)buildHierarchicalSlug()always produces a valid slug even from unicode input
Property-based tests complement example tests. Examples are readable -- they document the expected behavior. Properties are thorough -- they find edge cases humans miss.
More Property Invariants
The property-based test file covers more than just slugify. Here are other invariants tested:
describe('matchScore invariants', () => {
it('should return 0 for empty query', () => {
fc.assert(
fc.property(fc.string(), (target) => {
expect(matchScore('', target)).toBe(0);
})
);
});
it('should return non-negative scores', () => {
fc.assert(
fc.property(fc.string(), fc.string(), (query, target) => {
expect(matchScore(query, target)).toBeGreaterThanOrEqual(0);
})
);
});
it('should score exact match higher than partial', () => {
fc.assert(
fc.property(
fc.string({ minLength: 3 }).filter(s => /[a-z]/i.test(s)),
(str) => {
const exact = matchScore(str, str);
const partial = matchScore(str.slice(0, Math.ceil(str.length / 2)), str);
expect(exact).toBeGreaterThanOrEqual(partial);
}
)
);
});
});
describe('buildHierarchicalSlug invariants', () => {
it('should produce valid slug format (no consecutive dashes, no leading/trailing dashes)', () => {
fc.assert(
fc.property(fc.string(), fc.string(), (parent, child) => {
const slug = buildHierarchicalSlug(parent, child);
expect(slug).not.toMatch(/^-/); // No leading dash
expect(slug).not.toMatch(/-$/); // No trailing dash
expect(slug).not.toMatch(/---/); // No triple dashes (double is separator)
})
);
});
});describe('matchScore invariants', () => {
it('should return 0 for empty query', () => {
fc.assert(
fc.property(fc.string(), (target) => {
expect(matchScore('', target)).toBe(0);
})
);
});
it('should return non-negative scores', () => {
fc.assert(
fc.property(fc.string(), fc.string(), (query, target) => {
expect(matchScore(query, target)).toBeGreaterThanOrEqual(0);
})
);
});
it('should score exact match higher than partial', () => {
fc.assert(
fc.property(
fc.string({ minLength: 3 }).filter(s => /[a-z]/i.test(s)),
(str) => {
const exact = matchScore(str, str);
const partial = matchScore(str.slice(0, Math.ceil(str.length / 2)), str);
expect(exact).toBeGreaterThanOrEqual(partial);
}
)
);
});
});
describe('buildHierarchicalSlug invariants', () => {
it('should produce valid slug format (no consecutive dashes, no leading/trailing dashes)', () => {
fc.assert(
fc.property(fc.string(), fc.string(), (parent, child) => {
const slug = buildHierarchicalSlug(parent, child);
expect(slug).not.toMatch(/^-/); // No leading dash
expect(slug).not.toMatch(/-$/); // No trailing dash
expect(slug).not.toMatch(/---/); // No triple dashes (double is separator)
})
);
});
});The filter on fc.string() narrows generated strings to those that are meaningful for the test. Without the filter, fast-check would generate strings of null bytes and control characters -- valid inputs but not useful for testing "exact match scores higher than partial."
When Property Tests Caught Bugs
Bug 1: slugify('') returned ''. The property "never produces empty output for non-empty input" caught this. Empty input ('') producing empty output is fine. But slugify(' ') (all spaces) also produced '', which broke URL generation. The fix: return 'untitled' for empty results after stripping.
Bug 2: buildHierarchicalSlug with unicode parents. The property "no triple dashes" caught that buildHierarchicalSlug('café', 'intro') produced 'caf---intro' because the é was stripped, leaving a trailing dash that merged with the -- separator. The fix: strip trailing dashes from each segment before joining.
These are exactly the kind of bugs that example-based tests miss because nobody thinks to test slugify(' ') or buildHierarchicalSlug('café', 'intro'). Property-based testing generates inputs that humans don't consider.
Coverage Thresholds as Quality Gates
The vitest.config.js enforces coverage thresholds:
coverage: {
provider: 'v8',
include: ['src/lib/**/*.ts', 'scripts/build-static.js'],
thresholds: {
'src/lib/**/*.ts': {
statements: 98,
branches: 95,
functions: 98,
lines: 99,
},
'scripts/build-static.js': {
statements: 100,
branches: 100,
functions: 100,
lines: 100,
},
},
},coverage: {
provider: 'v8',
include: ['src/lib/**/*.ts', 'scripts/build-static.js'],
thresholds: {
'src/lib/**/*.ts': {
statements: 98,
branches: 95,
functions: 98,
lines: 99,
},
'scripts/build-static.js': {
statements: 100,
branches: 100,
functions: 100,
lines: 100,
},
},
},If any threshold drops below the configured value, npm test fails. This is a ratchet -- it can go up but never down.
Why 98/95/98/99?
Statements at 98%: Some lines are truly unreachable in unit tests -- error callbacks that only fire in integration scenarios, TypeScript exhaustiveness default cases that can't actually execute.
Branches at 95%: TypeScript exhaustiveness checks generate unreachable branches in the compiled JavaScript output. V8 sees the generated default: throw new Error("unreachable") branch and reports it as uncovered. The 95% threshold accommodates these synthetic branches.
Functions at 98%: Every exported function must be tested. The 2% slack is for internal utility functions that are tested transitively.
Lines at 99%: Nearly every line must execute during tests. The 1% slack accounts for the same exhaustiveness issues as branches.
Why 100% for build-static.js?
The build script is critical infrastructure. A bug in the build script can silently corrupt the entire site output. 100% coverage means every code path in the build pipeline is exercised by tests -- including error handling, edge cases in frontmatter parsing, and the minification pipeline.
V8 ignore markers are used sparingly in the build script to exclude JavaScript artifacts that V8 counts as branches but aren't real decision points:
/* v8 ignore next 2 */
if (false) { /* unreachable, but V8 counts it */ }/* v8 ignore next 2 */
if (false) { /* unreachable, but V8 counts it */ }These are documented and minimal. The goal is 100% real coverage, not 100% metric gaming.
E2E Tests with Playwright
Unit tests verify the state machines. E2E tests verify the wiring layer -- the integration of machines with the DOM, events, and CSS.
Dual Target
The Playwright config supports two test targets:
const servers = {
dev: { command: 'npx serve . -p 4001', port: 4001, url: 'http://localhost:4001' },
static: { command: 'npx serve public -p 4000', port: 4000, url: 'http://localhost:4000' },
};const servers = {
dev: { command: 'npx serve . -p 4001', port: 4001, url: 'http://localhost:4001' },
static: { command: 'npx serve public -p 4000', port: 4000, url: 'http://localhost:4000' },
};TEST_TARGET=dev: tests against the development server (source files, unminified)TEST_TARGET=static: tests against the static build (minified, production-like)
The same test suite runs against both targets. This catches bugs where the static build process changes behavior -- minification breaking variable names, CSS purging removing used classes, etc.
Representative Test: Navigation
test('should navigate between pages', async ({ page }) => {
await page.goto('/');
// Click a TOC item
await page.click('[data-path="content/about.md"]');
// Verify URL changed
await expect(page).toHaveURL(/#content\/about\.md/);
// Verify content loaded
await expect(page.locator('#content h1')).toContainText('Stéphane');
// Verify TOC highlight
await expect(page.locator('[data-path="content/about.md"]'))
.toHaveClass(/active/);
});test('should navigate between pages', async ({ page }) => {
await page.goto('/');
// Click a TOC item
await page.click('[data-path="content/about.md"]');
// Verify URL changed
await expect(page).toHaveURL(/#content\/about\.md/);
// Verify content loaded
await expect(page.locator('#content h1')).toContainText('Stéphane');
// Verify TOC highlight
await expect(page.locator('[data-path="content/about.md"]'))
.toHaveClass(/active/);
});This test exercises the full navigation pipeline: click → SpaNavMachine → fetch → PageLoadMachine → DOM swap → TOC update. It doesn't test the state machines directly (that's what unit tests do). It tests that the wiring layer correctly connects the machines to the DOM.
Representative Test: Copy Button
test('should show success feedback after copy', async ({ page }) => {
await page.goto('/#content/blog/this-website.md');
// Find a code block with a copy button
const copyBtn = page.locator('.copy-btn').first();
await expect(copyBtn).toBeVisible();
// Grant clipboard permissions
await page.context().grantPermissions(['clipboard-write']);
// Click copy
await copyBtn.click();
// Verify success feedback (✅)
await expect(copyBtn).toContainText('✅');
// Wait for auto-reset (2 seconds)
await expect(copyBtn).toContainText('📋', { timeout: 3000 });
});test('should show success feedback after copy', async ({ page }) => {
await page.goto('/#content/blog/this-website.md');
// Find a code block with a copy button
const copyBtn = page.locator('.copy-btn').first();
await expect(copyBtn).toBeVisible();
// Grant clipboard permissions
await page.context().grantPermissions(['clipboard-write']);
// Click copy
await copyBtn.click();
// Verify success feedback (✅)
await expect(copyBtn).toContainText('✅');
// Wait for auto-reset (2 seconds)
await expect(copyBtn).toContainText('📋', { timeout: 3000 });
});This test verifies the full CopyFeedbackMachine lifecycle through the DOM: click → copying → success → auto-reset. The timeout: 3000 accommodates the 2-second reset delay plus assertion overhead.
Test Configuration
export default defineConfig({
workers: 4,
fullyParallel: true,
retries: 2,
timeout: 15000,
use: {
headless: true,
screenshot: 'only-on-failure',
},
projects: [
{ name: 'chromium', use: { browserName: 'chromium' } },
],
});export default defineConfig({
workers: 4,
fullyParallel: true,
retries: 2,
timeout: 15000,
use: {
headless: true,
screenshot: 'only-on-failure',
},
projects: [
{ name: 'chromium', use: { browserName: 'chromium' } },
],
});4 parallel workers. Chromium only (this is a portfolio site, not a cross-browser library). 2 retries for flake resilience. Screenshots on failure for debugging.
Representative Test: Scroll Spy
test('should highlight active TOC item on scroll', async ({ page }) => {
await page.goto('/#content/blog/this-website.md');
// Scroll down past the first heading
await page.evaluate(() => {
const content = document.getElementById('content');
content?.scrollTo({ top: 800 });
});
// Wait for scroll spy to update
await page.waitForTimeout(300);
// Verify that a TOC item has the active class
const activeItems = page.locator('.toc-item.active');
await expect(activeItems).toHaveCount(1);
// Scroll further
await page.evaluate(() => {
const content = document.getElementById('content');
content?.scrollTo({ top: 2000 });
});
await page.waitForTimeout(300);
// Verify the active item changed
const newActive = page.locator('.toc-item.active');
await expect(newActive).toHaveCount(1);
});test('should highlight active TOC item on scroll', async ({ page }) => {
await page.goto('/#content/blog/this-website.md');
// Scroll down past the first heading
await page.evaluate(() => {
const content = document.getElementById('content');
content?.scrollTo({ top: 800 });
});
// Wait for scroll spy to update
await page.waitForTimeout(300);
// Verify that a TOC item has the active class
const activeItems = page.locator('.toc-item.active');
await expect(activeItems).toHaveCount(1);
// Scroll further
await page.evaluate(() => {
const content = document.getElementById('content');
content?.scrollTo({ top: 2000 });
});
await page.waitForTimeout(300);
// Verify the active item changed
const newActive = page.locator('.toc-item.active');
await expect(newActive).toHaveCount(1);
});This test exercises the ScrollSpyMachine through the DOM. It doesn't test the machine's detectByScroll() function directly -- that's what unit tests do. It tests that the wiring layer correctly reads scroll positions, calls the machine, and applies the CSS class.
Representative Test: Keyboard Navigation
test('should toggle help modal with ? key', async ({ page }) => {
await page.goto('/');
// Press ? to open help
await page.keyboard.press('Shift+/'); // ? is Shift+/
// Verify help modal is visible
const helpModal = page.locator('.help-modal');
await expect(helpModal).toBeVisible();
// Press Escape to close
await page.keyboard.press('Escape');
await expect(helpModal).not.toBeVisible();
});test('should toggle help modal with ? key', async ({ page }) => {
await page.goto('/');
// Press ? to open help
await page.keyboard.press('Shift+/'); // ? is Shift+/
// Verify help modal is visible
const helpModal = page.locator('.help-modal');
await expect(helpModal).toBeVisible();
// Press Escape to close
await page.keyboard.press('Escape');
await expect(helpModal).not.toBeVisible();
});This exercises the KeyboardNavState's handleKey() and handleEscape() through the real DOM. The unit test verifies that handleKey('?', { ctrl: false, alt: false }, new Set()) returns { type: 'toggleHelp' }. The E2E test verifies that the wiring layer actually opens and closes the modal.
Test Results and Reporting
Both unit and E2E tests produce structured output:
// vitest.config.js
reporters: ['default', 'html'],
outputFile: {
html: `${runDir}/unit/index.html`,
},
// playwright.config.js
reporter: [
['list'],
['html', { outputFolder: `${runDir}/report`, open: 'never' }],
['json', { outputFile: `${runDir}/results.json` }],
],// vitest.config.js
reporters: ['default', 'html'],
outputFile: {
html: `${runDir}/unit/index.html`,
},
// playwright.config.js
reporter: [
['list'],
['html', { outputFolder: `${runDir}/report`, open: 'never' }],
['json', { outputFile: `${runDir}/results.json` }],
],Each test run gets a timestamped directory under test-results/. The HTML reports are browseable, the JSON results are machine-readable. The compliance scanner consumes the JSON to verify requirement coverage.
Requirements Tracking with Decorators
The test suite uses custom TypeScript decorators to link tests to feature requirements:
import { FeatureTest, Implements } from '../../requirements/decorators';
import { CopyButtons } from '../../requirements/features/copy-buttons';
@FeatureTest(CopyButtons)
class CopyButtonTests {
@Implements<CopyButtons>('shows clipboard emoji in idle state')
testIdleLabel() {
const machine = createCopyFeedback({ resetDelay: 2000 });
expect(getLabel(machine.getState())).toBe('📋');
}
@Implements<CopyButtons>('shows checkmark on success')
testSuccessLabel() {
const machine = createCopyFeedback({ resetDelay: 2000 });
machine.copy();
machine.succeed();
expect(getLabel(machine.getState())).toBe('✅');
}
}import { FeatureTest, Implements } from '../../requirements/decorators';
import { CopyButtons } from '../../requirements/features/copy-buttons';
@FeatureTest(CopyButtons)
class CopyButtonTests {
@Implements<CopyButtons>('shows clipboard emoji in idle state')
testIdleLabel() {
const machine = createCopyFeedback({ resetDelay: 2000 });
expect(getLabel(machine.getState())).toBe('📋');
}
@Implements<CopyButtons>('shows checkmark on success')
testSuccessLabel() {
const machine = createCopyFeedback({ resetDelay: 2000 });
machine.copy();
machine.succeed();
expect(getLabel(machine.getState())).toBe('✅');
}
}The @FeatureTest(CopyButtons) decorator marks the class as a test suite for the CopyButtons feature. The @Implements<CopyButtons>('acceptance criterion') decorator links each test to a specific acceptance criterion.
A compliance scanner (scripts/compliance-report.ts) reads the decorator metadata and generates a report:
Feature: CopyButtons
✓ shows clipboard emoji in idle state → CopyButtonTests.testIdleLabel
✓ shows checkmark on success → CopyButtonTests.testSuccessLabel
✓ shows X on error → CopyButtonTests.testErrorLabel
✓ auto-resets after delay → CopyButtonTests.testAutoReset
✗ handles rapid double-click → (no test found)
Coverage: 4/5 acceptance criteria (80%)Feature: CopyButtons
✓ shows clipboard emoji in idle state → CopyButtonTests.testIdleLabel
✓ shows checkmark on success → CopyButtonTests.testSuccessLabel
✓ shows X on error → CopyButtonTests.testErrorLabel
✓ auto-resets after delay → CopyButtonTests.testAutoReset
✗ handles rapid double-click → (no test found)
Coverage: 4/5 acceptance criteria (80%)This is the traceability system described in the Typed Specs series. It ensures that every acceptance criterion has at least one test, and that orphan tests (tests that don't map to any requirement) are flagged for review.
Speed
The entire unit test suite runs in ~600ms. No browser startup. No DOM initialization. No network requests. Each test creates a machine, calls methods, and asserts state. The V8 engine runs thousands of state transitions per second.
Compare this to E2E tests: ~30 seconds for 10 spec files. Playwright needs to launch a browser, navigate to pages, wait for animations, and verify DOM state. E2E tests are necessary but slow. Unit tests are fast and should catch most bugs.
No Flaky Tests
The state machines have zero flaky tests because they have zero dependencies on timing, network, or DOM rendering. A machine transition is synchronous. vi.fn() records calls synchronously. The test asserts synchronously. There's nothing to wait for, nothing to race, nothing to retry.
The E2E tests have occasional flakes (animation timing, network latency) -- that's why they have 2 retries. But the unit tests have 0 retries. They either pass or fail, deterministically, every time.
Regression Prevention
The coverage thresholds enforce that new machines get tested before merge. If someone adds a new state machine to src/lib/ without writing tests, the coverage for src/lib/**/*.ts drops below 98% and the build fails.
This is a structural enforcement, not a process enforcement. It doesn't require code review to catch missing tests. It doesn't require a CI bot to comment. The build breaks. The developer writes tests. The build passes.
Design Feedback
If a state machine is hard to test, it's probably poorly designed. Testing difficulty is a signal:
- Too many states? → Decompose into smaller machines (like the TOC cluster in Part IV).
- Can't inject a dependency? → Add it to the callbacks interface.
- Need DOM to trigger a transition? → Extract the logic into a pure function.
The test suite is the first consumer of the state machine API. If the test is awkward, the API is awkward. Fix the API, not the test.
The Testing Strategy Summarized
| Layer | Tool | What it tests | Speed | Flakiness |
|---|---|---|---|---|
| Unit | Vitest | State transitions, guard clauses, pure helpers | ~600ms | Zero |
| Property | fast-check | Invariants across random inputs | ~200ms | Zero |
| E2E | Playwright | Wiring layer + DOM integration | ~30s | Low (2 retries) |
| Compliance | Custom scanner | Test ↔ requirement mapping | ~100ms | Zero |
TypeScript catches type errors at compile time. Unit tests catch logic errors at test time. Property tests catch edge cases at test time. E2E tests catch integration errors at test time. The compliance scanner catches coverage gaps at build time.
Each layer has a different cost/benefit ratio. TypeScript is free (no runtime cost). Unit tests are cheap (600ms). Property tests are cheap (200ms). E2E tests are expensive (~30s). The strategy allocates most testing to the cheapest layers and reserves the expensive layer for what the cheap layers can't test: real browser integration.
The Test Commands
The full test suite runs through npm scripts:
# Unit tests only (fast)
npm test # vitest run — ~600ms
# E2E against dev server
npm run test:e2e:dev # playwright with TEST_TARGET=dev
# E2E against static build
npm run test:e2e:static # playwright with TEST_TARGET=static
# Compliance report
npm run test:compliance # scans @FeatureTest/@Implements decorators
# Everything
npm run test:smoke # unit + e2e combined
npm run test:all # smoke + compliance + visual + a11y + perf# Unit tests only (fast)
npm test # vitest run — ~600ms
# E2E against dev server
npm run test:e2e:dev # playwright with TEST_TARGET=dev
# E2E against static build
npm run test:e2e:static # playwright with TEST_TARGET=static
# Compliance report
npm run test:compliance # scans @FeatureTest/@Implements decorators
# Everything
npm run test:smoke # unit + e2e combined
npm run test:all # smoke + compliance + visual + a11y + perfThe CI pipeline runs test:all. Local development typically runs npm test (unit only) on every save and test:e2e:dev before committing.
1. Test the Machine, Not the Framework
The state machines are tested in isolation. The tests don't import Vitest matchers for DOM assertions. They don't use jsdom. They don't mock window or document. They create a machine, call methods, and check state.
This is only possible because the machines have zero DOM dependencies. If the machines read document.getElementById() or called element.scrollIntoView(), every test would need DOM fixtures. The callback injection pattern (Parts II-IV) is what makes the testing strategy possible.
2. Coverage Is a Ratchet, Not a Target
The thresholds started at whatever the first test suite achieved (around 85%). Each time we added tests, we raised the threshold. The current 98/95/98/99 represents the high-water mark. It can go up but never down.
This prevents "coverage regression" where someone adds untested code and the percentage slowly declines. The threshold is enforced by the build, not by code review.
3. Property Tests Are Cheap Insurance
The property-based.test.ts file is one file with ~400 lines. It runs in ~200ms. It has found 2 real bugs that example-based tests missed (empty-string edge case in slugify, unicode handling in buildHierarchicalSlug). The cost/benefit ratio is excellent.
The key is choosing the right invariants. "Never throws" is always a good invariant. "Idempotent" is good for normalizing functions. "Output length ≤ input length" is good for string transformers. Start with obvious invariants and add more as you discover edge cases.
4. E2E Tests Are Integration Tests
E2E tests don't verify state machine logic -- unit tests do that. E2E tests verify that:
- The wiring layer connects machines to the DOM correctly
- CSS animations trigger
transitionendevents that the wiring layer forwards to machines - User events (clicks, scrolls, key presses) reach the correct machine methods
- Multiple machines compose correctly through the wiring layer
This is a different testing responsibility. E2E tests are slower, flakier, and harder to debug. But they catch a class of bugs that unit tests can't: integration errors where the glue code is wrong.
5. The Test Suite Is Documentation
Reading the test file for a machine is often the fastest way to understand it. The test names describe the behavior:
✓ should start in idle state
✓ should transition to fetching on navigate
✓ should skip closingHeadings when no headings are open
✓ should wait for transitionEnd in closingHeadings
✓ should abort back to idle when fetch fails
✓ should ignore fetchComplete when not in fetching state
✓ should ignore transitionEnd when not in closingHeadings state✓ should start in idle state
✓ should transition to fetching on navigate
✓ should skip closingHeadings when no headings are open
✓ should wait for transitionEnd in closingHeadings
✓ should abort back to idle when fetch fails
✓ should ignore fetchComplete when not in fetching state
✓ should ignore transitionEnd when not in closingHeadings stateEach test is a sentence. Together, they form a specification. When the specification changes, the tests change first, then the implementation follows.
Series Conclusion
This series walked through 15 state machines, from a 47-line font size manager to a 163-line scroll orchestrator. The arc was deliberate: simple → complex, always the same pattern.
The core idea is not new. Finite state machines predate most of us. What's worth sharing is the specific implementation pattern -- factory functions, closure state, callback injection -- and the infrastructure that makes it work: TypeScript for type safety, esbuild for browser bundling, Vitest for fast tests, and Playwright for integration verification.
The site you're reading right now runs on these 15 machines. Every link click, every scroll, every tooltip, every copy button, every keyboard shortcut -- all driven by pure functions with no DOM dependencies, tested at 98%+ coverage, executing in under a second.
No framework. No xstate. Just closures, callbacks, and guard clauses.
Appendix: All Test Files
For reference, here is the complete list of test files that cover the state machines and related infrastructure:
Unit Tests (test/unit/)
| File | Lines | Coverage target |
|---|---|---|
scroll-spy-machine.test.ts |
310 | src/lib/scroll-spy-machine.ts |
copy-feedback-state.test.ts |
231 | src/lib/copy-feedback-state.ts |
spa-nav-state.test.ts |
366 | src/lib/spa-nav-state.ts |
headings-panel-machine.test.ts |
280 | src/lib/headings-panel-machine.ts |
keyboard-nav-state.test.ts |
195 | src/lib/keyboard-nav-state.ts |
property-based.test.ts |
400 | Cross-cutting invariants |
build-static.test.ts |
890 | scripts/build-static.js |
build-static-io.test.ts |
1,140 | scripts/build-static.js (I/O) |
compliance.test.ts |
180 | Requirements scanning |
frontmatter.test.ts |
220 | Frontmatter parsing |
link-prefetch.test.ts |
150 | Link prefetching |
E2E Tests (test/e2e/)
| File | Coverage target |
|---|---|
navigation.spec.ts |
SpaNavMachine wiring |
scroll-spy.spec.ts |
ScrollSpyMachine wiring |
copy-buttons.spec.ts |
CopyFeedbackMachine wiring |
font-size.spec.ts |
FontSizeManager wiring |
keyboard-nav.spec.ts |
KeyboardNavState wiring |
theme.spec.ts |
AccentPaletteMachine wiring |
overlays.spec.ts |
Modal priority (Escape chain) |
mobile.spec.ts |
Responsive sidebar behavior |
search.spec.ts |
Search interaction |
hire-modal.spec.ts |
Hire modal flow |
Each unit test file covers one machine. Each E2E test file covers one user-facing behavior that exercises one or more machines through the DOM. Together, they form a complete verification of the architecture described in Parts I-IV.