Part III: Multi-Layer Quality
644 tests across six layers. Every one passes. But which ones verify the search feature? The testing phase built confidence in the code — and exposed a gap in the process.
The Testing Architecture
After the SPA and static generation phases, the site had features worth specifying and output worth testing. The next step was building a comprehensive test suite — not just unit tests, but tests at every level where bugs can hide.
Six layers, each catching different classes of bugs.
Layer 1: Unit Tests (Vitest)
Pure logic extracted into importable modules — slugify, frontmatter parsing, search scoring, mermaid config. 235 tests with enforced 100% coverage gates on every module:
// vitest.config.js — coverage thresholds that only ratchet up
coverage: {
include: ['js/lib/**'],
thresholds: {
branches: 100,
functions: 100,
lines: 100,
statements: 100,
}
}// vitest.config.js — coverage thresholds that only ratchet up
coverage: {
include: ['js/lib/**'],
thresholds: {
branches: 100,
functions: 100,
lines: 100,
statements: 100,
}
}Unit tests run in under 2 seconds. They catch logic errors immediately.
Layer 2: Property-Based Tests (fast-check)
Hand-written tests check known cases. Property-based tests generate thousands of random inputs and verify invariants hold:
- Slugify always produces URL-safe strings
- Frontmatter parser handles any valid YAML
- Search scoring is monotonically ordered (exact > prefix > word > fuzzy)
- HTML escaping is idempotent
fast-check found a real bug: an orphaned h3 branch in the slug hierarchy that no hand-written test covered. That single discovery pushed branch coverage from 75% to 100%.
Layer 3: E2E Tests (Playwright)
54 tests running in real browsers with 6 parallel workers. These catch interaction bugs that unit tests can't:
- Clicking a TOC item loads the page with a fade transition
- The back button restores the previous page state
- Keyboard shortcuts work (
?for help,Ctrl+Kfor search) - The hire modal opens, validates, and submits
E2E tests run against both the dev server (runtime rendering) and the static build (pre-rendered HTML). Same tests, two targets — catching discrepancies between the rendering paths.
Layer 4: Visual Regression (Playwright Screenshots)
229 full-page stitched screenshots across:
- Desktop viewport
- 4 mobile devices (iPhone SE, iPhone 14, Pixel 5, iPad Mini)
- 4 themes (dark, light, high-contrast dark, high-contrast light)
Any pixel drift from the baseline triggers a failure. This catches CSS regressions that no assertion can describe — the kind of bugs where "it looks wrong" is the only specification.
Layer 5: Accessibility (axe-core + Contrast Matrix)
121 tests including:
- axe-core scanner on every page (catches WCAG violations)
- Contrast matrix: 8 accent colors x 3 modes x sample pages = 72+ assertions
- ARIA role verification on interactive elements
- Skip-to-content link presence
The contrast matrix is particularly interesting — it's a combinatorial explosion that would be impossible to test by hand. Each of the 8 curated accent palettes must pass WCAG AA contrast against both dark and light backgrounds, plus high-contrast mode.
Layer 6: Performance
5 tests measuring:
- Page load time (first contentful paint)
- SPA navigation speed (page-to-page transition)
- Scroll spy latency (heading tracking responsiveness)
These prevent speed regressions as features are added.
Coverage Gates as Ratchets
Every coverage metric is a ratchet — it can only tighten, never loosen:
- Unit test line coverage: 100% (enforced)
- Unit test branch coverage: 100% (enforced)
- Visual regression baseline: pixel-perfect (0.01 threshold)
- Accessibility: zero critical axe violations (enforced)
If you add code, you add tests. If you break a visual baseline, you approve the new one explicitly. The gates prevent regression.
The Gap That Remained
644 tests. All passing. Six layers. Coverage gates enforced. By any conventional measure, this website is well-tested.
But one question remained unanswerable: "Is the search feature fully tested?"
The tests were organized by technical concern:
test/
├── unit/ ← logic tests
├── e2e/ ← interaction tests
├── a11y/ ← accessibility tests
├── visual/ ← screenshot tests
└── perf/ ← speed teststest/
├── unit/ ← logic tests
├── e2e/ ← interaction tests
├── a11y/ ← accessibility tests
├── visual/ ← screenshot tests
└── perf/ ← speed testsNot by feature:
test/
├── search/ ← ??? doesn't exist
├── navigation/ ← ??? doesn't exist
├── accessibility/ ← ??? doesn't exist
└── ...test/
├── search/ ← ??? doesn't exist
├── navigation/ ← ??? doesn't exist
├── accessibility/ ← ??? doesn't exist
└── ...The search feature had tests scattered across unit/ (search scoring), e2e/ (search interaction), and a11y/ (search accessibility). But there was no single place that said "search has 5 acceptance criteria, and here are the tests for each one."
The link between features and tests was a markdown table in a blog post. Human-maintained. Already drifting.
What Was Needed
Not a reorganization of test directories — that would break the layered architecture. What was needed was a cross-cutting index that linked tests to features regardless of which directory they lived in.
That's what typed specifications provide. Not a replacement for test organization, but a layer on top that answers the feature question.
Previous: Part II: Building the Foundation Next: Part IV: Features as Abstract Classes — the type system becomes the specification language.