Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Build Pipeline: From 10 Seconds to 100 Milliseconds

The build was doing 530 pages of work to deliver 1 page of change. Fixing that required tearing out every readFileSync in the pipeline first — and because the hexagonal ports already existed, the whole migration landed in a day with zero test failures.

The 10-Second Build

The static site generator for this CV site runs npm run build:static, which does this:

npm run validate:links          # npx tsx process #1
npm run build:state-graph       # npx tsx processes #2, #3, #4
npx tsx scripts/build-static.ts # npx tsx process #5

Five separate npx tsx invocations, each paying ~0.5–1.5 seconds of TypeScript compiler cold start. Then inside build-static.ts, the pipeline runs sequentially: read toc.json, build the HTML template, spawn a worker pool for 530 pages, wait for all pages, resolve mermaid diagrams, collect git dates, write 1060 files (.html + .fragment.html per page), generate sitemaps, prune orphans. Every step waits for the previous one. Every page is re-rendered from scratch. Every build.

On my machine, the timing bar chart looks like this:

Step Time
TS transpile ~1.5s
CSS bundle ~0.1s
Render 530 pages ~3.5s
Mermaid SVGs ~0.1s
Git dates ~0.5s
Write files ~0.3s
Total ~8–12s

For a site with 530 markdown pages, that's not catastrophic. But when I'm editing a single blog post in the watcher and I want to see my change, 10 seconds is an eternity. The watcher already has --only=file.md support that filters down to one page — but even that one page goes through the full pipeline ceremony: read toc.json, build template, spawn a worker thread, resolve mermaid, write files.

The real problem isn't any single slow step. It's that every step runs every time, regardless of what changed.

Compound Decomposition

I broke the pipeline into 18 atomic units and drew the dependency graph:

Diagram
The build pipeline decomposes into 18 compounds. Everything at t=0 is independent. The critical sequential chain is assets → template → changed pages → write.

The key insight: HTML page generation doesn't need validate:links or state-graph results. It only needs toc.json (already on disk), the template (built from index.html + asset hashes), and the .md content. Everything else is independent. Validation, FSM extraction, git dates, font copying — all of these can run in parallel at t=0, overlapping with the critical path.

But the bigger insight is at the page level. If a page's inputs haven't changed — its .md content, the TOC structure, and the template — then its output hasn't changed either. Skip it. Don't re-render 529 unchanged pages to deliver 1 changed page.

The Date.now() Trap

There's a subtlety that makes page caching impossible without fixing first: the template.

// buildTemplate() — line 742 of the old code
`<link rel="stylesheet" href="../../app.min.css?v=${buildTs}">`
// ...
`<script src="../../app.min.js?v=${buildTs}"></script>`

Where buildTs comes from Date.now() in the watcher, or readPublishedTimestamp() in production. Either way, every build gets a new value. The template changes. Since the template is an input to every page's render, every page's hash changes. Every page gets re-rendered. The entire incremental caching strategy collapses because of one Date.now().

The fix: replace the timestamp with a content hash of the actual CSS and JS output. If app.min.css hasn't changed, its hash hasn't changed, the template is byte-identical, and pages whose .md content is also unchanged can be skipped entirely.

// After — deterministic content hashes
`<link rel="stylesheet" href="../../app.min.css?v=${cssHash}">`
`<script src="../../app.min.js?v=${jsHash}"></script>`

This is the version.txt distinction: the ?v= parameter is a browser cache-buster (must change when the file changes, must NOT change when it doesn't). The version.txt file is the "update available" notification system (changes when work publish runs). Two separate concerns, previously tangled by a shared timestamp.

The Async Foundation

Before any of the caching work can land, there's a prerequisite: the entire build pipeline uses sync IO.

// The old interface — every method is synchronous
interface BuildStaticIO {
  readFile:   (p: string) => string;        // blocks the event loop
  writeFile:  (p: string, data: string) => void;
  exists:     (p: string) => boolean;
  mkdir:      (p: string) => void;
  readDir:    (p: string) => string[];
  // ... 11 methods total
}

This was a pragmatic compromise from the early days of the build pipeline. It worked — the code was clean, testable with mock IO, and fast enough for a sequential pipeline. But it blocks the event loop on every call. You can't Promise.all three independent file reads. You can't start CSS hashing while JS is still transpiling. The entire compound decomposition is dead on arrival if every IO call is synchronous.

The fix is a full migration to the async FileSystem port that already existed in src/lib/external.ts for the rest of the codebase. Instead of the monolithic BuildStaticIO, the pipeline now receives a BuildDeps composition of three narrow ports:

interface BuildDeps {
  readonly fs: FileSystem;        // all file I/O (async)
  readonly runner: CommandRunner;  // shell commands (async)
  readonly logger: Logger;         // console output
}

The migration touched 62 call sites across build-static.ts:

Old pattern New pattern
io.readFile(p) await deps.fs.readFile(p, 'utf8')
io.writeFile(p, d) await deps.fs.writeFile(p, d)
io.exists(p) await deps.fs.exists(p)
io.log(...) deps.logger.info(...)
io.warn(...) deps.logger.warn(...)
io.execAsync(cmd) deps.runner.exec(cmd)

Every function that gained an await became async. Every caller of those functions gained an await. The cascade rippled through cleanDir, copyDirFiltered, buildTemplate, buildPage, copyFonts, pruneOrphanHtml, and a dozen others. The sync timed() helper was replaced by the async timedAsync() everywhere.

The test file — 797 lines of build-static-io.test.ts — migrated from createMockIO() (sync in-memory store) to createMockDeps() (same store, async methods). Every io.readFile(p) became async (p) => store[p].

Also migrated: mermaid-manifest.ts (the ManifestIO interface went from sync to async), find-missing-mermaids.ts, and content-fragment.test.ts which had its own sync mock.

Result: 2739 tests, all green, all coverage gates passing. The old build-static-sync-io.ts file was deleted.

The Hexagonal Payoff

The reason this migration took a day instead of a week is that the hexagonal architecture had already done the hard work.

The FileSystem port existed. The CommandRunner port existed. The Logger port existed. All three lived in src/lib/external.ts — the same file where Clipboard, Scheduler, Fetcher, and every other external-world port lives. Every factory in src/lib/ already consumed these ports by injection. Every test already mocked them.

The build pipeline was the last holdout. It had its own BuildStaticIO interface — similar in spirit to FileSystem but synchronous and monolithic (11 methods in one bag instead of three narrow interfaces). Migrating it to the shared ports was not a design challenge. It was a mechanical substitution, done by following compiler errors from top to bottom.

The FileSystem interface needed only two additions:

export interface FileSystem {
  // ... existing 7 methods ...
  copyFile?(src: string, dst: string): Promise<void>;  // NEW
  rm?(path: string): Promise<void>;                     // NEW (recursive)
}

Both optional, so zero existing implementations broke. The defaultBuildDeps implements them via fs.promises.copyFile and fs.promises.rm. Tests that don't need them don't provide them.

The test mock factory is 40 lines of code that creates an in-memory async FileSystem backed by a Record<string, string> store, a Set<string> of directories, and arrays for log/warn capture. It's the exact same pattern as the old sync mock, with async prepended to every method. The store is the same. The assertions are the same. The only difference is await before every call in the test body.

This is the SOLID thesis made concrete: when every dependency is an injected abstraction, changing the abstraction's async-ness is a find-and-replace, not a rewrite.

What's Next

The async foundation is in place. Here's what comes next, in order:

Per-page cache files — each page's rendered result (HTML + fragment + mermaid blocks) cached in .build-cache/pages/{slug}.json. No monolithic JSON file; each page is independent. Input hash = sha256(mdContent + tocHash + templateHash). Cache hit → skip rendering entirely.

Input-tree hashing for assets — hash the CSS source tree (css/style.css + css/github-dark.min.css). If unchanged, skip CleanCSS. Same for JS (hash src/*.ts + src/lib/*.ts), fonts (hash fonts/*.woff2), images, statics. Each compound checks its inputs and skips if the hash matches.

Content-hash cache-busters — replace Date.now() with sha256(app.min.css) in the template ?v= parameter. Template becomes stable when assets don't change. Pages become cacheable.

Single-process build — import validate:links and build:state-graph as functions instead of spawning 4 extra npx tsx processes. Run them in parallel with page generation. Save 2–4 seconds of cold-start overhead.

Parallel orchestrationcollectGitDates starts at t=0 (it's just git log, no dependency on anything). Asset builds run in parallel with validation and state-graph. Template blocks only on CSS+JS hashes. Page generation starts as soon as the template is ready.

The projected numbers:

Scenario Today After
Full build (cache miss) ~10s ~3.5s
Incremental (1 file changed) ~10s ~100ms
Watcher (modify .md) ~2s ~100ms

The 100ms target for a single-file change: asset hash check (~10ms) → template (cached, ~0ms) → render 1 page (~50ms) → write 2 files (~5ms). No worker pool overhead for small batches. No mermaid resolution if no mermaid blocks. No sitemap regeneration if no pages added or deleted.

The 8–12 second build was never slow because of any single step. It was slow because every step ran every time. The fix isn't making steps faster — it's making them not run at all when they don't need to.

⬇ Download