Content Fragments: A 10× Lighter SPA Navigation in One Afternoon
The SPA was fetching 105 KB per click to use 10 KB of it. Fixing it was the easy part. The reason it was easy is the actual story.
The 105 KB Problem
I opened DevTools on my own site, clicked a link in the TOC, and watched the Network panel: qwant.html, 105 KB, text/html. Every single click — hover-prefetched or not — pulled down a full page shell to feed a single outputEl.innerHTML = … assignment. Everything else was noise: the <head> and its meta tags, the JSON-LD structured data, the entire inline TOC rendered per page, the footer, the script tags. My own SPA was treating each navigation as a cold page load minus the parse cost — which is roughly the worst of both worlds.
The code in src/app-static.ts made the waste explicit. On every click it called fetch(href) for the full .html, parsed the response with new DOMParser().parseFromString(html, 'text/html'), fished out a single element with getElementById('markdown-output'), copied its innerHTML into the live DOM, and threw the rest away. Ninety percent of the bytes I paid for on the wire, and a full Document allocation on the main thread, all to move a substring from one element to another. This had been fine for a while because the site is small and fast anyway, but once the TOC started weighing several kilobytes inlined into every page, and once I started seriously prefetching on hover, the cost curve turned wrong.
The fix, stated abstractly, is obvious: serve just the <article> content as a separate file and have the SPA fetch that. The cold visit / SEO / crawler path still hits the full .html, so there's no regression in indexability. But "serve a subset" is the kind of change that tends to metastasize — it usually means touching the build pipeline, the HTTP routing, the client fetch wiring, the history management, the test suite, and probably half a dozen things you didn't expect. I braced for a weekend of chasing edge cases.
It took an afternoon.
Before — 105 KB / click:
After — ~10 KB / click:
The Insight: One Variable Already Held the Answer
The build pipeline for this site runs markdown through a Node worker pool — one worker per page, all sharing an immutable TOC and an index.html template via workerData. Each worker calls renderPage() in scripts/lib/page-renderer.ts, which parses the markdown with marked, wraps headings into <div data-slug> sections, rewrites internal links, collects mermaid blocks into placeholder tokens, then plugs the resulting HTML into the template via page.replace('%%CONTENT_HTML%%', () => html) and a handful of sibling replacements for meta tags, TOC, and JSON-LD.
Read that again: plugs the resulting HTML into the template. The content HTML exists, as a local variable called html, one line before it gets injected into the full page. It is already separated from the shell; the shell was just pasted around it seconds later. The fragment I wanted to serve was never missing — it was only ever one return statement away from existing.
So the whole pipeline rewrite reduced to: capture that variable, add it to BuildPageResult, and let the main thread write it to a second file. No regex extraction, no DOM parsing, no second pass over anything. The worker was already doing exactly the work I needed. I just had to stop throwing away the intermediate result.
This is the part of the refactor that feels cheap in hindsight but isn't, actually. The only reason renderPage() has a clean local for the content is that someone (me, a few months ago) extracted it from an inline monolith into a pure function with explicit intermediates. Had the template substitution been tangled up in the same statement as the markdown render — const page = template.replace('%%CONTENT_HTML%%', marked.parse(content)) — there would be no variable to grab. Every disciplined extraction compounds later.
Pipeline Rewrite, Minimal Blast Radius
The concrete diff on the build side is almost embarrassingly small. First, extend BuildPageResult in scripts/lib/page-renderer.ts with a contentHtml: string field — the rendered article content with mermaid placeholders still in it. Second, inside renderPage(), insert const contentHtml: string = html immediately before the template wrapping begins, and add it to the return object. The sync-only buildPage() in scripts/build-static.ts (used by the unit tests to avoid a worker pool) gets the exact same two-line treatment for symmetry.
The only tricky bit is mermaid. Mermaid code blocks get replaced by placeholder tokens like %%MERMAID_PLACEHOLDER_abc123%% during renderPage(), then the actual SVGs are rendered in parallel on the main thread via Puppeteer, and finally replaceMermaidPlaceholders() substitutes the tokens back into the full HTML string. The function is pure and stateless on its first argument, which means I can apply it twice — once to result.page for the full HTML file, once to result.contentHtml for the fragment — with the same mermaidBlocks and the same mermaidResults. The placeholders are identical in both strings because the second is a substring of the first. No extra rendering, no extra Puppeteer calls, no extra book-keeping.
The write phase in scripts/build-static.ts gains two lines inside the existing loop: push the fragment into the same filesToWrite array that already drives io.writeFilesParallel. Hundreds of files, still written in parallel, still under the same I/O abstraction that the tests mock. The orphan pruner gains a sibling cleanup (when a source .md disappears we unlink its .html AND its .fragment.html) and a filter to skip the .fragment.html files when scanning for pages. That's the whole build-side change — maybe 40 lines of real diff.
Before — one write per page:
After — captured local, dual write, same mermaid pass:
The SOLID Payoff: Nothing Broke Downstream
Here is where the previous work started paying interest. The SPA navigation machine in src/lib/spa-nav-state.ts is a pure finite state machine — no DOM, no fetch, no timers. It orchestrates the fetch → swap → settle → scroll flow entirely through injected callbacks. It has no opinion about what a "fetched HTML" is; it just passes a string through to the swapContent callback and trusts the caller to do the right thing. So changing the payload from a full HTML page to an article fragment required zero changes to the machine, zero changes to its tests, and zero risk of breaking the dozen navigation edge cases it already handles (hash scroll, heading toggle, popstate, abort, error, back-to-back clicks).
The prefetch manager in src/lib/link-prefetch.ts was the same story. It's a state machine over per-href entries — idle → loading → loaded | error — and the fetch function is injected via options.fetch. I wrote that manager months ago specifically so it could be tested without touching the network, which also happens to mean its caller decides what URL to hit. The whole content-fragment migration on the client side is a single function in src/app-static.ts that rewrites foo.html to foo.fragment.html before calling window.fetch, wrapped around the existing fetch closure. Hover prefetching and click navigation both go through the same wiring point, so they both switched over automatically — hover pre-loading is now so cheap that the cost/benefit math on it has completely flipped.
The startFetch callback used to build a Document with DOMParser, extract the <article> with getElementById, and pass the innerHTML downstream. Now it passes the fetched text directly to outputEl.innerHTML — one allocation instead of two, no parser ever touched. I kept a fallback branch that refetches the full .html and DOMParses it if the .fragment.html is missing, strictly as transition insurance during the first deploy; once the fragments are universally in the cache, that branch can be deleted without ceremony.
DRY Between Build and Client
Two tiny shared modules in src/lib/ carry the weight of the "one source of truth" invariant between the build and the client.
src/lib/page-title.ts — thirteen lines — exports formatPageTitle(pageTitle: string): string, which produces ${pageTitle} — Stéphane Erard (with trim, empty-fallback, and dedup when the title already equals the site title). It is imported by buildMetaTags() in the worker, so the <title> tag baked into every static .html comes from this function; it is also imported by the SPA's swapContent, so document.title = formatPageTitle(tocItem.title) uses the same format at runtime. No way for the server-rendered and client-rendered titles to drift — the format exists exactly once, and any change would break both sides identically.
src/lib/fragment-path.ts — sixteen lines — exports fragmentPathForHtmlPath(), which maps content/blog/foo.html to content/blog/foo.fragment.html (with a fail-fast throw on paths that don't end in .html). The build uses it to name the file it writes; the client uses it inside rewriteToFragmentHref() to compute the URL it fetches. Same function, both sides, no hand-rolled string manipulation anywhere. If the convention ever changes — say, to a flat public/fragments/ directory — that change lives in exactly one file and is picked up automatically at both ends of the pipe.
The client doesn't parse any HTML to find the title anymore, either. It looks up the current page in toc.json — which the SPA already loads at boot for the sidebar TOC — and reads item.title directly. The toc contains, for every page, every field I could conceivably need for navigation chrome: title, description, tags, date, icon, the full heading outline. Carrying metadata on top of the fragment would have been redundant with information already sitting in memory.
Tests Caught Everything Before I Noticed
The test side of this repo uses a decorator-based feature-tracking DSL: abstract Feature classes under requirements/features/*.ts declare acceptance criteria as abstract methods, and test classes decorated with @FeatureTest(MyFeature) implement individual ACs via @Implements<MyFeature>('acName'). Tests are auto-registered by the decorators — no describe/it boilerplate. I wrote about this machinery in Hardening the Test Pipeline.
For this change I added three new Feature classes: PageTitleFeature (four ACs around format, dedup, trim, empty fallback), FragmentPathFeature (five ACs around the path mapping, idempotence, and fail-fast), and ContentFragmentFeature (four ACs covering the worker's contentHtml output, its consistency with the article substring in the full page, the use of the shared title formatter, and the derivation of fragment paths from HTML paths). Each Feature got a test file in test/unit/ implementing every AC. The compliance scanner runs in CI (and locally in my pre-push hook) and fails the build if any AC is left without an @Implements pointing at it, so the structure forces honesty.
The test pipeline caught two things I would have missed. First, the strict tsc gate on test/tsconfig.json rejected my mock IO for the content-fragment test because it was missing copyFile and execAsync — two methods on the IO interface that my test didn't care about but that the type required anyway. TypeScript had me adding three lines to the mock before any test actually ran. Second, the compliance scanner caught that I'd initially hand-written describe()/it() registration boilerplate that duplicated the auto-registration — the CI rule against bare describe() use sent me back to delete it. Two small corrections, both surfaced automatically, both before I had to notice them myself.
Full suite at the end: 88 test files, 1717 tests, all green. The refactor touched the worker pipeline, the main thread write phase, the orphan pruner, the client fetch wiring, and the DOM swap logic — and nothing else had to change. Every existing test kept passing, and the new tests landed on top of a machinery that made them almost mechanical to write.
The Numbers
Payload size: 105 KB per XHR navigation before, roughly 10 KB after. The fragment is exactly the article innerHTML with no shell, no TOC, no meta, no scripts. The compression ratio is whatever the article itself compresses to, which is already mostly text and already served brotli by Vercel. Hover prefetching — which used to budget 100 KB per hover as a fixed tax on anyone moving their mouse across the TOC — now costs approximately what it costs to serve the content you were probably going to see anyway. The rational thing is to prefetch earlier and more aggressively, which is exactly what happens now, for free.
Diff size: about 200 lines across twelve files. Two new 13-to-20-line lib modules (src/lib/page-title.ts, src/lib/fragment-path.ts); one new field on BuildPageResult; one captured local in renderPage(); a dozen lines of write-phase changes in scripts/build-static.ts and a sibling cleanup in the orphan pruner; one rewrite function and one simplified startFetch in src/app-static.ts; three Feature classes; three test files; three lines in requirements/index.ts. No FSM changes. No template changes. No routing config. No deploy changes.
Total time: roughly half an afternoon, including reading the code to confirm the pipeline shape, writing the plan, making the edits, fixing two TypeScript errors, removing the registration boilerplate the compliance scanner didn't want, and writing this post.
The Math: What a Full Site Crawl Actually Costs
The 105 KB / 10 KB figures from qwant.html are one data point. Once the change shipped I ran the build and measured every output file on disk. The site has 517 published pages. Average uncompressed size of a full .html: ~590 KB. Average uncompressed size of a .fragment.html: ~53 KB. The eleven-to-one ratio holds across the corpus — qwant.html was actually on the smaller end, because it has no Mermaid SVGs inlined.
A "complete visit" of the site means: arrive on one page (cold load, full HTML, including the inlined TOC and the bootstrap JS), then navigate to all 516 others through the SPA. That's 1 bootstrap + 516 XHRs.
Before, every XHR pulled a full page:
1 × 590 KB (bootstrap, full HTML)
516 × 590 KB (XHR, full HTML each)
─────────────
517 × 590 KB ≈ 305 MB per complete visit1 × 590 KB (bootstrap, full HTML)
516 × 590 KB (XHR, full HTML each)
─────────────
517 × 590 KB ≈ 305 MB per complete visitAfter, the bootstrap is unchanged (cold load still serves the full HTML for SEO and crawlers), but every subsequent click pulls only the article fragment:
1 × 590 KB (bootstrap, full HTML)
516 × 53 KB (XHR, fragment only)
─────────────
≈ 590 KB + 27 MB ≈ 28 MB per complete visit1 × 590 KB (bootstrap, full HTML)
516 × 53 KB (XHR, fragment only)
─────────────
≈ 590 KB + 27 MB ≈ 28 MB per complete visitSavings per complete visit: ~277 MB. A factor of 10.9×, end-to-end. The bootstrap is the same on both sides, so the only thing that changed is the 516 follow-up requests — and those collapsed from 305 MB to 27 MB.
Compression doesn't change the ratio. Vercel serves brotli by default, and both the full pages and the fragments are dominated by the same article text, so the compressed numbers shrink in lockstep — full pages land around 100 KB on the wire, fragments around 9 KB, which is exactly the 105 / 10 figures from the opening paragraph, just measured at a single representative page.
Now extrapolate. A visitor who explores the entire site is rare; the more interesting numbers are aggregates over time. Assume modest traffic — say 100 complete-visit-equivalents per day (a complete-visit-equivalent being any combination of clicks summing to 517 page views, regardless of whether one user did all 517 or 100 users did ~5 each: bandwidth doesn't care):
| Window | Before | After | Saved |
|---|---|---|---|
| 1 day | 100 × 305 MB ≈ 30.5 GB | 100 × 28 MB ≈ 2.8 GB | ~27.7 GB / day |
| 1 month | 30 × 30.5 GB ≈ 915 GB | 30 × 2.8 GB ≈ 84 GB | ~831 GB / month |
| 1 year | 365 × 30.5 GB ≈ 11.1 TB | 365 × 2.8 GB ≈ 1.0 TB | ~10.1 TB / year |
At a more realistic small-site rate of 10 complete-visit-equivalents per day, divide everything by ten: 3 GB / day → ~280 MB / day, 91 GB / month → 8.4 GB / month, 1.1 TB / year → 102 GB / year, saving roughly 1 TB of pointless egress per year.
Even at one complete-visit-equivalent per day — a personal site nobody visits but a single obsessive reader — the math is 111 GB / year → 10 GB / year, a hundred-gigabyte saving on a static site that fits in a tarball.
The rational version of this number is: for every megabyte the site actually needs to deliver, the old design was shipping eleven. Ninety percent of the bandwidth was wasted on the wire and ninety percent of the work on the client (DOMParser allocating a full Document per click) was wasted on the CPU. Whether the absolute numbers matter to your hosting bill or not, the ratio is the same — and on Vercel's free tier, where the bandwidth allowance is 100 GB / month, the difference between 91 GB and 8.4 GB is the difference between "one bad week from a quota wall" and "indefinitely free."
Source vs. shell vs. fragment, totalled across the site
The bandwidth math above is one half of the story. The other half is what those bytes actually are. Below is the totals table for every markdown file under content/ — 513 source files — measured against the corresponding .html and .fragment.html produced by the build, broken down by top-level section so the composition is visible at a glance.
| Section | Files | Markdown | Full HTML | Fragment HTML | full ÷ md | frag ÷ md |
|---|---|---|---|---|---|---|
blog/ |
472 | 11.37 MB | 281.03 MB | 26.77 MB | 24.7× | 2.35× |
philosophy/ |
8 | 353 KB | 4.80 MB | 479 KB | 13.6× | 1.36× |
projects/ |
12 | 88 KB | 6.62 MB | 160 KB | 75.0× | 1.81× |
skills/ |
10 | 41 KB | 5.45 MB | 67 KB | 132.4× | 1.63× |
experience/ |
10 | 32 KB | 5.43 MB | 45 KB | 168.2× | 1.38× |
education/ |
6 | 8.6 KB | 3.24 MB | 12.5 KB | 375.2× | 1.45× |
(root) |
1 | 4.9 KB | 544 KB | 6.1 KB | 110.4× | 1.24× |
| Total | 513 | 11.87 MB | 303.84 MB | 27.49 MB | 25.6× | 2.32× |
Two things jump out.
First: a fragment is barely bigger than its markdown source. 27.49 MB of fragments for 11.87 MB of markdown — a 2.32× expansion. That ratio is essentially the cost of turning # heading into <h2>heading</h2>, of inlining mermaid SVGs, and of wrapping headings into <div data-slug> sections. There is almost no shell in a fragment. It is the markdown, rendered, and that's it. The 2.35× on blog/ is dragged up by Mermaid SVGs (which can be tens of kilobytes per diagram); the small content sections sit between 1.24× and 1.81×, which is what pure-text rendering overhead actually looks like.
Second: a full HTML file is 25.6× the size of its markdown source, on average across the whole corpus. For tiny pages like experience/* or education/*, the ratio is hundreds of times larger — 168×, 375× — because the page-shell cost (head, JSON-LD, TOC, footer, scripts) is roughly constant per page, so a 1 KB markdown stub blows up into a 540 KB HTML file. The shell is the same shell every time. That's exactly what was being shipped on every XHR navigation under the old design: ~290 MB of the same shell, repeated 517 times, glued around 27 MB of actual content.
Subtract: 303.84 MB − 27.49 MB ≈ 276 MB of pure shell across the full site build. That number — 276 MB of bytes that exist solely to be discarded by the SPA on every XHR — is the structural waste the content-fragments change made disappear. The bandwidth saving per visit (~277 MB, computed independently above) lines up almost exactly with the on-disk size of the shell, which is the right cross-check: what the SPA was over-fetching is precisely what the build was over-bundling.
There's a second-order effect that the bandwidth math doesn't capture: hover prefetching used to be a luxury budget item — every mouseover on the TOC speculatively spent ~100 KB on a page the user might never visit. Now it costs ~9 KB per hover, which is below the noise floor of any reasonable connection. The economically rational prefetch policy went from "be conservative, only prefetch high-confidence intent signals" to "prefetch everything you can see, the cost is gone." That's not a 10× speedup of the existing flow; it's a category change in what the prefetcher is allowed to do.
What I Actually Wrote
None of this should be surprising. It isn't, to anyone who's been disciplined about their code for any length of time — which is exactly the point. The hard, invisible work that made today's refactor cheap was done in sessions long before today. Someone extracted the markdown renderer from build-static.ts into a pure scripts/lib/page-renderer.ts with a typed BuildPageResult. Someone wrote the prefetch manager with an injected fetch option so it could be tested without the network. Someone wrote the SPA navigation as a pure state machine with injected callbacks. Someone hardened the test pipeline so that new Features get auto-registered, ACs get enforced, and tsc fails fast on missing properties. Every one of those small investments is why I spent half an afternoon on this instead of a weekend.
SOLID and DRY don't feel expensive when you're writing them. Separation of concerns feels slow because you're writing two files where one would have "worked." Fetch injection feels unnecessary because the URL is right there. A shared title helper feels silly for thirteen lines of code. But when you come back six months later to change something upstream — to swap out a payload format, to add a new artifact alongside an existing one, to rewire a fetch target — the compound interest of all those small decisions is the difference between an afternoon's diff and a refactor that takes over your week.
The meta-twist of this post is that I'm writing it inside the very machinery it describes. When you navigate to this page from another post on the site, your browser is fetching content-fragments.fragment.html, not content-fragments.html. The build emitted both. The SPA picked the lighter one. Open DevTools on the way in if you want to see the trick land.