The Loop: Claude Code + Quality Gates as a Self-Reinforcing Test Augmentation Cycle

An AI that writes tests without a quality gate is a random text generator. A quality gate without an AI to feed it is a number nobody reads. Put them together and you get something neither can do alone: a self-reinforcing cycle that grinds coverage upward until the gate says stop.

The Problem: The Coverage Plateau

Every team I've worked with hits the same wall. You adopt a testing culture, write unit tests for your new features, maybe even enforce coverage in CI. Coverage climbs to 60%, then 70%. Then it stops.

Not because anyone decided to stop. Because the remaining 30% is the hard part — the error paths nobody wants to think about, the branch conditions buried in complex state machines, the edge cases that require elaborate setup. Writing tests for the easy 70% is pleasant. Writing tests for the next 20% is tedious. Writing tests for the last 10% feels like punishment.

So the coverage report becomes write-only. Generated every build, read by nobody. The dashboard exists. The number doesn't move.

And here's the cruel part: even when coverage is "high," it can lie. Line coverage tells you "this line was executed during a test." It does not tell you "this test would fail if this line changed." A test that calls a function without asserting anything achieves 100% line coverage and detects exactly zero bugs. That's where mutation testing comes in — and where the gap between "covered" and "tested" becomes visible.

The real problem isn't that teams lack testing tools. It's that someone needs to:

Read the coverage report
Understand which lines are uncovered and why
Read the source code to understand the logic
Write a test that exercises the missing path
Run it, check coverage again
Repeat

That "someone" is expensive, gets bored, and has features to ship.

The Insight: Close the Loop

What if that "someone" isn't a someone?

The ingredients have been sitting on the table for years:

Quality gates produce machine-readable output — JSON coverage reports, XML Cobertura files, Stryker mutation JSON, structured summaries with per-method metrics
Claude Code can read files, understand code context, write code, run commands, and read the output — all inside the project, not in a browser window
Thresholds define an objective target — not "write more tests" but "reach 95% branch coverage and 80% mutation score"

The missing connection was always: who reads the report and acts on it?

The answer: the same AI agent that can read the code. The quality gate is the objective. Claude is the executor. The human is the architect who sets the bar and reviews the output.

  ┌─────────────────────────────────────────────────────────────────┐
  │                        THE LOOP                                 │
  │                                                                 │
  │   Human sets threshold                                          │
  │         │                                                       │
  │         ▼                                                       │
  │   ┌───────────┐     ┌───────────┐    ┌──────────────────┐       │
  │   │  Claude   │────►│ Run tests │───►│ Read coverage +  │       │
  │   │  writes   │     │ + collect │    │ mutation reports │       │
  │   │  tests    │     │ metrics   │    │ (machine-        │       │
  │   └───────────┘     └───────────┘    │  readable JSON)  │       │
  │         ▲                            └────────┬─────────┘       │
  │         │                                    │                  │
  │         │                                    ▼                  │
  │         │                           ┌──────────────────┐        │
  │         │                           │ Gate passes?     │        │
  │         │                           └───────┬──┬───────┘        │
  │         │                              NO   │  │  YES           │
  │         │                                   │  │                │
  │         │         ┌─────────────────┐       │  ▼                │
  │         │         │ Identify        │◄──────┘  ┌────────────┐   │
  │         └─────────│ uncovered lines │          │ Done.      │   │
  │                   │ + branches      │          │ Human      │   │
  │                   └─────────────────┘          │ reviews +  │   │
  │                                                │ ratchets   │   │
  │                                                └────────────┘   │
  └─────────────────────────────────────────────────────────────────┘

This is The Loop. It's not a framework. It's not a product. It's a workflow pattern I've used hundreds of times across a 57-project .NET monorepo and a TypeScript CV website. It works because each component does what it's best at:

The human decides what quality means (thresholds, which metrics matter, when to ratchet)
The quality gate measures objectively (no opinions, no fatigue, no shortcuts)
Claude does the grinding (reads reports, reads code, writes tests, iterates)

The Loop in Practice: .NET

The Setup

In my .NET ecosystem, the quality gate is FrenchExDev.Net.QualityGate — a Roslyn-powered tool that analyzes syntax trees and enforces thresholds for complexity, coupling, cohesion, coverage, and mutation testing. Each project has a quality-gate.yml:

# quality-gate.yml — Injectable project
solution: FrenchExDev.Net.Injectable.slnx
gates:
  max-cyclomatic-complexity: 15
  max-cognitive-complexity: 20
  max-class-coupling: 55
  max-inheritance-depth: 5
  min-maintainability-index: 55
  max-lcom: 25
  max-distance-from-main-sequence: 1.0
  min-test-quality-score: 0.80   # Combined coverage + mutation

That last line — min-test-quality-score: 0.80 — is the key. It's not just coverage. It's a combined score of line coverage and mutation testing. You can't game it with empty assertions.

A Typical Session

Here's what The Loop looks like in practice. I open Claude Code in the Injectable project directory:

me:     Run the quality gate. Let's see where we stand.

claude: Running `dotnet quality-gate test`...

        ┌─────────────────────────────────────────────────┐
        │ QualityGate Report — Injectable                 │
        ├─────────────────────────────────────────────────┤
        │ Cyclomatic Complexity    12/15         ✓ PASS   │
        │ Cognitive Complexity     18/20         ✓ PASS   │
        │ Class Coupling           42/55         ✓ PASS   │
        │ LCOM4                    8/25          ✓ PASS   │
        │ Test Quality Score       0.71/0.80     ✗ FAIL   │
        │   Line Coverage          78%                    │
        │   Mutation Score         64%                    │
        └─────────────────────────────────────────────────┘

        Gate FAILED. Test quality score is 0.71, threshold is 0.80.
        Reading report.json for details...

        Lowest coverage areas:
        - InjectableEmitter.cs: 62% lines, 48% branches
          Missing: lines 87-112 (error handling for malformed attributes)
          Missing: lines 145-163 (generic type parameter resolution)
        - InterfaceContractModel.cs: 71% lines, 55% branches
          Missing: lines 34-52 (nullable reference type handling)

        Surviving mutants (12 total):
        - InjectableEmitter.cs:94  — `if (x > 0)` mutated to `if (x >= 0)` — SURVIVED
        - InjectableEmitter.cs:103 — `return null` mutated to `return ""` — SURVIVED
        - InterfaceContractModel.cs:41 — `== null` mutated to `!= null` — SURVIVED
        ...

me:     Fix InjectableEmitter first. Follow the existing test patterns.

claude: Reading InjectableEmitter.cs...
        Reading existing test file InjectableEmitterTests.cs...

        I see the test pattern: in-memory Roslyn compilations with
        fake source text, assertions on emitted code via
        `VerifyGeneratedOutput()`. Let me add tests for the missing
        paths.

        [writes 6 new test methods targeting error handling + generics]

        Running `dotnet test`... 47 passed, 0 failed.
        Running `dotnet quality-gate test`...

        Test Quality Score: 0.76/0.80  — still below threshold.
        3 mutants still surviving in InjectableEmitter.cs.

        The surviving mutant at line 94 needs a boundary test.
        The mutation changes `> 0` to `>= 0`, so I need a test
        where the parameter count is exactly 0.

        [writes boundary test for parameterCount == 0]

        Running `dotnet quality-gate test`...

        Test Quality Score: 0.82/0.80  ✓ PASS

Three iterations. Maybe 15 minutes. The quality gate went from 0.71 to 0.82. The human (me) made one decision: "Fix InjectableEmitter first." Claude did the rest.

The .NET Flow

Diagram — The .NET loop on a single feature: Claude runs the gate, reads report.json, identifies uncovered code and surviving mutants, writes targeted tests, and re-runs until the gate is green.

Why It Works for .NET

Roslyn gives Claude precise context. The QualityGate report.json doesn't just say "coverage is low." It says "method EmitRegistration in InjectableEmitter.cs has 48% branch coverage, and mutant at line 94 survived because no test exercises the zero-parameter path." That's actionable. Claude reads the source, understands the branch, and writes the test.

The test infrastructure matters too. Projects in my ecosystem use in-memory Roslyn compilations — you create fake source text, run the generator, and assert on the emitted code. No filesystem, no process spawning, no flakiness. Claude learns this pattern from existing tests and replicates it. The patterns come from the human. The volume comes from Claude.

The Loop in Practice: TypeScript/Vitest

The Setup

This website — the one you're reading — has a comprehensive test suite with Vitest coverage gates that act as the quality gate:

// vitest.config.js — the quality gate for this website
thresholds: {
  // Pure state machine logic — high bar
  'src/lib/**/*.ts': {
    statements: 98,
    branches: 95,
    functions: 98,
    lines: 99,
  },
  // Build pipeline — total coverage via dependency injection
  'scripts/build-static.js': {
    statements: 100,
    branches: 100,
    functions: 100,
    lines: 100,
  },
}

98% statements. 95% branches. 99% lines. For the build pipeline: 100% across the board. These aren't aspirational — they're enforced. npx vitest run returns exit code 1 if any threshold is violated.

A Typical Session

me:     I just refactored the scroll spy state machine. Run
        the tests, let's see the damage.

claude: Running `npx vitest run --coverage`...

        Test Files  144 passed (144)
        Tests       2812 passed (2812)

        Coverage:
        src/lib/scroll-spy.ts
          Statements: 94.2%  (threshold: 98%)  ✗
          Branches:   88.1%  (threshold: 95%)  ✗
          Functions:  100%   (threshold: 98%)  ✓
          Lines:      94.8%  (threshold: 99%)  ✗

        Coverage thresholds not met. Reading coverage report...

        Uncovered branches in scroll-spy.ts:
        - Line 47: `else` branch when headingElements is empty
        - Line 63: early return when scrollY < firstHeadingOffset
        - Lines 78-82: fallback when no heading is in viewport
        - Line 91: `else` branch when activeId === previousId

me:     Cover those branches. Check the existing scroll-spy
        tests for the pattern.

claude: Reading test/unit/scroll-spy.test.ts...

        The existing tests use a mock DOM factory:
        `createMockHeadings(count, offsets)` that returns fake
        elements with getBoundingClientRect stubs. I'll follow
        this pattern.

        [writes 4 new test cases]

        Running `npx vitest run --coverage`...

        src/lib/scroll-spy.ts
          Statements: 98.5%  ✓
          Branches:   95.2%  ✓
          Functions:  100%   ✓
          Lines:      99.1%  ✓

        All thresholds met.

me:     Now run compliance.

claude: Running `npx tsx scripts/compliance-report.ts --strict`...

        97 features, 829 acceptance criteria, 100% coverage.
        Total tests linked to ACs: 2812 (2757 unit + 55 e2e)
        Runtime coverage warnings: 0
        Unbound features: 0
        Orphan source files: 0
        Quality gate: PASS

Two iterations for the coverage gate. One pass for compliance. The refactored state machine is tested, the thresholds hold, and the typed specifications confirm that every acceptance criterion is still linked to a test.

Beyond Coverage: The Compliance Scanner

The Vitest thresholds catch line and branch coverage. But coverage doesn't tell you whether you're testing the right things. That's what the compliance scanner does.

This website has 97 typed feature specifications with 829 acceptance criteria, verified by 2812 tests (2757 unit + 55 e2e). Each test is linked to features via @Implements decorators. The compliance scanner reads the features, scans test files for decorator references, and builds a coverage matrix:

   ID                   Title                              Total  Covered   TU  E2E     %  src
  ──────────────────────────────────────────────────────────────────────────────────────────────
✓  NAV                  SPA Navigation + Deep Links            8        8    4    4  100%  src 100% (1 file)
✓  THEME                Theme Switching                        5        5    5    0  100%  src 100% (1 file)
✓  SEARCH               Search                                 5        5    5    0  100%  src 100% (1 file)
✓  SPY                  Scroll Spy                            12       12    6    6  100%  src 100% (1 file)
✓  HOT-RELOAD           WebSocket Hot Reload                  43       43   43    0  100%  src 100% (7 files)
✓  TEST-BINDINGS-INF    Test-driven bindings inference        23       23   23    0  100%  src 100% (1 file)
...
  Features: 97 active
  Acceptance criteria: 829/829 ACs covered (100%)
  Total tests linked to ACs: 2812 (2757 unit + 55 e2e)
  Orphan source files: 0
  Quality gate: PASS

If Claude writes tests that satisfy the coverage gate but miss an acceptance criterion, the compliance scanner catches it. Two gates, two dimensions: coverage measures breadth, compliance measures intent. The Handoff article goes deeper into how the AST scanner infers the traceability graph transitively — proving that each test actually calls the code it claims to verify.

Why This Works: The Three Prerequisites

The Loop isn't magic. It works because three conditions are met simultaneously:

Prerequisite	.NET Example	TypeScript Example	Why It Matters
Machine-readable reports	`report.json` with per-method metrics	V8 coverage JSON + compliance JSON	Claude needs structured data, not a dashboard screenshot
Code-reading AI agent	Claude reads Roslyn-analyzed source	Claude reads TS modules + test files	Not a chatbot — an agent that works inside the project
Objective threshold	`min-test-quality-score: 0.80` in YAML	`branches: 95` in vitest.config.js	The gate defines "done" — without it, the loop has no termination condition

Remove any one of these and the system breaks:

No machine-readable reports? Claude can't know what's missing. It would have to guess, and guessing means writing redundant tests that cover already-covered paths.
No code-reading agent? The reports exist but nobody acts on them. We're back to write-only dashboards.
No objective threshold? The loop has no termination condition. "Write more tests" is not a goal — "reach 95% branch coverage" is.

This is why dotnet quality-gate check outputs JSON and why Vitest has a json-summary reporter. Machine-readable output isn't a nice-to-have. It's the interface between the gate and the agent.

The Ratchet: Thresholds Only Tighten

Quality gates are not aspirational goals. They are ratchets. They only move in one direction: up.

The Loop accelerates the ratchet. Before The Loop, tightening a threshold meant a human had to write the missing tests. That's a cost in time and motivation. Now, tightening a threshold means telling Claude "the bar is now 85%." The cost is one sentence.

Here's what the progression looks like on a real project:

Iteration	Threshold	Before	After	Iterations	Human Intervention
Week 1	60%	52%	63%	2	Set initial threshold
Week 3	75%	63%	77%	3	Ratcheted to 75%, reviewed new tests
Week 5	85%	77%	87%	4	Ratcheted to 85%, rewrote 2 naive tests
Week 7	95%	87%	96%	5	Ratcheted to 95%, added property-based tests

Notice the pattern: as the threshold climbs, the iterations increase. The easy coverage is fast. The hard coverage takes more cycles and more human review. That's expected — and that's where the human's judgment matters most.

The human reviews every ratchet step. Some tests Claude writes at the 60% level are acceptable. At the 95% level, you're in edge-case territory where semantic correctness matters more than structural coverage. That's when I rewrite tests, add property-based invariants with fast-check, or redesign the test strategy entirely.

The Loop doesn't replace the human. It changes what the human does — from writing tests to reviewing and directing them.

Addressing Skepticism

I've heard every objection. Let me address them head-on.

"AI just generates trivial assertions"

Without a quality gate, yes. Claude will happily write expect(result).toBeDefined() and call it a day. That's why the quality gate exists.

But the real answer is mutation testing. A trivial assertion lets mutants survive. Consider:

// Source code
public int CalculateDiscount(int quantity)
{
    if (quantity > 10)
        return quantity * 2;
    return quantity;
}

A naive test:

[Fact]
public void CalculateDiscount_Returns_Value()
{
    var result = CalculateDiscount(15);
    Assert.True(result > 0);  // trivial — always true for positive input
}

This achieves 100% line coverage. But Stryker mutates quantity > 10 to quantity >= 10 — and the test still passes. The mutant survives. The min-test-quality-score gate fails.

Claude reads the Stryker report, sees the surviving mutant, and writes:

[Fact]
public void CalculateDiscount_BoundaryAt10_NoDiscount()
{
    var result = CalculateDiscount(10);
    Assert.Equal(10, result);  // boundary: exactly 10 → no discount
}

[Fact]
public void CalculateDiscount_Above10_DoubleDiscount()
{
    var result = CalculateDiscount(11);
    Assert.Equal(22, result);  // 11 → 11 * 2 = 22
}

Now the mutant dies. The gate passes. Mutation testing is the antidote to trivial assertions, and Claude reads mutation reports as naturally as coverage reports.

"AI doesn't understand the business domain"

Correct. Claude doesn't know that CalculateDiscount is a pricing rule with tax implications. That's why the human designs the architecture, writes the typed feature specifications, and reviews the output.

Claude writes tests that satisfy structural quality gates — coverage and mutation scores. The human ensures semantic quality — that the right behaviors are tested, that the assertions match business intent, and that the typed specs link tests to acceptance criteria.

The division of labor is clear:

  BEFORE (All Human)                 AFTER (The Loop)
  ═══════════════════                ═════════════════════

  Human decides what to test         Human decides what to test
  Human writes the test              Claude writes the test
  Human runs the test                Claude runs the test
  Human reads the report             Claude reads the report
  Human fixes the gap                Claude fixes the gap
  Human gets bored at 70%            Claude doesn't get bored
  Coverage plateaus                  Coverage meets gate
                                     Human reviews + approves

The first line is identical. The human always decides what to test. The Loop automates the tedious part: writing, running, reading, fixing.

"AI tests are brittle"

Tests that test implementation details are brittle regardless of who writes them. If your project tests mock.Verify(x => x.CallDatabase(), Times.Exactly(3)), those tests will break when you refactor — whether a human or an AI wrote them.

Claude follows the project's existing test patterns. If the project uses fakes and in-memory compilations (as QualityGate does), Claude writes tests that way. If the project uses dependency-injected mock I/O (as this website's build pipeline does), Claude uses that pattern. The conventions come from the human. The volume comes from Claude.

Want non-brittle tests? Write non-brittle patterns. Claude will replicate them.

"You're just gaming coverage numbers"

This objection assumes coverage is the only metric. It's not.

The QualityGate's min-test-quality-score averages coverage and mutation score. The website's compliance scanner checks that every acceptance criterion has a linked test. Together, they form a three-dimensional quality measure:

Coverage — "was this code executed during tests?"
Mutation score — "would this test catch a bug in this code?"
Compliance — "are we testing the right features?"

You cannot game all three simultaneously. High coverage with low mutation score means weak assertions — the gate fails. High mutation score with missing compliance means you're testing the wrong things — the scanner fails. The Loop converges on genuine test quality because the gates measure it from multiple angles.

The Compound Effect: 57 Projects

A single project, this is useful. Across a 57-project monorepo, it's transformative.

Each project in FrenchExDev has its own quality-gate.yml. Each has its own thresholds, calibrated to the project's maturity. A new library starts at 60%. A stable library sits at 95%. Claude reads the project structure, finds the existing test patterns, and adapts.

me:     Run quality gates across all Injectable projects.

claude: [runs dotnet quality-gate check for each project]

        Results:
        ┌──────────────────────────────────────┬──────────┬───────────┐
        │ Project                              │ Score    │ Status    │
        ├──────────────────────────────────────┼──────────┼───────────┤
        │ Injectable.Lib                       │ 0.91     │ ✓ PASS    │
        │ Injectable.Abstractions              │ 0.88     │ ✓ PASS    │
        │ Injectable.SourceGenerator.Lib       │ 0.74     │ ✗ FAIL    │
        │ Injectable.Microsoft.SourceGenerator │ 0.68     │ ✗ FAIL    │
        │ Injectable.Tests                     │ N/A      │ test proj │
        └──────────────────────────────────────┴──────────┴───────────┘

        2 projects below threshold (0.80):
        - SourceGenerator.Lib: 0.74 (gap: 0.06)
        - Microsoft.SourceGenerator: 0.68 (gap: 0.12)

me:     Fix SourceGenerator.Lib first. Smaller gap.

claude: Reading SourceGenerator.Lib source and test files...
        Found 23 existing tests in InjectableCollectorTests.cs
        and EmitterTests.cs. Pattern: in-memory compilation
        with SyntaxFactory, assertions on generated source text.

        [iteration 1: writes 8 tests for uncovered branches]
        Score: 0.78. Still 2 surviving mutants.

        [iteration 2: writes boundary tests for mutant at line 112]
        Score: 0.81. Gate passes.

        Moving to Microsoft.SourceGenerator...
        Reading source and test files...
        Found 15 existing tests. Same compilation pattern,
        plus Microsoft DI container assertions.

        [iteration 1: writes 11 tests]
        Score: 0.75.

        [iteration 2: writes 6 more tests targeting nullable handling]
        Score: 0.79.

        [iteration 3: writes property-based test for generic resolution]
        Score: 0.83. Gate passes.

me:     Good. Review the property-based test before I commit.

Five iterations total across two projects. Claude read the patterns from existing tests, wrote new ones that matched, and iterated until both gates passed. My contribution: two sentences of direction and a final review.

This is what expanding what a single developer can realistically build and maintain looks like. Not replacing judgment — compressing the distance between deciding what quality means and achieving it.

What Claude Gets Right (and Wrong)

Intellectual honesty demands this section. The Loop isn't perfect.

What Claude Gets Right

Pattern replication. Give Claude 5 tests as examples and it will write 50 more in the same style. Test patterns are highly regular, which plays to AI's strength.
Coverage grinding. Claude will patiently write the 47th test for the 47th branch without complaining. Humans check out mentally around test #12.
Report reading. Claude parses JSON coverage reports and Stryker mutation reports accurately. It maps line numbers to source code, identifies the specific branch or mutant, and targets the gap.
Boundary detection. Once Claude reads a mutation report showing > 0 mutated to >= 0, it reliably writes boundary tests at 0, -1, 1, int.MaxValue. The mutation report teaches it what matters.

What Claude Gets Wrong

Semantic assertions. Claude will assert that a function returns "something" but may not assert the right "something." At low thresholds (60-75%), this is fine — you're building coverage mass. At high thresholds (90%+), you need to review assertions for business meaning.
Over-mocking. If existing tests use mocks, Claude will use mocks everywhere — even when an integration test would be more valuable. The human needs to set the pattern correctly.
Test naming. Claude writes descriptive but sometimes redundant test names. I regularly rename tests during review. This is cosmetic, not structural.
Complex state setup. For state machines with 15+ states and complex transition guards, Claude sometimes writes tests that achieve coverage through unrealistic state combinations. The quality gate catches this indirectly (surviving mutants), but human review catches it faster.

The pattern is clear: Claude handles structural quality (coverage, mutation killing) well. Semantic quality (does this test make business sense?) requires human review. The Loop doesn't eliminate the human — it changes what the human does.

What You Need to Set This Up

1. A Machine-Readable Quality Gate

For .NET: FrenchExDev.Net.QualityGate with report.json output. Or any tool that produces Cobertura XML + Stryker JSON.

For JavaScript/TypeScript: Vitest with json-summary reporter and coverage thresholds in vitest.config.js. Or Jest with --coverageReporters=json-summary and coverageThreshold in jest.config.js.

For other stacks: Any tool that (a) produces machine-readable output and (b) returns exit code 1 on failure.

The key requirement: the report must be granular enough for Claude to identify specific uncovered lines and branches. A summary that says "coverage is 72%" is useless. A report that says "line 47 of scroll-spy.ts: uncovered branch in else clause" is actionable.

2. Claude Code with Project Access

Not a chatbot. Not copy-paste from a browser. Claude Code — the CLI agent that reads files, runs commands, writes code, and iterates. The Loop requires an agent that can:

Run test commands (dotnet test, npx vitest run)
Read coverage report files (JSON, XML)
Read source code to understand what needs testing
Write test files following existing patterns
Re-run tests to verify

All of this happens inside the project. Claude sees the same files you do.

3. Existing Test Patterns

Claude needs patterns to follow. If you have zero tests, write the first 10 yourself. Establish:

How test files are organized (one per module? one per feature?)
What assertion style you use (xUnit? NUnit? Vitest's expect?)
How dependencies are handled (mocks? fakes? DI?)
How test data is created (factories? builders? literals?)

Claude will replicate whatever you give it. The quality of The Loop's output is directly proportional to the quality of the patterns you seed it with.

4. Start Low, Ratchet Up

Don't set 95% on day one. If your current coverage is 52%, set the threshold at 60%. Run The Loop. Review the tests. If they're good, ratchet to 70%. Repeat.

Each ratchet step is a conversation:

me:     Coverage is at 77%. I'm raising the threshold to 85%.
        The failing gate should be in the state machine modules.
        Get us there.

claude: [runs tests, reads report, identifies gaps, iterates]

The Loop is most effective when the gap between current and target is 5-15 percentage points. Larger gaps work but produce more tests to review in a single session.

5. Always Review

The Loop is not "fire and forget." The human reviews every batch of tests Claude writes. At low thresholds, the review is quick — "yes, these cover the basic paths." At high thresholds, the review is more careful — "this assertion checks the return type but not the value; rewrite it."

The review is also where you catch structural improvements that Claude won't suggest: "these 6 tests should be a property-based test instead" or "this mock should be a fake."

The Philosophy: Quality as a System Property

This article is really about one idea from Don't Put the Burden on Developers:

Every recurring failure is a structural gap, not a discipline problem.

Low test coverage is not a discipline problem. Developers don't lack the skill to write tests. They lack the time, the motivation, and — honestly — the patience to grind through the last 30% of coverage on a Thursday afternoon when there are features to ship.

The Loop is a structural solution to this. The quality gate defines the floor. Claude does the grinding. The human controls the system — setting thresholds, reviewing output, ratcheting upward.

The AI is not the author. The quality gate is not the judge. Together, they are a feedback loop. And the human — the architect — decides when the loop runs, what it targets, and when to raise the bar.

The future of test writing is not "AI writes all tests." It's this:

The human decides what quality means. The gate measures it. The AI grinds toward it. The human reviews, approves, and ratchets upward. Repeat.

That's The Loop.

`[` or `Alt+S`	Focus sidebar navigation
`]` or `Alt+C`	Focus main content
`↑` `↓`	Navigate between sidebar items
`Enter`	Open page / toggle section
`Space`	Toggle section expand/collapse
`Escape`	Close overlay / sidebar

`Ctrl+K`	Open search
`?`	Show this help

`Ctrl+=` or `Ctrl+↑`	Increase font size
`Ctrl+−` or `Ctrl+↓`	Decrease font size
`f`	Open console font selector

`Ctrl+⇧+=` or `Ctrl+⇧+↑`	Browser zoom in
`Ctrl+⇧+−` or `Ctrl+⇧+↓`	Browser zoom out
`Ctrl+⇧+0`	Reset browser zoom

`Tab`	Focus a diagram or image
`Enter`	Open full size overlay
`+` `−`	Zoom in / out (in overlay)
`Escape`	Close overlay, return focus

The Loop: Claude Code + Quality Gates as a Self-Reinforcing Test Augmentation Cycle📋

The Problem: The Coverage Plateau📋

The Insight: Close the Loop📋

The Loop in Practice: .NET📋

The Setup📋

A Typical Session📋

The .NET Flow📋

Why It Works for .NET📋

The Loop in Practice: TypeScript/Vitest📋

The Setup📋

A Typical Session📋

Beyond Coverage: The Compliance Scanner📋

Why This Works: The Three Prerequisites📋

The Ratchet: Thresholds Only Tighten📋

Addressing Skepticism📋

"AI just generates trivial assertions"📋

"AI doesn't understand the business domain"📋

"AI tests are brittle"📋

"You're just gaming coverage numbers"📋

The Compound Effect: 57 Projects📋

What Claude Gets Right (and Wrong)📋

What Claude Gets Right📋

What Claude Gets Wrong📋

What You Need to Set This Up📋

1. A Machine-Readable Quality Gate📋

2. Claude Code with Project Access📋

3. Existing Test Patterns📋

4. Start Low, Ratchet Up📋

5. Always Review📋

The Philosophy: Quality as a System Property📋

Further Reading📋

The Loop: Claude Code + Quality Gates as a Self-Reinforcing Test Augmentation Cycle

The Problem: The Coverage Plateau

The Insight: Close the Loop

The Loop in Practice: .NET

The Setup

A Typical Session

The .NET Flow

Why It Works for .NET

The Loop in Practice: TypeScript/Vitest

The Setup

A Typical Session

Beyond Coverage: The Compliance Scanner

Why This Works: The Three Prerequisites

The Ratchet: Thresholds Only Tighten

Addressing Skepticism

"AI just generates trivial assertions"

"AI doesn't understand the business domain"

"AI tests are brittle"

"You're just gaming coverage numbers"

The Compound Effect: 57 Projects

What Claude Gets Right (and Wrong)

What Claude Gets Right

What Claude Gets Wrong

What You Need to Set This Up

1. A Machine-Readable Quality Gate

2. Claude Code with Project Access

3. Existing Test Patterns

4. Start Low, Ratchet Up

5. Always Review

The Philosophy: Quality as a System Property

Further Reading