Validation: Quality Gates vs Roslyn Analyzers
Validation is the moment of truth. Everything before — requirements, specifications, implementations, tests — leads to this question: is the system correct? Both approaches answer this question, but they answer it at different times, in different ways, with different consequences.
When You Find Out
The single most important difference in validation is timing:
Spec-Driven Timeline:
Developer AI Agent Code Tests CI/CD Quality
writes PRD → reads PRD → generates → run tests → pipeline → gate
↑
HERE
you find out
Typed Specification Timeline:
Developer Compiler Developer Compiler Tests CI/CD
adds AC → fires REQ101 → adds spec → fires CS0535 → pass → pipeline
↑ ↑
HERE HERE
you find out you find outSpec-Driven Timeline:
Developer AI Agent Code Tests CI/CD Quality
writes PRD → reads PRD → generates → run tests → pipeline → gate
↑
HERE
you find out
Typed Specification Timeline:
Developer Compiler Developer Compiler Tests CI/CD
adds AC → fires REQ101 → adds spec → fires CS0535 → pass → pipeline
↑ ↑
HERE HERE
you find out you find outWith spec-driven, validation happens at the end of the pipeline — after code is generated, after tests are written, after CI runs. The feedback loop is: write → generate → test → deploy → check → fix → repeat.
With typed specifications, validation happens continuously — every time the compiler runs. The feedback loop is: change type → compiler fires → fix → compiler happy. There is no "check at the end." The compiler checks at every step.
Why Timing Matters
A bug found at compile time costs seconds to fix. A bug found at the CI quality gate costs minutes. A bug found in production costs hours or days. The same defect, found at different times, has radically different costs.
The spec-driven quality gate catches defects after code generation and testing. This is late — the AI has already generated code, the developer has already reviewed it, the tests have already been written. If the quality gate fails, everything downstream must be redone.
The typed specification analyzer catches defects the moment the type changes. The developer hasn't written any code yet. The AI agent hasn't generated anything. The compiler simply says "this AC has no spec" — and the developer creates the spec before writing a single line of implementation.
This is the difference between prevention and detection. Typed specifications prevent defects by making them impossible to introduce. Spec-driven quality gates detect defects after they've been introduced.
The Spec-Driven Quality Gate System
The spec-driven framework defines quality gates at multiple pipeline stages:
Pre-Commit Gates
Quality Gate: Pre-Commit
Checks:
- Fast unit tests pass
- Linting rules pass
- No secrets in code
Failure action: Block commitQuality Gate: Pre-Commit
Checks:
- Fast unit tests pass
- Linting rules pass
- No secrets in code
Failure action: Block commitCommit Gates
Quality Gate: Commit
Checks:
- Comprehensive unit tests pass
- Integration tests pass
- Code coverage > 80% (line), > 75% (branch), > 90% (function)
- Coding practices validated (SOLID, DRY)
Failure action: Block mergeQuality Gate: Commit
Checks:
- Comprehensive unit tests pass
- Integration tests pass
- Code coverage > 80% (line), > 75% (branch), > 90% (function)
- Coding practices validated (SOLID, DRY)
Failure action: Block mergePre-Deployment Gates
Quality Gate: Pre-Deployment
Checks:
- E2E tests pass
- Security scan clean
- Performance tests within budget (p95 < 2s)
- Mutation score > 80%
Failure action: Block deploymentQuality Gate: Pre-Deployment
Checks:
- E2E tests pass
- Security scan clean
- Performance tests within budget (p95 < 2s)
- Mutation score > 80%
Failure action: Block deploymentPost-Deployment Gates
Quality Gate: Post-Deployment
Checks:
- Smoke tests pass
- Error rate < 1%
- Response time within SLA
Failure action: RollbackQuality Gate: Post-Deployment
Checks:
- Smoke tests pass
- Error rate < 1%
- Response time within SLA
Failure action: RollbackStrengths
Progressive validation. Each stage catches different defect types. Fast feedback for simple issues (lint), deeper feedback for complex issues (mutation testing), production validation for operational issues (error rate).
Multi-dimensional. Quality is measured across multiple dimensions: correctness (tests), completeness (coverage), robustness (mutation), security (vulnerability scan), performance (load test), reliability (error rate).
Language-agnostic. The quality gate definitions work for any language. The tools change (pytest vs jest vs cargo test), but the gate structure is universal.
Industry-standard. This is how most CI/CD pipelines work. Teams adopting the spec-driven approach can use existing tooling (GitHub Actions, GitLab CI, Jenkins) without custom infrastructure.
Weaknesses
Post-hoc. Every gate fires after the defect is introduced. The pre-commit gate fires after the developer writes code. The commit gate fires after the developer pushes. The pre-deployment gate fires after the branch is ready for merge. At every stage, the fix requires going back.
Coarse granularity. "Code coverage > 80%" doesn't tell you which acceptance criteria are untested. "Tests pass" doesn't tell you which requirements are verified. The gates measure proxies, not the actual question ("is every requirement implemented and tested?").
Gameable. A developer can reach 80% line coverage by testing easy methods thoroughly while leaving hard methods untested. The gate passes, but the important code is unverified.
No structural link. The gate checks that tests exist and pass, but doesn't know which tests cover which requirements. A passing quality gate is a necessary condition for correctness, not a sufficient one.
The Roslyn Analyzer System
The typed approach defines four analyzer families that fire during compilation (for a concrete example of how these analyzer families extend to operational concerns, see Auto-Documentation from a Typed System, Part VII):
REQ1xx: Requirement Coverage
Scans the Requirements project for feature types with abstract AC methods. Scans the Specifications project for [ForRequirement] attributes. Reports features and ACs that have no specification.
error REQ100: UserRolesFeature has 3 acceptance criteria but no ISpec interface
references it via [ForRequirement(typeof(UserRolesFeature))]
error REQ101: UserRolesFeature.RoleChangeTakesEffectImmediately has no matching
spec method with [ForRequirement(typeof(UserRolesFeature),
nameof(UserRolesFeature.RoleChangeTakesEffectImmediately))]
warning REQ102: AssignRoleStory has no specification (acceptable for small stories)
info REQ103: UserRolesFeature — all ACs fully specified ✓error REQ100: UserRolesFeature has 3 acceptance criteria but no ISpec interface
references it via [ForRequirement(typeof(UserRolesFeature))]
error REQ101: UserRolesFeature.RoleChangeTakesEffectImmediately has no matching
spec method with [ForRequirement(typeof(UserRolesFeature),
nameof(UserRolesFeature.RoleChangeTakesEffectImmediately))]
warning REQ102: AssignRoleStory has no specification (acceptable for small stories)
info REQ103: UserRolesFeature — all ACs fully specified ✓REQ2xx: Specification Implementation
Scans the Specifications project for interfaces with [ForRequirement]. Scans the Domain project for implementing classes. Reports unimplemented specifications.
error REQ200: IUserRolesSpec is not implemented by any class in MyApp.Domain
warning REQ201: AuthorizationService implements IUserRolesSpec but is missing
[ForRequirement(typeof(UserRolesFeature))] on the class
warning REQ202: AuthorizationService.VerifyImmediateRoleEffect implements
IUserRolesSpec but is missing method-level [ForRequirement]
info REQ203: IUserRolesSpec — fully implemented ✓error REQ200: IUserRolesSpec is not implemented by any class in MyApp.Domain
warning REQ201: AuthorizationService implements IUserRolesSpec but is missing
[ForRequirement(typeof(UserRolesFeature))] on the class
warning REQ202: AuthorizationService.VerifyImmediateRoleEffect implements
IUserRolesSpec but is missing method-level [ForRequirement]
info REQ203: IUserRolesSpec — fully implemented ✓REQ3xx: Test Coverage
Scans the Tests project for [TestsFor] and [Verifies] attributes. Cross-references with feature types. Reports untested features and ACs.
error REQ300: JwtRefreshStory has 2 ACs but no test class with [TestsFor]
warning REQ301: UserRolesFeature.ViewerHasReadOnlyAccess has no test with [Verifies]
warning REQ302: OldTests.StaleTest references nameof(UserRolesFeature.DeletedAC)
which no longer exists
info REQ303: UserRolesFeature — all ACs fully tested ✓error REQ300: JwtRefreshStory has 2 ACs but no test class with [TestsFor]
warning REQ301: UserRolesFeature.ViewerHasReadOnlyAccess has no test with [Verifies]
warning REQ302: OldTests.StaleTest references nameof(UserRolesFeature.DeletedAC)
which no longer exists
info REQ303: UserRolesFeature — all ACs fully tested ✓REQ4xx: Quality Gates (Post-Test)
Integrates with MSBuild to validate test execution results:
error REQ400: UserRolesFeature test pass rate is 87% (minimum: 100%)
warning REQ401: PasswordResetFeature AC coverage is 75% (threshold: 80%)
warning REQ402: OrderProcessingTests.LargeOrderTest took 12.3s (budget: 5s)
info REQ403: UserRolesFeature — all quality gates passed ✓error REQ400: UserRolesFeature test pass rate is 87% (minimum: 100%)
warning REQ401: PasswordResetFeature AC coverage is 75% (threshold: 80%)
warning REQ402: OrderProcessingTests.LargeOrderTest took 12.3s (budget: 5s)
info REQ403: UserRolesFeature — all quality gates passed ✓Strengths
Pre-hoc. REQ1xx, REQ2xx, and REQ3xx fire during compilation — before any code runs. The developer sees the diagnostic in the IDE, in real time, as they type. The fix is immediate: add the missing spec, implement the method, write the test.
Specific. Each diagnostic names the exact feature, the exact AC, and the exact action needed. Not "coverage is low" but "this specific method on this specific feature has no test."
Ungameable. You can't satisfy REQ301 by testing other ACs. The diagnostic is per-AC. Either
ViewerHasReadOnlyAccesshas a[Verifies]test, or it doesn't. No amount of testing other methods makes this diagnostic go away.Configurable severity. Each diagnostic can be configured via
.editorconfig:
# Strict mode: all diagnostics are errors
dotnet_diagnostic.REQ100.severity = error
dotnet_diagnostic.REQ101.severity = error
dotnet_diagnostic.REQ200.severity = error
dotnet_diagnostic.REQ300.severity = error
dotnet_diagnostic.REQ301.severity = error
# Relaxed mode: some diagnostics are warnings
dotnet_diagnostic.REQ102.severity = suggestion
dotnet_diagnostic.REQ202.severity = suggestion# Strict mode: all diagnostics are errors
dotnet_diagnostic.REQ100.severity = error
dotnet_diagnostic.REQ101.severity = error
dotnet_diagnostic.REQ200.severity = error
dotnet_diagnostic.REQ300.severity = error
dotnet_diagnostic.REQ301.severity = error
# Relaxed mode: some diagnostics are warnings
dotnet_diagnostic.REQ102.severity = suggestion
dotnet_diagnostic.REQ202.severity = suggestion- IDE integration. Diagnostics appear as squiggly underlines in the IDE. Hover shows the message. Ctrl+. offers code fixes. The experience is identical to built-in C# diagnostics — no separate tool to run.
Weaknesses
C# only. Roslyn analyzers are a .NET technology. If your codebase includes Python microservices, Go workers, or React frontends, those components can't benefit from REQ1xx-REQ4xx.
Requirement coverage only. The analyzers track requirement-to-test linkage, but they don't measure line coverage, branch coverage, mutation scores, or performance. You still need traditional coverage tools for those dimensions.
Setup cost. Writing Roslyn analyzers is non-trivial. Each analyzer family requires understanding the Roslyn API, syntax trees, semantic models, and diagnostic reporting. This is a significant investment.
False confidence — solvable. A
[Verifies]test that passes but tests the wrong thing satisfies the basic REQ3xx analyzer. But this weakness has a three-layer solution: (1) make ACs executable static methods that tests must call directly, (2) add a REQ305 analyzer that checks the test body invokes the referenced AC method, (3) use mutation testing via[MutationTarget]to verify the test actually kills mutants in the AC. See Part VIII: The AI Agent Experience for the full treatment. The key insight: the AC is not just a name — it's an executable method. The test must call it. The analyzer verifies the call. Mutation testing verifies the assertion.
The Validation Timeline Compared
Let's trace a defect through both systems:
Defect: Missing implementation for an acceptance criterion
Spec-driven:
Day 1: PRD updated with new AC
Day 2: AI agent reads PRD, generates code — misses the new AC
Day 3: Developer reviews generated code — doesn't notice the gap
Day 4: Tests written for existing ACs — new AC not tested
Day 5: CI pipeline runs — all tests pass, coverage at 82%
Day 5: Quality gate passes — nothing flags the missing AC
Day 15: QA manually tests the feature — discovers the gap
Day 16: Developer implements the missing AC
Day 17: New tests written and mergedDay 1: PRD updated with new AC
Day 2: AI agent reads PRD, generates code — misses the new AC
Day 3: Developer reviews generated code — doesn't notice the gap
Day 4: Tests written for existing ACs — new AC not tested
Day 5: CI pipeline runs — all tests pass, coverage at 82%
Day 5: Quality gate passes — nothing flags the missing AC
Day 15: QA manually tests the feature — discovers the gap
Day 16: Developer implements the missing AC
Day 17: New tests written and mergedTotal time from defect introduction to fix: 15 days.
Typed specifications:
Minute 0: Developer adds new AC to feature record
Minute 0: Compiler fires REQ101 — "no spec method for this AC"
Minute 5: Developer adds spec method to interface
Minute 5: Compiler fires CS0535 — "class doesn't implement interface method"
Minute 15: Developer implements the method
Minute 15: Compiler fires REQ301 — "no test for this AC"
Minute 30: Developer writes test
Minute 30: Build succeeds — all diagnostics clearMinute 0: Developer adds new AC to feature record
Minute 0: Compiler fires REQ101 — "no spec method for this AC"
Minute 5: Developer adds spec method to interface
Minute 5: Compiler fires CS0535 — "class doesn't implement interface method"
Minute 15: Developer implements the method
Minute 15: Compiler fires REQ301 — "no test for this AC"
Minute 30: Developer writes test
Minute 30: Build succeeds — all diagnostics clearTotal time from defect introduction to fix: 30 minutes.
The difference is not incremental. It's categorical. The defect never existed in the typed approach — the compiler prevented it from being introduced.
Can They Coexist?
Yes. And arguably they should.
The spec-driven quality gates cover dimensions that Roslyn analyzers don't: line coverage, mutation testing, load testing, security scanning, performance budgets. The Roslyn analyzers cover the dimension that quality gates don't: per-AC requirement coverage.
A combined validation pipeline:
Stage 1: Compile-Time (Roslyn Analyzers)
├── REQ1xx: Every requirement has a specification
├── REQ2xx: Every specification has an implementation
└── REQ3xx: Every AC has a test
Stage 2: Test Execution
├── All tests pass
└── REQ4xx: Pass rate and duration budgets
Stage 3: Post-Test Quality Gates (Spec-Driven)
├── Line coverage > 80%
├── Branch coverage > 75%
├── Mutation score > 80%
├── Security scan clean
└── Performance within SLA
Stage 4: Pre-Deployment
├── E2E tests pass
└── Load test within budgetStage 1: Compile-Time (Roslyn Analyzers)
├── REQ1xx: Every requirement has a specification
├── REQ2xx: Every specification has an implementation
└── REQ3xx: Every AC has a test
Stage 2: Test Execution
├── All tests pass
└── REQ4xx: Pass rate and duration budgets
Stage 3: Post-Test Quality Gates (Spec-Driven)
├── Line coverage > 80%
├── Branch coverage > 75%
├── Mutation score > 80%
├── Security scan clean
└── Performance within SLA
Stage 4: Pre-Deployment
├── E2E tests pass
└── Load test within budgetThis gives you the best of both: compile-time requirement enforcement AND post-test multi-dimensional quality measurement. The analyzers prevent structural defects (missing implementations, missing tests). The quality gates catch qualitative defects (weak tests, poor coverage, slow performance).
The Severity Configuration Question
Both approaches allow configuring validation strictness. The spec-driven approach does it through threshold values:
coverage_threshold: 80 # Adjustable per project
mutation_score_minimum: 80 # Adjustable per team
flakiness_rate_maximum: 0.01coverage_threshold: 80 # Adjustable per project
mutation_score_minimum: 80 # Adjustable per team
flakiness_rate_maximum: 0.01The typed approach does it through .editorconfig severity levels:
# Per-project severity configuration
[*.cs]
dotnet_diagnostic.REQ100.severity = error # Missing spec: fail build
dotnet_diagnostic.REQ301.severity = warning # Missing test: warn only
dotnet_diagnostic.REQ102.severity = suggestion # Small story no spec: hint# Per-project severity configuration
[*.cs]
dotnet_diagnostic.REQ100.severity = error # Missing spec: fail build
dotnet_diagnostic.REQ301.severity = warning # Missing test: warn only
dotnet_diagnostic.REQ102.severity = suggestion # Small story no spec: hintThe key difference: spec-driven configures thresholds (continuous values), typed specifications configure severities (discrete levels). A threshold says "80% is enough." A severity says "this diagnostic is an error / warning / suggestion / none." There's no "80% of ACs need specs" in the typed approach — either an AC has a spec (diagnostic clear) or it doesn't (diagnostic fires).
This reflects a deeper philosophical difference. Spec-driven says: "some imperfection is acceptable — 80% coverage is fine." Typed specifications say: "every individual gap is reported — you decide which ones to fix by configuring severity."
The Auto-Fix Difference
The spec-driven Testing-as-Code specification defines "Auto Fix Options" for each practice:
DEFINE_PRACTICE(test_organization)
Violation Patterns:
- unclear_test_names
- mixed_responsibilities
- poor_grouping
Auto Fix Options:
- RenameTests
- GroupByFunction
- ClarifyPurposeDEFINE_PRACTICE(test_organization)
Violation Patterns:
- unclear_test_names
- mixed_responsibilities
- poor_grouping
Auto Fix Options:
- RenameTests
- GroupByFunction
- ClarifyPurposeThese are aspirational: the spec describes what an auto-fix would do, but the framework doesn't implement them. They're suggestions for tooling that could be built.
The typed approach provides Roslyn code fixes — actual IDE integration that modifies code:
// Analyzer detects: REQ301 on UserRolesFeature.AdminCanRevokeRoles
// Code fix offers: "Generate test stub for AdminCanRevokeRoles"
// Clicking the code fix inserts:
[Verifies(typeof(UserRolesFeature), nameof(UserRolesFeature.AdminCanRevokeRoles))]
public void Admin_can_revoke_roles()
{
// TODO: implement test
throw new NotImplementedException();
}// Analyzer detects: REQ301 on UserRolesFeature.AdminCanRevokeRoles
// Code fix offers: "Generate test stub for AdminCanRevokeRoles"
// Clicking the code fix inserts:
[Verifies(typeof(UserRolesFeature), nameof(UserRolesFeature.AdminCanRevokeRoles))]
public void Admin_can_revoke_roles()
{
// TODO: implement test
throw new NotImplementedException();
}The code fix is real, executable, and integrated into the IDE. It's not a suggestion in a document — it's a button the developer clicks. This reduces the friction between "diagnostic fires" and "fix applied" to a single keystroke.
Summary
| Dimension | Spec-Driven Quality Gates | Roslyn Analyzers |
|---|---|---|
| Timing | Post-hoc (after code generation, after tests) | Pre-hoc (during compilation) |
| Granularity | Threshold-based (80% coverage) | Per-diagnostic (specific AC, specific feature) |
| Gameability | Gameable (test easy methods to hit threshold) | Not gameable (per-AC enforcement) |
| Dimensions | Multi (line, branch, mutation, security, perf) | Single (requirement coverage) |
| Language | Any (language-agnostic gates) | C# only (Roslyn) |
| Tooling | Standard CI/CD | Custom analyzers + code fixes |
| Auto-fix | Aspirational (described, not implemented) | Real (IDE code fix actions) |
| Configuration | Thresholds (continuous) | Severities (discrete) |
| Best for | Multi-dimensional quality measurement | Requirement-to-test enforcement |
| Cost | Low (standard tooling) | High (custom analyzer development) |
Part VII examines how each approach handles documentation — another domain where the philosophical gap produces visible consequences.
Compilation as Continuous Integration
Here's a provocative framing: in a typed specification system, every keystroke is a CI run.
The traditional CI model works like this: a developer writes code locally, pushes to a remote branch, CI picks up the change, runs builds, runs tests, runs quality gates, and reports back in 5-30 minutes. The developer context-switches to another task. When CI fails, they must reload the mental context of the original change.
The Roslyn analyzer model works differently. The compiler runs continuously in the IDE. Every time you type a character, the analyzer re-evaluates. Diagnostics appear in real time — red squiggles, warning icons, info messages. There is no push, no wait, no context switch.
The Timeline Comparison
Consider a developer adding a new acceptance criterion to OrderProcessingFeature:
Spec-Driven Timeline (with CI quality gates):
0:00 Developer adds AC text to PRD document
0:02 Developer saves file
0:03 Developer starts writing implementation code
0:25 Developer writes tests
0:35 Developer pushes to branch
0:36 CI pipeline starts
0:37 ├── Checkout + restore packages (30s)
0:38 ├── Build (45s)
0:39 ├── Unit tests (60s)
0:40 ├── Integration tests (120s)
0:42 ├── Coverage analysis (30s)
0:43 ├── Quality gate evaluation (15s)
0:43 CI reports: "Coverage dropped to 74% (threshold: 80%)"
└── But WHICH AC is untested? Unknown. The gate says "coverage is low."
0:43 Developer must figure out what's missing
0:50 Developer adds missing test
0:55 Developer pushes again
1:00 CI passes
───────────────────────────────────────────
Total feedback cycle: ~60 minutes
Number of context switches: 2 (push → wait → return)
Specificity of feedback: Low ("coverage is 74%")
Typed Specification Timeline (with Roslyn analyzers):
0:00 Developer adds abstract AC method to feature record
0:00 IDE shows: REQ101 — "no spec method for OrderCanBeCancelledByCustomer"
↑ instant feedback, in the same editor, at the cursor position
0:03 Developer adds spec method to interface
0:03 IDE shows: CS0535 — "OrderService does not implement ICancelSpec.CancelOrder"
0:08 Developer implements the method
0:08 IDE shows: REQ301 — "no test with [Verifies] for OrderCanBeCancelledByCustomer"
0:15 Developer writes test
0:15 IDE shows: all diagnostics clear ✓
0:16 Developer pushes (CI runs for additional validation: mutation, load, security)
───────────────────────────────────────────
Total feedback cycle: ~16 minutes
Number of context switches: 0 (never left the IDE)
Specificity of feedback: Maximum (exact AC, exact action needed)Spec-Driven Timeline (with CI quality gates):
0:00 Developer adds AC text to PRD document
0:02 Developer saves file
0:03 Developer starts writing implementation code
0:25 Developer writes tests
0:35 Developer pushes to branch
0:36 CI pipeline starts
0:37 ├── Checkout + restore packages (30s)
0:38 ├── Build (45s)
0:39 ├── Unit tests (60s)
0:40 ├── Integration tests (120s)
0:42 ├── Coverage analysis (30s)
0:43 ├── Quality gate evaluation (15s)
0:43 CI reports: "Coverage dropped to 74% (threshold: 80%)"
└── But WHICH AC is untested? Unknown. The gate says "coverage is low."
0:43 Developer must figure out what's missing
0:50 Developer adds missing test
0:55 Developer pushes again
1:00 CI passes
───────────────────────────────────────────
Total feedback cycle: ~60 minutes
Number of context switches: 2 (push → wait → return)
Specificity of feedback: Low ("coverage is 74%")
Typed Specification Timeline (with Roslyn analyzers):
0:00 Developer adds abstract AC method to feature record
0:00 IDE shows: REQ101 — "no spec method for OrderCanBeCancelledByCustomer"
↑ instant feedback, in the same editor, at the cursor position
0:03 Developer adds spec method to interface
0:03 IDE shows: CS0535 — "OrderService does not implement ICancelSpec.CancelOrder"
0:08 Developer implements the method
0:08 IDE shows: REQ301 — "no test with [Verifies] for OrderCanBeCancelledByCustomer"
0:15 Developer writes test
0:15 IDE shows: all diagnostics clear ✓
0:16 Developer pushes (CI runs for additional validation: mutation, load, security)
───────────────────────────────────────────
Total feedback cycle: ~16 minutes
Number of context switches: 0 (never left the IDE)
Specificity of feedback: Maximum (exact AC, exact action needed)The time difference (60 minutes vs 16 minutes) is significant, but the real difference is cognitive load. The spec-driven developer must maintain a mental model of what they've done, push, wait, interpret a coarse report, map the report back to specific changes, and fix. The typed specification developer never leaves the editor — the compiler guides them through the chain step by step.
Validation Frequency
Validation events per hour:
Spec-Driven (CI quality gates):
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░█
0 1
(one validation per push, ~1-2 pushes per hour)
Typed Specifications (Roslyn analyzers):
█████████████████████████████████████████████████████
0 100 200 300 ~360+
(one validation per keystroke, ~6 per second while typing)Validation events per hour:
Spec-Driven (CI quality gates):
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░█
0 1
(one validation per push, ~1-2 pushes per hour)
Typed Specifications (Roslyn analyzers):
█████████████████████████████████████████████████████
0 100 200 300 ~360+
(one validation per keystroke, ~6 per second while typing)This isn't an incremental improvement in validation frequency. It's a categorical difference. The spec-driven approach validates 1-2 times per hour. The typed approach validates hundreds of times per hour. Every single one of those validations catches the same classes of defects — missing specs, missing implementations, missing tests, stale references.
The "Shift Left" Taken to Its Logical Conclusion
The software industry talks about "shift left" — catch defects earlier. CI was a shift left from manual QA. Pre-commit hooks were a shift left from CI. The Roslyn analyzer approach shifts all the way left: to the moment the developer types the code.
There is no earlier point at which validation can happen. The code doesn't exist before the developer types it. The instant it exists — even as an incomplete line in the editor — the analyzer evaluates it. This is the theoretical limit of shift-left.
Shift-Left Progression:
Manual QA ──→ CI Pipeline ──→ Pre-Commit Hook ──→ IDE Analyzer
(days) (minutes) (seconds) (milliseconds)
↑
You are here
(typed specs)Shift-Left Progression:
Manual QA ──→ CI Pipeline ──→ Pre-Commit Hook ──→ IDE Analyzer
(days) (minutes) (seconds) (milliseconds)
↑
You are here
(typed specs)When CI Still Matters
This doesn't make CI obsolete. Roslyn analyzers check structural correctness — the requirement chain. CI quality gates check dimensions that analyzers cannot:
- Mutation testing — does the test actually verify the behavior, or does it pass trivially?
- Load testing — does the endpoint handle 1,000 concurrent requests?
- Security scanning — are there known vulnerabilities in dependencies?
- Integration testing — do the services actually communicate correctly?
- Performance budgets — does the p95 response time stay under 200ms?
The typed approach handles Stage 1 (structural correctness) in the IDE. CI handles Stages 2-4 (behavioral, performance, security). Both are necessary. But the structural defects — the ones that account for the majority of "why didn't we catch this sooner?" moments — are eliminated before the developer even saves the file.
Validation Depth: What Each Approach Can and Cannot Catch
Not all defects are created equal. Some are structural (wrong types, missing methods). Some are semantic (wrong logic, incorrect behavior). Some are operational (performance degradation, resource leaks). Each approach catches different defect types at different stages.
The Comprehensive Defect Matrix
| Defect Category | Specific Defect | Spec-Driven (When?) | Typed Specs (When?) |
|---|---|---|---|
| Structural | |||
| Missing implementation for AC | CI quality gate (post-test) | Compile time (REQ101) | |
| Missing test for AC | CI coverage check (post-test) | Compile time (REQ301) | |
| Stale test referencing deleted AC | Never (test still passes) | Compile time (nameof fails) | |
| Wrong method signature on spec | Never (no contract) | Compile time (CS0535) | |
| Feature with no parent epic | Never (no hierarchy check) | Compile time (generic constraint) | |
| Orphan implementation (no requirement) | Never (no link) | Traceability report (build time) | |
| Inconsistent cross-service types | Never (string-based IDs) | Compile time (shared value types) | |
| Semantic | |||
| Implementation logic is incorrect | Test failure (CI) | Test failure (CI) | |
| AC implemented but behavior is wrong | Test failure or QA | Test failure or QA | |
| Edge case not covered by tests | Mutation testing (CI) | Mutation testing (CI) | |
| Race condition in concurrent code | Load test or production | Load test or production | |
| Business rule interpreted incorrectly | QA or production | QA or production | |
| Performance | |||
| Endpoint exceeds latency budget | Load test (CI) | Load test (CI) | |
| Memory leak under sustained load | Load test (CI) | Load test (CI) | |
| N+1 query pattern | Code review or APM | Code review or APM | |
| Missing database index | Load test or production | Load test or production | |
| Connection pool exhaustion | Load test or production | Load test or production | |
| Security | |||
| SQL injection vulnerability | SAST scan (CI) | SAST scan (CI) | |
| Missing authorization check | Security test (CI) | Security test (CI) + analyzer* | |
| Exposed sensitive data in logs | Code review or SAST | Code review or SAST | |
| Dependency with known CVE | Dependency scan (CI) | Dependency scan (CI) | |
| Missing input validation | Test or penetration test | Test or penetration test | |
| Operational | |||
| Missing health check endpoint | Deployment test | Deployment test | |
| Incorrect connection string | Integration test (CI) | Integration test (CI) | |
| Missing retry policy on HTTP calls | Code review | Analyzer* (if custom rule exists) | |
| Log level too verbose for production | Code review | Analyzer* (if custom rule exists) | |
| Missing circuit breaker | Code review | Code review |
*Asterisked items indicate that custom Roslyn analyzers CAN catch these defects, but they require additional investment beyond the Requirements DSL. The spec-driven approach describes these checks in documents; the typed approach can implement them as analyzers — but the implementation is additional work.
Reading the Matrix
Three patterns emerge:
Pattern 1: Structural defects are the typed approach's stronghold. Every structural defect in the table is caught at compile time by the typed approach and caught late (or never) by the spec-driven approach. This is the core value proposition of typed specifications: the defect category that causes the most rework is eliminated at the earliest possible moment.
Pattern 2: Semantic, performance, and security defects are caught identically. Both approaches rely on the same tools (tests, load tests, SAST scanners, code review) for non-structural defects. The typed approach doesn't help you find N+1 queries or SQL injection — those are runtime concerns that no type system catches.
Pattern 3: The spec-driven approach describes all defect categories; the typed approach catches one category. The Testing-as-Code specification has sections on mutation testing, chaos engineering, fuzz testing, and security scanning. The typed approach enforces requirement coverage. For a team that needs guidance on WHAT to test, the spec-driven approach is more helpful. For a team that needs enforcement on WHETHER they tested, the typed approach is more helpful.
The "Never" Column
The most striking entries in the matrix are the ones where the spec-driven approach says "Never": stale tests, orphan implementations, inconsistent cross-service types. These are defects that the spec-driven approach cannot catch because the connection between documents and code is not structural. No CI quality gate can detect that a test references a deleted AC — because the quality gate doesn't know which tests correspond to which ACs. It knows coverage percentages, not coverage targets.
The typed approach catches these at compile time — not because of some clever trick, but because nameof(DeletedAC) is a compile error when DeletedAC no longer exists. The defect literally cannot be introduced. It's not caught — it's prevented.
The Pragmatic View
A team adopting the typed approach should not abandon CI quality gates. The correct strategy is layered:
Layer 1: IDE (Roslyn Analyzers) → Structural defects → Milliseconds
Layer 2: Pre-commit (fast tests) → Basic semantic defects → Seconds
Layer 3: CI (full test suite) → Semantic + edge cases → Minutes
Layer 4: CI (mutation testing) → Test quality → Minutes
Layer 5: CI (security scanning) → Security defects → Minutes
Layer 6: CI (load testing) → Performance defects → Minutes
Layer 7: Staging (E2E + smoke) → Integration defects → Hours
Layer 8: Production (monitoring) → Operational defects → OngoingLayer 1: IDE (Roslyn Analyzers) → Structural defects → Milliseconds
Layer 2: Pre-commit (fast tests) → Basic semantic defects → Seconds
Layer 3: CI (full test suite) → Semantic + edge cases → Minutes
Layer 4: CI (mutation testing) → Test quality → Minutes
Layer 5: CI (security scanning) → Security defects → Minutes
Layer 6: CI (load testing) → Performance defects → Minutes
Layer 7: Staging (E2E + smoke) → Integration defects → Hours
Layer 8: Production (monitoring) → Operational defects → OngoingThe typed approach replaces Layer 1 with something radically better than what existed before (nothing, or linting). It doesn't replace Layers 2-8. The spec-driven approach describes all eight layers in documents — valuable guidance. But describing a layer and implementing a layer are different things. The typed approach implements Layer 1; the spec-driven approach describes all layers but implements none.
But here's the trajectory: every layer can become a typed DSL. Layer 4 (mutation testing) can be a [MutationTarget] attribute. Layer 5 (security scanning) can be a [SecurityPolicy] attribute. Layer 6 (load testing) can be a [LoadTest] attribute. Layer 7 (E2E) can be a [UserJourney] attribute. Layer 8 (monitoring) can be an [Alert] attribute.
When fully realized, the eight layers look like this:
Layer 1: IDE (Roslyn Analyzers) → [ForRequirement] + REQ1xx-REQ4xx → Typed ✓
Layer 2: Pre-commit (fast tests) → [Verifies] + [TestsFor] → Typed ✓
Layer 3: CI (full test suite) → [Verifies] + test runner → Typed ✓
Layer 4: CI (mutation testing) → [MutationTarget(typeof(Feature))] → Typed ✓
Layer 5: CI (security scanning) → [SecurityPolicy(typeof(Endpoint))] → Typed ✓
Layer 6: CI (load testing) → [LoadTest] + [PerformanceBudget] → Typed ✓
Layer 7: Staging (E2E + smoke) → [UserJourney(typeof(Feature))] → Typed ✓
Layer 8: Production (monitoring) → [Alert] + [HealthCheck] → Typed ✓Layer 1: IDE (Roslyn Analyzers) → [ForRequirement] + REQ1xx-REQ4xx → Typed ✓
Layer 2: Pre-commit (fast tests) → [Verifies] + [TestsFor] → Typed ✓
Layer 3: CI (full test suite) → [Verifies] + test runner → Typed ✓
Layer 4: CI (mutation testing) → [MutationTarget(typeof(Feature))] → Typed ✓
Layer 5: CI (security scanning) → [SecurityPolicy(typeof(Endpoint))] → Typed ✓
Layer 6: CI (load testing) → [LoadTest] + [PerformanceBudget] → Typed ✓
Layer 7: Staging (E2E + smoke) → [UserJourney(typeof(Feature))] → Typed ✓
Layer 8: Production (monitoring) → [Alert] + [HealthCheck] → Typed ✓The spec-driven approach describes all eight layers in text. The typed approach can implement all eight layers in the compiler. Not today — not all these DSLs exist yet. But the architecture supports it. Each DSL is a [MetaConcept] that self-registers in the M3 metamodel. The framework is extensible by design.
This is the deeper argument: the typed approach is not limited to the requirement chain. The requirement chain is where it started, but the M3 meta-metamodel is a general-purpose framework for turning any domain concern into a compiler-enforced DSL. Requirements, operations, testing, security, performance — they're all domains. They're all candidates for DSLs. They all benefit from the same properties: compile-time validation, source generation, IDE integration, drift prevention.
The spec-driven approach says "here are eight important concerns, described in documents." The typed approach says "here are eight important concerns, each expressible as a compiler-enforced DSL." One is a description. The other is a design principle.
The Enforcement Asymmetry
There's a fundamental asymmetry between the two approaches that becomes clear when you look at enforcement holistically:
Spec-driven enforcement is additive. Every new concern requires adding a new enforcement mechanism: a new CI check, a new quality gate, a new coverage tool, a new linting rule. The enforcement surface area grows linearly with the number of concerns. Each mechanism is independent — the coverage tool doesn't know about the security scanner, which doesn't know about the load test, which doesn't know about the mutation test.
Spec-driven enforcement:
Concern 1: Feature coverage → CI tool A (coverage.py/coverlet)
Concern 2: Code quality → CI tool B (sonarqube/codeclimate)
Concern 3: Security → CI tool C (snyk/semgrep)
Concern 4: Performance → CI tool D (k6/locust)
Concern 5: Mutation testing → CI tool E (stryker/mutmut)
Concern 6: Documentation → CI tool F (custom script)
Concern 7: Architecture → CI tool G (netarchtest/archunit)
Concern 8: Dependency health → CI tool H (dependabot/renovate)
Total: 8 independent tools, 8 configurations, 8 maintenance burdens
Cross-concern validation: None (tool A doesn't know about tool C)Spec-driven enforcement:
Concern 1: Feature coverage → CI tool A (coverage.py/coverlet)
Concern 2: Code quality → CI tool B (sonarqube/codeclimate)
Concern 3: Security → CI tool C (snyk/semgrep)
Concern 4: Performance → CI tool D (k6/locust)
Concern 5: Mutation testing → CI tool E (stryker/mutmut)
Concern 6: Documentation → CI tool F (custom script)
Concern 7: Architecture → CI tool G (netarchtest/archunit)
Concern 8: Dependency health → CI tool H (dependabot/renovate)
Total: 8 independent tools, 8 configurations, 8 maintenance burdens
Cross-concern validation: None (tool A doesn't know about tool C)Typed specification enforcement is structural. Every concern is a DSL — an attribute set processed by a source generator and validated by an analyzer. All DSLs share the M3 metamodel, the same five-stage pipeline, and the same IDE integration. Adding a new concern means adding one attribute library and one generator — not a new CI tool with its own configuration format, its own output format, and its own failure modes.
Typed specification enforcement:
Concern 1: Feature coverage → [ForRequirement] + REQ1xx analyzer
Concern 2: Code quality → [Invariant] + [Validated] + analyzers
Concern 3: Security → [SecurityPolicy] + SEC1xx analyzer
Concern 4: Performance → [PerformanceBudget] + PERF1xx analyzer
Concern 5: Mutation testing → [MutationTarget] + MUT1xx analyzer
Concern 6: Documentation → Type system IS documentation
Concern 7: Architecture → [Layer] + ARCH1xx analyzer
Concern 8: Dependency health → [Dependency] + DEP1xx analyzer
Total: 1 framework (M3 + Roslyn), N attribute sets, N analyzers
Cross-concern validation: Built-in (all concerns share the type system)Typed specification enforcement:
Concern 1: Feature coverage → [ForRequirement] + REQ1xx analyzer
Concern 2: Code quality → [Invariant] + [Validated] + analyzers
Concern 3: Security → [SecurityPolicy] + SEC1xx analyzer
Concern 4: Performance → [PerformanceBudget] + PERF1xx analyzer
Concern 5: Mutation testing → [MutationTarget] + MUT1xx analyzer
Concern 6: Documentation → Type system IS documentation
Concern 7: Architecture → [Layer] + ARCH1xx analyzer
Concern 8: Dependency health → [Dependency] + DEP1xx analyzer
Total: 1 framework (M3 + Roslyn), N attribute sets, N analyzers
Cross-concern validation: Built-in (all concerns share the type system)The cross-concern validation is the key. In the typed approach, a [PerformanceBudget] can reference a Feature via typeof(). A [SecurityPolicy] can reference an endpoint that's annotated with [ForRequirement]. A [LoadTest] can reference the same Feature that the [Verifies] tests reference. All concerns are connected through the type system.
In the spec-driven approach, the performance-targets document and the testing document and the security document are separate text files. The connection between "this endpoint needs < 200ms response time" and "this endpoint implements Feature X" and "this endpoint must pass security scan Y" exists only in the reader's head. No tool validates the cross-references.
The Cost Implication
This asymmetry has a cost implication at scale:
Enforcement cost as concerns grow:
Cost │ Spec-Driven (linear)
│ /
│ /
│ / ← Each new concern = new CI tool + config + maintenance
│ /
│ /
│ / Typed Specifications (sublinear)
│ / ──────────────────────
│ / ──/─── ← Each new concern = attribute set + generator
│ /──/ (reuses existing M3 framework)
│/─/
├──────────────────────────────────
2 4 6 8 10 12 concernsEnforcement cost as concerns grow:
Cost │ Spec-Driven (linear)
│ /
│ /
│ / ← Each new concern = new CI tool + config + maintenance
│ /
│ /
│ / Typed Specifications (sublinear)
│ / ──────────────────────
│ / ──/─── ← Each new concern = attribute set + generator
│ /──/ (reuses existing M3 framework)
│/─/
├──────────────────────────────────
2 4 6 8 10 12 concernsThe spec-driven approach pays a roughly linear cost per concern: each new CI tool has its own learning curve, configuration format, and maintenance burden. The typed approach pays a sublinear cost: the first DSL is expensive (build the M3 framework), but each subsequent DSL reuses the same infrastructure. By the 5th or 6th DSL, the marginal cost of adding a new typed concern is a fraction of adding a new CI tool.
This is why the typed approach's initial investment — which looks expensive on Day 1 — pays off over time. It's not just about the requirement chain. It's about building a platform for compiler-enforced concerns that scales sublinearly.
The Validation Unification
The spec-driven approach validates eight concerns with eight separate tools. Each tool has its own configuration format, its own output format, its own failure modes, and its own integration requirements. Let's count what a typical CI pipeline looks like:
Spec-Driven Validation Pipeline (8 tools, 8 configs, 8 outputs):
Tool 1: coverlet → coverage.cobertura.xml → config: coverlet.runsettings
Tool 2: sonarqube → sonar-report.json → config: sonar-project.properties
Tool 3: snyk → snyk-report.json → config: .snyk
Tool 4: k6 → k6-results.json → config: load-test.js
Tool 5: stryker → mutation-report.html → config: stryker-config.json
Tool 6: custom script → doc-coverage.txt → config: doc-check.yaml
Tool 7: netarchtest → arch-test-results.xml → config: ArchTests.cs
Tool 8: dependabot → dependabot-alerts.json → config: .github/dependabot.yml
Total configuration files: 8
Total output formats: 5 (XML, JSON, HTML, TXT, YAML)
Total dashboards needed: 8 (or 1 aggregator like SonarQube that partially unifies)
Total failure modes: 8 independent (tool 3 can fail while tools 1-2, 4-8 succeed)
Cross-tool validation: NoneSpec-Driven Validation Pipeline (8 tools, 8 configs, 8 outputs):
Tool 1: coverlet → coverage.cobertura.xml → config: coverlet.runsettings
Tool 2: sonarqube → sonar-report.json → config: sonar-project.properties
Tool 3: snyk → snyk-report.json → config: .snyk
Tool 4: k6 → k6-results.json → config: load-test.js
Tool 5: stryker → mutation-report.html → config: stryker-config.json
Tool 6: custom script → doc-coverage.txt → config: doc-check.yaml
Tool 7: netarchtest → arch-test-results.xml → config: ArchTests.cs
Tool 8: dependabot → dependabot-alerts.json → config: .github/dependabot.yml
Total configuration files: 8
Total output formats: 5 (XML, JSON, HTML, TXT, YAML)
Total dashboards needed: 8 (or 1 aggregator like SonarQube that partially unifies)
Total failure modes: 8 independent (tool 3 can fail while tools 1-2, 4-8 succeed)
Cross-tool validation: NoneEach tool is an island. The coverage tool doesn't know which features are security-critical. The security scanner doesn't know which endpoints have performance budgets. The architecture enforcement tool doesn't know which layers contain features with compliance requirements. The mutation testing tool doesn't know which features have acceptance criteria.
The typed approach unifies all validation concerns into a single analyzer framework. Every concern is a Roslyn analyzer family with a consistent diagnostic ID scheme. (The Auto-Documentation from a Typed System series demonstrates this unification across five operational sub-DSLs, all sharing the same analyzer infrastructure.)
The Unified Analyzer Taxonomy
REQ1xx: Requirement Coverage
REQ100 Feature has no specification interface Error
REQ101 AC has no matching spec method Error
REQ102 Story has no specification (small story) Warning
REQ103 Feature fully specified Info
REQ2xx: Specification Implementation
REQ200 Spec interface has no implementing class Error
REQ201 Implementation missing [ForRequirement] Warning
REQ202 Implementation missing method-level attribute Warning
REQ203 Spec fully implemented Info
REQ3xx: Test Coverage
REQ300 Feature has no [TestsFor] test class Error
REQ301 AC has no [Verifies] test Warning
REQ302 [Verifies] references nonexistent AC (stale) Warning
REQ303 Feature fully tested Info
REQ4xx: Quality Gates
REQ400 Feature test pass rate below minimum Error
REQ401 Feature AC coverage below threshold Warning
REQ402 Test duration exceeds budget Warning
REQ403 Feature passes all quality gates Info
PERF1xx: Performance Budgets
PERF100 Feature has [PerformanceTest] — configured Info
PERF101 Feature has no [PerformanceTest] Warning
PERF102 Endpoint has no matching performance budget Warning
PERF103 P95/P99 threshold parse error Error
PERF104 Performance budget references deleted feature Error
PERF105 Performance test without functional tests Warning
SEC1xx: Security Policies
SEC100 Endpoint missing [Authorize] attribute Error
SEC101 Controller missing CORS policy Warning
SEC102 Action accepts user input without validation Warning
SEC103 Sensitive data type used in API response Error
SEC104 Security policy references deleted endpoint Error
SEC105 All security policies satisfied Info
ARCH1xx: Architecture Constraints
ARCH100 Domain layer references infrastructure Error
ARCH101 API layer references domain directly Warning
ARCH102 Circular dependency between assemblies Error
ARCH103 Feature implementation in wrong layer Warning
ARCH104 Generated code manually modified Error
ARCH105 Architecture constraints satisfied Info
OPS1xx: Operational Readiness
OPS100 Service missing health check endpoint Error
OPS101 Service missing readiness probe Warning
OPS102 Alert definition references unknown metric Error
OPS103 Deployment strategy missing rollback condition Warning
OPS104 Chaos experiment targets deleted service Error
OPS105 All operational readiness checks passed InfoREQ1xx: Requirement Coverage
REQ100 Feature has no specification interface Error
REQ101 AC has no matching spec method Error
REQ102 Story has no specification (small story) Warning
REQ103 Feature fully specified Info
REQ2xx: Specification Implementation
REQ200 Spec interface has no implementing class Error
REQ201 Implementation missing [ForRequirement] Warning
REQ202 Implementation missing method-level attribute Warning
REQ203 Spec fully implemented Info
REQ3xx: Test Coverage
REQ300 Feature has no [TestsFor] test class Error
REQ301 AC has no [Verifies] test Warning
REQ302 [Verifies] references nonexistent AC (stale) Warning
REQ303 Feature fully tested Info
REQ4xx: Quality Gates
REQ400 Feature test pass rate below minimum Error
REQ401 Feature AC coverage below threshold Warning
REQ402 Test duration exceeds budget Warning
REQ403 Feature passes all quality gates Info
PERF1xx: Performance Budgets
PERF100 Feature has [PerformanceTest] — configured Info
PERF101 Feature has no [PerformanceTest] Warning
PERF102 Endpoint has no matching performance budget Warning
PERF103 P95/P99 threshold parse error Error
PERF104 Performance budget references deleted feature Error
PERF105 Performance test without functional tests Warning
SEC1xx: Security Policies
SEC100 Endpoint missing [Authorize] attribute Error
SEC101 Controller missing CORS policy Warning
SEC102 Action accepts user input without validation Warning
SEC103 Sensitive data type used in API response Error
SEC104 Security policy references deleted endpoint Error
SEC105 All security policies satisfied Info
ARCH1xx: Architecture Constraints
ARCH100 Domain layer references infrastructure Error
ARCH101 API layer references domain directly Warning
ARCH102 Circular dependency between assemblies Error
ARCH103 Feature implementation in wrong layer Warning
ARCH104 Generated code manually modified Error
ARCH105 Architecture constraints satisfied Info
OPS1xx: Operational Readiness
OPS100 Service missing health check endpoint Error
OPS101 Service missing readiness probe Warning
OPS102 Alert definition references unknown metric Error
OPS103 Deployment strategy missing rollback condition Warning
OPS104 Chaos experiment targets deleted service Error
OPS105 All operational readiness checks passed InfoThe Unified Configuration
All eight concern families are configured in a single .editorconfig file. No separate configuration files. No separate tools. No separate output formats:
# .editorconfig — ALL validation concerns, one file
[*.cs]
# ── Requirement Coverage ──────────────────────────────────
dotnet_diagnostic.REQ100.severity = error
dotnet_diagnostic.REQ101.severity = error
dotnet_diagnostic.REQ102.severity = suggestion
dotnet_diagnostic.REQ103.severity = none # suppress info for clean output
# ── Specification Implementation ──────────────────────────
dotnet_diagnostic.REQ200.severity = error
dotnet_diagnostic.REQ201.severity = warning
dotnet_diagnostic.REQ202.severity = suggestion
dotnet_diagnostic.REQ203.severity = none
# ── Test Coverage ─────────────────────────────────────────
dotnet_diagnostic.REQ300.severity = error
dotnet_diagnostic.REQ301.severity = warning # allow missing tests during dev
dotnet_diagnostic.REQ302.severity = error # stale tests are always errors
# ── Quality Gates ─────────────────────────────────────────
dotnet_diagnostic.REQ400.severity = error
dotnet_diagnostic.REQ401.severity = warning
dotnet_diagnostic.REQ402.severity = warning
# ── Performance Budgets ───────────────────────────────────
dotnet_diagnostic.PERF100.severity = none
dotnet_diagnostic.PERF101.severity = suggestion # not all features need perf tests
dotnet_diagnostic.PERF102.severity = warning
dotnet_diagnostic.PERF103.severity = error
dotnet_diagnostic.PERF104.severity = error
dotnet_diagnostic.PERF105.severity = warning # perf without functional = risky
# ── Security Policies ─────────────────────────────────────
dotnet_diagnostic.SEC100.severity = error # missing auth is always an error
dotnet_diagnostic.SEC101.severity = warning
dotnet_diagnostic.SEC102.severity = warning
dotnet_diagnostic.SEC103.severity = error # leaking PII is always an error
dotnet_diagnostic.SEC104.severity = error
# ── Architecture Constraints ──────────────────────────────
dotnet_diagnostic.ARCH100.severity = error # layer violations break architecture
dotnet_diagnostic.ARCH101.severity = warning
dotnet_diagnostic.ARCH102.severity = error # circular deps are always errors
dotnet_diagnostic.ARCH103.severity = warning
dotnet_diagnostic.ARCH104.severity = error # never hand-edit generated code
# ── Operational Readiness ─────────────────────────────────
dotnet_diagnostic.OPS100.severity = error # no health check = no deploy
dotnet_diagnostic.OPS101.severity = warning
dotnet_diagnostic.OPS102.severity = error
dotnet_diagnostic.OPS103.severity = warning
dotnet_diagnostic.OPS104.severity = error
# ── Per-project overrides ─────────────────────────────────
# In test projects, relax architecture rules
[*Tests/**/*.cs]
dotnet_diagnostic.ARCH100.severity = suggestion
dotnet_diagnostic.ARCH101.severity = none
# In prototype projects, relax everything to warnings
[*Prototype/**/*.cs]
dotnet_diagnostic.REQ100.severity = warning
dotnet_diagnostic.REQ300.severity = warning
dotnet_diagnostic.SEC100.severity = warning# .editorconfig — ALL validation concerns, one file
[*.cs]
# ── Requirement Coverage ──────────────────────────────────
dotnet_diagnostic.REQ100.severity = error
dotnet_diagnostic.REQ101.severity = error
dotnet_diagnostic.REQ102.severity = suggestion
dotnet_diagnostic.REQ103.severity = none # suppress info for clean output
# ── Specification Implementation ──────────────────────────
dotnet_diagnostic.REQ200.severity = error
dotnet_diagnostic.REQ201.severity = warning
dotnet_diagnostic.REQ202.severity = suggestion
dotnet_diagnostic.REQ203.severity = none
# ── Test Coverage ─────────────────────────────────────────
dotnet_diagnostic.REQ300.severity = error
dotnet_diagnostic.REQ301.severity = warning # allow missing tests during dev
dotnet_diagnostic.REQ302.severity = error # stale tests are always errors
# ── Quality Gates ─────────────────────────────────────────
dotnet_diagnostic.REQ400.severity = error
dotnet_diagnostic.REQ401.severity = warning
dotnet_diagnostic.REQ402.severity = warning
# ── Performance Budgets ───────────────────────────────────
dotnet_diagnostic.PERF100.severity = none
dotnet_diagnostic.PERF101.severity = suggestion # not all features need perf tests
dotnet_diagnostic.PERF102.severity = warning
dotnet_diagnostic.PERF103.severity = error
dotnet_diagnostic.PERF104.severity = error
dotnet_diagnostic.PERF105.severity = warning # perf without functional = risky
# ── Security Policies ─────────────────────────────────────
dotnet_diagnostic.SEC100.severity = error # missing auth is always an error
dotnet_diagnostic.SEC101.severity = warning
dotnet_diagnostic.SEC102.severity = warning
dotnet_diagnostic.SEC103.severity = error # leaking PII is always an error
dotnet_diagnostic.SEC104.severity = error
# ── Architecture Constraints ──────────────────────────────
dotnet_diagnostic.ARCH100.severity = error # layer violations break architecture
dotnet_diagnostic.ARCH101.severity = warning
dotnet_diagnostic.ARCH102.severity = error # circular deps are always errors
dotnet_diagnostic.ARCH103.severity = warning
dotnet_diagnostic.ARCH104.severity = error # never hand-edit generated code
# ── Operational Readiness ─────────────────────────────────
dotnet_diagnostic.OPS100.severity = error # no health check = no deploy
dotnet_diagnostic.OPS101.severity = warning
dotnet_diagnostic.OPS102.severity = error
dotnet_diagnostic.OPS103.severity = warning
dotnet_diagnostic.OPS104.severity = error
# ── Per-project overrides ─────────────────────────────────
# In test projects, relax architecture rules
[*Tests/**/*.cs]
dotnet_diagnostic.ARCH100.severity = suggestion
dotnet_diagnostic.ARCH101.severity = none
# In prototype projects, relax everything to warnings
[*Prototype/**/*.cs]
dotnet_diagnostic.REQ100.severity = warning
dotnet_diagnostic.REQ300.severity = warning
dotnet_diagnostic.SEC100.severity = warningA Single Build Output
With the unified analyzer framework, a single dotnet build reports ALL validation results:
$ dotnet build MyApp.sln
Build started...
# ── Requirement Coverage ──────────────────────────────────
error REQ100: JwtRefreshStory has 2 ACs but no specification interface
warning REQ102: AssignRoleStory has no specification (small story, acceptable)
info REQ103: OrderProcessingFeature — all ACs fully specified ✓
info REQ103: PasswordResetFeature — all ACs fully specified ✓
# ── Specification Implementation ──────────────────────────
error REQ200: IJwtRefreshSpec is not implemented by any class
info REQ203: IOrderProcessingSpec — fully implemented ✓
info REQ203: IPasswordResetSpec — fully implemented ✓
# ── Test Coverage ─────────────────────────────────────────
warning REQ301: UserRolesFeature.AdminCanRevokeRoles has no [Verifies] test
info REQ303: OrderProcessingFeature — all ACs fully tested ✓
info REQ303: PasswordResetFeature — all ACs fully tested ✓
# ── Performance Budgets ───────────────────────────────────
info PERF100: OrderProcessingFeature — P95=200ms, P99=500ms configured ✓
warning PERF105: PasswordResetFeature has [PerformanceTest] but AC
'ResetLinkCanOnlyBeUsedOnce' has no [Verifies] test
# ── Security Policies ─────────────────────────────────────
error SEC100: OrdersController.CancelOrder missing [Authorize] attribute
warning SEC102: OrdersController.CreateOrder accepts OrderDto without
[ValidateInput] attribute
info SEC105: PaymentController — all security policies satisfied ✓
# ── Architecture Constraints ──────────────────────────────
error ARCH100: MyApp.Domain.OrderService references
MyApp.Infrastructure.SqlOrderRepository directly
info ARCH105: MyApp.Api — architecture constraints satisfied ✓
# ── Operational Readiness ─────────────────────────────────
error OPS100: PaymentService has no /health endpoint
warning OPS103: CommerceDeployment has [DeploymentStrategy] but no
[RollbackCondition] defined
Build failed.
3 error(s): REQ100, SEC100, ARCH100, OPS100
5 warning(s): REQ102, REQ301, PERF105, SEC102, OPS103
7 info(s): REQ103(x2), REQ203(x2), REQ303(x2), SEC105$ dotnet build MyApp.sln
Build started...
# ── Requirement Coverage ──────────────────────────────────
error REQ100: JwtRefreshStory has 2 ACs but no specification interface
warning REQ102: AssignRoleStory has no specification (small story, acceptable)
info REQ103: OrderProcessingFeature — all ACs fully specified ✓
info REQ103: PasswordResetFeature — all ACs fully specified ✓
# ── Specification Implementation ──────────────────────────
error REQ200: IJwtRefreshSpec is not implemented by any class
info REQ203: IOrderProcessingSpec — fully implemented ✓
info REQ203: IPasswordResetSpec — fully implemented ✓
# ── Test Coverage ─────────────────────────────────────────
warning REQ301: UserRolesFeature.AdminCanRevokeRoles has no [Verifies] test
info REQ303: OrderProcessingFeature — all ACs fully tested ✓
info REQ303: PasswordResetFeature — all ACs fully tested ✓
# ── Performance Budgets ───────────────────────────────────
info PERF100: OrderProcessingFeature — P95=200ms, P99=500ms configured ✓
warning PERF105: PasswordResetFeature has [PerformanceTest] but AC
'ResetLinkCanOnlyBeUsedOnce' has no [Verifies] test
# ── Security Policies ─────────────────────────────────────
error SEC100: OrdersController.CancelOrder missing [Authorize] attribute
warning SEC102: OrdersController.CreateOrder accepts OrderDto without
[ValidateInput] attribute
info SEC105: PaymentController — all security policies satisfied ✓
# ── Architecture Constraints ──────────────────────────────
error ARCH100: MyApp.Domain.OrderService references
MyApp.Infrastructure.SqlOrderRepository directly
info ARCH105: MyApp.Api — architecture constraints satisfied ✓
# ── Operational Readiness ─────────────────────────────────
error OPS100: PaymentService has no /health endpoint
warning OPS103: CommerceDeployment has [DeploymentStrategy] but no
[RollbackCondition] defined
Build failed.
3 error(s): REQ100, SEC100, ARCH100, OPS100
5 warning(s): REQ102, REQ301, PERF105, SEC102, OPS103
7 info(s): REQ103(x2), REQ203(x2), REQ303(x2), SEC105The Spec-Driven Equivalent
To get the same validation coverage with the spec-driven approach, you need:
# .github/workflows/quality.yml — 8 separate tools
name: Quality Gates
on: [push]
jobs:
coverage:
runs-on: ubuntu-latest
steps:
- run: dotnet test --collect:"XPlat Code Coverage"
- run: reportgenerator -reports:**/coverage.cobertura.xml
# Output: coverage.cobertura.xml → parse for threshold check
# Does NOT know which ACs are covered
security:
runs-on: ubuntu-latest
steps:
- run: snyk test --severity-threshold=high
# Output: snyk-report.json → separate format from coverage
# Does NOT know which features are security-critical
architecture:
runs-on: ubuntu-latest
steps:
- run: dotnet test --filter "Category=Architecture"
# Output: test-results.xml → yet another format
# Does NOT know which features violate constraints
performance:
runs-on: ubuntu-latest
steps:
- run: k6 run load-test.js
# Output: k6-results.json → yet another format
# Does NOT know which features have performance budgets
mutation:
runs-on: ubuntu-latest
steps:
- run: dotnet stryker
# Output: mutation-report.html → yet another format
# Does NOT know which features need mutation testing
# ... 3 more jobs for docs, dependencies, operational readiness
aggregate:
needs: [coverage, security, architecture, performance, mutation]
steps:
- run: python check_all_gates.py # Custom aggregation script
# Must parse 5+ different output formats
# Must correlate results across tools (which tool found which problem?)
# Must decide: does the combination of results pass or fail?# .github/workflows/quality.yml — 8 separate tools
name: Quality Gates
on: [push]
jobs:
coverage:
runs-on: ubuntu-latest
steps:
- run: dotnet test --collect:"XPlat Code Coverage"
- run: reportgenerator -reports:**/coverage.cobertura.xml
# Output: coverage.cobertura.xml → parse for threshold check
# Does NOT know which ACs are covered
security:
runs-on: ubuntu-latest
steps:
- run: snyk test --severity-threshold=high
# Output: snyk-report.json → separate format from coverage
# Does NOT know which features are security-critical
architecture:
runs-on: ubuntu-latest
steps:
- run: dotnet test --filter "Category=Architecture"
# Output: test-results.xml → yet another format
# Does NOT know which features violate constraints
performance:
runs-on: ubuntu-latest
steps:
- run: k6 run load-test.js
# Output: k6-results.json → yet another format
# Does NOT know which features have performance budgets
mutation:
runs-on: ubuntu-latest
steps:
- run: dotnet stryker
# Output: mutation-report.html → yet another format
# Does NOT know which features need mutation testing
# ... 3 more jobs for docs, dependencies, operational readiness
aggregate:
needs: [coverage, security, architecture, performance, mutation]
steps:
- run: python check_all_gates.py # Custom aggregation script
# Must parse 5+ different output formats
# Must correlate results across tools (which tool found which problem?)
# Must decide: does the combination of results pass or fail?The comparison:
| Dimension | 8 CI Tools (Spec-Driven) | Unified Analyzers (Typed) |
|---|---|---|
| Configuration files | 8 (each tool's format) | 1 (.editorconfig) |
| Output formats | 5+ (XML, JSON, HTML, TXT, YAML) | 1 (MSBuild diagnostic format) |
| Cross-concern linking | None (tools are independent) | Built-in (all share type system) |
| Feature-level reporting | None (tool-level reporting) | Per-feature, per-AC |
| Aggregation | Custom script needed | Built into dotnet build |
| IDE integration | External (run CI, read report) | Native (squiggly underlines) |
| Timing | After push (minutes) | During typing (milliseconds) |
| New concern cost | New tool + config + parser | New analyzer family (same framework) |
| Dashboard | 8 separate or 1 aggregator | Build output + IDE |
| Failure correlation | Manual ("which tool found this?") | Automatic (diagnostic ID) |
The typed approach doesn't just unify validation output. It unifies the validation model. Every concern references the same type system. A PERF105 warning ("performance test without functional tests") can exist because the performance analyzer can see the [Verifies] attributes from the test analyzer. A SEC100 error ("missing [Authorize]") can reference the feature it affects because the security analyzer can see the [ForRequirement] attribute on the controller. Cross-concern validation isn't a feature — it's a consequence of sharing a type system.
Eight separate CI tools cannot cross-reference because they don't share a data model. They share a pipeline, but not a language.
The False Dichotomy: Breadth vs Depth
The standard framing of the comparison goes like this: "Spec-driven has breadth (covers everything). Typed specifications have depth (enforces the requirement chain). Choose based on whether you need breadth or depth."
This framing is wrong. It's a false dichotomy. And accepting it sells the typed approach short.
The Dichotomy as Presented
The "Standard" Comparison (FALSE):
Breadth
↑
│
Spec-Driven ● │
(15 concerns, │
0 enforced) │
│
│
│ ● Typed Specs
│ (1 concern,
│ 1 enforced)
│
└──────────────────→ DepthThe "Standard" Comparison (FALSE):
Breadth
↑
│
Spec-Driven ● │
(15 concerns, │
0 enforced) │
│
│
│ ● Typed Specs
│ (1 concern,
│ 1 enforced)
│
└──────────────────→ DepthThis diagram suggests a tradeoff: you can have broad coverage with no enforcement, or deep enforcement with narrow coverage. The implication is that both are valid choices depending on your needs.
But this is not a real tradeoff. It's a snapshot of a transitional state.
The Real Picture
The typed approach is not limited to one concern. It started with one concern (the requirement chain) because that's the most valuable concern to enforce first. But the architecture — attributes, source generators, Roslyn analyzers, the M3 metamodel — supports unlimited concerns. Every domain that the spec-driven approach covers with text, the typed approach can cover with a DSL.
The REAL Comparison:
Breadth
↑
│
Spec-Driven ● │ ● Typed Specs (future)
(15 concerns, │ (15 concerns,
0 enforced) │ 15 enforced)
│ /
│ / ← Each new DSL adds
│ / a concern WITH enforcement
│ /
│ /
│ ● Typed Specs (today)
│ (3-5 concerns,
│ 3-5 enforced)
│ /
│ /
│ ● Typed Specs (Day 1)
│ (1 concern,
│ 1 enforced)
│
└──────────────────→ Depth
Key insight: The typed approach MOVES. The spec-driven approach STAYS.The REAL Comparison:
Breadth
↑
│
Spec-Driven ● │ ● Typed Specs (future)
(15 concerns, │ (15 concerns,
0 enforced) │ 15 enforced)
│ /
│ / ← Each new DSL adds
│ / a concern WITH enforcement
│ /
│ /
│ ● Typed Specs (today)
│ (3-5 concerns,
│ 3-5 enforced)
│ /
│ /
│ ● Typed Specs (Day 1)
│ (1 concern,
│ 1 enforced)
│
└──────────────────→ Depth
Key insight: The typed approach MOVES. The spec-driven approach STAYS.The spec-driven approach starts broad and stays broad. Adding a new concern means writing another text document. That document has zero enforcement on Day 1 and zero enforcement on Day 1,000. The breadth never converts to depth.
The typed approach starts narrow and grows. Adding a new concern means building a DSL. That DSL has full enforcement from the moment it's built. Over time, the typed approach accumulates breadth AND depth. It converges toward the spec-driven approach's breadth while maintaining the enforcement the spec-driven approach cannot provide.
The Real Dichotomy
The actual choice is not breadth vs depth. It's enforced vs described:
The REAL Dichotomy:
Concerns
↑
│
│ ┌─────────────────────────────┐
│ │ DESCRIBED (spec-driven) │
│ │ │
│ │ - Performance targets │
│ │ - Security policies │
│ │ - Architecture constraints │
│ │ - Testing strategies │
│ │ - Monitoring alerts │
│ │ - Deployment strategies │
│ │ - Chaos experiments │
│ │ - Compliance requirements │
│ │ │
│ │ Status: Text. No compiler. │
│ │ Drift: Inevitable. │
│ │ Verification: Manual. │
│ └─────────────────────────────┘
│
│ ┌─────────────────────────────┐
│ │ ENFORCED (typed DSLs) │
│ │ │
│ │ - [ForRequirement] chain │
│ │ - [PerformanceBudget] │
│ │ - [SecurityPolicy] │
│ │ - [Layer] constraints │
│ │ - [MutationTarget] │
│ │ - [Alert] definitions │
│ │ - [DeploymentStrategy] │
│ │ - [ChaosExperiment] │
│ │ │
│ │ Status: Compiled. Type-safe. │
│ │ Drift: Impossible. │
│ │ Verification: Automatic. │
│ └─────────────────────────────┘
│
└──────────────────────────────→ Enforcement
Every concern in the DESCRIBED box can move to the ENFORCED box.
No concern in the ENFORCED box ever moves back.
The direction of travel is one-way: described → enforced.The REAL Dichotomy:
Concerns
↑
│
│ ┌─────────────────────────────┐
│ │ DESCRIBED (spec-driven) │
│ │ │
│ │ - Performance targets │
│ │ - Security policies │
│ │ - Architecture constraints │
│ │ - Testing strategies │
│ │ - Monitoring alerts │
│ │ - Deployment strategies │
│ │ - Chaos experiments │
│ │ - Compliance requirements │
│ │ │
│ │ Status: Text. No compiler. │
│ │ Drift: Inevitable. │
│ │ Verification: Manual. │
│ └─────────────────────────────┘
│
│ ┌─────────────────────────────┐
│ │ ENFORCED (typed DSLs) │
│ │ │
│ │ - [ForRequirement] chain │
│ │ - [PerformanceBudget] │
│ │ - [SecurityPolicy] │
│ │ - [Layer] constraints │
│ │ - [MutationTarget] │
│ │ - [Alert] definitions │
│ │ - [DeploymentStrategy] │
│ │ - [ChaosExperiment] │
│ │ │
│ │ Status: Compiled. Type-safe. │
│ │ Drift: Impossible. │
│ │ Verification: Automatic. │
│ └─────────────────────────────┘
│
└──────────────────────────────→ Enforcement
Every concern in the DESCRIBED box can move to the ENFORCED box.
No concern in the ENFORCED box ever moves back.
The direction of travel is one-way: described → enforced.This is the fundamental insight: describing a concern is the first step toward enforcing it. The spec-driven approach describes concerns and stops. The typed approach takes the additional step of building the DSL that enforces them.
"But building DSLs takes effort," the objection goes. Yes. But maintaining documents also takes effort — it's just invisible effort. The document maintenance doesn't produce error messages when it fails. It produces stale documents. Nobody knows they're stale until the wrong decision gets made based on outdated information.
The DSL maintenance is visible: a generator bug produces incorrect generated code, which produces a test failure, which gets fixed. The document maintenance is invisible: a stale performance target sits in a markdown file for six months until someone relies on it and deploys an endpoint that can't handle the load.
Visible maintenance is better than invisible maintenance. The same effort, spent on DSLs instead of documents, produces artifacts that cannot drift. That's not a tradeoff. That's a strictly better outcome.
What "Breadth" Really Means
When people say the spec-driven approach has "breadth," they mean it covers many concerns. But coverage without enforcement is inventory, not capability. A warehouse full of unplugged machines has "breadth" — it covers many manufacturing processes. But it doesn't manufacture anything.
The spec-driven Testing-as-Code specification describes 15+ testing strategies. How many does it enforce? Zero. The description is valuable as education — but education is a one-time transfer. Once the team knows about mutation testing, the document's ongoing value is zero. The ongoing value comes from enforcement: does the team ACTUALLY mutation-test? The document can't answer that. A [MutationTarget] analyzer can.
This is not an argument against documents. Documents are excellent for teaching. The spec-driven Testing-as-Code specification is a genuine contribution to the testing knowledge base. The argument is against stopping at documents. Teaching is the first step. Enforcement is the second step. The spec-driven approach takes the first step and declares victory. The typed approach takes both steps.
The Transitional Illusion
The spec-driven approach's "breadth advantage" is real on Day 1. The typed approach has one DSL; the spec-driven approach has 15 documented concerns. The gap is visible, and it feels like a fundamental difference.
But it's a transitional state, not a permanent one. Every month, the typed approach can add another DSL. Every month, a "described" concern becomes an "enforced" concern. The spec-driven approach's 15 concerns stay at zero enforcement month after month. The gap closes from one direction only: the typed approach gains breadth while keeping depth.
Breadth over time:
Month │ Spec-Driven │ Typed (enforced) │ Typed (total concerns)
───────┼────────────────┼──────────────────┼───────────────────────
1 │ 15 described │ 1 enforced │ 1 of 15
3 │ 15 described │ 3 enforced │ 3 of 15
6 │ 15 described │ 5 enforced │ 5 of 15
12 │ 15 described* │ 8 enforced │ 8 of 15
18 │ 14 described** │ 11 enforced │ 11 of 15
24 │ 12 described** │ 15 enforced │ 15 of 15
* Some spec-driven documents are now stale (never updated)
** Some spec-driven documents have been abandoned (nobody reads them)Breadth over time:
Month │ Spec-Driven │ Typed (enforced) │ Typed (total concerns)
───────┼────────────────┼──────────────────┼───────────────────────
1 │ 15 described │ 1 enforced │ 1 of 15
3 │ 15 described │ 3 enforced │ 3 of 15
6 │ 15 described │ 5 enforced │ 5 of 15
12 │ 15 described* │ 8 enforced │ 8 of 15
18 │ 14 described** │ 11 enforced │ 11 of 15
24 │ 12 described** │ 15 enforced │ 15 of 15
* Some spec-driven documents are now stale (never updated)
** Some spec-driven documents have been abandoned (nobody reads them)The spec-driven "breadth" is also not stable. Documents require maintenance. Without it, they drift, become stale, and eventually get abandoned. The 15 concerns that are described on Day 1 might be 12 actively-maintained concerns by Month 24. Meanwhile, the typed approach has grown from 1 to 15 enforced concerns — each one guaranteed to be current because it's compiled, not maintained.
The question isn't "which approach has more breadth?" It's "which approach's breadth is still accurate in two years?" Documents decay. Types don't. The transitional advantage of document breadth is exactly that — transitional. The permanent advantage of type enforcement is exactly that — permanent.