Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Drift, Scale, and the Long Game

Both approaches work on Day 1. The question is: what happens on Day 180? Day 730? Day 1,825? Every specification system must survive three challenges: drift (does it stay accurate?), scale (does it work when the project grows?), and maintenance (does it pay for itself over time?).


The Drift Problem

Drift is the gradual divergence between what the specification says and what the code does. All software projects experience drift. The question is not whether it happens, but how quickly, how detectably, and how painfully.

Spec-Driven Drift

Document drift is the natural state of all documentation. Documents and code are maintained by different processes, at different times, by different people. They diverge by default and converge only through deliberate effort.

Month 1: The PRD accurately describes the system. All features, ACs, and priorities are correct. The specification files match the CI pipeline configuration.

Month 3: Two new features were added directly in code (a hotfix for a security vulnerability, a performance optimization). The PRD wasn't updated because the features were urgent and the PRD update felt like paperwork. The Testing spec still references the old test framework; the team switched from xUnit to NUnit last sprint but nobody updated the spec.

Month 6: A developer reads the PRD and finds five features listed as "Critical" that were deprioritized to "Low" three months ago. The Coding Practices spec recommends a pattern the team abandoned after discovering it caused performance issues. The Context Engineering spec references a context assembly strategy that was never implemented.

Month 12: The PRD is treated as a historical document. New team members don't read it because senior developers warn them "it's outdated." The Testing spec is a relic — the team's actual testing practices have evolved beyond what it describes. The only specification that gets updated is the one a tech lead personally owns because they care about it.

Month 24: The team considers deleting the specification files. They're more misleading than helpful. A new junior developer follows the Coding Practices spec and writes code that conflicts with the team's actual practices. A code review catches it, but the reviewer says "ignore the spec, here's how we actually do things." The spec has negative value — it actively misinforms.

Drift curve (spec-driven):

  Accuracy │
    100% ───┐
            │\
            │ \
            │  \
     50% ───│───\──────── ← Most teams reach 50% accuracy at ~6 months
            │    \
            │     \_______ ← Plateau of "core stuff is still right"
      0% ───┼────────────────────────────
            0    6    12    18    24 months

Remediation: The framework's testing and documentation pillars include CI checks for specification freshness. But these checks are themselves documents — they describe what should be checked, not what IS checked. A team that doesn't implement the CI checks described in the specs is a team whose specs are unvalidated. And implementing those CI checks is additional work that competes with feature development.

Typed Specification Drift

Type drift is structurally impossible for the requirement→spec→impl→test chain. The types ARE the code. They cannot diverge from themselves.

Month 1: The feature types accurately describe the system because they ARE the system.

Month 3: Two new features were added. Each required adding new feature records with abstract AC methods. The compiler forced the developer to create specification interfaces, implement them, and write tests. The features exist in the type system because they wouldn't have compiled otherwise.

Month 6: A feature was deprioritized. The RequirementPriority was changed from Critical to Low on the feature record. This is a one-line change that's tracked in git. Every reference to this feature still works because the type hasn't been renamed or deleted.

Month 12: The team's requirements are exactly as current as their codebase — because the requirements ARE the codebase. There's no separate document to maintain, no CI check to implement, no freshness validation to configure.

Month 24: A new developer joins. They see UserRolesFeature in the codebase. They Ctrl+Click through the chain: feature → spec → implementation → tests. They understand the system without reading any documentation because the types are self-navigating.

Drift curve (typed specifications):

  Accuracy │
    100% ───┬──────────────────────────── ← Types cannot drift from types
            │
            │
            │
     50% ───│
            │
            │
      0% ───┼────────────────────────────
            0    6    12    18    24 months

But Not All Drift Is Prevented

Typed specifications prevent drift in the structural chain (requirement → spec → impl → test). They do NOT prevent:

  1. Semantic drift. A feature type says AdminCanAssignRoles, but the implementation assigns roles incorrectly (wrong logic, correct types). The compiler verifies structure, not behavior.

  2. External documentation drift. If the team maintains a wiki, a Confluence page, or a customer-facing FAQ that describes features, those documents can drift from the types. The types don't know about external documentation.

  3. Non-functional specification drift. Performance targets, SLA thresholds, and operational constraints are not part of the type system. If the team's performance target changes from "p95 < 200ms" to "p95 < 100ms," that change lives in a configuration file or CI pipeline — not in the type system.

  4. Process drift. The team's workflow (how features are proposed, reviewed, and approved) isn't encoded in types. If the team changes their process from "PO approves all features" to "tech lead approves small features," that's process drift that types can't prevent.


The Scale Problem

What happens when the project grows from 10 features to 100? From 5 developers to 50? From 1 service to 12?

Spec-Driven at Scale

10 features, 1 service: The PRD is one document with ~3,000 words. Everyone reads it. Context assembly is straightforward.

50 features, 3 services: The PRD is now ~15,000 words across multiple sections. Context assembly must select the right section — a non-trivial problem when features span services. The Testing spec references patterns that apply to some services but not others (the background worker doesn't have API endpoints, but the Testing spec's API integration section is still part of the context).

100 features, 12 services: The PRD is unwieldy. Some features span four services. Context assembly requires understanding the feature-to-service mapping — which is itself a specification that can drift. The team splits the PRD into per-service PRDs, but cross-service features don't fit neatly into any single per-service PRD.

The scaling pattern: document-based specifications scale poorly with feature count because documents are linear (one text after another), while systems are graphs (features span services, services share components, components have cross-cutting concerns). A linear document cannot efficiently represent a graph structure.

Typed Specifications at Scale

10 features, 1 service: 10 feature records in the Requirements project. Clean, navigable, well-understood.

50 features, 3 services: 50 feature records, organized by epics. Each service has its own Specifications and Domain project, but all reference the shared Requirements project. Cross-service features are types that multiple services reference:

SharedCompany.sln
├── MyCompany.Requirements/               ← All 50 features
├── src/
│   ├── MyCompany.OrderService/           ← [ForRequirement] order features
│   ├── MyCompany.PaymentGateway/         ← [ForRequirement] payment features
│   └── MyCompany.NotificationWorker/     ← [ForRequirement] notification features
└── test/
    ├── MyCompany.OrderService.Tests/     ← [Verifies] order ACs
    ├── MyCompany.PaymentGateway.Tests/   ← [Verifies] payment ACs
    └── MyCompany.NotificationWorker.Tests/

The traceability matrix shows which features are implemented where:

OrderProcessingFeature:
  AC: OrderCreated        → OrderService (impl) + OrderTests (verified)
  AC: PaymentValidated    → PaymentGateway (impl) + PaymentTests (verified)
  AC: CustomerNotified    → NotificationWorker (impl) + NotificationTests (verified)

100 features, 12 services: 100 feature records. The Requirements project is a library — small in code but rich in types. Each service references it via <ProjectReference>. The compiler enforces cross-service traceability because all services share the same feature types.

Build time increases because more types mean more analyzer work. But the analyzer is incremental — it only re-checks features whose types changed.

The scaling pattern: typed specifications scale well with feature count because types are a graph structure (references between types, generic constraints, interface implementations). A graph structure naturally represents a graph system.

The Scaling Tradeoffs

Scale Factor Spec-Driven Typed Specifications
Features (10→100) PRD grows linearly, becomes unwieldy Feature types grow, remain navigable
Services (1→12) Context assembly must handle cross-service Shared Requirements project handles naturally
Developers (5→50) More people forget to read/update specs More people → compiler teaches each one
ACs (30→300) Manual tracking impossible Source-generated matrix handles automatically
Build time No impact (text files) Analyzer build time grows (mitigated by incremental)
Onboarding Read the spec (if it's current) Navigate the types (always current)
Cross-team Which team owns which spec section? Which team owns which feature type? (clearer)

The Maintenance Problem

Every system has maintenance costs. What does each approach cost to maintain?

Spec-Driven Maintenance

Document maintenance: Every specification file must be updated when the system changes. If you add a new testing strategy, update the Testing spec. If you change the deployment strategy, update the Specification-as-Code. If you adopt a new coding pattern, update the Coding Practices spec.

This is ongoing work that produces no code, no features, no user value. It's pure overhead — necessary to keep the specifications accurate, but competing with feature development for time and attention.

Context assembly maintenance: The context engineering infrastructure — how documents are selected, assembled, and validated — must be maintained separately. If a new document type is added, the assembly rules must be updated. If the AI model changes (GPT-4 → Claude → Gemini), the context window constraints change, and the assembly strategy must adapt.

CI pipeline maintenance: The quality gates described in the specifications must be implemented and maintained in CI. If the Testing spec changes thresholds, the CI pipeline must be updated. If a new quality gate is added, the pipeline configuration grows.

Estimated ongoing cost: 10-15% of development time goes to maintaining specifications, context assembly infrastructure, and CI pipeline — assuming the team actually keeps them current.

Typed Specification Maintenance

Generator maintenance: The Roslyn source generators must be updated when the target framework changes (new .NET version, new C# features, new Roslyn API). This is periodic, not continuous — perhaps 2-3 updates per year.

Analyzer maintenance: The REQ1xx-REQ4xx analyzers must be updated if the convention changes (new attribute types, new diagnostic rules). This is infrequent — the conventions are stable by design.

Type evolution: When the base types change (adding a new field to RequirementMetadata, changing the AcceptanceCriterionResult shape), all feature records must adapt. But the compiler tells you exactly which records need updating — it's a guided migration, not a manual hunt.

Estimated ongoing cost: 3-5% of development time goes to maintaining generators and analyzers. Zero time goes to maintaining "specification documents" because there are no specification documents.


The Cost-Benefit Curves

Cumulative cost over time:

Cost │                           Spec-Driven
     │                          /
     │                         /
     │                        / ← Document maintenance accumulates
     │                       /
     │                      /
     │                     /
     │           ─────────────────── Typed Specifications
     │          /                    (flat after initial investment)
     │         / ← Initial tooling investment
     │        /
     │       /
     │      /
     │     /
     │────/─────────────────────────────────────────
     0    3    6    9    12   15   18   21   24 months

The typed approach has a higher initial cost (building generators and analyzers) but a flatter maintenance curve (no documents to maintain). The spec-driven approach has a lower initial cost (fill in templates) but a steeper maintenance curve (documents must be kept current).

The crossover point — where typed specifications become cheaper total — depends on:

  1. Team size. Larger teams drift faster (more people → more opportunities for divergence).
  2. Change frequency. Rapidly changing projects drift faster.
  3. Compliance requirements. Regulated projects pay a higher cost for drift (audit findings, remediation).
  4. Tooling investment. A team that already has Roslyn expertise pays less for the initial investment.

For a small team (3-5 developers) on a stable project, the crossover might never happen — the spec-driven approach stays cheaper. For a large team (20+ developers) on a rapidly changing project with compliance requirements, the crossover happens within 6-9 months.


The Tooling Paradox at Scale

This section addresses a counterintuitive truth about the spec-driven approach: the more seriously you adopt it, the more you converge toward building your own typed system.

Phase 1: Text Files (Day 1)

You fill in the PRD template. You write acceptance criteria as English sentences. You define quality thresholds as numbers in a .txt file. Everything is text. Everything is simple. No tooling needed.

Phase 2: Parsing (Month 2)

You want to automatically check that every feature in the PRD has at least one test. To do this, you need to:

  1. Parse the PRD to extract feature names and AC lists
  2. Parse the test directory to find test classes
  3. Match features to tests by some convention (name matching? comments? tags?)

You've built a parser. It's custom. It breaks when someone adds a space in the wrong place.

Phase 3: Structured Format (Month 4)

The parser is fragile, so you switch from free-text to YAML or JSON:

features:
  - id: password_reset
    acceptance_criteria:
      - id: ac_request_email
        description: "User can request a password reset email"
      - id: ac_link_expires
        description: "Reset link expires after 24 hours"

Now you have a structured format with IDs. Your parser is more reliable. But you've also introduced a mini-DSL: features[].acceptance_criteria[].id is a schema that your tooling depends on. You need to validate that the YAML conforms to this schema. You write a JSON Schema or a custom validator.

Phase 4: Cross-Referencing (Month 6)

You want tests to reference features by ID. You define a convention:

# In tests metadata
tests:
  - file: PasswordResetTests.cs
    feature: password_reset
    acceptance_criteria: [ac_request_email, ac_link_expires]

Now you have cross-references between YAML files. You need to validate that password_reset exists in the PRD, that ac_request_email exists on that feature, and that the test file actually exists. You build a cross-reference validator.

Phase 5: Code Generation (Month 9)

You want to auto-generate test stubs from feature definitions. You write a template that reads the YAML and produces C# test files with the right attributes. You've built a code generator.

Phase 6: Realization (Month 12)

You now have:

  • A YAML-based DSL with a defined schema
  • A parser that extracts features, ACs, and tests
  • A validator that checks schema conformance and cross-references
  • A code generator that produces test stubs
  • A CI check that verifies coverage by AC

You've built a typed specification system. In YAML. Without a compiler, without IDE integration, without refactoring support, without type checking, without Ctrl+Click navigation.

You've spent 12 months building a worse version of what Roslyn source generators provide out of the box. (For an example of what the typed approach produces instead — one feature declaration generating 17 files across Grafana, Prometheus, Kubernetes, and Helm — see Auto-Documentation from a Typed System, Part VIII.)

The convergence paradox:

Spec-driven (text)
  │
  ├─ "We need to validate specs" → build a parser
  ├─ "We need to check coverage" → build cross-reference checker
  ├─ "We need to generate code" → build code generator
  ├─ "We need IDE support" → build a language server?
  └─ Congratulations: you've built a DSL compiler
     (12 months, no type safety, no IDE integration)

Typed specifications (C#)
  │
  └─ Use the existing C# compiler, IDE, and source generators
     (existing tooling, full type safety, full IDE integration)

This isn't hypothetical. This is the trajectory every team follows when they take specification-as-text seriously. The text format fights back at every step — because text was never designed to be executable. The moment you need the specifications to DO something (validate, cross-reference, generate), you're building compiler infrastructure. And compiler infrastructure is what typed languages already have.


The Language Lock-In Question

The typed approach's most significant long-term risk is language lock-in.

The spec-driven approach is language-agnostic. The same PRD template, Testing spec, and Coding Practices spec work for a C# backend, a Python ML pipeline, a React frontend, and a Go microservice. The specifications are text files — they don't care what compiles them.

The typed approach is C# (or at least .NET). The feature records are C# code. The Roslyn analyzers are C# technology. The source generators are Roslyn extensions. If the team adopts a Python service, the typed requirements don't extend to it.

Mitigations

  1. The .Requirements project is just a DLL. A Python service can reference it via gRPC, REST, or shared proto definitions. The types define the contract; the Python service implements it without Roslyn analyzers. The traceability is weaker (no compile-time enforcement in Python), but the requirements are still the source of truth.

  2. The approach can be ported to other typed languages. TypeScript has its own version of typed specifications (see Requirements as Code (TypeScript)). Rust could implement it with proc macros. Go could implement it with code generation tools. The principle transfers even if the tooling doesn't.

  3. Most enterprise backends are one primary language. A C# shop using C# for 80% of their code gets 80% of the benefit. The remaining 20% (scripts, frontends, ML) can use the spec-driven approach for its share.

The Honest Assessment

Language lock-in is a real cost. A team that standardizes on typed specifications in C# and then needs to add a significant Rust component faces a choice: port the typed specification tooling to Rust (expensive), accept weaker enforcement in Rust (pragmatic), or use spec-driven documents for the Rust component (hybrid).

The spec-driven approach wins here on flexibility. The typed approach wins on depth. This is a genuine tradeoff, not a false dichotomy.


The Learning Curve

There's a common misconception that the spec-driven approach has a lower learning curve because "anyone can fill in a text template." This deserves scrutiny.

The Uncomfortable Truth: Both Approaches Require Learning a Language

The spec-driven PRD template has a syntax:

DEFINE_FEATURE(feature_name)
  description: "..."
  user_story: "As a [user], I want [goal] so that [benefit]"
  acceptance_criteria:
    - "AC text"
  priority: Critical | High | Medium | Low
  complexity: Simple | Medium | Complex
  dependencies: ["dep1", "dep2"]

This is a grammar. It has keywords (DEFINE_FEATURE, acceptance_criteria), data types (priority is an enum, dependencies is an array), nesting rules (ACs are inside features), and conventions (user story format). A new team member must learn this grammar before they can write correct specifications.

The Testing-as-Code specification has its own grammar:

DEFINE_PRINCIPLE(principle_name)
  Scope: scope1, scope2
  Enforcement: mandatory | recommended
  Validation strategy: strategy_name
  Rule: "..."

DEFINE_PRACTICE(practice_name)
  Violation Patterns: [...]
  Auto Fix Options: [...]

More keywords. More structure. More rules. A new team member must learn what DEFINE_PRINCIPLE means vs DEFINE_PRACTICE, what Enforcement: mandatory implies, and how Violation Patterns relate to Auto Fix Options.

The Context Engineering specification has:

CONTEXT_RULE(rule_name)
  sources:
    - prd: feature_section(...)
    - coding_practices: language_section(...)
  assembly_strategy: task_driven | progressive_disclosure | adaptive
  max_context_tokens: 8000

Yet another grammar. Context rules, source selectors, assembly strategies, token budgets.

Total languages to learn for the spec-driven approach: 6 (one per pillar, each with its own syntax).

Now consider the typed approach. The language is C#. The grammar is:

public abstract record FeatureName : Feature<EpicName>
{
    public abstract AcceptanceCriterionResult AcName(Param1 p1, Param2 p2);
}

Total languages to learn: 1 (C#, which the team already knows).

The "low learning curve" of spec-driven is an illusion. What's low is the perceived learning curve — because the text files look like English, people assume they're intuitive. But DEFINE_FEATURE(feature_name) is no more intuitive than public abstract record FeatureName. Both are formal syntax that must be learned. The difference is:

  • The spec-driven syntax has no compiler. You learn it by reading docs, making mistakes, and having someone review your PRD. Errors are caught by human review — or not at all.
  • The C# syntax has a compiler. You learn it by writing code and reading compiler errors. Errors are caught instantly, with specific diagnostics and suggested fixes.

Which learning experience is better? The one with instant, specific feedback (the compiler), or the one with delayed, vague feedback (human review of a text file)?

The Real Learning Curve Comparison

Learning difficulty over time:

Competence │
    100% ───│                    ╭─── Typed Specifications
            │                   ╱     (compiler teaches continuously)
            │                  ╱
            │                 ╱
     50% ───│────╱────────────────── Spec-Driven
            │   ╱  ← fast start     (plateau: never fully learned because
            │  ╱                      no feedback on errors)
            │ ╱
      0% ───┼────────────────────────────────────
            0    1    2    3    4 weeks

Spec-driven starts faster (the template is readable on Day 1) but plateaus (without compiler feedback, subtle errors persist indefinitely). Typed specifications start slower (the "aha moment" takes a few hours) but reach higher competence (the compiler continuously corrects and teaches).

What Each Approach Teaches

Spec-driven teaches you: the structure of the template, the available fields, the quality thresholds, the testing strategies. It's a reading exercise — you learn by consuming the documentation.

Typed specifications teach you: the domain model, the requirement hierarchy, the specification chain, the compiler feedback loop. It's a doing exercise — you learn by writing code and responding to compiler diagnostics. And what you learn is the DOMAIN, not the tool. The tool (C#) you already know. The domain (your features, your ACs, your specifications) is what the compiler teaches you by demanding precision.

This is the deepest difference: spec-driven teaches you a template format. Typed specifications teach you your own domain. One is process knowledge that has no value outside the template. The other is domain knowledge that makes you a better engineer.



Summary

Long-Term Factor Spec-Driven Typed Specifications
Drift risk High (documents diverge from code) Zero for structural chain; possible for semantics
Scaling Poor (linear documents for graph systems) Good (types are naturally a graph)
Maintenance cost Ongoing (10-15% overhead) Front-loaded (high initial, 3-5% ongoing)
Language flexibility Excellent (language-agnostic) Limited (C#/.NET ecosystem)
Learning curve Low initial, medium advanced Medium initial, high advanced
Team size sensitivity High (more people = faster drift) Low (compiler enforces regardless of team size)
Crossover point N/A 6-9 months for large teams, possibly never for small teams

Part X brings it all together: the verdict, the decision framework, and the hybrid possibility.


The Mono-Repo Advantage

Typed specifications reach their full potential in a mono-repo architecture. When all services share a single Requirements project and a single solution, the compiler validates the entire system's requirement chain in one pass.

The 12-Service System

Consider a mid-size e-commerce platform with 12 services:

MyCompany.sln
│
├── shared/
│   ├── MyCompany.Requirements/            ← 45 features, 180 ACs
│   │   ├── Epics/
│   │   │   ├── CustomerExperienceEpic.cs
│   │   │   ├── CommerceOperationsEpic.cs
│   │   │   ├── PlatformReliabilityEpic.cs
│   │   │   └── ComplianceEpic.cs
│   │   ├── Features/
│   │   │   ├── OrderProcessingFeature.cs       → 3 services
│   │   │   ├── PaymentProcessingFeature.cs     → 2 services
│   │   │   ├── UserAuthenticationFeature.cs    → 2 services
│   │   │   ├── InventoryManagementFeature.cs   → 2 services
│   │   │   ├── ShippingFeature.cs              → 3 services
│   │   │   ├── NotificationFeature.cs          → 1 service
│   │   │   ├── SearchFeature.cs                → 1 service
│   │   │   ├── ReportingFeature.cs             → 2 services
│   │   │   ├── AuditFeature.cs                 → all services
│   │   │   └── ... (36 more features)
│   │   └── Stories/
│   │       └── ... (90 stories)
│   │
│   ├── MyCompany.SharedKernel/             ← Value types, Result, exceptions
│   └── MyCompany.Requirements.Analyzers/   ← REQ1xx-REQ4xx
│
├── services/
│   ├── MyCompany.Api.Gateway/              ← BFF, routing, rate limiting
│   ├── MyCompany.Identity/                 ← Auth, users, roles
│   ├── MyCompany.Catalog/                  ← Products, categories, search
│   ├── MyCompany.Orders/                   ← Order lifecycle
│   ├── MyCompany.Payments/                 ← Payment processing, refunds
│   ├── MyCompany.Inventory/                ← Stock, reservations, warehouses
│   ├── MyCompany.Shipping/                 ← Carriers, tracking, labels
│   ├── MyCompany.Notifications/            ← Email, SMS, push
│   ├── MyCompany.Reporting/                ← Analytics, dashboards, exports
│   ├── MyCompany.Admin/                    ← Internal admin UI
│   ├── MyCompany.Webhooks/                 ← External integrations
│   └── MyCompany.BackgroundJobs/           ← Scheduled tasks, cleanup
│
└── test/
    ├── MyCompany.Orders.Tests/
    ├── MyCompany.Payments.Tests/
    ├── MyCompany.Identity.Tests/
    ├── MyCompany.Inventory.Tests/
    ├── MyCompany.Shipping.Tests/
    ├── MyCompany.Notifications.Tests/
    ├── MyCompany.Catalog.Tests/
    ├── MyCompany.Reporting.Tests/
    ├── MyCompany.Admin.Tests/
    ├── MyCompany.Webhooks.Tests/
    ├── MyCompany.BackgroundJobs.Tests/
    └── MyCompany.Integration.Tests/        ← Cross-service E2E

The Project Reference Graph

Every service references the shared Requirements project. This creates a star topology:

                    MyCompany.Requirements
                          │
          ┌───────────────┼───────────────┐
          │               │               │
    ┌─────┼─────┐   ┌────┼────┐    ┌─────┼─────┐
    │     │     │   │    │    │    │     │     │
    ▼     ▼     ▼   ▼    ▼    ▼    ▼     ▼     ▼
  Orders Payments Identity Catalog Inventory Shipping
    │     │     │   │    │    │    │     │     │
    ▼     ▼     ▼   ▼    ▼    ▼    ▼     ▼     ▼
  Notifications  Reporting  Admin  Webhooks  BackgroundJobs

  ← All 12 services reference Requirements →
  ← All 12 services reference SharedKernel  →
  ← No service references another service   →

The critical constraint: no service references another service directly. Services communicate through messages, events, or APIs — not project references. But they all share the same feature types and value types. This means:

  1. OrderId is the same type in every service. No serialization mismatch possible.
  2. OrderProcessingFeature.PaymentIsCharged is the same AC in every service. No naming inconsistency.
  3. The traceability matrix covers all 12 services. No feature falls through the cracks.

The Generated Traceability Matrix at Scale

The solution-level traceability matrix for 45 features across 12 services:

┌─────────────────────────────────────────────────────────────────────────────┐
│ SOLUTION TRACEABILITY MATRIX — MyCompany (45 features, 180 ACs)            │
├────────────────────────┬──────────────┬──────────────┬──────┬──────────────┤
│ Feature                │ ACs          │ Services     │ Tests│ Status       │
├────────────────────────┼──────────────┼──────────────┼──────┼──────────────┤
│ OrderProcessingFeature │ 6/6 covered  │ ORD,PAY,SHP  │ 18   │ ✓ Complete   │
│ PaymentProcessingFeat. │ 4/4 covered  │ PAY,ORD      │ 12   │ ✓ Complete   │
│ UserAuthenticationFeat.│ 5/5 covered  │ IDN,GW       │ 15   │ ✓ Complete   │
│ InventoryMgmtFeature   │ 3/3 covered  │ INV,ORD      │ 9    │ ✓ Complete   │
│ ShippingFeature        │ 5/5 covered  │ SHP,ORD,NTF  │ 14   │ ✓ Complete   │
│ NotificationFeature    │ 3/3 covered  │ NTF          │ 8    │ ✓ Complete   │
│ SearchFeature          │ 4/4 covered  │ CAT          │ 10   │ ✓ Complete   │
│ ReportingFeature       │ 4/3 covered  │ RPT,ORD      │ 7    │ ⚠ 1 AC gap  │
│ AuditFeature           │ 2/2 covered  │ ALL          │ 24   │ ✓ Complete   │
│ ... (36 more)          │              │              │      │              │
├────────────────────────┼──────────────┼──────────────┼──────┼──────────────┤
│ TOTALS                 │ 175/180 (97%)│ 12 services  │ 487  │ 5 ACs gap   │
└────────────────────────┴──────────────┴──────────────┴──────┴──────────────┘

Gap Details:
  - ReportingFeature.ExportToPdfWithCharts → No [ForRequirement] in any service
  - InventoryFeature.LowStockAlert → Implemented but 0 tests [REQ301]
  - ... (3 more)

This matrix is generated automatically from the type annotations across all 12 services. It shows, at a glance, which features are fully covered and which have gaps. The 5 ACs with gaps are named, not just counted.

Why the Mono-Repo Matters

In a poly-repo (each service in its own repository), the typed approach still works — each repo references the Requirements NuGet package instead of a project reference. But the mono-repo has three advantages:

  1. Atomic changes. When you add an AC to OrderProcessingFeature, the compiler shows gaps in ALL 12 services at once. In a poly-repo, you'd need to publish the updated NuGet package and rebuild each service separately.

  2. Solution-level traceability. The matrix above is generated from one dotnet build. In a poly-repo, you'd need to aggregate results from 12 separate builds.

  3. Refactoring spans services. Rename OrderProcessingFeature.PaymentIsCharged to PaymentIsAuthorizedAndCaptured? In a mono-repo, one rename propagates across all 12 services instantly. In a poly-repo, each service must be updated separately after the NuGet package is published.

The Spec-Driven Equivalent at 12 Services

In the spec-driven approach, the same 12-service system needs:

  • One PRD with 45 features (or 12 per-service PRDs that must stay consistent)
  • One Testing spec (or 12 per-service testing specs)
  • One traceability document mapping features to services (manually maintained)
  • 12 CI pipeline configurations referencing the right spec sections
  • 12 context assembly configurations for AI agents

If you use one PRD, it becomes a 20,000+ word document that's difficult to navigate and harder to keep current. If you use 12 per-service PRDs, cross-service features are duplicated and consistency is a manual audit.

The spec-driven approach was designed for simplicity — a few text files that describe the system. At 12 services, that simplicity inverts. The text files become the bottleneck: too large to navigate, too coupled to update, too scattered to audit.

The typed approach scales linearly: one Requirements project, one set of analyzers, one traceability matrix. Adding a 13th service means adding one <ProjectReference> — the compiler handles the rest.


The Refactoring Story

Refactoring is where specification systems are truly tested. Not adding new features — that's the easy case. The hard case is: renaming, splitting, decomposing, and reorganizing existing features.

Let's trace a realistic refactoring scenario through both approaches.

The Scenario

The team needs to make three changes:

  1. Rename: UserRolesFeature is renamed to AccessControlFeature (the team realized "roles" is too narrow — they also manage permissions, scopes, and policies).

  2. Split an AC: The AC AdminCanAssignRoles is split into two ACs: AdminCanAssignRolesToUser (assigning a role to a single user) and AdminCanBulkAssignRolesToGroup (assigning a role to all users in a group).

  3. Decompose a service: The AuthorizationService is decomposed into RoleService (handles role assignments) and PolicyService (handles permission policies). Both still implement the same feature.

Spec-Driven Refactoring

Step 1: Rename the feature

Open the PRD. Find DEFINE_FEATURE(user_roles_management). Change it to DEFINE_FEATURE(access_control).

Now find every reference to user_roles_management across:

  • The PRD itself (internal cross-references)
  • The Testing spec (test strategy references)
  • The Context Engineering spec (context assembly references)
  • The traceability document (feature-to-service mapping)
  • Test files (comments, naming conventions, tags)
  • Code files (comments, naming conventions)
  • CI pipeline (if feature names are in pipeline config)
  • Documentation (wiki, ADRs, onboarding guides)

How many references exist? Unknown. You search for user_roles across the entire repository. You find 23 references. You update them. You miss 3 that use a slightly different form (userRoles, UserRoles, user-roles). These stale references become latent bugs — they reference a feature that no longer exists by that name.

Estimated effort: 1-2 hours. With a risk of missed references.

Step 2: Split the AC

Open the PRD. Find:

acceptance_criteria:
  - "Admin can assign roles to users"

Replace with:

acceptance_criteria:
  - "Admin can assign a role to a single user"
  - "Admin can bulk-assign a role to all users in a group"

Now update:

  • The traceability document (one AC becomes two)
  • The test files (one test becomes two test groups)
  • The implementation code (one method becomes two methods)
  • The context assembly configuration (if AC-level context exists)

The PRD change is trivial. Propagating it to tests and code is manual work. The spec-driven approach has no mechanism to tell you which tests and code files need updating. You search. You hope. You review.

Estimated effort: 2-3 hours. With a risk of incomplete propagation.

Step 3: Decompose the service

The AuthorizationService is split into RoleService and PolicyService. In the spec-driven approach:

  • Update the traceability document to map the feature to two services instead of one
  • Update the Testing spec if it references specific service architectures
  • Update the Context Engineering spec if context assembly is service-aware
  • Update the CI pipeline if it has per-service stages
  • Rename and reorganize test files

The spec-driven approach has no structural awareness of service decomposition. It's pure documentation and code reorganization — manual work with no validation that the result is consistent.

Estimated effort: 3-4 hours. With significant risk of inconsistency.

Total estimated effort: 6-9 hours. With residual risk of stale references, incomplete propagation, and inconsistent documentation.

Typed Specification Refactoring

Step 1: Rename the feature

In the IDE, right-click on UserRolesFeature and select "Rename" (or press F2). Type AccessControlFeature. Press Enter.

The IDE updates every reference across all projects:

Updated 47 references in 23 files:
  - MyCompany.Requirements/Features/AccessControlFeature.cs (definition)
  - MyCompany.Requirements/Stories/AssignRoleStory.cs (Feature<AccessControlFeature>)
  - MyCompany.Requirements/Stories/RevokeRoleStory.cs (Feature<AccessControlFeature>)
  - MyCompany.Identity/Specifications/IAccessControlSpec.cs ([ForRequirement(typeof(...))])
  - MyCompany.Identity/Domain/AuthorizationService.cs ([ForRequirement(typeof(...))])
  - MyCompany.Identity.Tests/AccessControlFeatureTests.cs ([TestsFor(typeof(...))])
  - MyCompany.Identity.Tests/AccessControlFeatureTests.cs ([Verifies(typeof(...))])
  - ... (16 more files)

Every typeof(UserRolesFeature) is now typeof(AccessControlFeature). Every nameof(UserRolesFeature.AdminCanAssignRoles) is now nameof(AccessControlFeature.AdminCanAssignRoles). The rename is atomic, complete, and verified by the compiler.

Estimated effort: 30 seconds. Zero risk of missed references.

Step 2: Split the AC

Replace the abstract method:

// Before:
public abstract AcceptanceCriterionResult
    AdminCanAssignRoles(UserId actingUser, UserId targetUser, RoleId role);

// After:
public abstract AcceptanceCriterionResult
    AdminCanAssignRoleToUser(UserId actingUser, UserId targetUser, RoleId role);

public abstract AcceptanceCriterionResult
    AdminCanBulkAssignRoleToGroup(UserId actingUser, GroupId targetGroup, RoleId role);

The moment this compiles, the compiler fires:

error CS0535: 'AuthorizationService' does not implement interface member
              'IAccessControlSpec.AssignRoleToUser(...)'

error CS0535: 'AuthorizationService' does not implement interface member
              'IAccessControlSpec.BulkAssignRoleToGroup(...)'

error REQ101: AccessControlFeature.AdminCanBulkAssignRoleToGroup has no
              matching spec method

warning REQ301: AccessControlFeature.AdminCanAssignRoleToUser has 0 tests
                with [Verifies]

warning REQ301: AccessControlFeature.AdminCanBulkAssignRoleToGroup has 0 tests
                with [Verifies]

error CS0234: The name 'AdminCanAssignRoles' does not exist in type
              'AccessControlFeature' (in test file, nameof reference)

The compiler tells you exactly what to update:

  1. Add AssignRoleToUser and BulkAssignRoleToGroup to the spec interface
  2. Implement both methods in AuthorizationService
  3. Update the test's nameof reference (the old AC name no longer exists)
  4. Write tests for both new ACs

Every step is guided. Nothing can be forgotten. The build doesn't succeed until every reference is updated.

Estimated effort: 20-30 minutes. Zero risk of incomplete propagation.

Step 3: Decompose the service

Create RoleService and PolicyService. Move the relevant interface implementations:

// Before: one service implements everything
[ForRequirement(typeof(AccessControlFeature))]
public class AuthorizationService : IAccessControlSpec { ... }

// After: two services, each implementing their share
[ForRequirement(typeof(AccessControlFeature))]
public class RoleService : IRoleSpec
{
    // Implements AdminCanAssignRoleToUser, AdminCanBulkAssignRoleToGroup
}

[ForRequirement(typeof(AccessControlFeature))]
public class PolicyService : IPolicySpec
{
    // Implements ViewerHasReadOnlyAccess, RoleChangeTakesEffectImmediately
}

The compiler verifies that every AC from AccessControlFeature is covered by at least one service's spec interface. The traceability matrix updates automatically:

AccessControlFeature:
  AC: AdminCanAssignRoleToUser       → RoleService (impl) + Tests ✓
  AC: AdminCanBulkAssignRoleToGroup  → RoleService (impl) + Tests ✓
  AC: ViewerHasReadOnlyAccess        → PolicyService (impl) + Tests ✓
  AC: RoleChangeTakesEffectImmediately → PolicyService (impl) + Tests ✓

No manual traceability update. No documentation changes. The types self-document.

Estimated effort: 30-45 minutes. Mostly spent on the actual code reorganization, not on updating references.

Total estimated effort: ~1 hour. Zero residual risk.

The Refactoring Comparison

Refactoring Step Spec-Driven Typed Specifications
Rename feature 1-2 hours, risk of missed refs 30 seconds, zero risk
Split AC 2-3 hours, risk of incomplete propagation 20-30 minutes, compiler-guided
Decompose service 3-4 hours, significant inconsistency risk 30-45 minutes, auto-validated
Total effort 6-9 hours ~1 hour
Residual risk High (stale refs, missed updates, drift) Zero (compiler validates all references)
Confidence Low (manual verification needed) Total (if it compiles, it's consistent)

The effort ratio is roughly 7:1. For a single refactoring event, this saves a day of work. Over the lifetime of a project with monthly refactoring, the cumulative savings are measured in person-weeks.

But the more important metric is confidence. After the spec-driven refactoring, someone should audit the result — read every file, check every reference, verify every test. Nobody does this. In the typed approach, the compiler IS the audit. If the build passes, the refactoring is complete.

⬇ Download