Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

The Verdict: When to Use Which

Ten parts of analysis. Hundreds of comparisons. The honest answer is: both approaches are better than no specification system at all. But they're not equivalent, and choosing the wrong one for your context wastes time, money, and developer goodwill.

This part provides a decision framework, an honest verdict, and a hybrid strategy for teams that want the best of both.


The Grand Comparison Table

Dimension Spec-Driven Typed Specifications Winner
Initial cost Low (templates) High (generators, analyzers) Spec-driven
Ongoing cost High (document maintenance) Low (compiler does the work) Typed
Requirement precision English sentences Typed method signatures Typed
Drift prevention Weak (documents diverge) Strong (types are code) Typed
Enforcement timing Post-hoc (CI quality gates) Pre-hoc (compile-time) Typed
Enforcement granularity Threshold-based (80% coverage) Per-AC (specific diagnostic) Typed
Testing guidance Excellent (15+ strategies) Minimal (chain enforcement only) Spec-driven
Language support Any language C# (primary), TypeScript (port) Spec-driven
Learning curve Illusory low (6 custom grammars) Medium-high (one language: C#) Contested (see Part IX)
Team size scaling Poor (drift accelerates) Good (compiler enforces) Typed
AI agent feedback Coarse (quality gate pass/fail) Fine (per-AC compiler diagnostic) Typed
Documentation Separate artifact (can drift) Types ARE documentation Typed
Non-functional specs Strong (perf, security, ops) Extensible (DSLs can be added) Spec-driven today; typed long-term
Compliance/audit Manual traceability Source-generated matrix Typed
Cross-language projects Natural fit Requires hybrid approach Spec-driven
IDE integration External (read docs) Native (Ctrl+Click, diagnostics) Typed
Refactoring safety Manual find-replace IDE propagation via typeof/nameof Typed
Breadth of coverage Wide (all development concerns) Deep (requirement chain only) Spec-driven for breadth; typed for depth

Score: Typed specifications win 11 dimensions. Spec-driven wins 6 dimensions. One is a tie. But not all dimensions are equally important, and the weighting depends on your context.


Choose Spec-Driven When:

1. Your team is multi-language. If your system includes Python ML pipelines, Go microservices, React frontends, and a C# backend, the typed approach only covers the C# portion. The spec-driven approach covers everything uniformly.

2. Your team is new to specification systems — but acknowledge the hidden cost. The spec-driven templates feel immediately accessible. But as Part IX explains, learning DEFINE_FEATURE(...) with acceptance_criteria: [...] IS learning a grammar — just one without a compiler to teach you when you get it wrong. The perceived simplicity is real, but the learning curve advantage is smaller than it appears. Both approaches require learning a language; the question is whether that language has a compiler.

3. You need testing strategy guidance. If your team doesn't know about mutation testing, chaos engineering, or property-based testing, the Testing-as-Code specification is an excellent teaching document. The typed approach doesn't teach strategies — it enforces coverage. But note: over time, typed DSLs can be built for any testing domain (chaos, load, fuzz — see Part V). The spec-driven advantage here is transitional, not permanent.

4. Your project is short-lived. For a three-month project, the typed approach's upfront investment doesn't pay off. The spec-driven templates provide immediate structure at minimal cost.

5. Your product owners need to read specifications directly — but reconsider. The conventional wisdom says "POs can't read C#." Part VII challenges this: the Requirements DSL uses method names that read as English sentences (CustomerCanCancelPlacedOrConfirmedOrder), XML comments that ARE the plain-English spec, and a hierarchy that's visible in the type names. A PO learning to read DEFINE_FEATURE(order_cancellation) is learning a grammar. A PO learning to read record OrderCancellationFeature is also learning a grammar. The C# grammar happens to come with precision guarantees the text grammar lacks. Still — if your POs refuse to look at code, spec-driven documents are the pragmatic choice.

6. You need operational specifications — for now. Performance targets, SLA thresholds, deployment strategies, monitoring configurations — these are non-functional specifications that the typed approach doesn't cover today. The spec-driven approach has explicit sections for all of these. But as Part V explains, every one of these domains is a DSL waiting to be written. [PerformanceBudget(P95 = "200ms")] is not science fiction — it's the same pattern as [AggregateRoot].

Choose Typed Specifications When:

1. Your team writes C# (or TypeScript, or another strongly-typed language). The typed approach requires a language with generics, attributes/decorators, and a compilation step. C# is the primary implementation. TypeScript is a viable alternative. Rust could work with proc macros. Python and JavaScript cannot provide the compile-time enforcement.

2. Correctness is more important than breadth. If your team needs to guarantee that every acceptance criterion is implemented and tested — for compliance, for safety-critical systems, for contractual obligations — the typed approach provides structural guarantees that documents cannot.

3. Your project is long-lived. Over 12+ months, the typed approach's zero-drift guarantee and low maintenance cost compound. The spec-driven approach's documents will drift unless actively maintained — and "actively maintaining specifications" is a commitment few teams sustain.

4. Your team is 10+ developers. Large teams experience document drift faster because more people means more opportunities for divergence. The typed approach scales with team size because the compiler enforces requirements regardless of who writes the code.

5. You use AI agents extensively. AI agents produce better output with typed specifications because the feedback loop is tighter (seconds vs minutes), more specific (per-AC vs threshold), and more reliable (compiler vs quality gate). If AI-assisted development is central to your workflow, the typed approach amplifies the AI's effectiveness.

6. Traceability is a hard requirement. If your domain requires a traceability matrix — medical devices (IEC 62304), automotive (ISO 26262), aviation (DO-178C), financial services (SOX) — the source-generated matrix is always correct. A manually maintained traceability document is an audit risk.

7. You're tired of maintaining documents. This is the emotional argument, but it's real. If your team has a graveyard of outdated wikis, abandoned ADRs, and "we'll update the docs later" tickets in the backlog, the typed approach eliminates the document maintenance burden. The types maintain themselves.


The Hybrid Strategy

The most pragmatic approach for many teams is a hybrid: use spec-driven documents for what types can't cover, and typed specifications for what documents can't enforce.

Layer 1: Typed Specifications for the Requirement Chain

MyApp.Requirements/
├── Epics/
│   └── PlatformScalabilityEpic.cs
├── Features/
│   ├── UserRolesFeature.cs
│   ├── PasswordResetFeature.cs
│   └── OrderProcessingFeature.cs
├── Stories/
│   └── ...
└── Base/
    ├── RequirementMetadata.cs
    └── AcceptanceCriterionResult.cs

→ Roslyn analyzers enforce: every AC has a spec, impl, and test
→ Source generator produces: traceability matrix, coverage report
→ Compiler prevents: missing implementations, stale tests, orphan code

The requirement→specification→implementation→test chain is typed. The compiler enforces it. This is the core of the system — where correctness matters most and drift is most dangerous.

Layer 2: Operational DSLs for Everything Else

The conventional hybrid advice would say "use spec-driven documents for operational concerns." But that's selling typed specifications short. Everything can be a DSL. C# is DSL-friendly by design — attributes, generics, source generators, and analyzers make any domain expressible as typed code. (For a working implementation of operational DSLs — Deployment, Migration, Observability, Configuration, and Resilience — see Auto-Documentation from a Typed System, Parts IV-V.)

Consider what "operational concerns" actually are:

  • Performance targets = typed constraints on endpoints
  • Monitoring = typed alert definitions with thresholds and escalation paths
  • Deployment = typed rollout strategies with rollback conditions
  • Load testing = typed scenarios with user counts, ramp-up curves, and budgets
  • Chaos engineering = typed failure injection experiments with expected recovery
  • Security = typed scanning rules with severity levels and remediation owners

Every one of these is a "when X, then Y" rule — which is exactly what a typed DSL expresses. Here's what the operational layer looks like when fully typed:

// Performance DSL
[PerformanceBudget("OrderCancellation")]
[Endpoint("POST /api/orders/{id}/cancel")]
[ResponseTime(P95 = "200ms", P99 = "500ms")]
[Throughput(MinRPS = 100)]
[ForRequirement(typeof(OrderCancellationFeature))]
public partial class OrderCancellationPerformance { }

// Monitoring DSL
[Alert("OrderCancellationFailureRate")]
[Metric("order.cancellation.failure_rate")]
[Threshold(Warning = 0.05, Critical = 0.10)]
[Escalation(Warning = "slack:#orders", Critical = "pagerduty:commerce-oncall")]
[Runbook("When cancellation failure rate exceeds threshold, check PaymentGateway health")]
public partial class CancellationFailureAlert { }

// Chaos DSL
[ChaosExperiment("PaymentGateway_Timeout")]
[TargetService("PaymentGateway")]
[FailureMode(FailureType.Timeout, Duration = "30s")]
[ExpectedBehavior(Degradation.GracefulFallback)]
[RequiresResilience(typeof(OrderCancellationFeature),
    nameof(OrderCancellationFeature.CancellationTriggersFullRefund))]
public partial class PaymentTimeoutExperiment { }

// Load Testing DSL
[LoadTest("CancellationPeakTraffic")]
[Endpoint("POST /api/orders/{id}/cancel")]
[LoadProfile(ConcurrentUsers = 500, RampUp = "60s", Duration = "10m")]
[PerformanceBudget(P95 = "300ms", ErrorRate = 0.01)]
[VerifiesNonFunctional(typeof(OrderCancellationFeature))]
public partial class CancellationLoadTest { }

// Deployment DSL
[DeploymentStrategy("Commerce")]
[Rollout(Strategy.BlueGreen, CanaryPercentage = 10)]
[HealthCheck(Endpoint = "/health", Timeout = "5s", Retries = 3)]
[RollbackCondition(ErrorRate = 0.05, ResponseTime_P95 = "1s")]
public partial class CommerceDeployment { }

These DSLs generate:

  • Performance budgets → load test configurations + CI gate thresholds
  • Alert definitions → Prometheus/Grafana alert rules + PagerDuty integrations
  • Chaos experiments → Litmus/Gremlin experiment definitions + recovery assertions
  • Load tests → k6/Locust test scripts + pass/fail criteria
  • Deployment strategies → Kubernetes manifests + Argo Rollout configurations

The source generator ensures that:

  • Performance budgets reference real endpoints (compile error if endpoint doesn't exist)
  • Alerts reference valid metrics (compile error if metric name is wrong)
  • Chaos experiments reference real features (compile error if feature is deleted)
  • Load tests reference features with matching performance budgets
  • Deployment health checks reference real endpoints

Operational knowledge is not special. It follows the same pattern as every other DSL: attribute → source generator → generated artifact → analyzer. "When the error rate exceeds 5%, page the oncall engineer" is a typed rule, not a paragraph in a markdown document. A markdown document can be wrong and nobody will know until 3 AM. A typed alert definition is validated at compile time and generates the correct Prometheus rules.

The honest answer to "what about operational concerns?" is not "use documents" — it's "build the DSL." Yes, building DSLs takes effort. But that effort produces compiler-checked, drift-proof, generated operational artifacts. A document produces... a document. That someone might read. Someday.

Project structure (fully typed):

MyApp.sln
├── src/
│   ├── MyApp.Requirements/           ← Features, ACs
│   ���── MyApp.SharedKernel/           ← Domain types
│   ├── MyApp.Specifications/         ← Spec interfaces
│   ├── MyApp.Domain/                 ← Implementations
│   ├── MyApp.Api/                    ← Controllers
│   ├── MyApp.Operations/             ← Operational DSLs
│   │   ├── Performance/              ← [PerformanceBudget] definitions
│   │   ├── Monitoring/               ← [Alert] definitions
│   │   ├── Chaos/                    ← [ChaosExperiment] definitions
│   │   ├── LoadTests/                ← [LoadTest] definitions
│   │   └── Deployment/               ← [DeploymentStrategy] definitions
│   └── MyApp.Operations.Generators/  ← SGs for operational artifacts
├── test/
│   └── MyApp.Tests/                  ← [Verifies] tests
└── tools/
    └── MyApp.Requirements.Analyzers/ ← REQ1xx-REQ4xx

Zero markdown files. Zero drift. Everything compiled, generated, validated.

Layer 3: Bootstrap Context for AI Agents

.claude/
└── CLAUDE.md                         ← Bootstrap context (50 lines)
    Includes:
    ├── "Features are types, ACs are methods"
    ├── "Read compiler diagnostics, they tell you what to do"
    ├── "Operational DSLs: Performance, Monitoring, Chaos, LoadTest, Deployment"
    └── Links to DSL base type definitions

The CLAUDE.md is a bootstrap document — a 50-line explainer that teaches the AI the convention system. It's the only document in the project. Everything else is typed.

How the Fully Typed Approach Works in Practice

Developer adds new feature:

 1. Creates OrderCancellationFeature.cs (typed)     → Compiler enforces chain
 2. Compiler fires REQ101                           → Developer adds spec
 3. Compiler fires CS0535                           → Developer implements
 4. Compiler fires REQ301                           → Developer writes tests
 5. Build succeeds — functional chain complete

Developer adds operational concerns:

 6. Creates CancellationPerformance.cs              → SG generates load test config
 7. Creates CancellationFailureAlert.cs             → SG generates Prometheus rules
 8. Creates PaymentTimeoutExperiment.cs             → SG generates chaos experiment
 9. Build succeeds — operational chain complete

Generated artifacts:

10. k6 load test script (from [LoadTest] attributes)
11. Prometheus alert rules (from [Alert] attributes)
12. Grafana dashboard panels (from [Alert] + [Metric])
13. Litmus chaos experiment YAML (from [ChaosExperiment])
14. Kubernetes rollout config (from [DeploymentStrategy])

All validated at compile time. All drift-proof.

No documents updated. No markdown drift. No "someone forgot to update the monitoring page." The operational knowledge lives in the type system, is validated by the compiler, and generates the actual operational artifacts.

The typed system handles the structural chain (where drift is dangerous). The spec docs handle the strategic/operational layer (where breadth matters). The AI context layer bridges them.


What the Spec-Driven Approach Gets Right

Credit where credit is due. The cogeet-io framework makes several genuinely good contributions:

  1. The diagnosis is correct. "Most AI agent failures aren't model failures — they're context failures." This is the most important insight in AI-assisted development. Every team should internalize it.

  2. The Context Engineering pillar is novel. Systematizing context assembly for AI agents is real engineering. Most teams do this ad-hoc. Having a framework for it — even a document-based one — is progress.

  3. The Testing-as-Code breadth is valuable. 15+ testing strategies in one place, with metrics, practices, and language-specific patterns. This is an excellent reference document, regardless of whether you adopt the full framework.

  4. The template approach is accessible. Any team, any language, any project can start using these templates today. No tooling investment, no language lock-in, no learning curve beyond reading comprehension.

  5. The explicit quality metrics are helpful. "Mutation score > 80%", "flakiness rate < 0.01", "p95 < 2s" — concrete numbers that teams can adopt and measure against. This is better than vague goals like "high quality."


What Typed Specifications Get Right

And the typed approach's contributions:

  1. The compiler is the best enforcer. No CI pipeline, no quality gate, no code review process is as reliable as the compiler. If it compiles, the structure is correct. If it doesn't, you know exactly why.

  2. Types eliminate an entire category of drift. The requirement→spec→impl→test chain cannot diverge because it's a single structure. This is not an incremental improvement over documents — it's a categorical elimination of a failure mode.

  3. IDE navigation replaces documentation. Ctrl+Click from test to AC to feature to epic is faster, more reliable, and always current. It's documentation that cannot lie.

  4. Per-AC granularity is transformative. "This specific acceptance criterion has no test" is infinitely more actionable than "your coverage is 78%." The specificity changes developer behavior — instead of gaming thresholds, they address specific gaps.

  5. AI agents benefit disproportionately. The tight feedback loop (seconds, not minutes), specific diagnostics (per-AC, not threshold), and structural constraints (must compile, not just pass gates) make AI agents significantly more effective.

  6. C# IS the DSL. The biggest misconception is that typed specifications require "custom tooling." But C# IS the tooling. The Requirements DSL is C# — the same language, the same IDE, the same compiler, the same debugger. Product owners who learn to read abstract record UserRolesFeature are learning a domain language that happens to be compilable. That's not a cost — it's a superpower.


The Elephant in the Room: "As Code" Must Mean Compilable

Both approaches use the phrase "as Code." Specification as Code. Testing as Code. Context Engineering as Code. Requirements as Code. But they mean radically different things by "Code."

In the spec-driven framework, "as Code" means structured text that is version-controlled. The specifications live in a git repository, are diffable, are reviewable, and follow a defined structure. This is "as Code" in the same sense that Infrastructure as Code uses YAML files — the text is treated with code-like discipline, but it isn't executable.

In the typed specification approach, "as Code" means compiled, type-checked, executable artifacts that produce compiler errors when incorrect. The requirements are C# records. The ACs are abstract methods. The traceability matrix is generated at compile time. This is "as Code" in the literal sense — it is code.

The distinction matters because "as Code" carries a promise: that the specification has the same properties as code — verifiable, executable, testable, refactorable. Text files in a git repo fulfill some of these properties (verifiable via review, refactorable via find-replace) but not others (not executable, not testable, not compiler-checked).

A text file called Testing-as-Code.txt is not code. It's a text file about code. Calling it "as Code" is aspirational — it describes what the file wishes it were, not what it is. The honest name would be Testing-as-Specification-Document.txt. But that doesn't sound as compelling.

This isn't a cheap shot. It's a fundamental architectural observation: the word "Code" implies executability, and executability implies a compiler. If your specifications don't have a compiler, they're documents, not code. And documents drift. That's the thesis restated one final time.

The typed approach can use the phrase "as Code" without qualification. Requirements as Code? Yes — C# code that compiles, generates, and is enforced by Roslyn analyzers. Testing as Code? Yes — test attributes that the compiler validates against feature types. The phrase means what it says.


What We Actually See in the Wild

After analyzing both approaches theoretically, it's worth looking at what actually happens in practice.

Spec-Driven in the Wild

The cogeet-io repository has 28 stars on GitHub. It's a v1.0 release with six .txt files and a README. There are no implementation examples, no reference projects, no community-built tooling on top of it.

This doesn't mean the ideas are bad — it means the framework is a starting point, not a proven system. The specifications describe what should be built, but the building hasn't happened yet. The Context Engineering pillar has a 78-week roadmap that hasn't been executed. The auto-fix options in the Testing spec are described but not implemented.

This is the fundamental challenge of document-based specification frameworks: the document describes the system; someone still has to build the system. The gap between the spec and the implementation is the exact gap the spec was supposed to close.

Typed Specifications in the Wild

The typed specification approach exists as working code. The Requirements as Code series describes a running system with:

  • Feature records with abstract AC methods
  • Specification interfaces with [ForRequirement] attributes
  • Domain implementations that satisfy the interfaces
  • Tests with [Verifies] attributes
  • Roslyn analyzers that enforce the chain
  • Source-generated traceability matrices

The system is self-hosting: the Content Management Framework uses its own Requirements DSL to track its own requirements. That's the strongest validation possible — the tool works well enough that its author uses it to build itself.

This doesn't mean the typed approach is mature or widely adopted — it's still a single-author project. But it's working code, not working documents. The gap between "described" and "built" is closed.


The Final Verdict

For teams at Level 0-1: the spec-driven approach gets you to Level 2 quickly. It's accessible and provides immediate structure. Use the PRD template, adopt the testing strategies, implement the quality gates. This alone is a massive improvement.

For teams at Level 2 that write C# or TypeScript: stop maintaining documents and start typing your specifications. Adopt typed requirements for the core chain. The upfront investment is real, but the payoff is transformative — and it compounds as you add more DSLs.

For teams at Level 3: extend the type system to operational concerns. Performance budgets, monitoring alerts, chaos experiments, deployment strategies — everything is a DSL. Don't settle for "documents for non-functional specs." Build the DSLs. C# is DSL-friendly. The M3 meta-metamodel is extensible. Every new DSL reuses the same infrastructure and brings another domain under compiler enforcement.

For everyone: internalize the core insight. AI agents need structured context. Requirements must be explicit. Testing must be systematic. The gap between "what we decided" and "what we built" is the root cause of most failures. Documents close this gap partially. Types close it completely. Choose types.


The Progression

There's a natural progression that many teams follow:

Stage 1: No specifications
  → Everything is in Jira tickets and developer memory
  → AI agents get ad-hoc context
  → Requirements drift immediately

Stage 2: Spec-driven documents
  → PRD templates, testing strategies, coding practices
  → AI agents get structured context
  → Requirements drift slowly (months)

Stage 3: Typed specifications
  → Features as types, ACs as methods, compiler enforcement
  → AI agents get type-constrained context
  → Requirements cannot drift (structural guarantee)

Stage 4: Full DSL Chain
  → Operational DSLs: Performance, Chaos, Monitoring, Deployment
  → Everything compiled, generated, validated
  → Zero documents. Zero drift. Maximum enforcement.

Most teams are at Stage 1. The spec-driven framework helps them reach Stage 2. The typed specification approach helps them reach Stage 3 and 4.

The important thing is to move forward — from wherever you are. But know the destination: everything is a DSL. Everything can be compiled. Everything can be enforced by the type system. Documents are a way station, not the destination.


One Last Thing

The spec-driven framework claims "10x improvement in AI task success rates." That's a marketing number. But here's a number that isn't marketing:

In a typed specification system, the compiler catches 100% of structural defects in the requirement chain. Not 80%. Not 95%. 100%. Every missing spec. Every missing implementation. Every missing test. Every stale reference. Every orphan test. Every hierarchy violation.

No document, no quality gate, no CI pipeline, no AI agent can make that claim. Only the compiler can. And that's not a metric — it's a mathematical guarantee.

Documents describe what should be true. Types enforce what must be true. In the long run, "must be" always wins.


The Specification Maturity Model

After analyzing both approaches across nine dimensions, a pattern emerges: they're not just different tools — they're different levels of maturity. Most teams progress through a predictable sequence of specification practices, each level building on the previous.

The Five Levels

╔══════════════════════════════════════════════════════════════════════════════╗
║                    SPECIFICATION MATURITY MODEL                             ║
╠══════════════════════════════════════════════════════════════════════════════╣
║                                                                            ║
║  Level 4 ─── Full Typed DSL Chain ─────────────────────────────── ●        ║
║              Req → Spec → Impl → Test → Generated Code                     ║
║              All DSLs (DDD, Admin, Workflow, Content, Pages)               ║
║              M3 meta-metamodel, self-describing system                     ║
║              Generated: traceability, UI, API, EF Core, CQRS              ║
║              │                                                             ║
║  Level 3 ─── Typed Requirements ───────────────────────────── ●            ║
║              Features as types, ACs as methods                             ║
║              [ForRequirement] + [Verifies] chain                           ║
║              Roslyn analyzers (REQ1xx-REQ4xx)                              ║
║              Source-generated traceability matrix                           ║
║              │                                                             ║
║  Level 2 ─── Structured Documents ─────────────────────── ●                ║
║              PRD templates, testing strategies                              ║
║              Quality gates in CI/CD                                         ║
║              Context engineering for AI agents                              ║
║              (Spec-driven / cogeet-io sits here)                           ║
║              │                                                             ║
║  Level 1 ─── Ad-Hoc Documents ────────────────────── ●                     ║
║              Jira tickets with AC bullet points                            ║
║              Wiki pages (updated sporadically)                             ║
║              README files (often stale)                                    ║
║              ADRs (written once, rarely revisited)                         ║
║              │                                                             ║
║  Level 0 ─── No Specifications ─────────────────── ●                       ║
║              Requirements in developer memory                              ║
║              "We'll figure it out as we go"                                ║
║              AI agents get raw codebase as context                         ║
║                                                                            ║
╚══════════════════════════════════════════════════════════════════════════════╝

Level 0: No Specifications

Characteristics: Requirements live in conversations, Slack threads, and developer memory. There is no artifact that says "this is what we're building." An AI agent given this codebase gets raw source files with no context about intent.

Team profile: Early-stage startups, hackathon projects, solo developers who hold the entire system in their head. Also: teams that once had documentation but abandoned it.

What works: Speed. No overhead, no ceremony, no maintenance cost. The developer writes code and ships it.

What breaks: Everything, eventually. New team members have no onramp. Features are duplicated because nobody knows what already exists. Bug investigations start with "ask the person who wrote it" — and when that person leaves, the knowledge evaporates.

AI agent effectiveness: Low. The AI gets source code and nothing else. It can read the code, but it doesn't know why it exists, what it should do, or what the acceptance criteria are. The AI generates code that compiles but may not match the team's intent.

Level 1: Ad-Hoc Documents

Characteristics: Some documentation exists. Jira tickets have acceptance criteria (as bullet points). There's a wiki with architecture diagrams (from 6 months ago). ADRs exist for the 5 biggest decisions. README files in some repositories.

Team profile: Most professional software teams. Documentation is valued in principle but deprioritized in practice. "We should write docs" is a sentiment; "we wrote docs" is rare.

What works: The big-picture information is available somewhere. A new developer can read the wiki to understand the architecture (if it hasn't drifted). Jira tickets provide some AC traceability (if you trace manually).

What breaks: Drift. The wiki's architecture diagram shows the system from 6 months ago — it doesn't include the two services added since. The ADR recommends a pattern that was abandoned. The Jira AC says "user can log in" but the implementation uses SSO, which isn't mentioned in the ticket.

AI agent effectiveness: Moderate. If the AI is given the relevant Jira ticket, it has context. If it's given the wrong ticket, or a stale wiki page, it generates code based on outdated assumptions. The context assembly is manual — someone pastes the right information into the prompt.

Level 2: Structured Documents (Spec-Driven)

Characteristics: Comprehensive specification documents with defined structure. PRD templates with sections for features, ACs, non-functional requirements. Testing strategies with metrics and thresholds. Quality gates in CI/CD pipelines. Context engineering for AI agents.

Team profile: Teams that have adopted the spec-driven framework or equivalent structured approaches. Teams with compliance requirements. Teams that take process seriously.

What works: Context assembly is systematic, not ad-hoc. AI agents get structured specifications. Quality gates catch coarse-grained issues (coverage below threshold, tests failing). The PRD serves as a comprehensive reference for what the system should do.

What breaks: Document-code drift. The specs are comprehensive on Day 1 and progressively stale thereafter. The quality gates check proxies (coverage percentage) not targets (specific AC coverage). The context engineering infrastructure requires its own maintenance.

AI agent effectiveness: Good. Structured context means the AI gets relevant, organized information. But the quality of the output depends on the freshness of the documents. Stale context produces stale code.

Upgrade path to Level 3: The team realizes they're spending 10-15% of their time maintaining specification documents. They want the specifications to self-maintain. They adopt typed requirements for the core chain (features → specs → impls → tests). The documents become redundant — not because someone deletes them, but because nobody reads them anymore. The types are the source of truth.

Level 3: Typed Requirements

Characteristics: Features are C# types. ACs are abstract methods. Specifications are interfaces linked via [ForRequirement]. Tests are linked via [Verifies]. Roslyn analyzers enforce the chain at compile time. Source-generated traceability matrix.

Team profile: C# or TypeScript teams. 10+ developers. Long-lived products. Compliance-sensitive domains. Teams that adopted spec-driven documents and graduated to typed enforcement.

What works: The requirement chain cannot drift. Missing specs, missing tests, and stale references are compile errors. Refactoring is safe (IDE rename propagates everywhere). Traceability is always current. AI agents get type-constrained context with per-AC compiler feedback.

What breaks: Non-functional requirements are not yet DSLs. Operational specifications (deployment, monitoring, performance targets) could be typed but the DSLs haven't been built yet. The team recognizes the gap and starts planning Level 4.

AI agent effectiveness: Excellent. The compiler guides the AI step by step: add AC → compiler says "add spec" → compiler says "implement" → compiler says "test." The feedback is specific, immediate, and ungameable.

Upgrade path to Level 4: The team realizes two things: (1) they're writing repetitive boilerplate — CQRS handlers, EF Core configurations, admin UI, workflow state machines; and (2) their operational concerns (performance, monitoring, deployment) are still in markdown files that nobody updates. They want the types to generate more code AND to absorb operational domains. They adopt additional DSLs — functional (DDD, Admin, Workflow, Content, Pages) and operational (Performance, Monitoring, Chaos, Deployment) — that produce complete implementations from attribute annotations. The last markdown file gets deleted.

Level 4: Full Typed DSL Chain

Characteristics: Multiple DSLs — each as a set of attributes processed by source generators. The DDD DSL generates entities, builders, CQRS handlers, EF Core configs, repositories. The Admin DSL generates Blazor UI. The Workflow DSL generates state machines. The Content DSL generates CMS components. The Pages DSL generates widget infrastructure. The Requirements DSL ties it all together. The M3 meta-metamodel makes the system self-describing.

Team profile: Teams building frameworks, platforms, or products where the domain model IS the system. Teams with Roslyn expertise. Teams where the initial investment (building generators) pays off across dozens of entities and hundreds of features.

What works: From ~20 lines of attributed C#, the generators produce ~200+ lines of correct, consistent, type-safe code. The entire system — from requirement to generated UI — is a single compiled artifact. Adding a new entity means adding attributes; the generators handle the rest. Adding a new DSL concept means adding a [MetaConcept] attribute; the M3 registry auto-discovers it.

What breaks: Less than you'd think. Generators are complex software — but they're also AI-assisted software. Claude Code, Copilot, and similar tools excel at writing Roslyn source generators because the pattern is repetitive and well-documented. Generator correctness is unit-tested: feed a test attribute set to the generator, assert the output matches expected code. A generator bug still affects the whole system, but the blast radius is caught by generator unit tests before it reaches production code. The learning curve exists — but new team members consume the DSL (write [AggregateRoot]), they don't need to understand the generator internals. Only the framework team touches generators, and they have AI to help.

AI agent effectiveness: Maximum. The AI writes attributed C# — 20 lines. The generators produce 200+ lines. The compiler validates the result. The traceability matrix confirms coverage. The AI's task is reduced from "write everything" to "write the specification" — and the specification IS the input to the generator. This is the theoretical optimum: the AI writes the minimum necessary input, and the toolchain produces the maximum correct output.

The Progression as an Investment Curve

Output         Level 4: Full DSL chain
quality        ┌──────────────────────────────●
and            │                            / |
enforcement    │                           /  |
               │            Level 3: Typed /   |
               │            requirements  ●    |
               │                        / |    |
               │                       /  |    |
               │        Level 2:      /   |    |
               │        Spec-driven  ●    |    |
               │                   / |    |    |
               │                  /  |    |    |
               │   Level 1:      /   |    |    |
               │   Ad-hoc docs  ●    |    |    |
               │              / |    |    |    |
               │  Level 0:   /  |    |    |    |
               │  Nothing   ●   |    |    |    |
               │            |   |    |    |    |
               └────────────┼───┼────┼────┼────┼──────→ Investment
                          Zero Low  Med  High  Very High

Each level requires more investment and delivers more enforcement. The key insight: there is no shortcut from Level 0 to Level 4. Teams that try to jump directly to typed DSL chains without understanding structured requirements (Level 2) and typed requirements (Level 3) fail — they build tooling they don't know how to use.

The spec-driven framework is an excellent vehicle from Level 0/1 to Level 2. The typed specification approach is the vehicle from Level 2 to Level 3 and Level 4. They're not competitors — they're stages in a maturation journey.

Mapping Levels to Team Decisions

Decision Factor Level 0-1 Level 2 Level 3 Level 4
When to adopt Default starting point When AI agents join the workflow When correctness becomes critical When code generation ROI is clear
Team size 1-3 developers 3-10 developers 10-30 developers 10+ with framework expertise
Project lifespan Weeks-months Months-years Years Years-decades
Primary language Any Any C# / TypeScript C# (with Roslyn)
Compliance needs None Low-medium Medium-high High
AI agent usage Minimal Moderate Heavy Heavy
Maintenance budget Zero 10-15% 3-5% 5-8% (generators)
Biggest risk Knowledge loss Document drift Language lock-in Generator complexity
Biggest benefit Speed Structure Enforcement Automation

The One-Level-Up Rule

A practical recommendation: move one level up from where you are. Don't try to jump two levels. Each level builds skills and understanding that the next level requires:

  • At Level 0: Adopt structured documents (Level 2). Skip Level 1 — there's no point in ad-hoc docs if you can have structured docs.
  • At Level 1: Adopt structured documents (Level 2). The spec-driven framework is a direct path.
  • At Level 2: Adopt typed requirements (Level 3) for the core chain. Start planning operational DSLs.
  • At Level 3: Build operational DSLs (Performance, Monitoring, Chaos, Deployment) and code generation DSLs (DDD, Admin, Workflow). Every concern that's currently a document becomes a DSL candidate.

The important thing — the thing both approaches agree on — is to move forward. Any level above 0 is better than Level 0. Any level above 1 is better than Level 1. The specific level matters less than the direction of travel.


The DSL Ecosystem: What Gets Built Next

The typed specification approach is not a single tool. It's an architecture — a framework for turning any domain concern into a compiler-enforced DSL. The requirement chain was the first DSL because it delivers the highest value: the link between "what we decided" and "what we built." But it's not the last.

Here's the roadmap of DSLs that extend the typed specification approach into every domain the spec-driven framework covers with text.

1. Requirements DSL (Exists)

Features, ACs, Specs, Tests. The chain from Part I.

[Feature("Order Cancellation")]
public abstract record OrderCancellationFeature : FeatureBase<OrderProcessingEpic>
{
    public abstract AcceptanceCriterionResult CustomerCanCancelPlacedOrConfirmedOrder();
    public abstract AcceptanceCriterionResult CancellationTriggersFullRefund();
}

Status: built, running, self-hosting. The foundation for everything that follows.

2. DDD DSL (Exists)

Entities, Value Objects, Aggregates, CQRS. Generates builders, EF Core configurations, repository interfaces, command/query handlers.

[AggregateRoot]
[HasBuilder]
[GenerateEfCoreConfig]
public partial class Order : AggregateRoot<OrderId>
{
    [Required] public CustomerId CustomerId { get; init; }
    [HasMany] public IReadOnlyList<OrderLine> Lines { get; init; }
    [Invariant] public Money Total => Lines.Sum(l => l.SubTotal);
}

Status: built, generating ~200 lines per 20 lines of input. Proven across multiple domains.

3. Operations DSL (Next)

Performance budgets, monitoring, alerts. The operational layer that most teams leave in markdown files. (See Auto-Documentation from a Typed System, Part IV for the five Ops sub-DSLs that generate Prometheus YAML, Grafana JSON, and Kubernetes manifests from typed attributes.)

[PerformanceBudget("OrderCancellation")]
[Endpoint("POST /api/orders/{id}/cancel")]
[ResponseTime(P95 = "200ms", P99 = "500ms")]
[Throughput(MinRPS = 100)]
[ForRequirement(typeof(OrderCancellationFeature))]
public partial class OrderCancellationPerformance { }

[Alert("HighCancellationRate")]
[Metric("order.cancellation.rate")]
[Threshold(Warning = 0.15, Critical = 0.30)]
[Escalation(Warning = "slack:#orders", Critical = "pagerduty:commerce")]
[Window(Duration = "5m", Evaluation = "1m")]
public partial class CancellationRateAlert { }

[HealthCheck("OrderService")]
[Endpoint("/health")]
[Dependencies(typeof(IPaymentGateway), typeof(IInventoryService))]
[Timeout("5s")]
[FailureThreshold(3)]
public partial class OrderServiceHealthCheck { }

Generates: Prometheus alert rules, Grafana dashboard panels, k6 load test scripts, Kubernetes health check probes. All validated at compile time — an alert referencing a deleted metric is a compile error, not a 3 AM surprise.

Status: designed, implementation planned. The attribute shapes are stable; the generators are next.

4. Chaos DSL (Next)

Failure injection, resilience verification. What Netflix Chaos Monkey does with configuration files, expressed as typed experiments.

[ChaosExperiment("PaymentGateway_Timeout")]
[TargetService(typeof(IPaymentGateway))]
[FailureMode(FailureType.Timeout, Duration = "30s")]
[SteadyStateHypothesis(
    Metric = "order.cancellation.success_rate",
    Operator = ComparisonOperator.GreaterThan,
    Value = 0.95)]
[ExpectedBehavior(Degradation.CircuitBreakerOpen, RecoveryTime = "60s")]
[Rollback(Automatic = true, Condition = "error_rate > 0.5")]
[RequiresResilience(typeof(OrderCancellationFeature),
    nameof(OrderCancellationFeature.CancellationTriggersFullRefund))]
public partial class PaymentTimeoutExperiment { }

[ChaosExperiment("Database_PartialFailure")]
[TargetService(typeof(IOrderRepository))]
[FailureMode(FailureType.PartialFailure, AffectedPercentage = 30)]
[SteadyStateHypothesis(
    Metric = "order.read.success_rate",
    Operator = ComparisonOperator.GreaterThan,
    Value = 0.70)]
[ExpectedBehavior(Degradation.CacheServedStaleData)]
public partial class DatabasePartialFailureExperiment { }

Generates: Litmus ChaosEngine YAML, Gremlin attack definitions, steady-state probe configurations. The generator validates that every [TargetService] references a real service interface and every [RequiresResilience] references a real feature AC.

Status: designed. Blocked on the Operations DSL (chaos experiments need the alert definitions from the Operations DSL to define steady-state hypotheses).

5. Compliance DSL (Future)

This is the strongest argument for typed specifications: even compliance — the most document-heavy, audit-driven domain in software engineering — can be typed.

Consider IEC 62304, the international standard for medical device software lifecycle processes. Section 5.5.3 requires "Software unit verification." Auditors demand a traceability matrix showing that every requirement is implemented, tested, and verified. Today, this matrix is a manually maintained spreadsheet. It's always wrong. Audit preparation takes weeks of updating it.

With a Compliance DSL:

[RegulatoryRequirement(Standard.IEC62304, Section = "5.5.3")]
[RiskClass(RiskClass.ClassC)]  // Highest risk — full traceability required
[RegulatoryTitle("Software Unit Verification")]
[RegulatoryText("Each software unit of the software item shall be verified " +
    "and the results of the verification shall be recorded.")]
public abstract record SoftwareUnitVerification
    : RegulatoryRequirementBase<IEC62304Standard>
{
    [RegulatoryAC("5.5.3.a")]
    public abstract ComplianceResult EachUnitHasVerificationRecord();

    [RegulatoryAC("5.5.3.b")]
    public abstract ComplianceResult VerificationUsesAcceptanceCriteria();

    [RegulatoryAC("5.5.3.c")]
    public abstract ComplianceResult AnomaliesAreDocumented();
}

// Link a feature to a regulatory requirement
[Feature("Insulin Dose Calculation")]
[Subject(typeof(SoftwareUnitVerification))]  // This feature must satisfy 5.5.3
[RiskLevel(SoftwareRiskLevel.Critical)]
public abstract record InsulinDoseCalculationFeature
    : FeatureBase<PatientSafetyEpic>
{
    [AcceptanceCriterion]
    [HazardMitigation(Hazard.Overdose, MitigationType.InputValidation)]
    public abstract AcceptanceCriterionResult DoseCannotExceedMaximumForPatientWeight();

    [AcceptanceCriterion]
    [HazardMitigation(Hazard.Underdose, MitigationType.AlgorithmVerification)]
    public abstract AcceptanceCriterionResult DoseAccountsForInsulinSensitivityFactor();

    [AcceptanceCriterion]
    [HazardMitigation(Hazard.Overdose, MitigationType.DoubleCheck)]
    public abstract AcceptanceCriterionResult CalculationRequiresNurseConfirmation();
}

The source generator produces — at compile time — the compliance matrix that auditors need:

// Generated: IEC62304_Traceability_Matrix.csv
Standard,Section,Requirement,Feature,AC,Spec,Implementation,Tests,Risk,Hazards,Status
IEC62304,5.5.3.a,EachUnitHasVerificationRecord,InsulinDoseCalculation,DoseCannotExceedMaximumForPatientWeight,IDoseValidationSpec.ValidateMaxDose,DoseCalculator.ValidateMaxDose,3 tests (all pass),Critical,Overdose,COMPLIANT
IEC62304,5.5.3.a,EachUnitHasVerificationRecord,InsulinDoseCalculation,DoseAccountsForInsulinSensitivityFactor,IDoseCalculationSpec.CalculateWithSensitivity,DoseCalculator.CalculateWithSensitivity,2 tests (all pass),Critical,Underdose,COMPLIANT
IEC62304,5.5.3.b,VerificationUsesAcceptanceCriteria,InsulinDoseCalculation,CalculationRequiresNurseConfirmation,IConfirmationSpec.RequireConfirmation,ConfirmationService.RequireConfirmation,4 tests (all pass),Critical,Overdose,COMPLIANT
IEC62304,5.5.3.c,AnomaliesAreDocumented,-,-,-,-,-,-,MANUAL_REVIEW_REQUIRED

The analyzer validates:

error COMPLY100: InsulinDoseCalculationFeature is [Subject(typeof(SoftwareUnitVerification))]
                 (IEC 62304, Section 5.5.3) but AC 'DoseCannotExceedMaximumForPatientWeight'
                 has no [Verifies] test. Regulatory compliance requires 100% test coverage
                 for Class C software.

warning COMPLY101: SoftwareUnitVerification section 5.5.3.c
                   'AnomaliesAreDocumented' has no automated verification.
                   Manual review required — add to audit checklist.

info COMPLY102: InsulinDoseCalculationFeature regulatory traceability:
                3/3 ACs implemented, 3/3 ACs tested, 2/3 hazards mitigated.
                Compliance matrix generated: IEC62304_Traceability_Matrix.csv

This works for any regulatory standard:

// Automotive: ISO 26262 (functional safety)
[RegulatoryRequirement(Standard.ISO26262, Section = "Part6.9")]
[ASIL(ASIL.D)]  // Highest automotive safety integrity level
public abstract record SoftwareUnitTestingAutomotive
    : RegulatoryRequirementBase<ISO26262Standard> { }

// Aviation: DO-178C (software considerations in airborne systems)
[RegulatoryRequirement(Standard.DO178C, Section = "6.3.4")]
[DesignAssuranceLevel(DAL.A)]  // Catastrophic failure condition
public abstract record LowLevelTestingAviation
    : RegulatoryRequirementBase<DO178CStandard> { }

// Financial: SOX (Sarbanes-Oxley)
[RegulatoryRequirement(Standard.SOX, Section = "Section 404")]
[ControlObjective("Financial reporting accuracy")]
public abstract record InternalControlsFinancial
    : RegulatoryRequirementBase<SOXStandard> { }

The Compliance DSL doesn't replace auditors. It replaces the weeks of manual traceability matrix preparation. The auditor still reviews — but they review a matrix that is generated from the type system, not manually assembled from Jira tickets and test reports. The matrix is always current because it's generated from the same codebase the auditor is evaluating.

This is the ultimate argument for typed specifications: even compliance — the domain that most people assume MUST be documents — can be typed. And when it's typed, the compliance matrix cannot drift from the code. The audit preparation drops from weeks to minutes. The risk of non-compliance drops from "human forgot to update the spreadsheet" to "compile error."

6. Documentation DSL (Future)

API docs, architecture diagrams, generated from types. The type system IS the documentation — the DSL makes it publishable. (This DSL is fully designed in Auto-Documentation from a Typed System, including the Document<T> generic pattern and its recursive self-documentation proof via Document<Document<>>.)

[ApiDocumentation(typeof(OrdersController))]
[Audience(Audience.ExternalDevelopers)]
[IncludeExamples(Source = typeof(OrderApiExamples))]
[Architecture(DiagramType.ComponentDiagram, Scope = "Commerce")]
public partial class OrderApiDocs { }

Generates: OpenAPI specs (from controller attributes + feature links), architecture diagrams (from [Layer] and dependency analysis), example requests/responses (from typed example classes). The documentation cannot drift from the API because it IS the API.

7. Onboarding DSL (Future)

Team capabilities, skill requirements per feature, learning paths. The most speculative DSL — but even team topology can be typed.

[TeamCapability("Commerce")]
[RequiresSkill(Skill.DomainDrivenDesign, Level.Intermediate)]
[RequiresSkill(Skill.PaymentIntegration, Level.Advanced)]
[OwnsFeatures(typeof(OrderProcessingFeature), typeof(OrderCancellationFeature))]
[OnboardingPath("commerce-team-onboarding")]
public partial class CommerceTeamDefinition { }

Generates: skill gap analysis when a new feature is assigned to a team, onboarding checklists per team, knowledge dependency graphs showing which skills are single-threaded (one person knows it).

The Ecosystem Trajectory

DSL Ecosystem Growth:

Today (3 DSLs):
  Requirements ──→ DDD ──→ Content
  Coverage: Functional requirements + domain model + CMS

Next (5 DSLs):
  Requirements ──→ DDD ──→ Content
       │
       ├──→ Operations (perf, monitoring, alerts)
       └──→ Chaos (resilience, failure injection)
  Coverage: + operational concerns + resilience

Future (7+ DSLs):
  Requirements ──→ DDD ──→ Content
       │
       ├──→ Operations ──→ Chaos
       ├──→ Compliance (IEC 62304, ISO 26262, SOX)
       ├──→ Documentation (API docs, architecture)
       └──→ Onboarding (team topology, skills)
  Coverage: Everything. All compiled. All enforced.

Each DSL:
  - Is an attribute library + source generator + analyzer
  - Self-registers in the M3 meta-metamodel
  - Cross-references other DSLs via typeof()
  - Produces compiler diagnostics when constraints are violated
  - Generates artifacts (configs, reports, docs, test scripts)
  - Shares the same IDE integration (squiggles, code fixes, navigation)

This is not a roadmap of features to build. It's a roadmap of domains to bring under compiler enforcement. Each domain that today lives in markdown files, wiki pages, Confluence spaces, and Jira tickets becomes a DSL. Each DSL makes one more category of drift structurally impossible.

The spec-driven approach covers all these domains today — with text. The typed approach covers them progressively — with types. The spec-driven approach's breadth is static: 15 concerns described in documents on Day 1, still 15 concerns described in documents on Day 1,000. The typed approach's breadth is growing: 3 DSLs on Day 1, 5 DSLs on Day 100, 7+ DSLs on Day 1,000. And every DSL that gets built converts a "described" concern into an "enforced" concern.

The question is not "which approach has more breadth today?" The question is "which approach has more enforcement tomorrow?" Documents don't compound. DSLs do.

The Cost of NOT Building DSLs

A common objection: "Building DSLs is expensive. We can't afford it." But this objection ignores the cost of the alternative. Not building DSLs means paying the document maintenance tax forever.

Consider a team with 8 operational concerns documented in markdown files. Each document requires:

  • Initial writing: 4-8 hours
  • Quarterly review: 2-4 hours
  • Update when implementation changes: 1-2 hours per change (happens ~monthly)
  • Audit preparation (if regulated): 8-16 hours per audit cycle

Annual cost per document: ~40-80 hours of engineer time. Annual cost for 8 documents: ~320-640 hours = 8-16 engineer-weeks per year.

Now consider a DSL for the same concern:

  • Initial build (attribute library + generator + analyzer): 40-80 hours
  • Ongoing maintenance: ~8-16 hours per year (fix generator bugs, add new attributes)
  • Audit preparation: 0 hours (matrix is generated)

The DSL pays for itself in the first year. By Year 2, the DSL approach costs a fraction of the document approach. By Year 5, the cumulative savings are measured in engineer-months.

Cumulative cost over 5 years (per concern):

Hours │
      │                          Documents (linear)
600   │                        /
      │                      /
500   │                    /
      │                  /
400   │                /
      │              /
300   │            /  DSL (front-loaded, then flat)
      │          / ─────────────────────────────
200   │        //
      │      //
100   │    //
      │  //
  0   │─/─────────────────────────────────────→ Years
      0    1    2    3    4    5

Break-even: ~Month 12
5-year savings: ~300 hours per concern
8 concerns: ~2,400 hours saved = 60 engineer-weeks

The DSL approach is more expensive on Day 1 and cheaper on every subsequent day. For long-lived projects — which is the primary use case for typed specifications — the economics are not even close. The "we can't afford DSLs" objection is backwards: you can't afford NOT to build DSLs. Every quarter spent maintaining documents instead of building the DSL that replaces them is a quarter of engineer time that produces zero compound value.

Documents are a recurring cost. DSLs are an investment. The distinction matters.

Why C# Is the Right Language for DSLs

A natural question: why C#? Why not build DSLs in a purpose-built meta-language?

The answer is pragmatic: C# already has every feature a DSL framework needs.

  • Attributes = DSL annotations. [Feature], [AggregateRoot], [PerformanceBudget] — every DSL concept is an attribute.
  • Generics = typed relationships. FeatureBase<TParent> — DSL hierarchies are generic constraints.
  • Abstract members = required contracts. abstract AcceptanceCriterionResult MyAC() — the compiler enforces implementation.
  • Source generators = code production. Roslyn source generators produce arbitrary C# from attribute metadata.
  • Analyzers = validation rules. Roslyn analyzers produce diagnostics with severity, code fixes, and IDE integration.
  • Records = immutable data. record OrderCancellationFeature — DSL definitions are naturally immutable.
  • Nameof = refactor-safe references. nameof(Feature.AC) — references survive renames.
  • Typeof = type-safe links. typeof(Feature) — links are checked by the compiler.
  • Partial classes = generated + handwritten. partial class OrderService — the generator adds code without touching the developer's code.
  • XML comments = embedded documentation. /// <summary>Customer can cancel...</summary> — the spec IS the doc comment.

No purpose-built DSL language offers this combination. A custom language requires:

  • A custom parser (C# has Roslyn)
  • A custom IDE integration (C# has VS/Rider)
  • A custom debugging experience (C# has the .NET debugger)
  • A custom package manager (C# has NuGet)
  • A custom build system (C# has MSBuild)
  • A custom CI integration (C# has dotnet build)

Every custom language reinvents what C# already provides. And here's the key insight from Part IX: learning a PRD template syntax IS learning a grammar. The cognitive cost of learning DEFINE_FEATURE(...) is comparable to learning [Feature("...")]. But the C# grammar comes with a compiler, an IDE, a debugger, and a 20-year ecosystem. The PRD grammar comes with a text file.

The spec-driven approach says "learn our custom grammar." The typed approach says "use the grammar you already know." For C# teams, this isn't a tradeoff — it's a clear win. The DSL is the language. The language is the DSL. Everything is C#.


Final Words: The Compiler Always Wins

Documents describe possibility. Types enforce necessity.

This distinction runs deeper than software. In every domain where humans have moved from description to enforcement, the enforced version wins — not immediately, not easily, but inevitably.

Accounting had narrative ledgers for centuries. Merchants described transactions in prose. Then double-entry bookkeeping arrived: every debit must have a matching credit. The books must balance. This is a type system for money — a structural constraint that makes certain errors impossible. The narrative ledger described what happened. The double-entry system enforces consistency. No serious organization uses narrative ledgers today.

Manufacturing had quality descriptions: "make the part to specification." Workers read the spec and exercised judgment. Then statistical process control arrived: measure every output, track variance, halt the line when measurements exceed control limits. The spec described what the part should be. SPC enforces that every part IS what it should be. Manufacturing quality improved by orders of magnitude.

Aviation had pilot judgment: experienced pilots flew by feel. Then checklists arrived — not as suggestions but as mandatory procedures. The checklist is a type system for flight operations: every item must be checked, in order, before the next phase. Skipping an item is a violation, not a choice. Aviation became the safest mode of transportation not because pilots became more skilled, but because checklists made certain omissions structurally impossible.

Medicine had clinical judgment: doctors remembered drug interactions, dosage ranges, and contraindications. Then clinical decision support systems arrived: the system flags dangerous prescriptions before they're administered. The doctor's knowledge described what should happen. The system enforces what must not happen. Medical errors dropped wherever these systems were adopted.

Law had customary practices: communities understood norms through oral tradition and shared memory. Then codified law arrived: written statutes that define exactly what is permitted, prohibited, and required. The customary practice described how things worked. The statute enforces how things must work. Modern legal systems are built on codification, not custom — because enforcement scales while memory doesn't.

Engineering had rules of thumb: builders estimated loads based on experience. Then structural analysis arrived: mathematical models that prove a structure can bear its intended load before construction begins. The rule of thumb described what usually worked. The mathematical proof enforces what must work. No bridge is built today without structural analysis — because "it usually works" is not an acceptable answer when lives depend on the structure.

The pattern is universal: description precedes enforcement, and enforcement succeeds description. Not because enforcement is easier — it's harder. Not because enforcement is cheaper — it's more expensive upfront. But because enforcement compounds. Every structural constraint that prevents an error prevents that error forever, for every person, under every circumstance. A described constraint depends on humans reading, remembering, and following it. An enforced constraint depends on nothing but the system itself.

Software specifications are no different. The spec-driven approach describes what should be true: "every feature must have tests," "coverage must exceed 80%," "APIs must follow RESTful conventions." These descriptions are valuable. They represent knowledge, experience, and best practices.

But descriptions drift. They age. They're ignored under deadline pressure. They're misinterpreted by new team members. They're contradicted by the code they describe. And when description and code disagree, the code wins — always — because the code is what runs.

The typed specification approach enforces what must be true. Not as a policy. Not as a guideline. Not as a quality gate that fires after the fact. As a structural property of the system itself. The compiler is the enforcement mechanism. The type system is the specification language. Everything else — the documents, the wikis, the PRD templates, the quality gates — is commentary.

The Spec-Driven Approach's Gift

Let's end with generosity. The spec-driven approach gave us something important: the framing. Before frameworks like cogeet-io, most teams didn't think about AI agent context as an engineering problem. They didn't think about specification structure. They didn't think about testing strategies as a unified framework. They just... coded.

The spec-driven approach said: "Stop. Think about what you're building before you build it. Write it down. Structure it. Give it to your AI agents. Measure the output."

This is correct. This is valuable. This is the insight that matters.

The typed specification approach takes that insight and asks the next question: "Now that we've written it down, can we make the compiler check it?" The answer is yes. And when the answer is yes, you should — because the compiler is more reliable than any document, any process, any quality gate, any code reviewer.

The spec-driven approach is the teacher. The typed approach is the student who built the exam.

Both are needed. The teacher defines what matters. The exam ensures it's learned. A school with only teachers produces graduates who might know the material. A school with teachers AND exams produces graduates who demonstrably know the material. The typed approach doesn't replace the teaching — it adds the verification.

The Arc of Software Engineering

Zoom out far enough, and the history of software engineering is a story of moving from description to enforcement:

1960s: "Write good code" (description)
1970s: Structured programming — if/while/for replace goto (enforcement)

1980s: "Manage memory carefully" (description)
1990s: Garbage collection — runtime enforces memory safety (enforcement)

1990s: "Check for null" (description)
2010s: Nullable reference types — compiler enforces null safety (enforcement)

2000s: "Follow the API contract" (description)
2010s: Interface contracts — compiler enforces method signatures (enforcement)

2010s: "Write tests for requirements" (description)
2020s: [Verifies] + REQ3xx — compiler enforces test coverage (enforcement)

2020s: "Write specifications before coding" (description)
2025+: Typed DSLs — compiler enforces specification completeness (enforcement)

Each transition follows the same arc: a practice starts as advice ("check for null"), becomes a convention ("always null-check parameters"), becomes a tool (static analyzers that warn on possible null), and finally becomes a language feature (nullable reference types that make null-safety the default). Each stage adds enforcement. Each stage is controversial when introduced and obvious in hindsight.

Typed specifications are at the "controversial when introduced" stage. In ten years, they'll be at the "obvious in hindsight" stage. Not because I say so — but because the arc of software engineering bends toward enforcement, and it has never bent back.

The One-Sentence Summary

If you take nothing else from this ten-part series, take this:

Everything can be a DSL. C# is DSL-friendly. Every domain gap between "what we decided" and "what we built" is a DSL waiting to be written — and when it's written, the compiler closes that gap permanently.

Documents are where DSLs begin. Types are where they end. The compiler is the bridge.

The compiler always wins.


⬇ Download