The Problem Both Solve

"Most AI agent failures aren't model failures — they're context failures."

This sentence opens the README of the cogeet-io/ai-development-specifications repository. It is, quite possibly, the most important sentence in the entire AI-assisted development discourse. And it happens to be exactly right.

The Context Problem

When an AI agent generates incorrect code, the instinctive reaction is to blame the model. "GPT-4 doesn't understand my codebase." "Claude hallucinated an API that doesn't exist." "Copilot suggested a function that violates our architecture."

But the model isn't the problem. The problem is that the model doesn't know:

What your domain entities are and how they relate to each other
What your team's architectural boundaries look like
Which acceptance criteria a given feature must satisfy
What testing patterns your team has agreed on
Which conventions are enforced and which are aspirational
What the current state of implementation is — what's done, what's in progress, what's missing
What the operational behavior should be — deployment ordering, resilience policies, monitoring thresholds, who gets paged at 3 AM when the payment gateway goes down

Without this context, even the most capable model is guessing. And guessing at scale — across a 200-file codebase with 47 domain entities and 12 microservices — produces inconsistent, incorrect, and architecturally unsound code.

And the context problem doesn't end at dotnet test. The code leaves your IDE and enters production — and the AI that helped build it knows nothing about how to deploy it, monitor it, scale it, or recover when it breaks. The context gap spans the full lifecycle, from requirement to retirement.

Both the spec-driven and typed specification approaches agree on this diagnosis. The disagreement is about the treatment.

Treatment A: Assemble Context from Documents

The spec-driven approach says: write comprehensive documents that describe everything the AI needs to know, then feed those documents to the AI as context.

The cogeet-io/ai-development-specifications framework defines six document types (they call them "pillars") that together form a complete context package:

┌─────────────────────────────────────────────────────────────────┐
│                  Product Requirements Document                   │
│           (features, ACs, priorities, constraints)                │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Specification as Code                          │
│         (build pipeline, stages, task definitions)                │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                  Context Engineering as Code                     │
│        (context sources, assembly strategies, validation)         │
└──────────┬───────────────┼───────────────┬──────────────────────┘
           │               │               │
           ▼               ▼               ▼
┌──────────────┐ ┌─────────────────┐ ┌─────────────────────────┐
│   Testing    │ │  Documentation  │ │   Coding Best Practices │
│   as Code    │ │    as Code      │ │       as Code           │
└──────────────┘ └─────────────────┘ └─────────────────────────┘

When an AI agent needs to implement a feature, it receives a composite context assembled from these documents:

The PRD section describing the feature, its acceptance criteria, and its priority
The specification defining the build stage where this feature fits
The context engineering rules for how much background the agent needs
The testing specification defining which test strategies to apply
The documentation rules for what to document
The coding practices for the target language

The AI reads all of this, generates code, and the quality gates validate the output.

The Claimed Outcomes

The framework claims:

10x improvement in AI task success rates
50% reduction in debugging and rework time
85% consistency in code quality across teams

These are marketing numbers, not benchmarks. There's no published methodology, no control group, no reproducible measurement. But the direction is plausible: more context does produce better AI output. That much is uncontroversial.

Where It Breaks

Consider a concrete failure. The PRD says PasswordResetFeature has three acceptance criteria. The AI implements two of them. The quality gate runs coverage checks — 80% line coverage, passing. The gate doesn't know that AC #3 (NewPasswordMeetsComplexityRequirements) was never implemented, because the gate measures code coverage percentages, not per-AC completeness. The PRD document says three ACs exist. The code has two. Nobody notices until a user complains that they can set "a" as their new password.

The document was correct. The code was wrong. The quality gate didn't bridge the gap. This is the fundamental fragility: document truth and code truth are separate things, connected only by the hope that someone reads both.

The Mechanism

The mechanism is document-to-prompt engineering. You write structured documents. A context assembly system selects the relevant sections. The AI receives those sections as its prompt context. The AI generates code. Quality gates check the output against the specifications.

Documents ──→ Context Assembly ──→ AI Prompt ──→ Code Generation ──→ Quality Gates
    ↑                                                                      │
    └──────────────────── Feedback Loop ───────────────────────────────────┘
                          (coarse: "80% coverage" not "AC #3 is untested")

The feedback loop is real but coarse. Quality gates report aggregate metrics — "coverage is 80%," "complexity is acceptable," "tests pass." They don't report "acceptance criterion #3 of PasswordResetFeature has no implementation." The AI agent receives a pass/fail signal, not a per-requirement diagnostic. When the gate passes, the agent moves on — even if specific requirements slipped through the cracks.

This is a legitimate engineering approach. It's essentially what every team does when they write a good .cursorrules file, a comprehensive CLAUDE.md, or a detailed system prompt. The spec-driven framework just systematizes it into six reusable templates.

Treatment B: Encode Context in the Type System

The typed specification approach says: don't describe what the AI should do in documents. Encode it in the type system so the compiler enforces it.

Instead of a PRD template that says "Feature X has acceptance criteria Y and Z," you write:

public abstract record UserRolesFeature : Feature<PlatformScalabilityEpic>
{
    public override string Title => "User roles and permissions management";
    public override RequirementPriority Priority => RequirementPriority.Critical;
    public override string Owner => "Platform Team";

    public abstract AcceptanceCriterionResult
        AdminCanAssignRoles(UserId actingUser, UserId targetUser, RoleId role);

    public abstract AcceptanceCriterionResult
        ViewerHasReadOnlyAccess(UserId viewer, ResourceId resource);

    public abstract AcceptanceCriterionResult
        RoleChangeTakesEffectImmediately(UserId user, RoleId previousRole, RoleId newRole);
}

This is not a document. It's compiled code. The acceptance criteria are abstract methods with typed parameters. The hierarchy (Feature<PlatformScalabilityEpic>) is enforced by generic constraints. The compiler knows that this feature exists, what its ACs are, and what inputs they expect.

When the AI agent implements this feature, it doesn't read a document — it writes code that must satisfy the type system:

// The specification interface — derived from the feature
[ForRequirement(typeof(UserRolesFeature))]
public interface IUserRolesSpec
{
    [ForRequirement(typeof(UserRolesFeature), nameof(UserRolesFeature.AdminCanAssignRoles))]
    Result AssignRole(User actingUser, User targetUser, Role role);

    [ForRequirement(typeof(UserRolesFeature), nameof(UserRolesFeature.ViewerHasReadOnlyAccess))]
    Result<Permission, AuthorizationException> VerifyReadAccess(User viewer, Resource resource);

    [ForRequirement(typeof(UserRolesFeature), nameof(UserRolesFeature.RoleChangeTakesEffectImmediately))]
    Result ApplyRoleChange(User user, Role previousRole, Role newRole);
}

// The implementation — must implement the interface
[ForRequirement(typeof(UserRolesFeature))]
public class AuthorizationService : IUserRolesSpec
{
    public Result AssignRole(User actingUser, User targetUser, Role role)
    {
        // The compiler enforces that this method exists.
        // Remove it → compile error.
        // Change the signature → compile error.
        if (!actingUser.HasPermission(Permission.ManageRoles))
            return Result.Failure(new UnauthorizedException("User lacks ManageRoles permission"));

        targetUser.AssignRole(role);
        return Result.Success();
    }

    // ... other AC implementations
}

// The tests — linked by type, not by string
[TestsFor(typeof(UserRolesFeature))]
public class UserRolesFeatureTests
{
    [Verifies(typeof(UserRolesFeature), nameof(UserRolesFeature.AdminCanAssignRoles))]
    public void Admin_with_ManageRoles_permission_can_assign_roles()
    {
        var admin = TestUsers.AdminWithPermission(Permission.ManageRoles);
        var target = TestUsers.RegularUser();
        var role = Roles.Editor;

        var result = _service.AssignRole(admin, target, role);

        Assert.That(result.IsSuccess, Is.True);
        Assert.That(target.CurrentRole, Is.EqualTo(role));
    }
}

If the AI forgets to implement an AC, the compiler says:

error REQ101: UserRolesFeature.RoleChangeTakesEffectImmediately has no matching
              spec method. Create a method on IUserRolesSpec with
              [ForRequirement(typeof(UserRolesFeature),
              nameof(UserRolesFeature.RoleChangeTakesEffectImmediately))]

If the AI forgets to test an AC, the compiler says:

warning REQ301: UserRolesFeature.ViewerHasReadOnlyAccess has no test with
                [Verifies(typeof(UserRolesFeature),
                nameof(UserRolesFeature.ViewerHasReadOnlyAccess))]

The context IS the type system. The validation IS the compiler. There is no separate document to read, no quality gate to configure, no context assembly strategy to choose. The code describes itself, and the compiler enforces it.

The Core Tension

Here is the fundamental difference, stated as plainly as possible:

Dimension	Spec-Driven	Typed Specifications
Where truth lives	In documents (`.txt`, `.md`, templates)	In types (C# records, interfaces, attributes)
Who enforces correctness	CI/CD quality gates (post-hoc)	The compiler (pre-hoc)
When you find out	After code is generated, when gates run	During compilation, before anything runs
How the AI gets context	Context assembled from documents into prompt	Type system constrains what the AI can write
What happens when specs change	Documents are updated; code may or may not follow	Types change; compiler errors force code to follow
Refactoring	Find-replace across documents and code	IDE renames propagate through types automatically
Navigation	Read the document, then find the code	Ctrl+Click from test → AC → feature → epic

Neither is universally better. But they make different tradeoffs that matter enormously depending on your context.

The Shared Insight

Before we dive into the differences, it's worth pausing on the shared insight. Both approaches agree that:

AI agents need structured context, not ad-hoc prompts
Requirements must be explicit, not implicit in tribal knowledge
Testing must be systematic, not random
Quality gates are necessary, not optional
The gap between "what we decided" and "what we built" is the root cause of most failures

This shared insight is more important than the disagreement about implementation. If your team has neither — no structured specs and no typed requirements — either approach is a massive improvement over the status quo of "Jira tickets and hope."

The rest of this series explores where the approaches diverge, why they diverge, and what the consequences are. Both are better than nothing — but they are not equivalent. One approach trusts documents to tell the truth. The other makes lying impossible. The difference matters more than you think, and it compounds over time.

The History: How We Got Here

To understand why two radically different approaches emerged for the same problem, it helps to trace the history.

The Convention Era (2005-2020)

Ruby on Rails popularized "Convention over Configuration" in 2005. The idea was revolutionary: instead of writing XML configuration files, follow naming conventions and the framework does the wiring. Name your model User, put it in app/models/, and Rails auto-creates the database table users, the route /users, and the CRUD controller.

ASP.NET Core adopted this. Spring Boot embraced it. The entire web development industry converged on convention-based frameworks.

But conventions have a hidden cost: the double tax. Every convention must be (1) documented in a wiki, ADR, or onboarding guide, and (2) enforced by test code, linting rules, or architecture tests. The convention itself is invisible — it exists in the gap between documentation and enforcement code. Both artifacts drift independently from the codebase.

This is the Convention Tax described in detail in the Contention over Convention series.

The AI Era (2022-present)

GitHub Copilot launched in 2022. Claude Code, Cursor, Aider, and dozens of other AI coding tools followed. Suddenly, the person writing code wasn't always a human who had read the wiki and internalized the conventions. It was an AI that knew nothing about your team's specific conventions.

The Convention Tax exploded. A human developer might remember (or discover via code review) that "all commands must have a corresponding validator." An AI agent has no such memory. It follows the patterns it sees in the code — and if the code is inconsistent (because conventions drifted), the AI amplifies the inconsistency.

Two responses emerged:

Response A: Better context for the AI. If the AI doesn't know the conventions, give it the conventions. Write comprehensive documents. Assemble the right context. Make the AI read the wiki before writing code. This is the spec-driven approach.

Response B: Eliminate the need for conventions. If conventions are the problem — because they're invisible, undocumented, and unenforced — replace them with something the compiler can see. Make the AI write code within a type system that rejects incorrect implementations. This is the typed specification approach.

Both are legitimate responses. But they rest on different beliefs about the nature of the problem.

The Belief Difference

Spec-driven belief: The AI fails because it lacks information. Give it more information (better context), and it will succeed. The problem is a logistics problem: how do you get the right information to the right agent at the right time?

Typed specification belief: The AI fails because the system tolerates incorrect implementations. Make incorrect implementations impossible (via the type system), and the AI will succeed by construction. The problem is a structural problem: how do you make incorrect code uncompilable?

Both beliefs are partially correct. AI agents DO fail from lack of context. AND systems DO fail because they tolerate incorrect implementations. The question is which problem is more fundamental — and which solution is more durable.

The Four Eras Lens

The Contention over Convention series describes four eras of software architecture:

Era	Mechanism	Example	Feedback Time
Code	Write everything manually	Explicit DI wiring, hand-built SQL	Runtime
Configuration	Externalize to files	Spring XML, NHibernate mappings	Deploy/startup
Convention	Follow naming rules	Rails naming, ASP.NET Core conventions	Test time
Contention	Compiler-enforced generation	`[AggregateRoot]` → source-generated code	Compile time

The spec-driven approach operates at the Convention layer. It documents conventions (in structured text files rather than wiki pages), enforces them (via quality gates rather than architecture tests), and delivers them (via context assembly rather than onboarding documents). It's a much better version of Convention — systematic, structured, AI-ready. But it's still Convention: rules exist in documents, not in the compiler.

The typed specification approach operates at the Contention layer. It replaces conventions with types. The rules don't exist in documents because they exist in the compiler. There's nothing to document because the type IS the documentation. There's nothing to enforce because the compiler IS the enforcement.

This doesn't mean Contention is always better. Convention (even spec-driven convention) is more accessible, more flexible, and covers more ground. Contention is deeper but narrower. The question is what your project needs: breadth of guidance or depth of enforcement.

A Real-World Scenario

To make this concrete, let's trace a complete feature through both approaches — from the moment a product owner says "we need password reset" to the moment the feature ships.

Product Owner Says: "We Need Password Reset"

In the spec-driven world:

The product owner (or a developer acting as proxy) opens the PRD template and fills in:

DEFINE_FEATURE(password_reset)
  description: "Allow users to reset their password via email"
  user_story: "As a user, I want to reset my password so that
               I can regain access to my account"
  acceptance_criteria:
    - "User can request a password reset email"
    - "Reset link expires after 24 hours"
    - "New password must meet complexity requirements"
  priority: High
  complexity: Medium

The AI agent reads this, generates code, and the quality gate validates. The entire flow is document-mediated.

In the typed specification world:

The developer (in conversation with the product owner) creates a C# record:

public abstract record PasswordResetFeature : Feature<UserManagementEpic>
{
    public override string Title => "Password reset via email";
    public override RequirementPriority Priority => RequirementPriority.High;
    public override string Owner => "Identity Team";

    /// <summary>
    /// A user can submit their email address and receive a password reset link.
    /// If the email doesn't match an existing account, no email is sent (to prevent
    /// account enumeration), but the user sees a success message regardless.
    /// </summary>
    public abstract AcceptanceCriterionResult
        UserCanRequestPasswordResetEmail(Email userEmail);

    /// <summary>
    /// The reset link contains a token that expires 24 hours after creation.
    /// Attempts to use an expired token display a clear error and offer to
    /// send a new reset link.
    /// </summary>
    public abstract AcceptanceCriterionResult
        ResetLinkExpiresAfter24Hours(TokenId resetToken, DateTime requestedAt);

    /// <summary>
    /// The new password must be at least 12 characters with at least one uppercase,
    /// one lowercase, one digit, and one special character. Previous passwords
    /// within the last 5 changes are rejected.
    /// </summary>
    public abstract AcceptanceCriterionResult
        NewPasswordMeetsComplexityRequirements(Password newPassword);
}

Notice the difference in precision. The spec-driven AC says "reset link expires after 24 hours." The typed AC says ResetLinkExpiresAfter24Hours(TokenId resetToken, DateTime requestedAt) — with a typed token ID and a creation timestamp, plus XML documentation explaining the UX behavior (clear error, offer to resend).

The spec-driven approach captures intent in English. The typed approach captures intent in both English (XML docs) and the type system (method signatures). When the AI reads the typed version, it knows not just WHAT to implement, but the exact SHAPE of the implementation: it takes a TokenId and a DateTime, and returns a pass/fail result.

A Note on Terminology

Throughout this series:

"Spec-driven" refers specifically to the approach in cogeet-io/ai-development-specifications — document-based, template-driven, text-file specifications designed for AI consumption.
"Typed specifications" refers to the approach described in Requirements as Code and Contention over Convention — C# types, Roslyn source generators, and compiler-enforced requirements chains.
"AI agent" means any AI-assisted development tool: Copilot, Claude Code, Cursor, Aider, custom agent pipelines, or future tools we haven't seen yet.

When we say "the spec-driven approach does X," we mean the cogeet-io framework specifically. Other document-based approaches may differ. When we say "typed specifications do Y," we mean the C#/.NET implementation specifically. The principles could apply to other typed languages, but the examples are C#.

The Specification Lifecycle: A Complete Walk-Through

To make the comparison visceral, let's trace a single feature — Order Cancellation — through its entire lifecycle in both approaches. Not a toy example. A real feature with real complexity: the customer can cancel an order before shipment, the system processes a refund, and a confirmation email is sent.

We'll follow every step: define, implement, test, change, and ship.

Step 1: The Product Owner Defines the Requirement

Spec-driven approach:

The product owner (or a developer acting as proxy) opens the PRD template and writes:

DEFINE_FEATURE(order_cancellation)
  description: "Allow customers to cancel orders before shipment"
  user_story: "As a customer, I want to cancel my order before it ships
               so that I receive a full refund"
  acceptance_criteria:
    - "Customer can cancel an order that has not yet shipped"
    - "Cancellation triggers a full refund to the original payment method"
    - "A confirmation email is sent after successful cancellation"
  priority: High
  complexity: Medium
  stage: 2

Stop here and notice something. The product owner just learned a grammar. DEFINE_FEATURE, acceptance_criteria, user_story, priority, complexity, stage — these are keywords in a template language. They have specific positions, specific meanings, specific formatting rules. Misplace a colon, forget the indentation on the acceptance criteria list, misspell acceptance_criteria as acceptance_criterias — and the quality gate either rejects the document or, worse, silently ignores the malformed section.

This is a DSL. It has syntax. It has semantics. It has rules you must learn. The spec-driven community describes this as "low learning curve" because the syntax resembles English. But resembling English is not the same as being English. English doesn't care about indentation. English doesn't have reserved keywords. English doesn't require you to wrap acceptance criteria in a bulleted list inside a specific field name.

The cognitive cost of learning DEFINE_FEATURE(order_cancellation) is comparable to learning:

public abstract record OrderCancellationFeature : Feature<OrderManagementEpic>

Both are unfamiliar to a newcomer. Both require understanding what the keywords mean. Both have specific rules about structure. One compiles; one doesn't.

Typed specification approach:

The developer, in conversation with the product owner, creates a C# record:

public abstract record OrderCancellationFeature : Feature<OrderManagementEpic>
{
    public override string Title => "Order cancellation before shipment";
    public override RequirementPriority Priority => RequirementPriority.High;
    public override string Owner => "Commerce Team";

    /// <summary>
    /// A customer can cancel any order whose status is Created, Confirmed,
    /// or Processing. Orders with status Shipped, Delivered, or already
    /// Cancelled cannot be cancelled. Returns the updated order with status
    /// Cancelled and the cancellation timestamp.
    /// </summary>
    public abstract AcceptanceCriterionResult
        CustomerCanCancelUnshippedOrder(OrderId orderId, CustomerId customerId);

    /// <summary>
    /// Upon successful cancellation, the system initiates a full refund to the
    /// original payment method. The refund amount equals the order total including
    /// tax. The refund is processed asynchronously but must complete within
    /// 5 business days.
    /// </summary>
    public abstract AcceptanceCriterionResult
        CancellationTriggersFullRefund(OrderId orderId, PaymentMethodId originalPayment);

    /// <summary>
    /// After successful cancellation and refund initiation, the system sends
    /// a confirmation email to the customer's registered email address. The email
    /// includes the order number, cancellation timestamp, and expected refund
    /// timeline.
    /// </summary>
    public abstract AcceptanceCriterionResult
        ConfirmationEmailSentAfterCancellation(OrderId orderId, Email customerEmail);
}

The types encode information the PRD template cannot: OrderId, CustomerId, PaymentMethodId, Email. These are not strings — they are domain types that the compiler validates. The AI reading this feature record knows the exact shape of the implementation before writing a single line.

Step 2: The Developer Implements

Spec-driven approach:

The context engineering system assembles the PRD section, coding practices, and existing code into a prompt. The AI generates:

public class OrderService
{
    public async Task<bool> CancelOrder(int orderId)
    {
        var order = await _repository.GetOrder(orderId);
        if (order.Status == "Shipped")
            return false;

        order.Status = "Cancelled";
        await _repository.Save(order);

        await _paymentService.Refund(orderId);
        await _emailService.SendCancellationEmail(order.CustomerEmail);

        return true;
    }
}

The quality gate passes. But look carefully:

int orderId instead of a strongly-typed OrderId — was that intentional?
order.Status == "Shipped" — a string comparison. What about "Processing"? "Confirmed"?
return false when the order is shipped — but no information about WHY it failed.
No customer ID validation — any user can cancel any order.
Synchronous email sending inside the cancellation method — if the email fails, does the cancellation roll back?

The AI interpreted "customer can cancel an order that has not yet shipped" as "check if shipped, set status to cancelled." That's A valid interpretation. It's not THE correct interpretation. The quality gate cannot tell the difference because it checks syntax, not semantics.

Typed specification approach:

The compiler fires the moment the feature record exists:

error REQ101: OrderCancellationFeature.CustomerCanCancelUnshippedOrder has no
              matching spec method.
error REQ101: OrderCancellationFeature.CancellationTriggersFullRefund has no
              matching spec method.
error REQ101: OrderCancellationFeature.ConfirmationEmailSentAfterCancellation
              has no matching spec method.

The AI follows the compiler's guidance. First, the specification interface:

[ForRequirement(typeof(OrderCancellationFeature))]
public interface IOrderCancellationSpec
{
    [ForRequirement(typeof(OrderCancellationFeature),
        nameof(OrderCancellationFeature.CustomerCanCancelUnshippedOrder))]
    Result<CancelledOrder, OrderCancellationException> CancelOrder(
        OrderId orderId, CustomerId customerId);

    [ForRequirement(typeof(OrderCancellationFeature),
        nameof(OrderCancellationFeature.CancellationTriggersFullRefund))]
    Result<RefundInitiation, RefundException> InitiateRefund(
        OrderId orderId, PaymentMethodId originalPayment);

    [ForRequirement(typeof(OrderCancellationFeature),
        nameof(OrderCancellationFeature.ConfirmationEmailSentAfterCancellation))]
    Result<EmailConfirmation, EmailException> SendCancellationConfirmation(
        OrderId orderId, Email customerEmail);
}

The return types tell the story: Result<CancelledOrder, OrderCancellationException>. Not bool. Not Task<bool>. A typed result that carries either the cancelled order or a typed exception. The AI cannot return false — it must return a CancelledOrder or explain WHY with an OrderCancellationException.

Then the implementation:

[ForRequirement(typeof(OrderCancellationFeature))]
public class OrderCancellationService : IOrderCancellationSpec
{
    private static readonly OrderStatus[] CancellableStatuses =
    {
        OrderStatus.Created, OrderStatus.Confirmed, OrderStatus.Processing
    };

    public Result<CancelledOrder, OrderCancellationException> CancelOrder(
        OrderId orderId, CustomerId customerId)
    {
        var order = _orderRepository.FindById(orderId);
        if (order is null)
            return Result.Failure(new OrderNotFoundException(orderId));

        if (order.CustomerId != customerId)
            return Result.Failure(new UnauthorizedCancellationException(
                orderId, customerId));

        if (!CancellableStatuses.Contains(order.Status))
            return Result.Failure(new OrderNotCancellableException(
                orderId, order.Status));

        var cancelled = order.Cancel(DateTime.UtcNow);
        _orderRepository.Save(cancelled);

        return Result.Success(cancelled);
    }

    // ... other method implementations
}

Notice what the type system forced: customer ID validation (because CustomerId is a parameter), explicit status checking (because the return type demands either success or a typed exception), and no string comparisons anywhere. The method signature CancelOrder(OrderId orderId, CustomerId customerId) made it impossible to forget customer authorization.

Step 3: The Tester Verifies

Spec-driven approach:

The testing specification says "80% coverage, unit tests follow AAA pattern." The AI generates tests:

[Test]
public void CancelOrder_WhenOrderExists_ReturnsTrue()
{
    var service = new OrderService(mockRepo, mockPayment, mockEmail);
    mockRepo.Setup(r => r.GetOrder(1))
        .Returns(new Order { Status = "Created" });

    var result = await service.CancelOrder(1);

    Assert.That(result, Is.True);
}

[Test]
public void CancelOrder_WhenShipped_ReturnsFalse()
{
    var service = new OrderService(mockRepo, mockPayment, mockEmail);
    mockRepo.Setup(r => r.GetOrder(1))
        .Returns(new Order { Status = "Shipped" });

    var result = await service.CancelOrder(1);

    Assert.That(result, Is.False);
}

Coverage: maybe 70%. The quality gate asks for 80%. The AI generates more tests. Coverage: 82%. Gate passes.

But which acceptance criteria do these tests verify? There's no link. The test CancelOrder_WhenOrderExists_ReturnsTrue does not declare which AC it covers. The traceability from test to requirement is invisible. If the product owner asks "is the refund AC tested?" — someone must read every test method and mentally map it to an AC.

Typed specification approach:

The compiler fires warning REQ301 for every untested AC. The AI writes tests that explicitly declare what they verify:

[TestsFor(typeof(OrderCancellationFeature))]
public class OrderCancellationTests
{
    [Verifies(typeof(OrderCancellationFeature),
        nameof(OrderCancellationFeature.CustomerCanCancelUnshippedOrder))]
    public void Created_order_can_be_cancelled_by_its_customer()
    {
        var order = TestOrders.Created(customer: TestCustomers.Alice);

        var result = _service.CancelOrder(order.Id, TestCustomers.Alice.Id);

        Assert.That(result.IsSuccess, Is.True);
        Assert.That(result.Value.Status, Is.EqualTo(OrderStatus.Cancelled));
        Assert.That(result.Value.CancelledAt, Is.Not.Null);
    }

    [Verifies(typeof(OrderCancellationFeature),
        nameof(OrderCancellationFeature.CustomerCanCancelUnshippedOrder))]
    public void Shipped_order_cannot_be_cancelled()
    {
        var order = TestOrders.Shipped(customer: TestCustomers.Alice);

        var result = _service.CancelOrder(order.Id, TestCustomers.Alice.Id);

        Assert.That(result.IsSuccess, Is.False);
        Assert.That(result.Error, Is.InstanceOf<OrderNotCancellableException>());
    }

    [Verifies(typeof(OrderCancellationFeature),
        nameof(OrderCancellationFeature.CustomerCanCancelUnshippedOrder))]
    public void Different_customer_cannot_cancel_order()
    {
        var order = TestOrders.Created(customer: TestCustomers.Alice);

        var result = _service.CancelOrder(order.Id, TestCustomers.Bob.Id);

        Assert.That(result.IsSuccess, Is.False);
        Assert.That(result.Error, Is.InstanceOf<UnauthorizedCancellationException>());
    }

    [Verifies(typeof(OrderCancellationFeature),
        nameof(OrderCancellationFeature.CancellationTriggersFullRefund))]
    public void Cancellation_initiates_refund_for_full_order_amount()
    {
        var order = TestOrders.Created(total: 149.99m);
        _service.CancelOrder(order.Id, order.CustomerId);

        var result = _service.InitiateRefund(order.Id, order.PaymentMethodId);

        Assert.That(result.IsSuccess, Is.True);
        Assert.That(result.Value.Amount, Is.EqualTo(149.99m));
        Assert.That(result.Value.PaymentMethodId, Is.EqualTo(order.PaymentMethodId));
    }

    [Verifies(typeof(OrderCancellationFeature),
        nameof(OrderCancellationFeature.ConfirmationEmailSentAfterCancellation))]
    public void Confirmation_email_sent_with_order_details()
    {
        var order = TestOrders.Created(customer: TestCustomers.Alice);
        _service.CancelOrder(order.Id, TestCustomers.Alice.Id);

        var result = _service.SendCancellationConfirmation(
            order.Id, TestCustomers.Alice.Email);

        Assert.That(result.IsSuccess, Is.True);
        Assert.That(result.Value.RecipientEmail, Is.EqualTo(TestCustomers.Alice.Email));
        Assert.That(result.Value.OrderNumber, Is.EqualTo(order.OrderNumber));
    }
}

Every test declares which AC it verifies via [Verifies]. The compiler knows which ACs have tests and which don't. The traceability is structural, not inferred.

Step 4: The Feature Changes

Two weeks later, the product owner says: "We need a cancellation reason. Make it required."

This is where the approaches diverge dramatically.

Spec-driven approach:

Someone updates the PRD:

DEFINE_FEATURE(order_cancellation)
  acceptance_criteria:
    - "Customer can cancel an order that has not yet shipped"
    - "Cancellation triggers a full refund to the original payment method"
    - "A confirmation email is sent after successful cancellation"
    - "Cancellation reason is required (selected from predefined list)"    ← NEW

A new line in a text file. The existing code continues to work. The existing tests continue to pass. The quality gate continues to be green. Nothing in the system knows that a fourth AC was added. The AI, if asked to implement the feature again, would see the new AC — but only if someone triggers a reimplementation. If the AI is not re-invoked, the gap persists silently.

The gap between the PRD and the code is invisible. It requires a human to notice that the code doesn't ask for a cancellation reason.

Typed specification approach:

The developer adds one abstract method:

/// <summary>
/// The customer must select a cancellation reason from a predefined list:
/// ChangedMind, FoundBetterPrice, OrderedByMistake, TakingTooLong, Other.
/// If "Other" is selected, a free-text explanation (max 500 chars) is required.
/// </summary>
public abstract AcceptanceCriterionResult
    CancellationReasonRequired(OrderId orderId, CancellationReason reason);

The moment this is saved, the compiler fires:

error REQ101: OrderCancellationFeature.CancellationReasonRequired has no
              matching spec method.
warning REQ301: OrderCancellationFeature.CancellationReasonRequired has no test.

The build is now broken. It stays broken until someone (human or AI) adds the spec method, implements it, and writes tests. There is no silent gap. The type system made the gap visible, immediate, and unignorable.

But it goes further. The developer also updates the CancelOrder spec method signature:

Result<CancelledOrder, OrderCancellationException> CancelOrder(
    OrderId orderId, CustomerId customerId, CancellationReason reason);

Now the compiler fires a different error:

error CS0535: 'OrderCancellationService' does not implement interface member
              'IOrderCancellationSpec.CancelOrder(OrderId, CustomerId, CancellationReason)'

Every call site that invokes CancelOrder must now provide a CancellationReason. Every test that calls CancelOrder must be updated. The compiler finds every location. Nothing is missed.

Step 5: The Feature Ships — What Exists for Auditing?

Spec-driven approach:

After shipping, the audit trail consists of:

The PRD text file (which may or may not match the code)
The generated code (which may or may not match the PRD)
The test results (which measure coverage, not requirement traceability)
The quality gate logs (which show pass/fail, not requirement mapping)

To answer "is every acceptance criterion implemented and tested?" — someone must manually read the PRD, find each AC, search the codebase for the corresponding implementation, and search the test suite for relevant tests. This is a human process. It scales linearly with the number of ACs and is error-prone.

Typed specification approach:

After shipping, the audit trail is generated code:

// TraceabilityMatrix.g.cs — generated at compile time

new RequirementTrace
{
    Feature = typeof(OrderCancellationFeature),
    AcceptanceCriteria = new[]
    {
        new AcceptanceCriterionTrace
        {
            Name = "CustomerCanCancelUnshippedOrder",
            Specification = typeof(IOrderCancellationSpec),
            SpecMethod = "CancelOrder",
            Implementation = typeof(OrderCancellationService),
            ImplMethod = "CancelOrder",
            Tests = new[] { "Created_order_can_be_cancelled_by_its_customer",
                            "Shipped_order_cannot_be_cancelled",
                            "Different_customer_cannot_cancel_order" },
            IsCovered = true,
        },
        new AcceptanceCriterionTrace
        {
            Name = "CancellationTriggersFullRefund",
            Specification = typeof(IOrderCancellationSpec),
            SpecMethod = "InitiateRefund",
            Implementation = typeof(OrderCancellationService),
            ImplMethod = "InitiateRefund",
            Tests = new[] { "Cancellation_initiates_refund_for_full_order_amount" },
            IsCovered = true,
        },
        new AcceptanceCriterionTrace
        {
            Name = "ConfirmationEmailSentAfterCancellation",
            Specification = typeof(IOrderCancellationSpec),
            SpecMethod = "SendCancellationConfirmation",
            Implementation = typeof(OrderCancellationService),
            ImplMethod = "SendCancellationConfirmation",
            Tests = new[] { "Confirmation_email_sent_with_order_details" },
            IsCovered = true,
        },
        new AcceptanceCriterionTrace
        {
            Name = "CancellationReasonRequired",
            Specification = typeof(IOrderCancellationSpec),
            SpecMethod = "ValidateCancellationReason",
            Implementation = typeof(OrderCancellationService),
            ImplMethod = "ValidateCancellationReason",
            Tests = new[] { "Valid_reason_is_accepted",
                            "Other_reason_requires_explanation" },
            IsCovered = true,
        },
    },
    OverallCoverage = 1.0,
}

To answer "is every AC implemented and tested?" — read the generated traceability matrix. It's produced by the compiler from the same types it validated. It cannot be wrong. The audit is automatic, structural, and complete.

The Lifecycle Summary

Lifecycle Phase	Spec-Driven	Typed Specifications
Define	Fill in PRD template (learn the template syntax)	Write C# record (learn the type system)
Implement	AI interprets text, generates freely	Compiler guides AI through chain
Test	Coverage threshold, no AC linkage	`[Verifies]` per AC, compiler-enforced
Change	Update text, code may not follow	Update type, compiler forces code to follow
Audit	Manual: read PRD, find code, find tests	Automatic: read generated traceability matrix
Learning curve	Template keywords and structure	C# records and attributes

The last row is the key insight. Both approaches require learning a grammar. The spec-driven grammar (DEFINE_FEATURE, acceptance_criteria, priority) feels approachable because it resembles English keywords. The typed grammar (abstract record, Feature<T>, AcceptanceCriterionResult) feels intimidating because it's explicitly code.

But "feels approachable" and "is simpler" are different things. The spec-driven grammar has implicit rules (indentation matters, field names are reserved, the bulleted list format is required) that you discover when the quality gate rejects your document. The typed grammar has explicit rules (the compiler tells you exactly what's wrong) that you discover immediately.

One grammar has a compiler. The other doesn't. Both are grammars.

What Comes Next

Part II compares the architecture of both approaches: six pillars vs six DSLs. Same problem decomposition, radically different organizational units. The architecture tells you everything about the philosophy.

`[` or `Alt+S`	Focus sidebar navigation
`]` or `Alt+C`	Focus main content
`↑` `↓`	Navigate between sidebar items
`Enter`	Open page / toggle section
`Space`	Toggle section expand/collapse
`Escape`	Close overlay / sidebar

`Ctrl+K`	Open search
`?`	Show this help

`Ctrl+=` or `Ctrl+↑`	Increase font size
`Ctrl+−` or `Ctrl+↓`	Decrease font size
`f`	Open console font selector

`Ctrl+⇧+=` or `Ctrl+⇧+↑`	Browser zoom in
`Ctrl+⇧+−` or `Ctrl+⇧+↓`	Browser zoom out
`Ctrl+⇧+0`	Reset browser zoom

`Tab`	Focus a diagram or image
`Enter`	Open full size overlay
`+` `−`	Zoom in / out (in overlay)
`Escape`	Close overlay, return focus

The Problem Both Solve📋

The Context Problem📋

Treatment A: Assemble Context from Documents📋

The Claimed Outcomes📋

Where It Breaks📋

The Mechanism📋

Treatment B: Encode Context in the Type System📋

The Core Tension📋

The Shared Insight📋

The History: How We Got Here📋

The Convention Era (2005-2020)📋

The AI Era (2022-present)📋

The Belief Difference📋

The Four Eras Lens📋

A Real-World Scenario📋

Product Owner Says: "We Need Password Reset"📋

A Note on Terminology📋

The Specification Lifecycle: A Complete Walk-Through📋

Step 1: The Product Owner Defines the Requirement📋

Step 2: The Developer Implements📋

Step 3: The Tester Verifies📋

Step 4: The Feature Changes📋

Step 5: The Feature Ships — What Exists for Auditing?📋

The Lifecycle Summary📋

What Comes Next📋

The Problem Both Solve

The Context Problem

Treatment A: Assemble Context from Documents

The Claimed Outcomes

Where It Breaks

The Mechanism

Treatment B: Encode Context in the Type System

The Core Tension

The Shared Insight

The History: How We Got Here

The Convention Era (2005-2020)

The AI Era (2022-present)

The Belief Difference

The Four Eras Lens

A Real-World Scenario

Product Owner Says: "We Need Password Reset"

A Note on Terminology

The Specification Lifecycle: A Complete Walk-Through

Step 1: The Product Owner Defines the Requirement

Step 2: The Developer Implements

Step 3: The Tester Verifies

Step 4: The Feature Changes

Step 5: The Feature Ships — What Exists for Auditing?

The Lifecycle Summary

What Comes Next