Context Engineering: Assembly vs Generation

Context Engineering is the spec-driven framework's most innovative pillar — and arguably where the philosophical gap between the two approaches is widest. Both solve the same problem: how do you get the right information to the right AI agent at the right time? The answers are incompatible.

The Spec-Driven Approach: Context Assembly

The Context-Engineering-as-Code specification defines a sophisticated system for assembling context from multiple sources and delivering it to AI agents. The core model:

Four Context Sources

Codebase — The actual source code, file structure, dependencies, and build configuration. This is the "ground truth" that the AI needs to understand.
Documentation — Existing documentation, ADRs, wiki pages, README files, API docs. This is supplementary information that explains intent, decisions, and constraints.
Specifications — The other five pillars (PRD, Testing, Coding Practices, etc.). This is the "what should be" information that guides the AI's output.
Runtime Data — Logs, metrics, error reports, deployment status. This is the "what actually happened" information that helps the AI understand current state.

Three Assembly Strategies

Strategy 1: Task-Driven Assembly

The context assembled depends on the task type. An AI agent implementing a new feature gets the PRD section for that feature, the relevant coding practices, and the testing specification. An AI agent fixing a bug gets the error logs, the relevant code, and the documentation for the affected module.

Task: "Implement password reset"
     │
     ├── PRD → Feature: password_reset section
     ├── Coding Practices → Language: C#, patterns: SOLID
     ├── Testing → Strategy: unit + integration
     ├── Codebase → Files: AuthController.cs, UserService.cs
     └── Context assembled → Prompt sent to AI

Strategy 2: Progressive Disclosure

Start with minimal context. If the AI produces inadequate output, progressively add more context. This avoids overwhelming the AI with irrelevant information while ensuring it gets what it needs.

Round 1: PRD feature section only → AI generates code → Quality gate fails
Round 2: + Coding practices → AI revises → Quality gate fails
Round 3: + Related code files → AI revises → Quality gate passes

Strategy 3: Adaptive Optimization

Learn from past interactions which context combinations produce the best results. Over time, the system optimizes context assembly for each task type and AI model.

Validation Framework

The assembled context is validated through quality gates before being sent to the AI:

Completeness: Does the context include all relevant specification sections?
Consistency: Do the specification sections agree with each other?
Relevance: Is the context focused on the current task?
Size: Does the context fit within the AI's context window?

What This Means in Practice

The spec-driven context engineering approach is essentially a sophisticated prompt engineering system. It takes structured documents, selects relevant sections, assembles them into a prompt, sends it to the AI, validates the output, and iterates. The innovation is systematizing this process rather than doing it ad-hoc.

Documents ──→ Context Selector ──→ Assembler ──→ Validator ──→ AI Prompt
                                                                   │
                                                                   ▼
                                                              AI Output
                                                                   │
                                                                   ▼
                                                            Quality Gate
                                                              │       │
                                                          Pass ✓   Fail ✗
                                                              │       │
                                                          Done    Iterate
                                                                  (add context)

The Strength

This is genuinely novel. Most AI-assisted development tools do ad-hoc context assembly — the developer manually selects files, writes a prompt, and hopes the AI understands. The spec-driven approach replaces hope with system. That's real progress.

The 78-week implementation roadmap across five phases shows the ambition: this isn't a template — it's a framework for building AI orchestration pipelines.

The Weakness

The entire system operates on text. Context is assembled from text documents, validated by text-matching quality gates, and delivered as text prompts. There is no structural guarantee that the assembled context is correct — only that it passes heuristic quality checks.

What "correctness" means here is fuzzy. "Does the context include the relevant PRD section?" Well, which section is relevant? The system must decide, and that decision is itself a heuristic. The quality gate can check that the context includes something from the PRD, but it can't verify that it includes the right thing.

The Typed Specification Approach: Context Generation

The typed specification approach doesn't assemble context. It generates it.

The M3 Meta-Metamodel as Context Source

At compile time, Stage 0 of the generation pipeline scans all assemblies for [MetaConcept] attributes and produces MetamodelRegistry.g.cs — a complete map of every concept, every property, every reference, and every constraint in the system:

// Generated at compile time — MetamodelRegistry.g.cs

public static class MetamodelRegistry
{
    public static readonly IReadOnlyDictionary<string, MetaConceptDefinition> Concepts = new()
    {
        ["AggregateRoot"] = new MetaConceptDefinition
        {
            Name = "AggregateRoot",
            InheritsFrom = "Entity",
            Properties =
            {
                new("BoundedContext", typeof(string), Required: true),
                new("HasWorkflow", typeof(bool), Default: false),
            },
            Constraints =
            {
                new("MustHaveId", "Properties.Any(p => p.IsAnnotatedWith('EntityId'))"),
                new("NoDirectComposition",
                    "References.Where(r => r.IsComposition).All(r => r.Target.IsEntity)"),
            },
        },
        ["Feature"] = new MetaConceptDefinition
        {
            Name = "Feature",
            InheritsFrom = "RequirementMetadata",
            Properties =
            {
                new("Title", typeof(string), Required: true),
                new("Priority", typeof(RequirementPriority), Required: true),
                new("Owner", typeof(string), Required: true),
            },
            Constraints =
            {
                new("MustHaveACs",
                    "Methods.Any(m => m.ReturnType == typeof(AcceptanceCriterionResult))"),
            },
        },
        // ... every concept from every DSL
    };
}

This registry is the context. An AI agent working within this codebase can read the metamodel registry and understand:

Every domain concept and its shape
Every constraint and what it enforces
Every relationship between concepts
Every requirement and its acceptance criteria
Every specification and what it requires

The difference from spec-driven context: this context cannot be wrong. It's generated from the same types the compiler checks. If the code changes, the registry changes. If a concept is added, it appears in the registry. If a constraint is removed, it disappears.

The Traceability Matrix as Context

Stage 4 of the generation pipeline produces a traceability matrix — a complete map from requirements to specifications to implementations to tests:

// Generated at compile time — TraceabilityMatrix.g.cs

public static class TraceabilityMatrix
{
    public static readonly IReadOnlyList<RequirementTrace> Traces = new[]
    {
        new RequirementTrace
        {
            Feature = typeof(UserRolesFeature),
            AcceptanceCriteria = new[]
            {
                new AcceptanceCriterionTrace
                {
                    Name = nameof(UserRolesFeature.AdminCanAssignRoles),
                    Specification = typeof(IUserRolesSpec),
                    SpecMethod = nameof(IUserRolesSpec.AssignRole),
                    Implementation = typeof(AuthorizationService),
                    ImplMethod = nameof(AuthorizationService.AssignRole),
                    Tests = new[]
                    {
                        typeof(UserRolesFeatureTests)
                            .GetMethod(nameof(
                                UserRolesFeatureTests
                                    .Admin_with_ManageRoles_permission_can_assign_roles)),
                    },
                    IsCovered = true,
                },
                // ... other ACs
            },
            OverallCoverage = 1.0,  // 100% — all ACs have specs, impls, and tests
        },
        new RequirementTrace
        {
            Feature = typeof(PasswordResetFeature),
            AcceptanceCriteria = new[]
            {
                // ...
                new AcceptanceCriterionTrace
                {
                    Name = nameof(PasswordResetFeature.ResetLinkCanOnlyBeUsedOnce),
                    Specification = null,   // ⚠ No spec yet
                    Implementation = null,  // ⚠ No impl yet
                    Tests = Array.Empty<MethodInfo>(),
                    IsCovered = false,
                },
            },
            OverallCoverage = 0.75,  // 3 of 4 ACs covered
        },
    };
}

An AI agent reading this matrix knows exactly:

Which features exist and what their ACs are
Which ACs have specifications, implementations, and tests
Which ACs are missing coverage (and what kind of coverage is missing)
What the overall completion state of the project is

This is not a document that might be outdated. It's generated code that reflects the current state of the type system. It's always correct because it's produced by the same compiler pass that validates the code.

Context Through the IDE

In the typed approach, context delivery to AI agents happens naturally through the IDE:

The AI agent sees the current file. If it's a test file, it sees the [Verifies(typeof(UserRolesFeature), nameof(...))] attributes — which tell it exactly which feature and AC this test covers.
The AI agent can follow type references. typeof(UserRolesFeature) leads to the feature definition. typeof(IUserRolesSpec) leads to the specification. The type system IS the navigation graph.
The AI agent can read compiler diagnostics. error REQ101: ... tells it what's missing. The diagnostic IS the task description.

AI Agent sees:
├── Current file: PasswordResetFeatureTests.cs
│   └── [TestsFor(typeof(PasswordResetFeature))]
│       ├── [Verifies(...RequestEmail)] → Ctrl+Click → Feature definition
│       ├── [Verifies(...LinkExpires24h)] → Ctrl+Click → Feature definition
│       └── Missing: Verifies for ResetLinkCanOnlyBeUsedOnce
│
├── Compiler diagnostics:
│   ├── REQ301: PasswordResetFeature.ResetLinkCanOnlyBeUsedOnce has no test
│   └── REQ101: PasswordResetFeature.ResetLinkCanOnlyBeUsedOnce has no spec
│
└── Context is COMPLETE — no assembly needed

The Core Difference

Dimension	Spec-Driven Context	Typed Specification Context
Source	Documents (text files)	Types (compiled code)
Assembly	Runtime selection from documents	Compile-time generation from types
Correctness	Heuristic validation (quality gates)	Structural guarantee (compiler)
Staleness	Possible (documents can drift)	Impossible (generated from current types)
Completeness	Checked by heuristic	Checked by analyzer (REQ1xx-REQ4xx)
AI delivery	Prompt engineering (text in → text out)	Type system constraints (code in → compiled code out)
Feedback loop	Quality gate → iterate prompt	Compiler error → fix code
Granularity	Document section level	Method/property/type level

The Assembly Problem

The spec-driven approach faces the context assembly problem: given a task, which document sections should the AI see? Too much context overwhelms. Too little context produces wrong output. The system must select — and selection can be wrong.

The typed approach eliminates the assembly problem. There's nothing to select. The type system is the context. The compiler diagnostics are the task. The AI writes code within the type system, and the compiler validates it. If the code is wrong, the compiler says why. If the code is right, the compiler says nothing.

This is not a minor difference. The entire Context Engineering pillar — the four context sources, the three assembly strategies, the validation framework, the 78-week roadmap — exists to solve a problem that typed specifications don't have. Not because typed specifications are smarter, but because they encode context in a medium (the type system) that doesn't need assembly.

The Discovery Problem

But typed specifications face the discovery problem: an AI agent working in a typed codebase needs to know that typed requirements exist, how the [ForRequirement] system works, and what the analyzer diagnostics mean. This is meta-context — context about the context system.

The spec-driven approach doesn't have this problem. A document is self-describing: "This is the Testing Specification. It defines 15 testing strategies." An AI can read the document and understand what it's looking at.

A typed specification system is not self-describing to an AI that hasn't seen it before. abstract record UserRolesFeature : Feature<PlatformScalabilityEpic> means nothing to an AI that doesn't know what Feature<T> is or why acceptance criteria are abstract methods. The AI needs onboarding — and that onboarding is itself a document (a CLAUDE.md, a README, a system prompt).

This is an irony worth noting: the typed specification approach still needs a document to explain itself. The types eliminate the need for specification documents, but they create the need for a meta-specification document that explains the type system.

A Practical Comparison

Let's trace both approaches through a concrete scenario: an AI agent needs to implement a new AC for the UserRolesFeature.

Spec-Driven Flow

1. PRD updated: "Add AC: Admin can revoke roles"
2. Context Engineering assembles:
   - PRD section for user_roles_management
   - Coding Practices for C#
   - Testing spec for unit + integration
   - Existing code: AuthorizationService.cs
3. Prompt sent to AI:
   "Implement the new acceptance criterion 'Admin can revoke roles' for the
    user roles management feature. Follow C# coding practices. Write unit
    and integration tests. Here is the existing code: [AuthorizationService.cs]"
4. AI generates:
   - RevokeRole method in AuthorizationService
   - Unit test: AdminCanRevokeRole
   - Integration test: RevokeRolePersists
5. Quality gate checks:
   - Code compiles? ✓
   - Tests pass? ✓
   - Coverage > 80%? ✓
   - Coding practices followed? ✓ (linter)
6. Done.

Questions the quality gate cannot answer:

Does the implementation actually match the AC? (It checks syntax, not semantics.)
Is the test testing the right thing? (It checks existence, not correctness.)
Is the AC linked to the right feature? (There's no structural link.)

Typed Specification Flow

1. Developer adds abstract method to UserRolesFeature:
   public abstract AcceptanceCriterionResult
       AdminCanRevokeRoles(UserId actingUser, UserId targetUser, RoleId role);

2. Compiler fires:
   error REQ101: UserRolesFeature.AdminCanRevokeRoles has no matching spec method
   warning REQ301: UserRolesFeature.AdminCanRevokeRoles has no test

3. AI agent (or developer) creates spec method:
   [ForRequirement(typeof(UserRolesFeature),
       nameof(UserRolesFeature.AdminCanRevokeRoles))]
   Result RevokeRole(User actingUser, User targetUser, Role role);

4. Compiler fires:
   error CS0535: AuthorizationService does not implement
                 IUserRolesSpec.RevokeRole(User, User, Role)

5. AI agent implements the method in AuthorizationService.

6. Compiler fires:
   warning REQ301: UserRolesFeature.AdminCanRevokeRoles has no test

7. AI agent writes test:
   [Verifies(typeof(UserRolesFeature),
       nameof(UserRolesFeature.AdminCanRevokeRoles))]
   public void Admin_can_revoke_assigned_role() { ... }

8. Build succeeds. All diagnostics clear.

At every step, the compiler tells the AI exactly what's missing. There's no context assembly, no prompt engineering, no quality gate to configure. The type system guides the AI through the implementation.

The 78-Week Roadmap Question

The spec-driven Context Engineering specification includes a 78-week implementation roadmap across five phases. This is a serious commitment. It says: "Building a proper context engineering system is an 18-month project."

The typed specification approach doesn't have an equivalent roadmap because the "context engineering" is the source generator + analyzer — which, admittedly, also takes significant effort to build. But once built, there's no ongoing context assembly infrastructure to maintain. The type system does the work automatically.

The honest comparison:

Investment	Spec-Driven	Typed Specifications
Initial setup	Low (fill in templates)	High (build generators + analyzers)
Context assembly infrastructure	78 weeks (per the roadmap)	0 (type system does it)
Ongoing maintenance	Document updates + assembly rules	Generator maintenance + analyzer updates
Total cost at 2 years	Templates + infra + maintenance	Generators + maintenance
Total cost at 5 years	Templates + infra + maintenance + drift remediation	Generators + maintenance (no drift)

The spec-driven approach is cheaper to start. The typed approach is cheaper to maintain. The crossover point depends on team size, project complexity, and how often requirements change.

Deep Dive: Context Assembly vs Context Generation in Practice

Let's trace a complete context engineering cycle for both approaches. The task: an AI agent needs to add a new endpoint to an existing API.

Spec-Driven Context Assembly: Step by Step

Step 1: Task classification

The orchestrator receives: "Add a GET /api/users/{id}/roles endpoint that returns the roles assigned to a user."

It classifies this as a CodeGeneration task in Stage 2 (Core Implementation).

Step 2: Source selection

The Context Engineering rules specify four sources for a CodeGeneration task:

CONTEXT_RULE(code_generation_task):
  sources:
    - prd: feature_section(user_roles_management)
    - coding_practices: language_section(csharp)
    - testing: strategy_section(unit_tests, integration_tests)
    - codebase: related_files(controllers/*, services/UserRolesService.cs)
  assembly_strategy: task_driven
  max_context_tokens: 8000

Step 3: Content extraction

From the PRD:

Feature: user_roles_management
AC: "User can view their assigned roles"
Priority: High

From Coding Practices:

Language: C#
Patterns: SOLID, DI, Result pattern
Naming: PascalCase methods, camelCase params
Max method length: 30 lines
Error handling: Return Result<T>, don't throw

From Testing:

Unit tests: Arrange-Act-Assert, mock dependencies
Integration tests: Real database, test containers
Naming: MethodName_Scenario_ExpectedResult

From Codebase:

// Existing controller pattern (from AuthController.cs)
[ApiController]
[Route("api/[controller]")]
public class AuthController : ControllerBase
{
    private readonly IAuthService _authService;
    
    [HttpGet("{userId}/profile")]
    public async Task<IActionResult> GetProfile(Guid userId)
    {
        var result = await _authService.GetProfile(userId);
        return result.Match(
            success: Ok,
            failure: err => err switch
            {
                UserNotFoundException => NotFound(),
                _ => StatusCode(500)
            });
    }
}

Step 4: Assembly

The assembled context is ~4,000 tokens — well within the 8,000 budget. The prompt looks like:

You are implementing a new API endpoint.

TASK: Add GET /api/users/{id}/roles
FEATURE: user_roles_management (AC: "User can view their assigned roles")
PRIORITY: High

CODING PRACTICES:
- Use SOLID principles with dependency injection
- Return Result<T> for error handling (don't throw)
- PascalCase for methods, camelCase for parameters
- Max 30 lines per method

EXISTING PATTERN (follow this controller structure):
[existing AuthController.cs code]

TESTING REQUIREMENTS:
- Unit test with mocked service
- Integration test with real database
- Naming: MethodName_Scenario_ExpectedResult

Generate the endpoint implementation, unit test, and integration test.

Step 5: Validation

The quality gate checks the AI's output:

Does it compile? (Build)
Do tests pass? (Test runner)
Coverage > 80%? (Coverage tool)
Follows coding practices? (Linter/Roslyn analyzers)

If any gate fails, the context is enriched (progressive disclosure) and the AI regenerates.

Total cycle time: 5-15 minutes depending on CI pipeline speed.

Typed Specification Context Generation: Step by Step

Step 1: Developer adds AC to feature

// In UserRolesFeature.cs — adding a new AC

public abstract record UserRolesFeature : Feature<PlatformScalabilityEpic>
{
    // ... existing ACs ...

    /// <summary>
    /// Any authenticated user can view the roles assigned to a specific user.
    /// Returns the user's current roles with their assignment dates.
    /// </summary>
    public abstract AcceptanceCriterionResult
        UserCanViewAssignedRoles(UserId targetUser);
}

Step 2: Compiler fires immediately

error REQ101: UserRolesFeature.UserCanViewAssignedRoles has no matching
              spec method with [ForRequirement(typeof(UserRolesFeature),
              nameof(UserRolesFeature.UserCanViewAssignedRoles))]

warning REQ301: UserRolesFeature.UserCanViewAssignedRoles has no test

The AI agent (or developer) sees these diagnostics in the IDE. No context assembly needed — the diagnostics ARE the task.

Step 3: AI creates spec method

// In IUserRolesSpec.cs

[ForRequirement(typeof(UserRolesFeature),
    nameof(UserRolesFeature.UserCanViewAssignedRoles))]
Result<IReadOnlyList<RoleAssignment>, DomainException> GetUserRoles(UserId userId);

Note the return type: Result<IReadOnlyList<RoleAssignment>, DomainException>. This is not free-form — the AI must choose types that exist in the SharedKernel project. If RoleAssignment doesn't exist, the AI must create it — and the compiler will verify its shape.

Step 4: Compiler fires — interface not implemented

error CS0535: 'AuthorizationService' does not implement interface member
              'IUserRolesSpec.GetUserRoles(UserId)'

Step 5: AI implements the method

// In AuthorizationService.cs

[ForRequirement(typeof(UserRolesFeature),
    nameof(UserRolesFeature.UserCanViewAssignedRoles))]
public Result<IReadOnlyList<RoleAssignment>, DomainException> GetUserRoles(UserId userId)
{
    var user = _userRepository.FindById(userId);
    if (user is null)
        return Result<IReadOnlyList<RoleAssignment>, DomainException>
            .Failure(new UserNotFoundException(userId));

    var assignments = _roleRepository.GetAssignmentsForUser(userId);
    return Result<IReadOnlyList<RoleAssignment>, DomainException>
        .Success(assignments);
}

Step 6: Compiler fires — no test

warning REQ301: UserRolesFeature.UserCanViewAssignedRoles has no test

Step 7: AI writes test

[Verifies(typeof(UserRolesFeature),
    nameof(UserRolesFeature.UserCanViewAssignedRoles))]
public void Existing_user_returns_their_roles()
{
    var userId = TestUsers.Alice.Id;
    _roleRepository.Assign(userId, Roles.Editor, DateTime.UtcNow.AddDays(-30));
    _roleRepository.Assign(userId, Roles.Reviewer, DateTime.UtcNow.AddDays(-7));

    var result = _service.GetUserRoles(userId);

    Assert.That(result.IsSuccess, Is.True);
    Assert.That(result.Value, Has.Count.EqualTo(2));
    Assert.That(result.Value.Select(r => r.Role), 
        Is.EquivalentTo(new[] { Roles.Editor, Roles.Reviewer }));
}

[Verifies(typeof(UserRolesFeature),
    nameof(UserRolesFeature.UserCanViewAssignedRoles))]
public void Nonexistent_user_returns_not_found()
{
    var result = _service.GetUserRoles(new UserId(Guid.NewGuid()));

    Assert.That(result.IsSuccess, Is.False);
    Assert.That(result.Error, Is.InstanceOf<UserNotFoundException>());
}

Step 8: Build succeeds

All diagnostics clear. The chain is complete.

Total cycle time: 2-5 minutes (compile-time feedback, no CI dependency).

The Difference in Numbers

Metric	Spec-Driven	Typed Specifications
Steps to complete	5 (classify → select → extract → assemble → validate)	4 (add AC → spec → impl → test)
External dependencies	CI pipeline, coverage tool, linter	Compiler only
Feedback latency	5-15 minutes (pipeline)	5-30 seconds (compile)
Context accuracy	Depends on assembly rules	Always correct (types are code)
Iteration cost	High (re-run pipeline)	Low (re-compile)
Context size	~4,000 tokens (assembled)	Variable (type definitions + diagnostics)
Risk of wrong context	Possible (wrong PRD section selected)	None (types are unambiguous)

The CLAUDE.md Bridge

In practice, many typed specification projects use a CLAUDE.md file (or equivalent) that explains the typed specification system to AI agents. This is the "meta-context" that teaches the AI how the type system works:

# CLAUDE.md — Project Context

## Requirements System
This project uses typed specifications. Features are C# abstract records.
Acceptance criteria are abstract methods on feature records.

When you see a compiler diagnostic like REQ101, it means an AC has no specification.
When you see REQ301, it means an AC has no test.

## The Chain
1. Add AC to feature record (MyApp.Requirements)
2. Add spec method to interface (MyApp.Specifications) with [ForRequirement]
3. Implement in domain (MyApp.Domain)
4. Test with [Verifies] (MyApp.Tests)

## Conventions
- Feature types: abstract record XxxFeature : Feature<Epic>
- Spec interfaces: IXxxSpec with [ForRequirement(typeof(XxxFeature))]
- Tests: [TestsFor(typeof(XxxFeature))] class, [Verifies(...)] methods
- Error handling: Result<T, TError> pattern, never throw

This is a document — a spec-driven document — that explains the typed system. The irony is inescapable: even the typed approach needs a document to bootstrap. But this document is small (~50 lines), stable (the conventions rarely change), and supplementary (the types do the real work). It's the difference between a 50-line bootstrap document and a 15,000-line specification framework.

Summary

Context engineering is the domain where the philosophical gap is widest:

Spec-driven treats context as a logistics problem: assemble the right documents, deliver them to the AI, validate the output.
Typed specifications treat context as a structural problem: encode context in types, let the compiler enforce it, and the context is always correct.

Both solve real problems. But they solve them at different layers of the stack. Spec-driven solves context at the prompt layer. Typed specifications solve context at the compiler layer. The prompt layer is more flexible and language-agnostic. The compiler layer is more rigorous and self-correcting.

Part V examines how each approach handles the next critical concern: testing.

The Context Window Economics

Every AI model has a finite context window. GPT-4o: 128K tokens. Claude: 200K tokens. Sounds like a lot until you actually budget it for a real implementation task. The way each approach uses that window determines how much room the AI has to think.

Token Budget: A Real Comparison

Let's budget the context window for the same task: implementing Order Cancellation (3 ACs) with Claude (200K context window).

Spec-Driven Token Budget:

┌──────────────────────────────────────────────────────────┐
│           SPEC-DRIVEN CONTEXT WINDOW (200K)               │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  System prompt + instructions         ~   800 tokens     │
│  PRD: order_cancellation section      ~ 1,200 tokens     │
│  Coding Practices: C# rules          ~   900 tokens     │
│  Testing Specification: strategies    ~ 1,100 tokens     │
│  Documentation rules                  ~   400 tokens     │
│  Existing code: OrderService.cs       ~ 1,800 tokens     │
│  Existing code: PaymentService.cs     ~ 1,200 tokens     │
│  Existing code: EmailService.cs       ~   800 tokens     │
│  Existing code: Order.cs              ~   600 tokens     │
│  Existing code: related interfaces    ~ 1,400 tokens     │
│  ─────────────────────────────────────────────────────   │
│  TOTAL CONTEXT CONSUMED               ~10,200 tokens     │
│                                                          │
│  Round 2 additions (gate feedback):                      │
│  Previous generation (AI's own code)  ~ 3,500 tokens     │
│  Gate feedback + enriched context     ~ 1,200 tokens     │
│  ─────────────────────────────────────────────────────   │
│  TOTAL CONTEXT BY ROUND 2             ~14,900 tokens     │
│                                                          │
│  Round 3 additions:                                      │
│  Additional code context              ~ 2,000 tokens     │
│  Previous gate feedback               ~   800 tokens     │
│  ─────────────────────────────────────────────────────   │
│  TOTAL CONTEXT BY ROUND 3             ~17,700 tokens     │
│                                                          │
│  Available for AI reasoning:                             │
│  Round 1: 200,000 - 10,200 = 189,800 tokens             │
│  Round 2: 200,000 - 14,900 = 185,100 tokens             │
│  Round 3: 200,000 - 17,700 = 182,300 tokens             │
│                                                          │
└──────────────────────────────────────────────────────────┘

For this small feature, the context overhead seems manageable — about 5-9% of the window. But this is one feature with three ACs in a small codebase. Scale it to a real project:

Spec-Driven at Scale (50 features, shared context):

┌──────────────────────────────────────────────────────────┐
│      SPEC-DRIVEN AT SCALE — IMPLEMENTING FEATURE #37     │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  System prompt + instructions         ~   800 tokens     │
│  PRD: feature #37 section             ~ 1,500 tokens     │
│  PRD: related features (cross-refs)   ~ 3,200 tokens     │
│  Coding Practices: full C# rules     ~ 2,800 tokens     │
│  Testing Specification: strategies    ~ 2,400 tokens     │
│  Architecture rules / boundaries      ~ 1,600 tokens     │
│  Documentation rules                  ~   800 tokens     │
│  Existing code: 12 relevant files     ~18,000 tokens     │
│  Existing tests: related patterns     ~ 6,000 tokens     │
│  Previous agent context (history)     ~ 4,000 tokens     │
│  ─────────────────────────────────────────────────────   │
│  TOTAL CONTEXT CONSUMED               ~41,100 tokens     │
│                                                          │
│  Available for AI reasoning: 158,900 tokens (79.5%)      │
│  Context overhead: 20.5% of window                       │
│                                                          │
└──────────────────────────────────────────────────────────┘

Twenty percent of the context window consumed by specification documents before the AI writes a single line of code. On iterative rounds, that grows. And this is with a well-tuned context engineering system that selects only relevant sections. A poorly tuned system might include the full PRD (15,000+ tokens), the full testing specification (8,000+ tokens), and the full coding practices (5,000+ tokens) — consuming 30-40% of the window.

Typed Specification Token Budget:

┌──────────────────────────────────────────────────────────┐
│      TYPED SPECIFICATION CONTEXT WINDOW (200K)            │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  CLAUDE.md (bootstrap document)       ~   200 tokens     │
│  Feature record (3 ACs + XML docs)    ~   350 tokens     │
│  Compiler diagnostics (3 REQ101s)     ~   150 tokens     │
│  ─────────────────────────────────────────────────────   │
│  TOTAL CONTEXT CONSUMED (Round 1)     ~   700 tokens     │
│                                                          │
│  Round 2 (spec created, need impl):                      │
│  Feature record                       ~   350 tokens     │
│  Spec interface (3 methods)           ~   250 tokens     │
│  Compiler diagnostics (3 CS0535s)     ~   150 tokens     │
│  ─────────────────────────────────────────────────────   │
│  TOTAL CONTEXT (Round 2)              ~   750 tokens     │
│                                                          │
│  Round 3 (impl done, need tests):                        │
│  Feature record                       ~   350 tokens     │
│  Spec interface                       ~   250 tokens     │
│  Implementation (3 methods)           ~   600 tokens     │
│  Compiler diagnostics (3 REQ301s)     ~   150 tokens     │
│  ─────────────────────────────────────────────────────   │
│  TOTAL CONTEXT (Round 3)              ~ 1,350 tokens     │
│                                                          │
│  Available for AI reasoning:                             │
│  Round 1: 200,000 -   700 = 199,300 tokens (99.7%)      │
│  Round 2: 200,000 -   750 = 199,250 tokens (99.6%)      │
│  Round 3: 200,000 - 1,350 = 198,650 tokens (99.3%)      │
│                                                          │
│  Context overhead: 0.3% - 0.7% of window                │
│                                                          │
└──────────────────────────────────────────────────────────┘

The difference is not incremental. It's an order of magnitude. The typed approach consumes 700-1,350 tokens of context. The spec-driven approach consumes 10,200-41,100 tokens. The AI working within typed specifications has 95-99% more headroom for reasoning.

Why Context Headroom Matters

"But 200K is huge — 20% overhead still leaves 160K for reasoning. Isn't that enough?"

For simple features, yes. But AI reasoning quality degrades as the context fills:

Attention dilution. Language models attend to all tokens in the context. The more tokens, the more the model's attention is spread. Critical information (the AC signature, the return type) competes with boilerplate (coding practices, documentation rules, existing code that's "for reference"). The typed approach puts the critical information — and only the critical information — in the context.
Instruction following. Research consistently shows that models follow instructions less precisely as context length increases. A model with 700 tokens of context follows the spec interface signature exactly. A model with 41,000 tokens of context might drift from the coding practices on page 3 because it's focused on the existing code on page 8.
The needle-in-haystack problem. The spec-driven approach puts the critical information (the AC text) inside a large document alongside non-critical information (other features, general practices, documentation rules). The AI must find the needle. The typed approach IS the needle — there's nothing else.

The Token Budget Table

Budget Item	Spec-Driven	Typed Specifications
Specification documents	5,000 - 15,000 tokens	0 tokens (no documents)
Coding practices	900 - 2,800 tokens	0 tokens (enforced by compiler)
Testing specification	1,100 - 2,400 tokens	0 tokens (enforced by analyzer)
Feature definition	Included in PRD (~300 tokens)	Feature record (~350 tokens)
Compiler diagnostics	N/A	50 - 200 tokens
Bootstrap (CLAUDE.md)	N/A	~200 tokens (one-time)
Existing code context	3,000 - 18,000 tokens	As needed (IDE provides)
Total overhead	10,000 - 40,000 tokens	500 - 1,500 tokens
% of 200K window	5% - 20%	0.3% - 0.8%
Reasoning headroom	80% - 95%	99%+

The spec-driven approach must tell the AI what coding practices to follow (consuming tokens). The typed approach has the compiler enforce coding practices (consuming zero tokens). The spec-driven approach must tell the AI what testing strategies to apply (consuming tokens). The typed approach has the analyzer emit REQ301 diagnostics (consuming zero tokens).

Every rule encoded in a document is a token cost. Every rule encoded in the type system is free.

Context Correctness vs Context Freshness

The spec-driven approach faces two distinct problems that the typed approach eliminates entirely. Understanding why requires examining what "correct context" actually means.

Problem 1: Context Correctness — Getting the Right Context

Context correctness means: the information the AI receives accurately describes the current state of the system. In spec-driven, this requires that:

The PRD section for the feature is accurate and complete
The coding practices match what the team actually uses
The testing specification reflects current testing expectations
The existing code files selected are the relevant ones
No contradictions exist between documents

Each of these is a potential failure point. Let's trace a realistic scenario.

The Scenario: Stale Context Produces Wrong Code

Three weeks ago, the team wrote the Order Cancellation PRD section:

DEFINE_FEATURE(order_cancellation)
  acceptance_criteria:
    - "Customer can cancel an order that has not yet shipped"
    - "Cancellation triggers a full refund to original payment method"
    - "Confirmation email sent after successful cancellation"

Since then, two things happened that the PRD doesn't reflect:

The team decided that partial refunds are needed for orders with some items already shipped. They updated the PaymentService to accept a refundAmount parameter instead of always refunding the full total. This change lives in the code but not in the PRD.
The team added a fourth AC — "cancellation reason is required" — directly in the code. A developer added the field to the Order entity, the API accepted a reason parameter, and tests were written. But nobody updated the PRD.

Now an AI agent is asked to implement a new endpoint that uses order cancellation. The context engineering system assembles the PRD (3 ACs, full refund) and the existing code (4 ACs, partial refund). The AI receives contradictory context:

CONTEXT CONFLICT:

PRD says: "Cancellation triggers a full refund to original payment method"

PaymentService.cs shows:
  public async Task<RefundResult> ProcessRefund(
      OrderId orderId, decimal refundAmount, PaymentMethodId paymentMethod)
  //                    ^^^^^^^^^^^^^ partial refund support

The AI must choose: follow the PRD (full refund) or follow the code (partial refund)? Different AI models make different choices. Different runs of the same model may make different choices. The context is "correct" by the system's standards — the PRD section was selected, the code was selected — but the information is contradictory.

The quality gate cannot catch this. The code compiles. The tests pass (because the tests reflect the code, not the PRD). The coverage is above threshold. The contradiction survives until a human reviews the output and notices the discrepancy.

How the Typed Approach Eliminates This

In the typed approach, this scenario is structurally impossible. Here's why:

When the team decided to support partial refunds, they changed the spec method signature:

[ForRequirement(typeof(OrderCancellationFeature),
    nameof(OrderCancellationFeature.CancellationTriggersRefund))]
Result<RefundInitiation, RefundException> InitiateRefund(
    OrderId orderId, decimal refundAmount, PaymentMethodId originalPayment);

The moment this signature changed, every call site that passes the old signature broke. The compiler forced all callers to provide the new refundAmount parameter. The code and the specification are the same thing — there's no PRD to drift from.

When the developer added the fourth AC, they added an abstract method:

public abstract AcceptanceCriterionResult
    CancellationReasonRequired(OrderId orderId, CancellationReason reason);

The compiler immediately fired REQ101 (no spec method) and REQ301 (no test). The chain was completed (spec, implementation, test) before the AC was considered "done." There's no state where the AC exists in code but not in the specification — because the specification IS the code.

Problem 2: Context Freshness — Getting Current Context

Context freshness means: the information the AI receives reflects the current state of the project, not a past state. In spec-driven, freshness requires that:

Someone updates the PRD when requirements change
Someone updates the coding practices when conventions evolve
Someone updates the testing specification when strategies change
The context assembly system selects the latest versions

This is a human process. Documents drift because updating documents is a separate action from updating code. There's no compiler that says "you changed the code but not the PRD." The drift is silent.

The Freshness Timeline

Week 1:  PRD written ────────────────── PRD says: 3 ACs, full refund
Week 2:  Code updated (partial refund) ─ PRD says: 3 ACs, full refund  ← STALE
Week 3:  AC added in code ──────────── PRD says: 3 ACs, full refund  ← STALER
Week 4:  AI reads PRD ─────────────── AI sees: 3 ACs, full refund   ← WRONG

By week 4, the PRD is three weeks stale. The AI generates code based on outdated information. The quality gate passes because the gate checks the code's internal consistency (compiles, tests pass, coverage met), not its consistency with the PRD.

In the typed approach, freshness is structural:

Week 1:  Feature record created ─── Types say: 3 ACs, full refund
Week 2:  Spec signature changed ─── Types say: 3 ACs, partial refund  ← INSTANT
Week 3:  AC added to record ──────── Types say: 4 ACs, partial refund ← INSTANT
Week 4:  AI reads types ──────────── AI sees: 4 ACs, partial refund  ← CORRECT

There's no lag. There's no "someone needs to update the document." The types change when the code changes because the types ARE the code. The AI always reads the current state because there is no separate representation that could be outdated.

The Two-Problem Table

Problem	Spec-Driven	Typed Specifications
Correctness	Depends on document accuracy	Guaranteed (types are code)
Freshness	Depends on document updates	Guaranteed (types are always current)
Contradiction detection	None (documents and code are independent)	Compiler error (code must match types)
Drift remediation	Manual audit: compare PRD to code	Automatic: compiler prevents drift
Cost of staleness	Wrong AI output, passes quality gate	Build failure — immediate, loud, blocking
Who maintains freshness?	Humans (remember to update docs)	Compiler (refuses to build if inconsistent)

The spec-driven approach assumes that documents will be maintained. The typed approach doesn't assume anything — it enforces consistency through the type system. When documents and code can drift apart, they will. When types and code are the same thing, they can't.

This is not a theoretical concern. Every team that has maintained a wiki, an ADR repository, or a specifications folder knows the reality: documents drift. The question is not whether your PRD will become stale — it's when, and whether anyone will notice before the AI reads it.

Part V examines how each approach handles the next critical concern: testing.

`[` or `Alt+S`	Focus sidebar navigation
`]` or `Alt+C`	Focus main content
`↑` `↓`	Navigate between sidebar items
`Enter`	Open page / toggle section
`Space`	Toggle section expand/collapse
`Escape`	Close overlay / sidebar

`Ctrl+K`	Open search
`?`	Show this help

`Ctrl+=` or `Ctrl+↑`	Increase font size
`Ctrl+−` or `Ctrl+↓`	Decrease font size
`f`	Open console font selector

`Ctrl+⇧+=` or `Ctrl+⇧+↑`	Browser zoom in
`Ctrl+⇧+−` or `Ctrl+⇧+↓`	Browser zoom out
`Ctrl+⇧+0`	Reset browser zoom

`Tab`	Focus a diagram or image
`Enter`	Open full size overlay
`+` `−`	Zoom in / out (in overlay)
`Escape`	Close overlay, return focus

Context Engineering: Assembly vs Generation📋

The Spec-Driven Approach: Context Assembly📋

Four Context Sources📋

Three Assembly Strategies📋

Validation Framework📋

What This Means in Practice📋

The Strength📋

The Weakness📋

The Typed Specification Approach: Context Generation📋

The M3 Meta-Metamodel as Context Source📋

The Traceability Matrix as Context📋

Context Through the IDE📋

The Core Difference📋

The Assembly Problem📋

The Discovery Problem📋

A Practical Comparison📋

Spec-Driven Flow📋

Typed Specification Flow📋

The 78-Week Roadmap Question📋

Deep Dive: Context Assembly vs Context Generation in Practice📋

Spec-Driven Context Assembly: Step by Step📋

Typed Specification Context Generation: Step by Step📋

The Difference in Numbers📋

The CLAUDE.md Bridge📋

Summary📋

The Context Window Economics📋

Token Budget: A Real Comparison📋

Why Context Headroom Matters📋

The Token Budget Table📋

Context Correctness vs Context Freshness📋

Problem 1: Context Correctness — Getting the Right Context📋

Problem 2: Context Freshness — Getting Current Context📋

The Two-Problem Table📋