Context Engineering: Assembly vs Generation
Context Engineering is the spec-driven framework's most innovative pillar — and arguably where the philosophical gap between the two approaches is widest. Both solve the same problem: how do you get the right information to the right AI agent at the right time? The answers are incompatible.
The Spec-Driven Approach: Context Assembly
The Context-Engineering-as-Code specification defines a sophisticated system for assembling context from multiple sources and delivering it to AI agents. The core model:
Four Context Sources
Codebase — The actual source code, file structure, dependencies, and build configuration. This is the "ground truth" that the AI needs to understand.
Documentation — Existing documentation, ADRs, wiki pages, README files, API docs. This is supplementary information that explains intent, decisions, and constraints.
Specifications — The other five pillars (PRD, Testing, Coding Practices, etc.). This is the "what should be" information that guides the AI's output.
Runtime Data — Logs, metrics, error reports, deployment status. This is the "what actually happened" information that helps the AI understand current state.
Three Assembly Strategies
Strategy 1: Task-Driven Assembly
The context assembled depends on the task type. An AI agent implementing a new feature gets the PRD section for that feature, the relevant coding practices, and the testing specification. An AI agent fixing a bug gets the error logs, the relevant code, and the documentation for the affected module.
Task: "Implement password reset"
│
├── PRD → Feature: password_reset section
├── Coding Practices → Language: C#, patterns: SOLID
├── Testing → Strategy: unit + integration
├── Codebase → Files: AuthController.cs, UserService.cs
└── Context assembled → Prompt sent to AITask: "Implement password reset"
│
├── PRD → Feature: password_reset section
├── Coding Practices → Language: C#, patterns: SOLID
├── Testing → Strategy: unit + integration
├── Codebase → Files: AuthController.cs, UserService.cs
└── Context assembled → Prompt sent to AIStrategy 2: Progressive Disclosure
Start with minimal context. If the AI produces inadequate output, progressively add more context. This avoids overwhelming the AI with irrelevant information while ensuring it gets what it needs.
Round 1: PRD feature section only → AI generates code → Quality gate fails
Round 2: + Coding practices → AI revises → Quality gate fails
Round 3: + Related code files → AI revises → Quality gate passesRound 1: PRD feature section only → AI generates code → Quality gate fails
Round 2: + Coding practices → AI revises → Quality gate fails
Round 3: + Related code files → AI revises → Quality gate passesStrategy 3: Adaptive Optimization
Learn from past interactions which context combinations produce the best results. Over time, the system optimizes context assembly for each task type and AI model.
Validation Framework
The assembled context is validated through quality gates before being sent to the AI:
- Completeness: Does the context include all relevant specification sections?
- Consistency: Do the specification sections agree with each other?
- Relevance: Is the context focused on the current task?
- Size: Does the context fit within the AI's context window?
What This Means in Practice
The spec-driven context engineering approach is essentially a sophisticated prompt engineering system. It takes structured documents, selects relevant sections, assembles them into a prompt, sends it to the AI, validates the output, and iterates. The innovation is systematizing this process rather than doing it ad-hoc.
Documents ──→ Context Selector ──→ Assembler ──→ Validator ──→ AI Prompt
│
▼
AI Output
│
▼
Quality Gate
│ │
Pass ✓ Fail ✗
│ │
Done Iterate
(add context)Documents ──→ Context Selector ──→ Assembler ──→ Validator ──→ AI Prompt
│
▼
AI Output
│
▼
Quality Gate
│ │
Pass ✓ Fail ✗
│ │
Done Iterate
(add context)The Strength
This is genuinely novel. Most AI-assisted development tools do ad-hoc context assembly — the developer manually selects files, writes a prompt, and hopes the AI understands. The spec-driven approach replaces hope with system. That's real progress.
The 78-week implementation roadmap across five phases shows the ambition: this isn't a template — it's a framework for building AI orchestration pipelines.
The Weakness
The entire system operates on text. Context is assembled from text documents, validated by text-matching quality gates, and delivered as text prompts. There is no structural guarantee that the assembled context is correct — only that it passes heuristic quality checks.
What "correctness" means here is fuzzy. "Does the context include the relevant PRD section?" Well, which section is relevant? The system must decide, and that decision is itself a heuristic. The quality gate can check that the context includes something from the PRD, but it can't verify that it includes the right thing.
The Typed Specification Approach: Context Generation
The typed specification approach doesn't assemble context. It generates it.
The M3 Meta-Metamodel as Context Source
At compile time, Stage 0 of the generation pipeline scans all assemblies for [MetaConcept] attributes and produces MetamodelRegistry.g.cs — a complete map of every concept, every property, every reference, and every constraint in the system:
// Generated at compile time — MetamodelRegistry.g.cs
public static class MetamodelRegistry
{
public static readonly IReadOnlyDictionary<string, MetaConceptDefinition> Concepts = new()
{
["AggregateRoot"] = new MetaConceptDefinition
{
Name = "AggregateRoot",
InheritsFrom = "Entity",
Properties =
{
new("BoundedContext", typeof(string), Required: true),
new("HasWorkflow", typeof(bool), Default: false),
},
Constraints =
{
new("MustHaveId", "Properties.Any(p => p.IsAnnotatedWith('EntityId'))"),
new("NoDirectComposition",
"References.Where(r => r.IsComposition).All(r => r.Target.IsEntity)"),
},
},
["Feature"] = new MetaConceptDefinition
{
Name = "Feature",
InheritsFrom = "RequirementMetadata",
Properties =
{
new("Title", typeof(string), Required: true),
new("Priority", typeof(RequirementPriority), Required: true),
new("Owner", typeof(string), Required: true),
},
Constraints =
{
new("MustHaveACs",
"Methods.Any(m => m.ReturnType == typeof(AcceptanceCriterionResult))"),
},
},
// ... every concept from every DSL
};
}// Generated at compile time — MetamodelRegistry.g.cs
public static class MetamodelRegistry
{
public static readonly IReadOnlyDictionary<string, MetaConceptDefinition> Concepts = new()
{
["AggregateRoot"] = new MetaConceptDefinition
{
Name = "AggregateRoot",
InheritsFrom = "Entity",
Properties =
{
new("BoundedContext", typeof(string), Required: true),
new("HasWorkflow", typeof(bool), Default: false),
},
Constraints =
{
new("MustHaveId", "Properties.Any(p => p.IsAnnotatedWith('EntityId'))"),
new("NoDirectComposition",
"References.Where(r => r.IsComposition).All(r => r.Target.IsEntity)"),
},
},
["Feature"] = new MetaConceptDefinition
{
Name = "Feature",
InheritsFrom = "RequirementMetadata",
Properties =
{
new("Title", typeof(string), Required: true),
new("Priority", typeof(RequirementPriority), Required: true),
new("Owner", typeof(string), Required: true),
},
Constraints =
{
new("MustHaveACs",
"Methods.Any(m => m.ReturnType == typeof(AcceptanceCriterionResult))"),
},
},
// ... every concept from every DSL
};
}This registry is the context. An AI agent working within this codebase can read the metamodel registry and understand:
- Every domain concept and its shape
- Every constraint and what it enforces
- Every relationship between concepts
- Every requirement and its acceptance criteria
- Every specification and what it requires
The difference from spec-driven context: this context cannot be wrong. It's generated from the same types the compiler checks. If the code changes, the registry changes. If a concept is added, it appears in the registry. If a constraint is removed, it disappears.
The Traceability Matrix as Context
Stage 4 of the generation pipeline produces a traceability matrix — a complete map from requirements to specifications to implementations to tests:
// Generated at compile time — TraceabilityMatrix.g.cs
public static class TraceabilityMatrix
{
public static readonly IReadOnlyList<RequirementTrace> Traces = new[]
{
new RequirementTrace
{
Feature = typeof(UserRolesFeature),
AcceptanceCriteria = new[]
{
new AcceptanceCriterionTrace
{
Name = nameof(UserRolesFeature.AdminCanAssignRoles),
Specification = typeof(IUserRolesSpec),
SpecMethod = nameof(IUserRolesSpec.AssignRole),
Implementation = typeof(AuthorizationService),
ImplMethod = nameof(AuthorizationService.AssignRole),
Tests = new[]
{
typeof(UserRolesFeatureTests)
.GetMethod(nameof(
UserRolesFeatureTests
.Admin_with_ManageRoles_permission_can_assign_roles)),
},
IsCovered = true,
},
// ... other ACs
},
OverallCoverage = 1.0, // 100% — all ACs have specs, impls, and tests
},
new RequirementTrace
{
Feature = typeof(PasswordResetFeature),
AcceptanceCriteria = new[]
{
// ...
new AcceptanceCriterionTrace
{
Name = nameof(PasswordResetFeature.ResetLinkCanOnlyBeUsedOnce),
Specification = null, // ⚠ No spec yet
Implementation = null, // ⚠ No impl yet
Tests = Array.Empty<MethodInfo>(),
IsCovered = false,
},
},
OverallCoverage = 0.75, // 3 of 4 ACs covered
},
};
}// Generated at compile time — TraceabilityMatrix.g.cs
public static class TraceabilityMatrix
{
public static readonly IReadOnlyList<RequirementTrace> Traces = new[]
{
new RequirementTrace
{
Feature = typeof(UserRolesFeature),
AcceptanceCriteria = new[]
{
new AcceptanceCriterionTrace
{
Name = nameof(UserRolesFeature.AdminCanAssignRoles),
Specification = typeof(IUserRolesSpec),
SpecMethod = nameof(IUserRolesSpec.AssignRole),
Implementation = typeof(AuthorizationService),
ImplMethod = nameof(AuthorizationService.AssignRole),
Tests = new[]
{
typeof(UserRolesFeatureTests)
.GetMethod(nameof(
UserRolesFeatureTests
.Admin_with_ManageRoles_permission_can_assign_roles)),
},
IsCovered = true,
},
// ... other ACs
},
OverallCoverage = 1.0, // 100% — all ACs have specs, impls, and tests
},
new RequirementTrace
{
Feature = typeof(PasswordResetFeature),
AcceptanceCriteria = new[]
{
// ...
new AcceptanceCriterionTrace
{
Name = nameof(PasswordResetFeature.ResetLinkCanOnlyBeUsedOnce),
Specification = null, // ⚠ No spec yet
Implementation = null, // ⚠ No impl yet
Tests = Array.Empty<MethodInfo>(),
IsCovered = false,
},
},
OverallCoverage = 0.75, // 3 of 4 ACs covered
},
};
}An AI agent reading this matrix knows exactly:
- Which features exist and what their ACs are
- Which ACs have specifications, implementations, and tests
- Which ACs are missing coverage (and what kind of coverage is missing)
- What the overall completion state of the project is
This is not a document that might be outdated. It's generated code that reflects the current state of the type system. It's always correct because it's produced by the same compiler pass that validates the code.
Context Through the IDE
In the typed approach, context delivery to AI agents happens naturally through the IDE:
The AI agent sees the current file. If it's a test file, it sees the
[Verifies(typeof(UserRolesFeature), nameof(...))]attributes — which tell it exactly which feature and AC this test covers.The AI agent can follow type references.
typeof(UserRolesFeature)leads to the feature definition.typeof(IUserRolesSpec)leads to the specification. The type system IS the navigation graph.The AI agent can read compiler diagnostics.
error REQ101: ...tells it what's missing. The diagnostic IS the task description.
AI Agent sees:
├── Current file: PasswordResetFeatureTests.cs
│ └── [TestsFor(typeof(PasswordResetFeature))]
│ ├── [Verifies(...RequestEmail)] → Ctrl+Click → Feature definition
│ ├── [Verifies(...LinkExpires24h)] → Ctrl+Click → Feature definition
│ └── Missing: Verifies for ResetLinkCanOnlyBeUsedOnce
│
├── Compiler diagnostics:
│ ├── REQ301: PasswordResetFeature.ResetLinkCanOnlyBeUsedOnce has no test
│ └── REQ101: PasswordResetFeature.ResetLinkCanOnlyBeUsedOnce has no spec
│
└── Context is COMPLETE — no assembly neededAI Agent sees:
├── Current file: PasswordResetFeatureTests.cs
│ └── [TestsFor(typeof(PasswordResetFeature))]
│ ├── [Verifies(...RequestEmail)] → Ctrl+Click → Feature definition
│ ├── [Verifies(...LinkExpires24h)] → Ctrl+Click → Feature definition
│ └── Missing: Verifies for ResetLinkCanOnlyBeUsedOnce
│
├── Compiler diagnostics:
│ ├── REQ301: PasswordResetFeature.ResetLinkCanOnlyBeUsedOnce has no test
│ └── REQ101: PasswordResetFeature.ResetLinkCanOnlyBeUsedOnce has no spec
│
└── Context is COMPLETE — no assembly neededThe Core Difference
| Dimension | Spec-Driven Context | Typed Specification Context |
|---|---|---|
| Source | Documents (text files) | Types (compiled code) |
| Assembly | Runtime selection from documents | Compile-time generation from types |
| Correctness | Heuristic validation (quality gates) | Structural guarantee (compiler) |
| Staleness | Possible (documents can drift) | Impossible (generated from current types) |
| Completeness | Checked by heuristic | Checked by analyzer (REQ1xx-REQ4xx) |
| AI delivery | Prompt engineering (text in → text out) | Type system constraints (code in → compiled code out) |
| Feedback loop | Quality gate → iterate prompt | Compiler error → fix code |
| Granularity | Document section level | Method/property/type level |
The Assembly Problem
The spec-driven approach faces the context assembly problem: given a task, which document sections should the AI see? Too much context overwhelms. Too little context produces wrong output. The system must select — and selection can be wrong.
The typed approach eliminates the assembly problem. There's nothing to select. The type system is the context. The compiler diagnostics are the task. The AI writes code within the type system, and the compiler validates it. If the code is wrong, the compiler says why. If the code is right, the compiler says nothing.
This is not a minor difference. The entire Context Engineering pillar — the four context sources, the three assembly strategies, the validation framework, the 78-week roadmap — exists to solve a problem that typed specifications don't have. Not because typed specifications are smarter, but because they encode context in a medium (the type system) that doesn't need assembly.
The Discovery Problem
But typed specifications face the discovery problem: an AI agent working in a typed codebase needs to know that typed requirements exist, how the [ForRequirement] system works, and what the analyzer diagnostics mean. This is meta-context — context about the context system.
The spec-driven approach doesn't have this problem. A document is self-describing: "This is the Testing Specification. It defines 15 testing strategies." An AI can read the document and understand what it's looking at.
A typed specification system is not self-describing to an AI that hasn't seen it before. abstract record UserRolesFeature : Feature<PlatformScalabilityEpic> means nothing to an AI that doesn't know what Feature<T> is or why acceptance criteria are abstract methods. The AI needs onboarding — and that onboarding is itself a document (a CLAUDE.md, a README, a system prompt).
This is an irony worth noting: the typed specification approach still needs a document to explain itself. The types eliminate the need for specification documents, but they create the need for a meta-specification document that explains the type system.
A Practical Comparison
Let's trace both approaches through a concrete scenario: an AI agent needs to implement a new AC for the UserRolesFeature.
Spec-Driven Flow
1. PRD updated: "Add AC: Admin can revoke roles"
2. Context Engineering assembles:
- PRD section for user_roles_management
- Coding Practices for C#
- Testing spec for unit + integration
- Existing code: AuthorizationService.cs
3. Prompt sent to AI:
"Implement the new acceptance criterion 'Admin can revoke roles' for the
user roles management feature. Follow C# coding practices. Write unit
and integration tests. Here is the existing code: [AuthorizationService.cs]"
4. AI generates:
- RevokeRole method in AuthorizationService
- Unit test: AdminCanRevokeRole
- Integration test: RevokeRolePersists
5. Quality gate checks:
- Code compiles? ✓
- Tests pass? ✓
- Coverage > 80%? ✓
- Coding practices followed? ✓ (linter)
6. Done.1. PRD updated: "Add AC: Admin can revoke roles"
2. Context Engineering assembles:
- PRD section for user_roles_management
- Coding Practices for C#
- Testing spec for unit + integration
- Existing code: AuthorizationService.cs
3. Prompt sent to AI:
"Implement the new acceptance criterion 'Admin can revoke roles' for the
user roles management feature. Follow C# coding practices. Write unit
and integration tests. Here is the existing code: [AuthorizationService.cs]"
4. AI generates:
- RevokeRole method in AuthorizationService
- Unit test: AdminCanRevokeRole
- Integration test: RevokeRolePersists
5. Quality gate checks:
- Code compiles? ✓
- Tests pass? ✓
- Coverage > 80%? ✓
- Coding practices followed? ✓ (linter)
6. Done.Questions the quality gate cannot answer:
- Does the implementation actually match the AC? (It checks syntax, not semantics.)
- Is the test testing the right thing? (It checks existence, not correctness.)
- Is the AC linked to the right feature? (There's no structural link.)
Typed Specification Flow
1. Developer adds abstract method to UserRolesFeature:
public abstract AcceptanceCriterionResult
AdminCanRevokeRoles(UserId actingUser, UserId targetUser, RoleId role);
2. Compiler fires:
error REQ101: UserRolesFeature.AdminCanRevokeRoles has no matching spec method
warning REQ301: UserRolesFeature.AdminCanRevokeRoles has no test
3. AI agent (or developer) creates spec method:
[ForRequirement(typeof(UserRolesFeature),
nameof(UserRolesFeature.AdminCanRevokeRoles))]
Result RevokeRole(User actingUser, User targetUser, Role role);
4. Compiler fires:
error CS0535: AuthorizationService does not implement
IUserRolesSpec.RevokeRole(User, User, Role)
5. AI agent implements the method in AuthorizationService.
6. Compiler fires:
warning REQ301: UserRolesFeature.AdminCanRevokeRoles has no test
7. AI agent writes test:
[Verifies(typeof(UserRolesFeature),
nameof(UserRolesFeature.AdminCanRevokeRoles))]
public void Admin_can_revoke_assigned_role() { ... }
8. Build succeeds. All diagnostics clear.1. Developer adds abstract method to UserRolesFeature:
public abstract AcceptanceCriterionResult
AdminCanRevokeRoles(UserId actingUser, UserId targetUser, RoleId role);
2. Compiler fires:
error REQ101: UserRolesFeature.AdminCanRevokeRoles has no matching spec method
warning REQ301: UserRolesFeature.AdminCanRevokeRoles has no test
3. AI agent (or developer) creates spec method:
[ForRequirement(typeof(UserRolesFeature),
nameof(UserRolesFeature.AdminCanRevokeRoles))]
Result RevokeRole(User actingUser, User targetUser, Role role);
4. Compiler fires:
error CS0535: AuthorizationService does not implement
IUserRolesSpec.RevokeRole(User, User, Role)
5. AI agent implements the method in AuthorizationService.
6. Compiler fires:
warning REQ301: UserRolesFeature.AdminCanRevokeRoles has no test
7. AI agent writes test:
[Verifies(typeof(UserRolesFeature),
nameof(UserRolesFeature.AdminCanRevokeRoles))]
public void Admin_can_revoke_assigned_role() { ... }
8. Build succeeds. All diagnostics clear.At every step, the compiler tells the AI exactly what's missing. There's no context assembly, no prompt engineering, no quality gate to configure. The type system guides the AI through the implementation.
The 78-Week Roadmap Question
The spec-driven Context Engineering specification includes a 78-week implementation roadmap across five phases. This is a serious commitment. It says: "Building a proper context engineering system is an 18-month project."
The typed specification approach doesn't have an equivalent roadmap because the "context engineering" is the source generator + analyzer — which, admittedly, also takes significant effort to build. But once built, there's no ongoing context assembly infrastructure to maintain. The type system does the work automatically.
The honest comparison:
| Investment | Spec-Driven | Typed Specifications |
|---|---|---|
| Initial setup | Low (fill in templates) | High (build generators + analyzers) |
| Context assembly infrastructure | 78 weeks (per the roadmap) | 0 (type system does it) |
| Ongoing maintenance | Document updates + assembly rules | Generator maintenance + analyzer updates |
| Total cost at 2 years | Templates + infra + maintenance | Generators + maintenance |
| Total cost at 5 years | Templates + infra + maintenance + drift remediation | Generators + maintenance (no drift) |
The spec-driven approach is cheaper to start. The typed approach is cheaper to maintain. The crossover point depends on team size, project complexity, and how often requirements change.
Deep Dive: Context Assembly vs Context Generation in Practice
Let's trace a complete context engineering cycle for both approaches. The task: an AI agent needs to add a new endpoint to an existing API.
Spec-Driven Context Assembly: Step by Step
Step 1: Task classification
The orchestrator receives: "Add a GET /api/users/{id}/roles endpoint that returns the roles assigned to a user."
It classifies this as a CodeGeneration task in Stage 2 (Core Implementation).
Step 2: Source selection
The Context Engineering rules specify four sources for a CodeGeneration task:
CONTEXT_RULE(code_generation_task):
sources:
- prd: feature_section(user_roles_management)
- coding_practices: language_section(csharp)
- testing: strategy_section(unit_tests, integration_tests)
- codebase: related_files(controllers/*, services/UserRolesService.cs)
assembly_strategy: task_driven
max_context_tokens: 8000CONTEXT_RULE(code_generation_task):
sources:
- prd: feature_section(user_roles_management)
- coding_practices: language_section(csharp)
- testing: strategy_section(unit_tests, integration_tests)
- codebase: related_files(controllers/*, services/UserRolesService.cs)
assembly_strategy: task_driven
max_context_tokens: 8000Step 3: Content extraction
From the PRD:
Feature: user_roles_management
AC: "User can view their assigned roles"
Priority: HighFeature: user_roles_management
AC: "User can view their assigned roles"
Priority: HighFrom Coding Practices:
Language: C#
Patterns: SOLID, DI, Result pattern
Naming: PascalCase methods, camelCase params
Max method length: 30 lines
Error handling: Return Result<T>, don't throwLanguage: C#
Patterns: SOLID, DI, Result pattern
Naming: PascalCase methods, camelCase params
Max method length: 30 lines
Error handling: Return Result<T>, don't throwFrom Testing:
Unit tests: Arrange-Act-Assert, mock dependencies
Integration tests: Real database, test containers
Naming: MethodName_Scenario_ExpectedResultUnit tests: Arrange-Act-Assert, mock dependencies
Integration tests: Real database, test containers
Naming: MethodName_Scenario_ExpectedResultFrom Codebase:
// Existing controller pattern (from AuthController.cs)
[ApiController]
[Route("api/[controller]")]
public class AuthController : ControllerBase
{
private readonly IAuthService _authService;
[HttpGet("{userId}/profile")]
public async Task<IActionResult> GetProfile(Guid userId)
{
var result = await _authService.GetProfile(userId);
return result.Match(
success: Ok,
failure: err => err switch
{
UserNotFoundException => NotFound(),
_ => StatusCode(500)
});
}
}// Existing controller pattern (from AuthController.cs)
[ApiController]
[Route("api/[controller]")]
public class AuthController : ControllerBase
{
private readonly IAuthService _authService;
[HttpGet("{userId}/profile")]
public async Task<IActionResult> GetProfile(Guid userId)
{
var result = await _authService.GetProfile(userId);
return result.Match(
success: Ok,
failure: err => err switch
{
UserNotFoundException => NotFound(),
_ => StatusCode(500)
});
}
}Step 4: Assembly
The assembled context is ~4,000 tokens — well within the 8,000 budget. The prompt looks like:
You are implementing a new API endpoint.
TASK: Add GET /api/users/{id}/roles
FEATURE: user_roles_management (AC: "User can view their assigned roles")
PRIORITY: High
CODING PRACTICES:
- Use SOLID principles with dependency injection
- Return Result<T> for error handling (don't throw)
- PascalCase for methods, camelCase for parameters
- Max 30 lines per method
EXISTING PATTERN (follow this controller structure):
[existing AuthController.cs code]
TESTING REQUIREMENTS:
- Unit test with mocked service
- Integration test with real database
- Naming: MethodName_Scenario_ExpectedResult
Generate the endpoint implementation, unit test, and integration test.You are implementing a new API endpoint.
TASK: Add GET /api/users/{id}/roles
FEATURE: user_roles_management (AC: "User can view their assigned roles")
PRIORITY: High
CODING PRACTICES:
- Use SOLID principles with dependency injection
- Return Result<T> for error handling (don't throw)
- PascalCase for methods, camelCase for parameters
- Max 30 lines per method
EXISTING PATTERN (follow this controller structure):
[existing AuthController.cs code]
TESTING REQUIREMENTS:
- Unit test with mocked service
- Integration test with real database
- Naming: MethodName_Scenario_ExpectedResult
Generate the endpoint implementation, unit test, and integration test.Step 5: Validation
The quality gate checks the AI's output:
- Does it compile? (Build)
- Do tests pass? (Test runner)
- Coverage > 80%? (Coverage tool)
- Follows coding practices? (Linter/Roslyn analyzers)
If any gate fails, the context is enriched (progressive disclosure) and the AI regenerates.
Total cycle time: 5-15 minutes depending on CI pipeline speed.
Typed Specification Context Generation: Step by Step
Step 1: Developer adds AC to feature
// In UserRolesFeature.cs — adding a new AC
public abstract record UserRolesFeature : Feature<PlatformScalabilityEpic>
{
// ... existing ACs ...
/// <summary>
/// Any authenticated user can view the roles assigned to a specific user.
/// Returns the user's current roles with their assignment dates.
/// </summary>
public abstract AcceptanceCriterionResult
UserCanViewAssignedRoles(UserId targetUser);
}// In UserRolesFeature.cs — adding a new AC
public abstract record UserRolesFeature : Feature<PlatformScalabilityEpic>
{
// ... existing ACs ...
/// <summary>
/// Any authenticated user can view the roles assigned to a specific user.
/// Returns the user's current roles with their assignment dates.
/// </summary>
public abstract AcceptanceCriterionResult
UserCanViewAssignedRoles(UserId targetUser);
}Step 2: Compiler fires immediately
error REQ101: UserRolesFeature.UserCanViewAssignedRoles has no matching
spec method with [ForRequirement(typeof(UserRolesFeature),
nameof(UserRolesFeature.UserCanViewAssignedRoles))]
warning REQ301: UserRolesFeature.UserCanViewAssignedRoles has no testerror REQ101: UserRolesFeature.UserCanViewAssignedRoles has no matching
spec method with [ForRequirement(typeof(UserRolesFeature),
nameof(UserRolesFeature.UserCanViewAssignedRoles))]
warning REQ301: UserRolesFeature.UserCanViewAssignedRoles has no testThe AI agent (or developer) sees these diagnostics in the IDE. No context assembly needed — the diagnostics ARE the task.
Step 3: AI creates spec method
// In IUserRolesSpec.cs
[ForRequirement(typeof(UserRolesFeature),
nameof(UserRolesFeature.UserCanViewAssignedRoles))]
Result<IReadOnlyList<RoleAssignment>, DomainException> GetUserRoles(UserId userId);// In IUserRolesSpec.cs
[ForRequirement(typeof(UserRolesFeature),
nameof(UserRolesFeature.UserCanViewAssignedRoles))]
Result<IReadOnlyList<RoleAssignment>, DomainException> GetUserRoles(UserId userId);Note the return type: Result<IReadOnlyList<RoleAssignment>, DomainException>. This is not free-form — the AI must choose types that exist in the SharedKernel project. If RoleAssignment doesn't exist, the AI must create it — and the compiler will verify its shape.
Step 4: Compiler fires — interface not implemented
error CS0535: 'AuthorizationService' does not implement interface member
'IUserRolesSpec.GetUserRoles(UserId)'error CS0535: 'AuthorizationService' does not implement interface member
'IUserRolesSpec.GetUserRoles(UserId)'Step 5: AI implements the method
// In AuthorizationService.cs
[ForRequirement(typeof(UserRolesFeature),
nameof(UserRolesFeature.UserCanViewAssignedRoles))]
public Result<IReadOnlyList<RoleAssignment>, DomainException> GetUserRoles(UserId userId)
{
var user = _userRepository.FindById(userId);
if (user is null)
return Result<IReadOnlyList<RoleAssignment>, DomainException>
.Failure(new UserNotFoundException(userId));
var assignments = _roleRepository.GetAssignmentsForUser(userId);
return Result<IReadOnlyList<RoleAssignment>, DomainException>
.Success(assignments);
}// In AuthorizationService.cs
[ForRequirement(typeof(UserRolesFeature),
nameof(UserRolesFeature.UserCanViewAssignedRoles))]
public Result<IReadOnlyList<RoleAssignment>, DomainException> GetUserRoles(UserId userId)
{
var user = _userRepository.FindById(userId);
if (user is null)
return Result<IReadOnlyList<RoleAssignment>, DomainException>
.Failure(new UserNotFoundException(userId));
var assignments = _roleRepository.GetAssignmentsForUser(userId);
return Result<IReadOnlyList<RoleAssignment>, DomainException>
.Success(assignments);
}Step 6: Compiler fires — no test
warning REQ301: UserRolesFeature.UserCanViewAssignedRoles has no testwarning REQ301: UserRolesFeature.UserCanViewAssignedRoles has no testStep 7: AI writes test
[Verifies(typeof(UserRolesFeature),
nameof(UserRolesFeature.UserCanViewAssignedRoles))]
public void Existing_user_returns_their_roles()
{
var userId = TestUsers.Alice.Id;
_roleRepository.Assign(userId, Roles.Editor, DateTime.UtcNow.AddDays(-30));
_roleRepository.Assign(userId, Roles.Reviewer, DateTime.UtcNow.AddDays(-7));
var result = _service.GetUserRoles(userId);
Assert.That(result.IsSuccess, Is.True);
Assert.That(result.Value, Has.Count.EqualTo(2));
Assert.That(result.Value.Select(r => r.Role),
Is.EquivalentTo(new[] { Roles.Editor, Roles.Reviewer }));
}
[Verifies(typeof(UserRolesFeature),
nameof(UserRolesFeature.UserCanViewAssignedRoles))]
public void Nonexistent_user_returns_not_found()
{
var result = _service.GetUserRoles(new UserId(Guid.NewGuid()));
Assert.That(result.IsSuccess, Is.False);
Assert.That(result.Error, Is.InstanceOf<UserNotFoundException>());
}[Verifies(typeof(UserRolesFeature),
nameof(UserRolesFeature.UserCanViewAssignedRoles))]
public void Existing_user_returns_their_roles()
{
var userId = TestUsers.Alice.Id;
_roleRepository.Assign(userId, Roles.Editor, DateTime.UtcNow.AddDays(-30));
_roleRepository.Assign(userId, Roles.Reviewer, DateTime.UtcNow.AddDays(-7));
var result = _service.GetUserRoles(userId);
Assert.That(result.IsSuccess, Is.True);
Assert.That(result.Value, Has.Count.EqualTo(2));
Assert.That(result.Value.Select(r => r.Role),
Is.EquivalentTo(new[] { Roles.Editor, Roles.Reviewer }));
}
[Verifies(typeof(UserRolesFeature),
nameof(UserRolesFeature.UserCanViewAssignedRoles))]
public void Nonexistent_user_returns_not_found()
{
var result = _service.GetUserRoles(new UserId(Guid.NewGuid()));
Assert.That(result.IsSuccess, Is.False);
Assert.That(result.Error, Is.InstanceOf<UserNotFoundException>());
}Step 8: Build succeeds
All diagnostics clear. The chain is complete.
Total cycle time: 2-5 minutes (compile-time feedback, no CI dependency).
The Difference in Numbers
| Metric | Spec-Driven | Typed Specifications |
|---|---|---|
| Steps to complete | 5 (classify → select → extract → assemble → validate) | 4 (add AC → spec → impl → test) |
| External dependencies | CI pipeline, coverage tool, linter | Compiler only |
| Feedback latency | 5-15 minutes (pipeline) | 5-30 seconds (compile) |
| Context accuracy | Depends on assembly rules | Always correct (types are code) |
| Iteration cost | High (re-run pipeline) | Low (re-compile) |
| Context size | ~4,000 tokens (assembled) | Variable (type definitions + diagnostics) |
| Risk of wrong context | Possible (wrong PRD section selected) | None (types are unambiguous) |
The CLAUDE.md Bridge
In practice, many typed specification projects use a CLAUDE.md file (or equivalent) that explains the typed specification system to AI agents. This is the "meta-context" that teaches the AI how the type system works:
# CLAUDE.md — Project Context
## Requirements System
This project uses typed specifications. Features are C# abstract records.
Acceptance criteria are abstract methods on feature records.
When you see a compiler diagnostic like REQ101, it means an AC has no specification.
When you see REQ301, it means an AC has no test.
## The Chain
1. Add AC to feature record (MyApp.Requirements)
2. Add spec method to interface (MyApp.Specifications) with [ForRequirement]
3. Implement in domain (MyApp.Domain)
4. Test with [Verifies] (MyApp.Tests)
## Conventions
- Feature types: abstract record XxxFeature : Feature<Epic>
- Spec interfaces: IXxxSpec with [ForRequirement(typeof(XxxFeature))]
- Tests: [TestsFor(typeof(XxxFeature))] class, [Verifies(...)] methods
- Error handling: Result<T, TError> pattern, never throw# CLAUDE.md — Project Context
## Requirements System
This project uses typed specifications. Features are C# abstract records.
Acceptance criteria are abstract methods on feature records.
When you see a compiler diagnostic like REQ101, it means an AC has no specification.
When you see REQ301, it means an AC has no test.
## The Chain
1. Add AC to feature record (MyApp.Requirements)
2. Add spec method to interface (MyApp.Specifications) with [ForRequirement]
3. Implement in domain (MyApp.Domain)
4. Test with [Verifies] (MyApp.Tests)
## Conventions
- Feature types: abstract record XxxFeature : Feature<Epic>
- Spec interfaces: IXxxSpec with [ForRequirement(typeof(XxxFeature))]
- Tests: [TestsFor(typeof(XxxFeature))] class, [Verifies(...)] methods
- Error handling: Result<T, TError> pattern, never throwThis is a document — a spec-driven document — that explains the typed system. The irony is inescapable: even the typed approach needs a document to bootstrap. But this document is small (~50 lines), stable (the conventions rarely change), and supplementary (the types do the real work). It's the difference between a 50-line bootstrap document and a 15,000-line specification framework.
Summary
Context engineering is the domain where the philosophical gap is widest:
- Spec-driven treats context as a logistics problem: assemble the right documents, deliver them to the AI, validate the output.
- Typed specifications treat context as a structural problem: encode context in types, let the compiler enforce it, and the context is always correct.
Both solve real problems. But they solve them at different layers of the stack. Spec-driven solves context at the prompt layer. Typed specifications solve context at the compiler layer. The prompt layer is more flexible and language-agnostic. The compiler layer is more rigorous and self-correcting.
Part V examines how each approach handles the next critical concern: testing.
The Context Window Economics
Every AI model has a finite context window. GPT-4o: 128K tokens. Claude: 200K tokens. Sounds like a lot until you actually budget it for a real implementation task. The way each approach uses that window determines how much room the AI has to think.
Token Budget: A Real Comparison
Let's budget the context window for the same task: implementing Order Cancellation (3 ACs) with Claude (200K context window).
Spec-Driven Token Budget:
┌──────────────────────────────────────────────────────────┐
│ SPEC-DRIVEN CONTEXT WINDOW (200K) │
├──────────────────────────────────────────────────────────┤
│ │
│ System prompt + instructions ~ 800 tokens │
│ PRD: order_cancellation section ~ 1,200 tokens │
│ Coding Practices: C# rules ~ 900 tokens │
│ Testing Specification: strategies ~ 1,100 tokens │
│ Documentation rules ~ 400 tokens │
│ Existing code: OrderService.cs ~ 1,800 tokens │
│ Existing code: PaymentService.cs ~ 1,200 tokens │
│ Existing code: EmailService.cs ~ 800 tokens │
│ Existing code: Order.cs ~ 600 tokens │
│ Existing code: related interfaces ~ 1,400 tokens │
│ ───────────────────────────────────────────────────── │
│ TOTAL CONTEXT CONSUMED ~10,200 tokens │
│ │
│ Round 2 additions (gate feedback): │
│ Previous generation (AI's own code) ~ 3,500 tokens │
│ Gate feedback + enriched context ~ 1,200 tokens │
│ ───────────────────────────────────────────────────── │
│ TOTAL CONTEXT BY ROUND 2 ~14,900 tokens │
│ │
│ Round 3 additions: │
│ Additional code context ~ 2,000 tokens │
│ Previous gate feedback ~ 800 tokens │
│ ───────────────────────────────────────────────────── │
│ TOTAL CONTEXT BY ROUND 3 ~17,700 tokens │
│ │
│ Available for AI reasoning: │
│ Round 1: 200,000 - 10,200 = 189,800 tokens │
│ Round 2: 200,000 - 14,900 = 185,100 tokens │
│ Round 3: 200,000 - 17,700 = 182,300 tokens │
│ │
└──────────────────────────────────────────────────────────┘┌──────────────────────────────────────────────────────────┐
│ SPEC-DRIVEN CONTEXT WINDOW (200K) │
├──────────────────────────────────────────────────────────┤
│ │
│ System prompt + instructions ~ 800 tokens │
│ PRD: order_cancellation section ~ 1,200 tokens │
│ Coding Practices: C# rules ~ 900 tokens │
│ Testing Specification: strategies ~ 1,100 tokens │
│ Documentation rules ~ 400 tokens │
│ Existing code: OrderService.cs ~ 1,800 tokens │
│ Existing code: PaymentService.cs ~ 1,200 tokens │
│ Existing code: EmailService.cs ~ 800 tokens │
│ Existing code: Order.cs ~ 600 tokens │
│ Existing code: related interfaces ~ 1,400 tokens │
│ ───────────────────────────────────────────────────── │
│ TOTAL CONTEXT CONSUMED ~10,200 tokens │
│ │
│ Round 2 additions (gate feedback): │
│ Previous generation (AI's own code) ~ 3,500 tokens │
│ Gate feedback + enriched context ~ 1,200 tokens │
│ ───────────────────────────────────────────────────── │
│ TOTAL CONTEXT BY ROUND 2 ~14,900 tokens │
│ │
│ Round 3 additions: │
│ Additional code context ~ 2,000 tokens │
│ Previous gate feedback ~ 800 tokens │
│ ───────────────────────────────────────────────────── │
│ TOTAL CONTEXT BY ROUND 3 ~17,700 tokens │
│ │
│ Available for AI reasoning: │
│ Round 1: 200,000 - 10,200 = 189,800 tokens │
│ Round 2: 200,000 - 14,900 = 185,100 tokens │
│ Round 3: 200,000 - 17,700 = 182,300 tokens │
│ │
└──────────────────────────────────────────────────────────┘For this small feature, the context overhead seems manageable — about 5-9% of the window. But this is one feature with three ACs in a small codebase. Scale it to a real project:
Spec-Driven at Scale (50 features, shared context):
┌──────────────────────────────────────────────────────────┐
│ SPEC-DRIVEN AT SCALE — IMPLEMENTING FEATURE #37 │
├──────────────────────────────────────────────────────────┤
│ │
│ System prompt + instructions ~ 800 tokens │
│ PRD: feature #37 section ~ 1,500 tokens │
│ PRD: related features (cross-refs) ~ 3,200 tokens │
│ Coding Practices: full C# rules ~ 2,800 tokens │
│ Testing Specification: strategies ~ 2,400 tokens │
│ Architecture rules / boundaries ~ 1,600 tokens │
│ Documentation rules ~ 800 tokens │
│ Existing code: 12 relevant files ~18,000 tokens │
│ Existing tests: related patterns ~ 6,000 tokens │
│ Previous agent context (history) ~ 4,000 tokens │
│ ───────────────────────────────────────────────────── │
│ TOTAL CONTEXT CONSUMED ~41,100 tokens │
│ │
│ Available for AI reasoning: 158,900 tokens (79.5%) │
│ Context overhead: 20.5% of window │
│ │
└──────────────────────────────────────────────────────────┘┌──────────────────────────────────────────────────────────┐
│ SPEC-DRIVEN AT SCALE — IMPLEMENTING FEATURE #37 │
├──────────────────────────────────────────────────────────┤
│ │
│ System prompt + instructions ~ 800 tokens │
│ PRD: feature #37 section ~ 1,500 tokens │
│ PRD: related features (cross-refs) ~ 3,200 tokens │
│ Coding Practices: full C# rules ~ 2,800 tokens │
│ Testing Specification: strategies ~ 2,400 tokens │
│ Architecture rules / boundaries ~ 1,600 tokens │
│ Documentation rules ~ 800 tokens │
│ Existing code: 12 relevant files ~18,000 tokens │
│ Existing tests: related patterns ~ 6,000 tokens │
│ Previous agent context (history) ~ 4,000 tokens │
│ ───────────────────────────────────────────────────── │
│ TOTAL CONTEXT CONSUMED ~41,100 tokens │
│ │
│ Available for AI reasoning: 158,900 tokens (79.5%) │
│ Context overhead: 20.5% of window │
│ │
└──────────────────────────────────────────────────────────┘Twenty percent of the context window consumed by specification documents before the AI writes a single line of code. On iterative rounds, that grows. And this is with a well-tuned context engineering system that selects only relevant sections. A poorly tuned system might include the full PRD (15,000+ tokens), the full testing specification (8,000+ tokens), and the full coding practices (5,000+ tokens) — consuming 30-40% of the window.
Typed Specification Token Budget:
┌──────────────────────────────────────────────────────────┐
│ TYPED SPECIFICATION CONTEXT WINDOW (200K) │
├──────────────────────────────────────────────────────────┤
│ │
│ CLAUDE.md (bootstrap document) ~ 200 tokens │
│ Feature record (3 ACs + XML docs) ~ 350 tokens │
│ Compiler diagnostics (3 REQ101s) ~ 150 tokens │
│ ───────────────────────────────────────────────────── │
│ TOTAL CONTEXT CONSUMED (Round 1) ~ 700 tokens │
│ │
│ Round 2 (spec created, need impl): │
│ Feature record ~ 350 tokens │
│ Spec interface (3 methods) ~ 250 tokens │
│ Compiler diagnostics (3 CS0535s) ~ 150 tokens │
│ ───────────────────────────────────────────────────── │
│ TOTAL CONTEXT (Round 2) ~ 750 tokens │
│ │
│ Round 3 (impl done, need tests): │
│ Feature record ~ 350 tokens │
│ Spec interface ~ 250 tokens │
│ Implementation (3 methods) ~ 600 tokens │
│ Compiler diagnostics (3 REQ301s) ~ 150 tokens │
│ ───────────────────────────────────────────────────── │
│ TOTAL CONTEXT (Round 3) ~ 1,350 tokens │
│ │
│ Available for AI reasoning: │
│ Round 1: 200,000 - 700 = 199,300 tokens (99.7%) │
│ Round 2: 200,000 - 750 = 199,250 tokens (99.6%) │
│ Round 3: 200,000 - 1,350 = 198,650 tokens (99.3%) │
│ │
│ Context overhead: 0.3% - 0.7% of window │
│ │
└──────────────────────────────────────────────────────────┘┌──────────────────────────────────────────────────────────┐
│ TYPED SPECIFICATION CONTEXT WINDOW (200K) │
├──────────────────────────────────────────────────────────┤
│ │
│ CLAUDE.md (bootstrap document) ~ 200 tokens │
│ Feature record (3 ACs + XML docs) ~ 350 tokens │
│ Compiler diagnostics (3 REQ101s) ~ 150 tokens │
│ ───────────────────────────────────────────────────── │
│ TOTAL CONTEXT CONSUMED (Round 1) ~ 700 tokens │
│ │
│ Round 2 (spec created, need impl): │
│ Feature record ~ 350 tokens │
│ Spec interface (3 methods) ~ 250 tokens │
│ Compiler diagnostics (3 CS0535s) ~ 150 tokens │
│ ───────────────────────────────────────────────────── │
│ TOTAL CONTEXT (Round 2) ~ 750 tokens │
│ │
│ Round 3 (impl done, need tests): │
│ Feature record ~ 350 tokens │
│ Spec interface ~ 250 tokens │
│ Implementation (3 methods) ~ 600 tokens │
│ Compiler diagnostics (3 REQ301s) ~ 150 tokens │
│ ───────────────────────────────────────────────────── │
│ TOTAL CONTEXT (Round 3) ~ 1,350 tokens │
│ │
│ Available for AI reasoning: │
│ Round 1: 200,000 - 700 = 199,300 tokens (99.7%) │
│ Round 2: 200,000 - 750 = 199,250 tokens (99.6%) │
│ Round 3: 200,000 - 1,350 = 198,650 tokens (99.3%) │
│ │
│ Context overhead: 0.3% - 0.7% of window │
│ │
└──────────────────────────────────────────────────────────┘The difference is not incremental. It's an order of magnitude. The typed approach consumes 700-1,350 tokens of context. The spec-driven approach consumes 10,200-41,100 tokens. The AI working within typed specifications has 95-99% more headroom for reasoning.
Why Context Headroom Matters
"But 200K is huge — 20% overhead still leaves 160K for reasoning. Isn't that enough?"
For simple features, yes. But AI reasoning quality degrades as the context fills:
Attention dilution. Language models attend to all tokens in the context. The more tokens, the more the model's attention is spread. Critical information (the AC signature, the return type) competes with boilerplate (coding practices, documentation rules, existing code that's "for reference"). The typed approach puts the critical information — and only the critical information — in the context.
Instruction following. Research consistently shows that models follow instructions less precisely as context length increases. A model with 700 tokens of context follows the spec interface signature exactly. A model with 41,000 tokens of context might drift from the coding practices on page 3 because it's focused on the existing code on page 8.
The needle-in-haystack problem. The spec-driven approach puts the critical information (the AC text) inside a large document alongside non-critical information (other features, general practices, documentation rules). The AI must find the needle. The typed approach IS the needle — there's nothing else.
The Token Budget Table
| Budget Item | Spec-Driven | Typed Specifications |
|---|---|---|
| Specification documents | 5,000 - 15,000 tokens | 0 tokens (no documents) |
| Coding practices | 900 - 2,800 tokens | 0 tokens (enforced by compiler) |
| Testing specification | 1,100 - 2,400 tokens | 0 tokens (enforced by analyzer) |
| Feature definition | Included in PRD (~300 tokens) | Feature record (~350 tokens) |
| Compiler diagnostics | N/A | 50 - 200 tokens |
| Bootstrap (CLAUDE.md) | N/A | ~200 tokens (one-time) |
| Existing code context | 3,000 - 18,000 tokens | As needed (IDE provides) |
| Total overhead | 10,000 - 40,000 tokens | 500 - 1,500 tokens |
| % of 200K window | 5% - 20% | 0.3% - 0.8% |
| Reasoning headroom | 80% - 95% | 99%+ |
The spec-driven approach must tell the AI what coding practices to follow (consuming tokens). The typed approach has the compiler enforce coding practices (consuming zero tokens). The spec-driven approach must tell the AI what testing strategies to apply (consuming tokens). The typed approach has the analyzer emit REQ301 diagnostics (consuming zero tokens).
Every rule encoded in a document is a token cost. Every rule encoded in the type system is free.
Context Correctness vs Context Freshness
The spec-driven approach faces two distinct problems that the typed approach eliminates entirely. Understanding why requires examining what "correct context" actually means.
Problem 1: Context Correctness — Getting the Right Context
Context correctness means: the information the AI receives accurately describes the current state of the system. In spec-driven, this requires that:
- The PRD section for the feature is accurate and complete
- The coding practices match what the team actually uses
- The testing specification reflects current testing expectations
- The existing code files selected are the relevant ones
- No contradictions exist between documents
Each of these is a potential failure point. Let's trace a realistic scenario.
The Scenario: Stale Context Produces Wrong Code
Three weeks ago, the team wrote the Order Cancellation PRD section:
DEFINE_FEATURE(order_cancellation)
acceptance_criteria:
- "Customer can cancel an order that has not yet shipped"
- "Cancellation triggers a full refund to original payment method"
- "Confirmation email sent after successful cancellation"DEFINE_FEATURE(order_cancellation)
acceptance_criteria:
- "Customer can cancel an order that has not yet shipped"
- "Cancellation triggers a full refund to original payment method"
- "Confirmation email sent after successful cancellation"Since then, two things happened that the PRD doesn't reflect:
The team decided that partial refunds are needed for orders with some items already shipped. They updated the
PaymentServiceto accept arefundAmountparameter instead of always refunding the full total. This change lives in the code but not in the PRD.The team added a fourth AC — "cancellation reason is required" — directly in the code. A developer added the field to the
Orderentity, the API accepted areasonparameter, and tests were written. But nobody updated the PRD.
Now an AI agent is asked to implement a new endpoint that uses order cancellation. The context engineering system assembles the PRD (3 ACs, full refund) and the existing code (4 ACs, partial refund). The AI receives contradictory context:
CONTEXT CONFLICT:
PRD says: "Cancellation triggers a full refund to original payment method"
PaymentService.cs shows:
public async Task<RefundResult> ProcessRefund(
OrderId orderId, decimal refundAmount, PaymentMethodId paymentMethod)
// ^^^^^^^^^^^^^ partial refund supportCONTEXT CONFLICT:
PRD says: "Cancellation triggers a full refund to original payment method"
PaymentService.cs shows:
public async Task<RefundResult> ProcessRefund(
OrderId orderId, decimal refundAmount, PaymentMethodId paymentMethod)
// ^^^^^^^^^^^^^ partial refund supportThe AI must choose: follow the PRD (full refund) or follow the code (partial refund)? Different AI models make different choices. Different runs of the same model may make different choices. The context is "correct" by the system's standards — the PRD section was selected, the code was selected — but the information is contradictory.
The quality gate cannot catch this. The code compiles. The tests pass (because the tests reflect the code, not the PRD). The coverage is above threshold. The contradiction survives until a human reviews the output and notices the discrepancy.
How the Typed Approach Eliminates This
In the typed approach, this scenario is structurally impossible. Here's why:
When the team decided to support partial refunds, they changed the spec method signature:
[ForRequirement(typeof(OrderCancellationFeature),
nameof(OrderCancellationFeature.CancellationTriggersRefund))]
Result<RefundInitiation, RefundException> InitiateRefund(
OrderId orderId, decimal refundAmount, PaymentMethodId originalPayment);[ForRequirement(typeof(OrderCancellationFeature),
nameof(OrderCancellationFeature.CancellationTriggersRefund))]
Result<RefundInitiation, RefundException> InitiateRefund(
OrderId orderId, decimal refundAmount, PaymentMethodId originalPayment);The moment this signature changed, every call site that passes the old signature broke. The compiler forced all callers to provide the new refundAmount parameter. The code and the specification are the same thing — there's no PRD to drift from.
When the developer added the fourth AC, they added an abstract method:
public abstract AcceptanceCriterionResult
CancellationReasonRequired(OrderId orderId, CancellationReason reason);public abstract AcceptanceCriterionResult
CancellationReasonRequired(OrderId orderId, CancellationReason reason);The compiler immediately fired REQ101 (no spec method) and REQ301 (no test). The chain was completed (spec, implementation, test) before the AC was considered "done." There's no state where the AC exists in code but not in the specification — because the specification IS the code.
Problem 2: Context Freshness — Getting Current Context
Context freshness means: the information the AI receives reflects the current state of the project, not a past state. In spec-driven, freshness requires that:
- Someone updates the PRD when requirements change
- Someone updates the coding practices when conventions evolve
- Someone updates the testing specification when strategies change
- The context assembly system selects the latest versions
This is a human process. Documents drift because updating documents is a separate action from updating code. There's no compiler that says "you changed the code but not the PRD." The drift is silent.
The Freshness Timeline
Week 1: PRD written ────────────────── PRD says: 3 ACs, full refund
Week 2: Code updated (partial refund) ─ PRD says: 3 ACs, full refund ← STALE
Week 3: AC added in code ──────────── PRD says: 3 ACs, full refund ← STALER
Week 4: AI reads PRD ─────────────── AI sees: 3 ACs, full refund ← WRONGWeek 1: PRD written ────────────────── PRD says: 3 ACs, full refund
Week 2: Code updated (partial refund) ─ PRD says: 3 ACs, full refund ← STALE
Week 3: AC added in code ──────────── PRD says: 3 ACs, full refund ← STALER
Week 4: AI reads PRD ─────────────── AI sees: 3 ACs, full refund ← WRONGBy week 4, the PRD is three weeks stale. The AI generates code based on outdated information. The quality gate passes because the gate checks the code's internal consistency (compiles, tests pass, coverage met), not its consistency with the PRD.
In the typed approach, freshness is structural:
Week 1: Feature record created ─── Types say: 3 ACs, full refund
Week 2: Spec signature changed ─── Types say: 3 ACs, partial refund ← INSTANT
Week 3: AC added to record ──────── Types say: 4 ACs, partial refund ← INSTANT
Week 4: AI reads types ──────────── AI sees: 4 ACs, partial refund ← CORRECTWeek 1: Feature record created ─── Types say: 3 ACs, full refund
Week 2: Spec signature changed ─── Types say: 3 ACs, partial refund ← INSTANT
Week 3: AC added to record ──────── Types say: 4 ACs, partial refund ← INSTANT
Week 4: AI reads types ──────────── AI sees: 4 ACs, partial refund ← CORRECTThere's no lag. There's no "someone needs to update the document." The types change when the code changes because the types ARE the code. The AI always reads the current state because there is no separate representation that could be outdated.
The Two-Problem Table
| Problem | Spec-Driven | Typed Specifications |
|---|---|---|
| Correctness | Depends on document accuracy | Guaranteed (types are code) |
| Freshness | Depends on document updates | Guaranteed (types are always current) |
| Contradiction detection | None (documents and code are independent) | Compiler error (code must match types) |
| Drift remediation | Manual audit: compare PRD to code | Automatic: compiler prevents drift |
| Cost of staleness | Wrong AI output, passes quality gate | Build failure — immediate, loud, blocking |
| Who maintains freshness? | Humans (remember to update docs) | Compiler (refuses to build if inconsistent) |
The spec-driven approach assumes that documents will be maintained. The typed approach doesn't assume anything — it enforces consistency through the type system. When documents and code can drift apart, they will. When types and code are the same thing, they can't.
This is not a theoretical concern. Every team that has maintained a wiki, an ADR repository, or a specifications folder knows the reality: documents drift. The question is not whether your PRD will become stale — it's when, and whether anyone will notice before the AI reads it.
Part V examines how each approach handles the next critical concern: testing.