Testing: Strategy Documents vs Compiler-Enforced Coverage
Testing is where both approaches invest the most effort — and where the difference in philosophy produces the most visible consequences.
The Spec-Driven Testing Specification
The Testing-as-Code specification is the cogeet-io framework's most detailed document. It defines:
Principles (3)
- Layered Testing Strategy — Tests follow the testing pyramid: 70% unit, 20% integration, 10% E2E.
- Test Reliability — Flakiness rate < 0.01, stability > 0.99, execution time variance coefficient < 0.1.
- Comprehensive Coverage — Line coverage > 80%, branch > 75%, function > 90%, requirement coverage = 100%.
Practices (18+)
Unit Testing:
- Test Organization — consistent naming per language (Rust:
test_function_behavior_expected, Python:test_when_given_then, JS:should_behavior_when_condition) - Test Isolation — no shared mutable state, independent test data, proper cleanup
- Dependency Mocking — mock repositories, HTTP clients, file systems, clocks
- Test Doubles Strategy — when to use dummies, fakes, stubs, spies, mocks
Integration Testing:
- API Integration — request/response validation, error handling, auth, rate limiting
- Database Integration — real database instances, transaction isolation, migration testing
- Microservice Integration — sync, async, event-driven, circuit breaker testing
- Third-Party Integration — contract testing, sandbox environments, feature toggles
E2E Testing:
- User Workflow Testing — registration, core flows, error recovery, business processes
- Cross-Browser Testing — desktop (Chrome, Firefox, Safari, Edge), mobile, legacy
- System Reliability — network failures, service degradation, data corruption, resource exhaustion
Specialized Testing:
- Property-Based Testing — round-trip, invariants, commutativity, idempotency
- Mutation Testing — arithmetic, boolean, conditional, statement mutations; score > 80%
- Vulnerability Scanning — dependency audits, code patterns, configuration, secrets
- Penetration Testing — injection, authentication, authorization, data exposure
- Load Testing — baseline, stress, spike, volume, endurance; p95 < 2s
- Chaos Engineering — infrastructure, application, dependency, resource failures
- Fuzz Testing — structured fuzzing, protocol fuzzing
Test Data:
- Synthetic Data Generation — factories, fixtures, generators, anonymization
- Test Data Lifecycle — setup, isolation, cleanup, reset
- Test Environment Strategy — containers for integration, production-like for perf
CI/CD Integration:
- Pipeline integration — pre-commit, commit, pre-deploy, post-deploy stages
- Parallelization — test-level, suite-level, pipeline-level, distributed
- Reporting — execution results, coverage analysis, performance metrics, quality trends
Metrics (3)
- Test Coverage Comprehensive — five dimensions (line, branch, function, requirement, risk)
- Test Quality Score — composite of reliability (0.4), maintainability (0.3), effectiveness (0.3); target 0.85
- Defect Detection Effectiveness — pre-production detection > 85%, regression prevention > 90%, critical detection > 95%
Language-Specific Patterns
- Rust:
#[cfg(test)]modules,tests/directory, doc tests, criterion benchmarks - Python: pytest, hypothesis, coverage.py, mutmut
- JavaScript: jest, fast-check, nyc, stryker
- Java: JUnit, junit-quickcheck, jacoco, pitest
- C#: xUnit/NUnit, coverlet, stryker.NET
What This Gives You
The Testing-as-Code specification is a comprehensive encyclopedia of testing knowledge. If a team doesn't know about mutation testing, this document introduces it. If a team hasn't considered chaos engineering, this document explains why they should. If a team's naming conventions are inconsistent, this document provides language-specific patterns.
It's a teaching document, a reference document, and a compliance checklist rolled into one.
What It Cannot Give You
Which tests are missing. The document says "requirement coverage = 100%" but doesn't know which requirements exist, which have tests, and which don't. That knowledge lives in the code — and the document has no structural link to the code.
Whether test X covers AC Y. The document says "tests must cover acceptance criteria" but has no mechanism to verify the link. A test named
TestPasswordResetmight or might not test the "reset link expires after 24 hours" AC. The document can't tell.Stale test detection. If a feature is deleted, the tests for that feature become stale. The document has no mechanism to detect this. The tests still exist, still pass, still count toward coverage — but they test dead code.
Real-time coverage state. The document defines thresholds (80% line coverage) but doesn't know the current state. You need a separate tool (coverage.py, coverlet, nyc) to measure — and that tool measures lines, not acceptance criteria.
Typed Specification Testing
The typed approach to testing is narrower in scope but deeper in enforcement. It doesn't tell you about 15 testing strategies. It does three things:
1. Test-to-Requirement Linking
Every test class is annotated with [TestsFor] to declare which feature it covers. Every test method is annotated with [Verifies] to declare which AC it proves:
[TestsFor(typeof(UserRolesFeature))]
public class UserRolesFeatureTests
{
[Verifies(typeof(UserRolesFeature), nameof(UserRolesFeature.AdminCanAssignRoles))]
public void Admin_with_ManageRoles_can_assign_editor_role()
{
// Arrange
var admin = TestUsers.AdminWithPermission(Permission.ManageRoles);
var target = TestUsers.RegularUser();
var role = Roles.Editor;
// Act
var result = _service.AssignRole(admin, target, role);
// Assert
Assert.That(result.IsSuccess, Is.True);
Assert.That(target.CurrentRole, Is.EqualTo(role));
}
[Verifies(typeof(UserRolesFeature), nameof(UserRolesFeature.AdminCanAssignRoles))]
public void Non_admin_cannot_assign_roles()
{
var user = TestUsers.RegularUser();
var target = TestUsers.AnotherUser();
var role = Roles.Editor;
var result = _service.AssignRole(user, target, role);
Assert.That(result.IsSuccess, Is.False);
Assert.That(result.Error, Is.InstanceOf<UnauthorizedException>());
}
[Verifies(typeof(UserRolesFeature), nameof(UserRolesFeature.ViewerHasReadOnlyAccess))]
public void Viewer_can_read_resources()
{
var viewer = TestUsers.ViewerUser();
var resource = TestResources.Document("doc-123");
var result = _service.VerifyReadAccess(viewer, resource);
Assert.That(result.IsSuccess, Is.True);
}
[Verifies(typeof(UserRolesFeature), nameof(UserRolesFeature.ViewerHasReadOnlyAccess))]
public void Viewer_cannot_modify_resources()
{
var viewer = TestUsers.ViewerUser();
var resource = TestResources.Document("doc-123");
var result = _service.VerifyWriteAccess(viewer, resource);
Assert.That(result.IsSuccess, Is.False);
}
}[TestsFor(typeof(UserRolesFeature))]
public class UserRolesFeatureTests
{
[Verifies(typeof(UserRolesFeature), nameof(UserRolesFeature.AdminCanAssignRoles))]
public void Admin_with_ManageRoles_can_assign_editor_role()
{
// Arrange
var admin = TestUsers.AdminWithPermission(Permission.ManageRoles);
var target = TestUsers.RegularUser();
var role = Roles.Editor;
// Act
var result = _service.AssignRole(admin, target, role);
// Assert
Assert.That(result.IsSuccess, Is.True);
Assert.That(target.CurrentRole, Is.EqualTo(role));
}
[Verifies(typeof(UserRolesFeature), nameof(UserRolesFeature.AdminCanAssignRoles))]
public void Non_admin_cannot_assign_roles()
{
var user = TestUsers.RegularUser();
var target = TestUsers.AnotherUser();
var role = Roles.Editor;
var result = _service.AssignRole(user, target, role);
Assert.That(result.IsSuccess, Is.False);
Assert.That(result.Error, Is.InstanceOf<UnauthorizedException>());
}
[Verifies(typeof(UserRolesFeature), nameof(UserRolesFeature.ViewerHasReadOnlyAccess))]
public void Viewer_can_read_resources()
{
var viewer = TestUsers.ViewerUser();
var resource = TestResources.Document("doc-123");
var result = _service.VerifyReadAccess(viewer, resource);
Assert.That(result.IsSuccess, Is.True);
}
[Verifies(typeof(UserRolesFeature), nameof(UserRolesFeature.ViewerHasReadOnlyAccess))]
public void Viewer_cannot_modify_resources()
{
var viewer = TestUsers.ViewerUser();
var resource = TestResources.Document("doc-123");
var result = _service.VerifyWriteAccess(viewer, resource);
Assert.That(result.IsSuccess, Is.False);
}
}Note: multiple tests can verify the same AC. AdminCanAssignRoles has both a happy-path test and an authorization-failure test. The system tracks all of them.
2. Compiler-Enforced Coverage
The REQ3xx analyzer family detects coverage gaps at compile time:
| Diagnostic | Severity | Trigger |
|---|---|---|
| REQ300 | Error | Feature has zero [TestsFor] test classes |
| REQ301 | Warning | AC method has no [Verifies] test |
| REQ302 | Warning | [Verifies] references an AC method that doesn't exist (stale) |
| REQ303 | Info | Feature fully tested — all ACs have [Verifies] tests |
Build output:
error REQ300: JwtRefreshStory has 2 acceptance criteria but no test class
with [TestsFor(typeof(JwtRefreshStory))]
warning REQ301: PasswordResetFeature.ResetLinkCanOnlyBeUsedOnce has no test
with [Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.ResetLinkCanOnlyBeUsedOnce))]
warning REQ302: UserRolesTests.OldTest references
nameof(UserRolesFeature.DeletedAC) which no longer existserror REQ300: JwtRefreshStory has 2 acceptance criteria but no test class
with [TestsFor(typeof(JwtRefreshStory))]
warning REQ301: PasswordResetFeature.ResetLinkCanOnlyBeUsedOnce has no test
with [Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.ResetLinkCanOnlyBeUsedOnce))]
warning REQ302: UserRolesTests.OldTest references
nameof(UserRolesFeature.DeletedAC) which no longer existsThis is the critical difference: the compiler knows which acceptance criteria have tests and which don't. It doesn't measure line coverage — it measures requirement coverage. It doesn't count tests — it maps tests to ACs.
3. Quality Gates Integration
After tests execute, the REQ4xx analyzer family validates that tests don't just exist — they pass, meet coverage thresholds, and satisfy performance budgets:
<!-- MyApp.Tests.csproj -->
<PropertyGroup>
<RequirementMinCoverage>80</RequirementMinCoverage>
<RequirementMinPassRate>100</RequirementMinPassRate>
<RequirementMaxTestDuration>5000</RequirementMaxTestDuration>
</PropertyGroup><!-- MyApp.Tests.csproj -->
<PropertyGroup>
<RequirementMinCoverage>80</RequirementMinCoverage>
<RequirementMinPassRate>100</RequirementMinPassRate>
<RequirementMaxTestDuration>5000</RequirementMaxTestDuration>
</PropertyGroup>| Diagnostic | Severity | Trigger |
|---|---|---|
| REQ400 | Error | Feature's test pass rate below minimum |
| REQ401 | Warning | Feature's AC coverage below threshold |
| REQ402 | Warning | Test duration exceeds budget |
| REQ403 | Info | Feature passes all quality gates |
Scenario: Team adds property-based testing
Spec-driven: The Testing-as-Code spec already documents property-based testing. The team reads the section, learns about round-trip, invariant, commutativity, and idempotency properties, and implements tests using the recommended framework (proptest for Rust, hypothesis for Python, fast-check for JS).
The spec tells them:
- What property categories exist
- Which frameworks to use per language
- What violation patterns to watch for (untested invariants, missing round-trip tests)
- What auto-fix options exist
This is valuable. It's a teaching moment that produces better tests.
Typed specifications:
The typed approach has nothing to say about property-based testing. It doesn't know what property testing is, doesn't recommend frameworks, doesn't define categories. A developer writes a property test, annotates it with [Verifies], and the system tracks it like any other test.
[Verifies(typeof(OrderProcessingFeature),
nameof(OrderProcessingFeature.OrderTotalMustBePositive))]
[Property] // FsCheck or similar
public Property Order_total_is_always_positive_when_all_lines_have_positive_quantity()
{
return Prop.ForAll(
Arb.From<PositiveInt>().Select(i => i.Get),
Arb.From<PositiveDecimal>(),
(quantity, price) =>
{
var line = new OrderLine(quantity, price);
var order = new Order(new[] { line });
return order.Total > 0;
});
}[Verifies(typeof(OrderProcessingFeature),
nameof(OrderProcessingFeature.OrderTotalMustBePositive))]
[Property] // FsCheck or similar
public Property Order_total_is_always_positive_when_all_lines_have_positive_quantity()
{
return Prop.ForAll(
Arb.From<PositiveInt>().Select(i => i.Get),
Arb.From<PositiveDecimal>(),
(quantity, price) =>
{
var line = new OrderLine(quantity, price);
var order = new Order(new[] { line });
return order.Total > 0;
});
}The system knows this test covers the OrderTotalMustBePositive AC. But it doesn't know it's a property test, doesn't validate the property category, and doesn't check the mutation score.
Winner: Spec-driven for guidance; typed for enforcement.
Scenario: Developer forgets to test an AC
Spec-driven:
The coverage tool reports 78% line coverage on AuthorizationService.cs. The quality gate fails because the threshold is 80%. But line coverage doesn't tell you WHICH acceptance criteria are untested. The developer adds a few more test cases for well-covered methods until coverage reaches 81%. The quality gate passes. The untested AC remains untested.
Coverage report:
AuthorizationService.cs: 81% ← passes threshold
- AssignRole(): 95% covered
- VerifyReadAccess(): 90% covered
- RevokeRole(): 45% covered ← this is the problem, but not flaggedCoverage report:
AuthorizationService.cs: 81% ← passes threshold
- AssignRole(): 95% covered
- VerifyReadAccess(): 90% covered
- RevokeRole(): 45% covered ← this is the problem, but not flaggedThe spec-driven approach catches low coverage but not missing AC coverage. A developer can game the threshold by testing easy methods more thoroughly while leaving hard methods untested.
Typed specifications: The analyzer fires:
warning REQ301: UserRolesFeature.AdminCanRevokeRoles has no test with
[Verifies(typeof(UserRolesFeature),
nameof(UserRolesFeature.AdminCanRevokeRoles))]warning REQ301: UserRolesFeature.AdminCanRevokeRoles has no test with
[Verifies(typeof(UserRolesFeature),
nameof(UserRolesFeature.AdminCanRevokeRoles))]The diagnostic is specific: not "low coverage" but "this specific acceptance criterion has no test." The developer cannot game it by testing other ACs more thoroughly. The build won't pass until this specific AC has a [Verifies] test.
Winner: Typed specifications, clearly. The per-AC granularity is decisive.
Scenario: Feature is deleted, tests become stale
Spec-driven: A feature is removed from the PRD. The implementation code is deleted. But the tests remain. They still pass (the code they test is gone, so they test... nothing? They might test a helper that still exists, or they might test an interface that was repurposed). They still contribute to coverage. Nobody notices they're stale.
Six months later, someone renames a method and 14 "passing" tests break. The team spends two days figuring out that these tests have been testing dead code since the feature was deleted.
Typed specifications: The feature type is deleted. Instantly:
error CS0246: The type or namespace name 'UserRolesFeature' could not be found
→ in UserRolesFeatureTests.cs, line 1: [TestsFor(typeof(UserRolesFeature))]
→ in UserRolesFeatureTests.cs, line 5: [Verifies(typeof(UserRolesFeature), ...)]
→ in UserRolesFeatureTests.cs, line 15: [Verifies(typeof(UserRolesFeature), ...)]error CS0246: The type or namespace name 'UserRolesFeature' could not be found
→ in UserRolesFeatureTests.cs, line 1: [TestsFor(typeof(UserRolesFeature))]
→ in UserRolesFeatureTests.cs, line 5: [Verifies(typeof(UserRolesFeature), ...)]
→ in UserRolesFeatureTests.cs, line 15: [Verifies(typeof(UserRolesFeature), ...)]Every test referencing the deleted feature fails to compile. The developer must delete or repurpose the tests. Stale tests are structurally impossible.
Winner: Typed specifications. Dead code elimination is a compiler problem, not a discipline problem.
Scenario: New team member needs to understand testing expectations
Spec-driven: The new team member reads the Testing-as-Code specification. It's comprehensive — 18+ practices, 3 principles, 3 metrics, language-specific patterns. They learn about mutation testing, chaos engineering, property-based testing. They understand the testing pyramid and the coverage thresholds.
They feel informed. They know what to test and how to test.
Typed specifications:
The new team member writes their first test. They forget the [Verifies] attribute. The compiler says:
info REQ303: Test method 'MyTest' in class 'UserRolesTests' is not annotated with
[Verifies]. Consider adding [Verifies(typeof(Feature), nameof(AC))]
to link this test to a specific acceptance criterion.info REQ303: Test method 'MyTest' in class 'UserRolesTests' is not annotated with
[Verifies]. Consider adding [Verifies(typeof(Feature), nameof(AC))]
to link this test to a specific acceptance criterion.They add the attribute. They try to reference a nonexistent AC:
error CS0117: 'UserRolesFeature' does not contain a definition for 'NonexistentAC'error CS0117: 'UserRolesFeature' does not contain a definition for 'NonexistentAC'They pick the right AC. The compiler is happy. They've learned the system by using it, not by reading a 15,000-word document.
Winner: Typed specifications for enforcement, spec-driven for education. The typed approach teaches by doing. The spec-driven approach teaches by explaining. Both are valuable; they serve different learning styles.
The Coverage Granularity Gap
This is the single most important difference in the testing domain:
| Coverage Type | Spec-Driven | Typed Specifications |
|---|---|---|
| Line coverage | ✓ (via coverage tools) | ✓ (via coverage tools) |
| Branch coverage | ✓ (via coverage tools) | ✓ (via coverage tools) |
| Function coverage | ✓ (via coverage tools) | ✓ (via coverage tools) |
| Requirement coverage | ✗ (stated as goal, not measured) | ✓ (REQ3xx analyzers, per-AC) |
| AC-level coverage | ✗ (no structural link) | ✓ (each test declares which AC it verifies) |
| Stale test detection | ✗ (no mechanism) | ✓ (REQ302 diagnostic) |
| Missing test detection | Only via line coverage thresholds | Specific diagnostic per uncovered AC |
The spec-driven approach measures code coverage — how many lines, branches, and functions are exercised. This is a proxy for test quality, but it's a weak proxy. 100% line coverage doesn't mean all acceptance criteria are tested. 50% line coverage doesn't mean important ACs are untested.
The typed approach measures requirement coverage — how many acceptance criteria have [Verifies] tests. This is a direct measure of test completeness against the specification. It can coexist with line coverage (you can run both), but it adds the dimension that line coverage cannot provide.
What Typed Specifications Miss
The spec-driven Testing-as-Code specification covers domains that the typed approach ignores entirely:
Chaos Engineering — Testing system resilience under failure injection. The typed approach has no concept of chaos experiments. This is genuinely valuable for distributed systems.
Load Testing — Performance under expected and peak load. The typed approach tracks test duration (REQ402) but doesn't define load testing strategies, thresholds, or scenarios.
Cross-Browser Testing — Browser compatibility matrices. The typed approach is server-side C# and has no opinion on frontend testing.
Fuzz Testing — Input robustness testing with random/malicious inputs. The typed approach doesn't define fuzzing strategies or targets.
Penetration Testing — Security assessment workflows. The typed approach handles authorization testing through ACs but doesn't define pen-test methodologies.
Test Data Management — Synthetic data generation, privacy compliance, data lifecycle. The typed approach uses test factories and fixtures but doesn't define a data management strategy.
CI/CD Pipeline Design — Pre-commit, commit, pre-deploy, post-deploy test stages. The typed approach integrates with MSBuild but doesn't prescribe pipeline architecture.
These are current gaps, not fundamental limits. And this is the crucial distinction: every one of these gaps is a DSL waiting to be written.
Consider chaos engineering. Today, the typed approach has no opinion on it. But there's nothing stopping a Chaos DSL:
[ChaosExperiment("OrderService_NetworkPartition")]
[TargetService(typeof(OrderService))]
[FailureMode(FailureType.NetworkPartition, Duration = "30s")]
[ExpectedBehavior(Degradation.CircuitBreakerOpen, RecoveryTime = "60s")]
[RequiresResilience(typeof(OrderProcessingFeature),
nameof(OrderProcessingFeature.OrderCanBeProcessed))]
public partial class OrderServicePartitionExperiment { }[ChaosExperiment("OrderService_NetworkPartition")]
[TargetService(typeof(OrderService))]
[FailureMode(FailureType.NetworkPartition, Duration = "30s")]
[ExpectedBehavior(Degradation.CircuitBreakerOpen, RecoveryTime = "60s")]
[RequiresResilience(typeof(OrderProcessingFeature),
nameof(OrderProcessingFeature.OrderCanBeProcessed))]
public partial class OrderServicePartitionExperiment { }Or load testing:
[LoadTest("OrderEndpoint_PeakTraffic")]
[TargetEndpoint("POST /api/orders")]
[LoadProfile(Users = 1000, RampUp = "30s", Duration = "5m")]
[PerformanceBudget(P95 = "200ms", P99 = "500ms", ErrorRate = 0.01)]
[VerifiesNonFunctional(typeof(OrderProcessingFeature))]
public partial class OrderEndpointLoadTest { }[LoadTest("OrderEndpoint_PeakTraffic")]
[TargetEndpoint("POST /api/orders")]
[LoadProfile(Users = 1000, RampUp = "30s", Duration = "5m")]
[PerformanceBudget(P95 = "200ms", P99 = "500ms", ErrorRate = 0.01)]
[VerifiesNonFunctional(typeof(OrderProcessingFeature))]
public partial class OrderEndpointLoadTest { }The source generator validates the target service exists, the failure mode is valid, the performance budget is reasonable, and the feature reference is correct. The same compile-time enforcement that works for functional requirements works for non-functional testing strategies.
The spec-driven Testing-as-Code describes these strategies in English. A typed Chaos DSL would enforce them in the compiler. The spec-driven approach documents what should be tested. A typed DSL would make untested scenarios produce compiler warnings.
This is the trajectory: typed specifications start narrow (requirement chain only) and expand by adding DSLs. Each DSL brings another domain under compiler enforcement. The spec-driven approach starts broad (all testing strategies) but stays shallow (descriptions, not enforcement). Over time, the typed approach's coverage grows toward the spec-driven approach's breadth — but with enforcement the spec-driven approach cannot match.
The honest recommendation: for testing strategies you haven't yet built DSLs for, use spec-driven documents as a strategy guide (which testing techniques should we use?). But recognize that this is a transitional state, not the end state. The end state is typed DSLs for every testing domain that matters to your team.
The Testing Inertness Problem
The spec-driven Testing-as-Code specification defines beautiful testing strategies. But there's a problem that echoes Part II's discussion of pillar inertness: the spec describes testing practices; it doesn't create tests.
Consider this entry from the Testing spec:
DEFINE_PRACTICE(property_testing_implementation)
Scope: algorithmic_correctness, invariant_validation
Enforcement: recommended
Validation strategy: property_verification
Rule: "Complex algorithms and business logic must be tested
with property-based testing"
Property Categories:
- Round trip: encode_decode_identity
- Invariants: system_state_consistency
- Commutativity: operation_order_independence
- Idempotency: repeated_operation_same_result
Implementation Tools by Language:
- Rust: proptest
- Python: hypothesis
- JavaScript: fast_check
- Java: junit_quickcheckDEFINE_PRACTICE(property_testing_implementation)
Scope: algorithmic_correctness, invariant_validation
Enforcement: recommended
Validation strategy: property_verification
Rule: "Complex algorithms and business logic must be tested
with property-based testing"
Property Categories:
- Round trip: encode_decode_identity
- Invariants: system_state_consistency
- Commutativity: operation_order_independence
- Idempotency: repeated_operation_same_result
Implementation Tools by Language:
- Rust: proptest
- Python: hypothesis
- JavaScript: fast_check
- Java: junit_quickcheckThis is an excellent description of property-based testing. A developer who reads it will understand what properties to test and which tools to use.
But the description is text. It doesn't know which algorithms in your codebase are "complex." It doesn't know which functions should have round-trip properties. It doesn't generate property tests. It doesn't even know if your project uses Rust or Python.
To make this spec actionable, you need:
- A human (or AI) to read the spec
- That reader to identify which code is "complex"
- That reader to decide which property category applies
- That reader to write the property test
- A quality gate to verify the test exists and passes
Steps 1-4 are interpretation. Step 5 is validation. The spec contributes to step 1 (guidance) but cannot participate in steps 2-5.
The typed approach is narrower but active. It doesn't tell you to write property tests — but if you DO write a property test and annotate it with [Verifies], the system knows exactly which AC it covers, and it can verify that every AC has at least one test. The spec is passive guidance. The type system is active enforcement.
Can the Spec-Driven Testing Spec Become Active?
Only by building tooling. To make "Complex algorithms must be property-tested" enforceable, you'd need:
- A code analyzer that identifies "complex algorithms" (by cyclomatic complexity? by annotation?)
- A test scanner that identifies property tests (by framework? by naming convention?)
- A cross-reference checker that matches algorithms to property tests
- A CI gate that fails if complex algorithms lack property tests
This is Roslyn analyzer territory. You're building the same thing the typed approach provides — just without the type system's help.
A Full Test File: Side by Side
Let's see what a complete test file looks like in each approach for the same feature.
Spec-Driven Test File
Following the Testing-as-Code conventions (Arrange-Act-Assert, naming convention MethodName_Scenario_ExpectedResult):
// Following spec-driven conventions
// Feature: password_reset (from PRD)
// No structural link to the PRD — this is a naming convention
public class PasswordResetServiceTests
{
private readonly PasswordResetService _service;
private readonly Mock<IUserRepository> _userRepo;
private readonly Mock<ITokenStore> _tokenStore;
private readonly Mock<IEmailService> _emailService;
private readonly FakeClock _clock;
public PasswordResetServiceTests()
{
_userRepo = new Mock<IUserRepository>();
_tokenStore = new Mock<ITokenStore>();
_emailService = new Mock<IEmailService>();
_clock = new FakeClock(DateTime.UtcNow);
_service = new PasswordResetService(
_userRepo.Object, _tokenStore.Object,
_emailService.Object, _clock);
}
// AC: "User can request a password reset email"
// This comment is the only link to the AC. It can be wrong. It can be stale.
[Fact]
public void RequestReset_ValidEmail_SendsEmail()
{
// Arrange
var email = "user@example.com";
_userRepo.Setup(r => r.FindByEmail(email))
.Returns(new User { Id = Guid.NewGuid(), Email = email });
// Act
var result = _service.RequestPasswordReset(email);
// Assert
Assert.True(result.IsSuccess);
_emailService.Verify(e => e.SendResetLink(email, It.IsAny<string>()), Times.Once);
}
[Fact]
public void RequestReset_UnknownEmail_NoEmailSentButReturnsSuccess()
{
_userRepo.Setup(r => r.FindByEmail(It.IsAny<string>()))
.Returns((User?)null);
var result = _service.RequestPasswordReset("unknown@example.com");
// Success returned (prevent account enumeration) but no email sent
Assert.True(result.IsSuccess);
_emailService.Verify(e => e.SendResetLink(
It.IsAny<string>(), It.IsAny<string>()), Times.Never);
}
// AC: "Reset link expires after 24 hours"
[Fact]
public void ValidateToken_Expired_ReturnsFailure()
{
var token = new ResetToken(Guid.NewGuid(), DateTime.UtcNow.AddHours(-25));
_tokenStore.Setup(t => t.Find(token.Id)).Returns(token);
var result = _service.ValidateResetToken(token.Id);
Assert.False(result.IsSuccess);
Assert.Contains("expired", result.Error);
}
// ... more tests following the same pattern
}// Following spec-driven conventions
// Feature: password_reset (from PRD)
// No structural link to the PRD — this is a naming convention
public class PasswordResetServiceTests
{
private readonly PasswordResetService _service;
private readonly Mock<IUserRepository> _userRepo;
private readonly Mock<ITokenStore> _tokenStore;
private readonly Mock<IEmailService> _emailService;
private readonly FakeClock _clock;
public PasswordResetServiceTests()
{
_userRepo = new Mock<IUserRepository>();
_tokenStore = new Mock<ITokenStore>();
_emailService = new Mock<IEmailService>();
_clock = new FakeClock(DateTime.UtcNow);
_service = new PasswordResetService(
_userRepo.Object, _tokenStore.Object,
_emailService.Object, _clock);
}
// AC: "User can request a password reset email"
// This comment is the only link to the AC. It can be wrong. It can be stale.
[Fact]
public void RequestReset_ValidEmail_SendsEmail()
{
// Arrange
var email = "user@example.com";
_userRepo.Setup(r => r.FindByEmail(email))
.Returns(new User { Id = Guid.NewGuid(), Email = email });
// Act
var result = _service.RequestPasswordReset(email);
// Assert
Assert.True(result.IsSuccess);
_emailService.Verify(e => e.SendResetLink(email, It.IsAny<string>()), Times.Once);
}
[Fact]
public void RequestReset_UnknownEmail_NoEmailSentButReturnsSuccess()
{
_userRepo.Setup(r => r.FindByEmail(It.IsAny<string>()))
.Returns((User?)null);
var result = _service.RequestPasswordReset("unknown@example.com");
// Success returned (prevent account enumeration) but no email sent
Assert.True(result.IsSuccess);
_emailService.Verify(e => e.SendResetLink(
It.IsAny<string>(), It.IsAny<string>()), Times.Never);
}
// AC: "Reset link expires after 24 hours"
[Fact]
public void ValidateToken_Expired_ReturnsFailure()
{
var token = new ResetToken(Guid.NewGuid(), DateTime.UtcNow.AddHours(-25));
_tokenStore.Setup(t => t.Find(token.Id)).Returns(token);
var result = _service.ValidateResetToken(token.Id);
Assert.False(result.IsSuccess);
Assert.Contains("expired", result.Error);
}
// ... more tests following the same pattern
}Characteristics:
- Feature link: comment only (
// AC: "User can request...") - AC link: comment only (can be wrong, can be stale)
- No compiler check that these tests cover the right ACs
- No way to generate a coverage matrix from these tests
- If the feature is deleted, these tests still compile and pass
Typed Specification Test File
// Structural link to the feature via typeof() and nameof()
// The compiler verifies every reference
[TestsFor(typeof(PasswordResetFeature))]
public class PasswordResetFeatureTests
{
private readonly PasswordResetService _service;
private readonly InMemoryUserRepository _userRepo;
private readonly InMemoryTokenStore _tokenStore;
private readonly SpyEmailService _emailService;
private readonly FakeClock _clock;
public PasswordResetFeatureTests()
{
_userRepo = new InMemoryUserRepository();
_tokenStore = new InMemoryTokenStore();
_emailService = new SpyEmailService();
_clock = new FakeClock(DateTime.UtcNow);
_service = new PasswordResetService(
_userRepo, _tokenStore, _emailService, _clock);
// Seed test data
_userRepo.Add(TestUsers.Alice);
}
[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.UserCanRequestPasswordResetEmail))]
public void Valid_email_sends_reset_link()
{
var result = _service.RequestPasswordReset(TestUsers.Alice.Email);
Assert.That(result.IsSuccess, Is.True);
Assert.That(_emailService.SentEmails, Has.Count.EqualTo(1));
Assert.That(_emailService.SentEmails[0].To, Is.EqualTo(TestUsers.Alice.Email));
Assert.That(_emailService.SentEmails[0].Body, Does.Contain("reset"));
}
[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.UserCanRequestPasswordResetEmail))]
public void Unknown_email_returns_success_but_sends_no_email()
{
var result = _service.RequestPasswordReset(new Email("unknown@example.com"));
// Success returned to prevent account enumeration
Assert.That(result.IsSuccess, Is.True);
Assert.That(_emailService.SentEmails, Is.Empty);
}
[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.ResetLinkExpiresAfter24Hours))]
public void Token_used_after_24_hours_is_rejected()
{
var token = CreateValidToken();
_clock.AdvanceBy(TimeSpan.FromHours(25));
var result = _service.ValidateResetToken(token.Id);
Assert.That(result.IsSuccess, Is.False);
Assert.That(result.Error, Does.Contain("expired"));
}
[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.ResetLinkExpiresAfter24Hours))]
public void Token_used_within_24_hours_is_accepted()
{
var token = CreateValidToken();
_clock.AdvanceBy(TimeSpan.FromHours(23));
var result = _service.ValidateResetToken(token.Id);
Assert.That(result.IsSuccess, Is.True);
}
[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.NewPasswordMeetsComplexityRequirements))]
public void Weak_password_is_rejected()
{
var token = CreateValidToken();
var weakPassword = new Password("123");
var result = _service.ResetPassword(token.Id, weakPassword);
Assert.That(result.IsSuccess, Is.False);
Assert.That(result.Error, Does.Contain("complexity"));
}
[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.NewPasswordMeetsComplexityRequirements))]
public void Strong_password_resets_successfully()
{
var token = CreateValidToken();
var strongPassword = new Password("C0mpl3x!Pass#2026");
var result = _service.ResetPassword(token.Id, strongPassword);
Assert.That(result.IsSuccess, Is.True);
Assert.That(
_userRepo.FindById(TestUsers.Alice.Id).PasswordHash,
Is.Not.EqualTo(TestUsers.Alice.PasswordHash));
}
private ResetToken CreateValidToken()
{
_service.RequestPasswordReset(TestUsers.Alice.Email);
return _tokenStore.GetLatest();
}
}// Structural link to the feature via typeof() and nameof()
// The compiler verifies every reference
[TestsFor(typeof(PasswordResetFeature))]
public class PasswordResetFeatureTests
{
private readonly PasswordResetService _service;
private readonly InMemoryUserRepository _userRepo;
private readonly InMemoryTokenStore _tokenStore;
private readonly SpyEmailService _emailService;
private readonly FakeClock _clock;
public PasswordResetFeatureTests()
{
_userRepo = new InMemoryUserRepository();
_tokenStore = new InMemoryTokenStore();
_emailService = new SpyEmailService();
_clock = new FakeClock(DateTime.UtcNow);
_service = new PasswordResetService(
_userRepo, _tokenStore, _emailService, _clock);
// Seed test data
_userRepo.Add(TestUsers.Alice);
}
[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.UserCanRequestPasswordResetEmail))]
public void Valid_email_sends_reset_link()
{
var result = _service.RequestPasswordReset(TestUsers.Alice.Email);
Assert.That(result.IsSuccess, Is.True);
Assert.That(_emailService.SentEmails, Has.Count.EqualTo(1));
Assert.That(_emailService.SentEmails[0].To, Is.EqualTo(TestUsers.Alice.Email));
Assert.That(_emailService.SentEmails[0].Body, Does.Contain("reset"));
}
[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.UserCanRequestPasswordResetEmail))]
public void Unknown_email_returns_success_but_sends_no_email()
{
var result = _service.RequestPasswordReset(new Email("unknown@example.com"));
// Success returned to prevent account enumeration
Assert.That(result.IsSuccess, Is.True);
Assert.That(_emailService.SentEmails, Is.Empty);
}
[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.ResetLinkExpiresAfter24Hours))]
public void Token_used_after_24_hours_is_rejected()
{
var token = CreateValidToken();
_clock.AdvanceBy(TimeSpan.FromHours(25));
var result = _service.ValidateResetToken(token.Id);
Assert.That(result.IsSuccess, Is.False);
Assert.That(result.Error, Does.Contain("expired"));
}
[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.ResetLinkExpiresAfter24Hours))]
public void Token_used_within_24_hours_is_accepted()
{
var token = CreateValidToken();
_clock.AdvanceBy(TimeSpan.FromHours(23));
var result = _service.ValidateResetToken(token.Id);
Assert.That(result.IsSuccess, Is.True);
}
[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.NewPasswordMeetsComplexityRequirements))]
public void Weak_password_is_rejected()
{
var token = CreateValidToken();
var weakPassword = new Password("123");
var result = _service.ResetPassword(token.Id, weakPassword);
Assert.That(result.IsSuccess, Is.False);
Assert.That(result.Error, Does.Contain("complexity"));
}
[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.NewPasswordMeetsComplexityRequirements))]
public void Strong_password_resets_successfully()
{
var token = CreateValidToken();
var strongPassword = new Password("C0mpl3x!Pass#2026");
var result = _service.ResetPassword(token.Id, strongPassword);
Assert.That(result.IsSuccess, Is.True);
Assert.That(
_userRepo.FindById(TestUsers.Alice.Id).PasswordHash,
Is.Not.EqualTo(TestUsers.Alice.PasswordHash));
}
private ResetToken CreateValidToken()
{
_service.RequestPasswordReset(TestUsers.Alice.Email);
return _tokenStore.GetLatest();
}
}Characteristics:
- Feature link:
typeof(PasswordResetFeature)— compiler-checked, Ctrl+Click navigable - AC link:
nameof(PasswordResetFeature.UserCanRequestPasswordResetEmail)— refactor-safe - Compiler verifies every reference (rename AC → all tests update automatically)
- Source-generated traceability matrix includes these tests
- If the feature is deleted, these tests produce compile errors
The difference is not in the test logic — both test the same behavior. The difference is in the metadata: the typed approach's test metadata is compiler-checked, navigable, and participates in the traceability system. The spec-driven approach's test metadata is comments that can lie.
Summary
| Dimension | Spec-Driven Testing | Typed Specification Testing |
|---|---|---|
| Scope | 15+ strategies, comprehensive | Requirement-to-test chain only |
| Guidance | Excellent (principles, practices, patterns) | Minimal (compiler diagnostics) |
| Enforcement | Coverage thresholds (line, branch, function) | Per-AC requirement coverage (REQ3xx) |
| Stale detection | None | REQ302 (stale [Verifies] reference) |
| Missing detection | Low coverage flag (line-level) | Specific AC diagnostic (REQ301) |
| Granularity | File/class/method level | Acceptance criterion level |
| Learning | Read the document | Use the system |
| Language support | Rust, Python, JS, Java, C#, C++, Go | C# (with .NET ecosystem) |
The Testing DSL Vision
The typed approach starts with [Verifies] — a single attribute that links a test to an acceptance criterion. But [Verifies] is just the beginning. Every testing concern that the spec-driven approach describes in English can be expressed as a typed DSL attribute with compiler enforcement. (The Auto-Documentation from a Typed System series applies this same pattern to operational concerns — each DSL follows the attribute-to-generator-to-artifact pipeline described below.)
Here's what a fully typed testing ecosystem looks like.
Property-Based Testing DSL
The spec-driven approach describes property-based testing: round-trip, invariants, commutativity, idempotency. The typed approach enforces it:
[PropertyTest(typeof(OrderProcessingFeature),
nameof(OrderProcessingFeature.OrderTotalMustBePositive))]
[PropertyCategory(PropertyCategory.Invariant)]
[Shrinkable(typeof(OrderLineArbitrary))]
public partial class OrderTotalInvariantProperty
{
/// <summary>
/// The generator produces arbitrary OrderLine collections.
/// The shrinking strategy reduces failing inputs to minimal cases.
/// </summary>
public static Arbitrary<OrderLine[]> Generator => Arb.From(
Gen.ArrayOf(
from qty in Gen.Choose(1, 10000)
from price in Gen.Choose(1, 100000).Select(p => p / 100m)
select new OrderLine(qty, price)));
public bool Property(OrderLine[] lines)
{
var order = new Order(lines);
return order.Total > 0;
}
}[PropertyTest(typeof(OrderProcessingFeature),
nameof(OrderProcessingFeature.OrderTotalMustBePositive))]
[PropertyCategory(PropertyCategory.Invariant)]
[Shrinkable(typeof(OrderLineArbitrary))]
public partial class OrderTotalInvariantProperty
{
/// <summary>
/// The generator produces arbitrary OrderLine collections.
/// The shrinking strategy reduces failing inputs to minimal cases.
/// </summary>
public static Arbitrary<OrderLine[]> Generator => Arb.From(
Gen.ArrayOf(
from qty in Gen.Choose(1, 10000)
from price in Gen.Choose(1, 100000).Select(p => p / 100m)
select new OrderLine(qty, price)));
public bool Property(OrderLine[] lines)
{
var order = new Order(lines);
return order.Total > 0;
}
}The source generator produces:
// Generated: OrderTotalInvariantProperty.g.cs
public partial class OrderTotalInvariantProperty
{
[Fact]
[Trait("Category", "Property")]
[Trait("Feature", "OrderProcessingFeature")]
[Trait("AC", "OrderTotalMustBePositive")]
public void OrderTotalInvariantProperty_Executes()
{
Prop.ForAll(Generator, Property)
.WithShrink(OrderLineArbitrary.Shrink)
.WithMaxTest(1000)
.QuickCheckThrowOnFailure();
}
}// Generated: OrderTotalInvariantProperty.g.cs
public partial class OrderTotalInvariantProperty
{
[Fact]
[Trait("Category", "Property")]
[Trait("Feature", "OrderProcessingFeature")]
[Trait("AC", "OrderTotalMustBePositive")]
public void OrderTotalInvariantProperty_Executes()
{
Prop.ForAll(Generator, Property)
.WithShrink(OrderLineArbitrary.Shrink)
.WithMaxTest(1000)
.QuickCheckThrowOnFailure();
}
}The analyzer validates:
OrderProcessingFeatureexists (compile error if not)OrderTotalMustBePositiveis a valid AC on that feature (compile error if not)- The
Generatorproperty returnsArbitrary<T>where T matches thePropertymethod parameter (compile error if mismatched) - The
PropertyCategorymatches the actual property shape (warning ifInvariantis claimed but the property has side effects)
Mutation Testing DSL
The spec-driven approach says "mutation score > 80%." The typed approach makes mutation testing a first-class concern:
[MutationTarget(typeof(OrderProcessingFeature))]
[MutationOperators(
MutationOperator.ArithmeticReplacement,
MutationOperator.ConditionalBoundary,
MutationOperator.NegateConditional,
MutationOperator.ReturnValueMutation)]
[MinimumMutationScore(85)]
[ExcludeFromMutation(nameof(Order.ToString), Reason = "Display-only method")]
public partial class OrderProcessingMutationConfig { }[MutationTarget(typeof(OrderProcessingFeature))]
[MutationOperators(
MutationOperator.ArithmeticReplacement,
MutationOperator.ConditionalBoundary,
MutationOperator.NegateConditional,
MutationOperator.ReturnValueMutation)]
[MinimumMutationScore(85)]
[ExcludeFromMutation(nameof(Order.ToString), Reason = "Display-only method")]
public partial class OrderProcessingMutationConfig { }The source generator produces a Stryker.NET configuration:
// Generated: stryker-OrderProcessing.json
{
"stryker-config": {
"project-info": {
"name": "OrderProcessing",
"feature": "OrderProcessingFeature"
},
"mutate": [
"src/MyApp.Domain/Orders/**/*.cs"
],
"mutation-level": "Standard",
"thresholds": {
"high": 85,
"low": 70,
"break": 60
},
"excluded-mutations": [],
"ignore-methods": ["ToString"]
}
}// Generated: stryker-OrderProcessing.json
{
"stryker-config": {
"project-info": {
"name": "OrderProcessing",
"feature": "OrderProcessingFeature"
},
"mutate": [
"src/MyApp.Domain/Orders/**/*.cs"
],
"mutation-level": "Standard",
"thresholds": {
"high": 85,
"low": 70,
"break": 60
},
"excluded-mutations": [],
"ignore-methods": ["ToString"]
}
}The analyzer validates:
- The target feature exists and has implementations (compile error if feature is deleted)
- The mutation operators are valid for the implementation language (warning if operator doesn't apply)
- The
MinimumMutationScoreis achievable given the test coverage (info diagnostic with recommendation)
info MUT100: OrderProcessingFeature has 12 [Verifies] tests covering 4/4 ACs.
Mutation testing configured with score threshold 85%.
warning MUT101: OrderProcessingFeature.OrderCanBeSplitAcrossWarehouses has only
1 [Verifies] test. Consider adding edge-case tests to improve
mutation kill rate.info MUT100: OrderProcessingFeature has 12 [Verifies] tests covering 4/4 ACs.
Mutation testing configured with score threshold 85%.
warning MUT101: OrderProcessingFeature.OrderCanBeSplitAcrossWarehouses has only
1 [Verifies] test. Consider adding edge-case tests to improve
mutation kill rate.This is also the final piece of the semantic correctness puzzle. The biggest criticism of typed specifications is that a [Verifies] test can lie — it can reference an AC but test something unrelated. Mutation testing closes this gap: a lying test kills zero mutants, and [MutationTarget] catches it. Combined with executable ACs (where the test must call the AC method directly) and the REQ305 analyzer (which verifies the invocation), mutation testing is the third layer that guarantees semantic correctness. See Part VIII for the full three-layer defense.
Fuzz Testing DSL
The spec-driven approach mentions "structured fuzzing" and "protocol fuzzing." The typed approach makes fuzz targets declarative:
[FuzzTarget(typeof(OrderProcessingFeature),
nameof(OrderProcessingFeature.OrderTotalMustBePositive))]
[InputGenerator(typeof(MalformedOrderInputGenerator))]
[FuzzDuration("5m")]
[MaxInputSize(4096)]
[CrashPolicy(CrashPolicy.CollectAndContinue)]
public partial class OrderInputFuzzTest
{
/// <summary>
/// The fuzz engine calls this method with generated byte arrays.
/// The InputGenerator structures the bytes into domain-meaningful inputs.
/// </summary>
public FuzzResult Execute(byte[] input)
{
var order = MalformedOrderInputGenerator.FromBytes(input);
try
{
var result = _service.ProcessOrder(order);
// If we get here, the input was handled gracefully — good
return FuzzResult.Handled;
}
catch (DomainException)
{
// Expected: domain rejects malformed input — good
return FuzzResult.Handled;
}
// Unhandled exceptions = fuzz finding
}
}[FuzzTarget(typeof(OrderProcessingFeature),
nameof(OrderProcessingFeature.OrderTotalMustBePositive))]
[InputGenerator(typeof(MalformedOrderInputGenerator))]
[FuzzDuration("5m")]
[MaxInputSize(4096)]
[CrashPolicy(CrashPolicy.CollectAndContinue)]
public partial class OrderInputFuzzTest
{
/// <summary>
/// The fuzz engine calls this method with generated byte arrays.
/// The InputGenerator structures the bytes into domain-meaningful inputs.
/// </summary>
public FuzzResult Execute(byte[] input)
{
var order = MalformedOrderInputGenerator.FromBytes(input);
try
{
var result = _service.ProcessOrder(order);
// If we get here, the input was handled gracefully — good
return FuzzResult.Handled;
}
catch (DomainException)
{
// Expected: domain rejects malformed input — good
return FuzzResult.Handled;
}
// Unhandled exceptions = fuzz finding
}
}The source generator produces:
// Generated: OrderInputFuzzTest.g.cs
public partial class OrderInputFuzzTest
{
[Fact]
[Trait("Category", "Fuzz")]
[Trait("Feature", "OrderProcessingFeature")]
[Trait("AC", "OrderTotalMustBePositive")]
public void OrderInputFuzzTest_Executes()
{
var engine = new FuzzEngine(
target: Execute,
generator: new MalformedOrderInputGenerator(),
maxDuration: TimeSpan.FromMinutes(5),
maxInputSize: 4096,
crashPolicy: CrashPolicy.CollectAndContinue);
var report = engine.Run();
Assert.Empty(report.Crashes);
Assert.Empty(report.UnhandledExceptions);
Assert.True(report.InputsTested > 0,
"Fuzz engine must test at least one input");
}
}// Generated: OrderInputFuzzTest.g.cs
public partial class OrderInputFuzzTest
{
[Fact]
[Trait("Category", "Fuzz")]
[Trait("Feature", "OrderProcessingFeature")]
[Trait("AC", "OrderTotalMustBePositive")]
public void OrderInputFuzzTest_Executes()
{
var engine = new FuzzEngine(
target: Execute,
generator: new MalformedOrderInputGenerator(),
maxDuration: TimeSpan.FromMinutes(5),
maxInputSize: 4096,
crashPolicy: CrashPolicy.CollectAndContinue);
var report = engine.Run();
Assert.Empty(report.Crashes);
Assert.Empty(report.UnhandledExceptions);
Assert.True(report.InputsTested > 0,
"Fuzz engine must test at least one input");
}
}The analyzer validates:
- The target feature and AC exist (compile error if deleted)
- The
InputGeneratortype implementsIFuzzInputGenerator(compile error if not) - The
Executemethod has the correct signature (compile error if wrong) - The
FuzzDurationis reasonable (warning if > 30 minutes in a CI context)
Contract Testing DSL
For integration testing against external services, the spec-driven approach describes "contract testing" and "sandbox environments." The typed approach declares contracts as typed interfaces:
[ContractTest(typeof(IPaymentGateway))]
[Provider("Stripe")]
[ConsumerName("OrderService")]
[ProviderState("customer_with_valid_card")]
[VerifiesIntegration(typeof(OrderProcessingFeature),
nameof(OrderProcessingFeature.CancellationTriggersFullRefund))]
public partial class PaymentGatewayRefundContract
{
[ContractInteraction("refund_full_amount")]
public async Task<ContractResult> RefundInteraction()
{
// Arrange: the payment gateway is in state "customer_with_valid_card"
var payment = new PaymentId("pay_test_123");
var amount = new Money(99.99m, Currency.USD);
// Act
var result = await _gateway.RefundAsync(payment, amount);
// Assert: the contract specifies the response shape
return ContractResult.Verify(result)
.HasStatus(RefundStatus.Succeeded)
.HasAmount(amount)
.HasCurrency(Currency.USD);
}
}[ContractTest(typeof(IPaymentGateway))]
[Provider("Stripe")]
[ConsumerName("OrderService")]
[ProviderState("customer_with_valid_card")]
[VerifiesIntegration(typeof(OrderProcessingFeature),
nameof(OrderProcessingFeature.CancellationTriggersFullRefund))]
public partial class PaymentGatewayRefundContract
{
[ContractInteraction("refund_full_amount")]
public async Task<ContractResult> RefundInteraction()
{
// Arrange: the payment gateway is in state "customer_with_valid_card"
var payment = new PaymentId("pay_test_123");
var amount = new Money(99.99m, Currency.USD);
// Act
var result = await _gateway.RefundAsync(payment, amount);
// Assert: the contract specifies the response shape
return ContractResult.Verify(result)
.HasStatus(RefundStatus.Succeeded)
.HasAmount(amount)
.HasCurrency(Currency.USD);
}
}The source generator produces a Pact-compatible contract file:
// Generated: OrderService-Stripe-contract.json
{
"consumer": { "name": "OrderService" },
"provider": { "name": "Stripe" },
"interactions": [
{
"description": "refund_full_amount",
"providerState": "customer_with_valid_card",
"request": {
"method": "POST",
"path": "/v1/refunds",
"body": { "payment_intent": "pay_test_123", "amount": 9999 }
},
"response": {
"status": 200,
"body": { "status": "succeeded", "amount": 9999, "currency": "usd" }
}
}
],
"metadata": {
"feature": "OrderProcessingFeature",
"ac": "CancellationTriggersFullRefund"
}
}// Generated: OrderService-Stripe-contract.json
{
"consumer": { "name": "OrderService" },
"provider": { "name": "Stripe" },
"interactions": [
{
"description": "refund_full_amount",
"providerState": "customer_with_valid_card",
"request": {
"method": "POST",
"path": "/v1/refunds",
"body": { "payment_intent": "pay_test_123", "amount": 9999 }
},
"response": {
"status": 200,
"body": { "status": "succeeded", "amount": 9999, "currency": "usd" }
}
}
],
"metadata": {
"feature": "OrderProcessingFeature",
"ac": "CancellationTriggersFullRefund"
}
}The analyzer validates:
IPaymentGatewayexists and is a service interface (compile error if not)- The provider name matches a known configuration (warning if unknown)
- The feature and AC references are valid (compile error if deleted)
- Every public method on
IPaymentGatewayhas at least one[ContractInteraction](warning if uncovered)
warning CONTRACT100: IPaymentGateway.ChargeAsync has no [ContractInteraction]
in any contract test class. Consider adding a contract
for this interaction.warning CONTRACT100: IPaymentGateway.ChargeAsync has no [ContractInteraction]
in any contract test class. Consider adding a contract
for this interaction.Performance Testing DSL
The spec-driven approach defines "p95 < 2s" as a threshold. The typed approach links performance budgets to features:
[PerformanceTest(typeof(OrderProcessingFeature),
P95 = "200ms", P99 = "500ms")]
[Endpoint("POST /api/orders")]
[LoadProfile(ConcurrentUsers = 100, RampUp = "30s", Duration = "5m")]
[DataProfile(OrdersPerUser = 5, AverageLineItems = 3)]
[ResourceBudget(MaxMemoryMB = 512, MaxCpuPercent = 80)]
public partial class OrderProcessingPerformanceTest
{
[PerformanceScenario("happy_path")]
public async Task<PerformanceResult> HappyPath(HttpClient client)
{
var order = TestOrders.Typical();
var response = await client.PostAsJsonAsync("/api/orders", order);
return PerformanceResult.FromResponse(response);
}
[PerformanceScenario("large_order")]
public async Task<PerformanceResult> LargeOrder(HttpClient client)
{
var order = TestOrders.WithLineItems(100);
var response = await client.PostAsJsonAsync("/api/orders", order);
return PerformanceResult.FromResponse(response);
}
}[PerformanceTest(typeof(OrderProcessingFeature),
P95 = "200ms", P99 = "500ms")]
[Endpoint("POST /api/orders")]
[LoadProfile(ConcurrentUsers = 100, RampUp = "30s", Duration = "5m")]
[DataProfile(OrdersPerUser = 5, AverageLineItems = 3)]
[ResourceBudget(MaxMemoryMB = 512, MaxCpuPercent = 80)]
public partial class OrderProcessingPerformanceTest
{
[PerformanceScenario("happy_path")]
public async Task<PerformanceResult> HappyPath(HttpClient client)
{
var order = TestOrders.Typical();
var response = await client.PostAsJsonAsync("/api/orders", order);
return PerformanceResult.FromResponse(response);
}
[PerformanceScenario("large_order")]
public async Task<PerformanceResult> LargeOrder(HttpClient client)
{
var order = TestOrders.WithLineItems(100);
var response = await client.PostAsJsonAsync("/api/orders", order);
return PerformanceResult.FromResponse(response);
}
}The source generator produces a k6 load test script:
// Generated: order-processing-perf.k6.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '30s', target: 100 }, // ramp up
{ duration: '5m', target: 100 }, // sustained
{ duration: '30s', target: 0 }, // ramp down
],
thresholds: {
'http_req_duration{scenario:happy_path}': ['p(95)<200', 'p(99)<500'],
'http_req_duration{scenario:large_order}': ['p(95)<200', 'p(99)<500'],
},
};
export default function () {
// Scenario: happy_path (80% weight)
// Scenario: large_order (20% weight)
const scenario = Math.random() < 0.8 ? 'happy_path' : 'large_order';
// ... generated test logic
}// Generated: order-processing-perf.k6.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '30s', target: 100 }, // ramp up
{ duration: '5m', target: 100 }, // sustained
{ duration: '30s', target: 0 }, // ramp down
],
thresholds: {
'http_req_duration{scenario:happy_path}': ['p(95)<200', 'p(99)<500'],
'http_req_duration{scenario:large_order}': ['p(95)<200', 'p(99)<500'],
},
};
export default function () {
// Scenario: happy_path (80% weight)
// Scenario: large_order (20% weight)
const scenario = Math.random() < 0.8 ? 'happy_path' : 'large_order';
// ... generated test logic
}The analyzer validates:
- The target feature exists (compile error if deleted)
- The endpoint matches a real API route (warning if no matching controller action)
- The P95/P99 values parse as valid durations (compile error if "200xs")
- The load profile is reasonable (warning if > 10,000 concurrent users in a test environment)
- The feature has functional tests via
[Verifies](warning if performance-tested but not functionally tested)
warning PERF100: OrderProcessingFeature has a [PerformanceTest] but no
[Verifies] test for AC 'OrderCanBeSplitAcrossWarehouses'.
Performance testing without functional coverage is unreliable.warning PERF100: OrderProcessingFeature has a [PerformanceTest] but no
[Verifies] test for AC 'OrderCanBeSplitAcrossWarehouses'.
Performance testing without functional coverage is unreliable.The Complete Testing DSL Analyzer Suite
When all testing DSLs are in place, the analyzer output at build time covers every testing concern:
# Requirement Coverage (REQ3xx)
info REQ303: OrderProcessingFeature — all 4 ACs have [Verifies] tests ✓
info REQ303: PasswordResetFeature — all 3 ACs have [Verifies] tests ✓
warning REQ301: UserRolesFeature.AdminCanRevokeRoles has no [Verifies] test
# Property Testing (PROP1xx)
info PROP100: OrderProcessingFeature.OrderTotalMustBePositive has
[PropertyTest] with Invariant category ✓
warning PROP101: PasswordResetFeature has no [PropertyTest] for any AC.
Consider property-testing token generation and expiry logic.
# Mutation Testing (MUT1xx)
info MUT100: OrderProcessingFeature mutation config: score threshold 85%,
4 operators, 12 covering tests ✓
warning MUT101: PasswordResetFeature has no [MutationTarget] configuration.
# Fuzz Testing (FUZZ1xx)
info FUZZ100: OrderProcessingFeature.OrderTotalMustBePositive has
[FuzzTarget] with 5m duration ✓
warning FUZZ101: PasswordResetFeature has no [FuzzTarget]. Consider fuzzing
password complexity validation and token parsing.
# Contract Testing (CONTRACT1xx)
info CONTRACT100: IPaymentGateway — 3/3 methods have contract interactions ✓
warning CONTRACT101: IEmailService has no [ContractTest]. Consider adding
contracts for email delivery verification.
# Performance Testing (PERF1xx)
info PERF100: OrderProcessingFeature performance budget: P95=200ms,
P99=500ms, 100 concurrent users ✓
warning PERF101: PasswordResetFeature has no [PerformanceTest]. Consider
testing token validation under load.
# Summary
Build succeeded with 5 warnings.
Features fully covered (all DSLs): 1/3
Features with [Verifies] coverage: 2/3
Features with property tests: 1/3
Features with mutation configs: 1/3
Features with fuzz targets: 1/3
Features with performance tests: 1/3
External services with contracts: 1/2# Requirement Coverage (REQ3xx)
info REQ303: OrderProcessingFeature — all 4 ACs have [Verifies] tests ✓
info REQ303: PasswordResetFeature — all 3 ACs have [Verifies] tests ✓
warning REQ301: UserRolesFeature.AdminCanRevokeRoles has no [Verifies] test
# Property Testing (PROP1xx)
info PROP100: OrderProcessingFeature.OrderTotalMustBePositive has
[PropertyTest] with Invariant category ✓
warning PROP101: PasswordResetFeature has no [PropertyTest] for any AC.
Consider property-testing token generation and expiry logic.
# Mutation Testing (MUT1xx)
info MUT100: OrderProcessingFeature mutation config: score threshold 85%,
4 operators, 12 covering tests ✓
warning MUT101: PasswordResetFeature has no [MutationTarget] configuration.
# Fuzz Testing (FUZZ1xx)
info FUZZ100: OrderProcessingFeature.OrderTotalMustBePositive has
[FuzzTarget] with 5m duration ✓
warning FUZZ101: PasswordResetFeature has no [FuzzTarget]. Consider fuzzing
password complexity validation and token parsing.
# Contract Testing (CONTRACT1xx)
info CONTRACT100: IPaymentGateway — 3/3 methods have contract interactions ✓
warning CONTRACT101: IEmailService has no [ContractTest]. Consider adding
contracts for email delivery verification.
# Performance Testing (PERF1xx)
info PERF100: OrderProcessingFeature performance budget: P95=200ms,
P99=500ms, 100 concurrent users ✓
warning PERF101: PasswordResetFeature has no [PerformanceTest]. Consider
testing token validation under load.
# Summary
Build succeeded with 5 warnings.
Features fully covered (all DSLs): 1/3
Features with [Verifies] coverage: 2/3
Features with property tests: 1/3
Features with mutation configs: 1/3
Features with fuzz targets: 1/3
Features with performance tests: 1/3
External services with contracts: 1/2This is the key realization: the spec-driven Testing-as-Code specification describes 15+ testing strategies in English paragraphs. The typed approach can enforce all 15 as compiler-checked DSLs. Each strategy becomes an attribute family, a source generator, and an analyzer. Each produces specific, actionable diagnostics. Each links back to the feature it tests via typeof().
The spec-driven approach tells you "consider property-based testing for complex algorithms." The typed approach tells you "OrderProcessingFeature.OrderTotalMustBePositive has no [PropertyTest] — and here's the analyzer ID so you can configure it as error, warning, or suggestion per project."
The difference is not just enforcement vs description. It's specificity. The spec-driven document says "test complex algorithms." The analyzer says "test THIS algorithm, for THIS feature, covering THIS acceptance criterion." One is a strategy. The other is a work item.
And because every DSL follows the same pattern — attribute → source generator → analyzer — the cost of adding a new testing concern is sublinear. The first DSL (property testing) requires building the test DSL infrastructure. The second DSL (mutation testing) reuses it. By the fifth DSL (performance testing), adding a new testing concern is an afternoon's work, not a week's project.
The Coverage Dashboard: Generated, Not Assembled
The spec-driven approach requires assembling coverage information from multiple tools. Each tool has its own report format. Building a unified dashboard requires parsing coverlet XML, Stryker HTML, k6 JSON, and Snyk reports, then correlating them manually or through a custom aggregation layer.
The typed approach generates the dashboard from build output. Every analyzer family contributes diagnostics. The source generator aggregates them into a single report:
// Generated: TestCoverageDashboard.g.cs
public static class TestCoverageDashboard
{
public static readonly FeatureCoverage[] Features = new[]
{
new FeatureCoverage(
Feature: "OrderProcessingFeature",
AcCount: 4,
VerifiesTests: 12,
AcsCovered: 4,
PropertyTests: 1,
MutationConfigured: true,
MutationScoreThreshold: 85,
FuzzTargets: 1,
ContractTests: 3,
PerformanceTests: 1,
PerformanceBudget: "P95=200ms",
FullyCovered: true),
new FeatureCoverage(
Feature: "PasswordResetFeature",
AcCount: 3,
VerifiesTests: 6,
AcsCovered: 3,
PropertyTests: 0, // ← PROP101 warning
MutationConfigured: false, // ← MUT101 warning
MutationScoreThreshold: 0,
FuzzTargets: 0, // ← FUZZ101 warning
ContractTests: 0,
PerformanceTests: 1,
PerformanceBudget: "P95=300ms",
FullyCovered: false),
// ... one entry per feature
};
}// Generated: TestCoverageDashboard.g.cs
public static class TestCoverageDashboard
{
public static readonly FeatureCoverage[] Features = new[]
{
new FeatureCoverage(
Feature: "OrderProcessingFeature",
AcCount: 4,
VerifiesTests: 12,
AcsCovered: 4,
PropertyTests: 1,
MutationConfigured: true,
MutationScoreThreshold: 85,
FuzzTargets: 1,
ContractTests: 3,
PerformanceTests: 1,
PerformanceBudget: "P95=200ms",
FullyCovered: true),
new FeatureCoverage(
Feature: "PasswordResetFeature",
AcCount: 3,
VerifiesTests: 6,
AcsCovered: 3,
PropertyTests: 0, // ← PROP101 warning
MutationConfigured: false, // ← MUT101 warning
MutationScoreThreshold: 0,
FuzzTargets: 0, // ← FUZZ101 warning
ContractTests: 0,
PerformanceTests: 1,
PerformanceBudget: "P95=300ms",
FullyCovered: false),
// ... one entry per feature
};
}This generated class is available at compile time. A dashboard UI can read it. A CI gate can query it. An AI agent can inspect it. No parsing. No aggregation. No "which report format does this tool use?" The data model is a C# class — queryable, type-safe, and always current.
The spec-driven approach builds dashboards from tool outputs. The typed approach generates dashboards from the type system. One is integration work that breaks when a tool changes its output format. The other is generated code that's always consistent with the build.
The Test Naming Convention Trap
The spec-driven approach defines naming conventions for tests, carefully tailored per language:
- Rust:
test_function_behavior_expected - Python:
test_when_given_then - JavaScript:
should_behavior_when_condition - Java:
methodName_scenario_expectedResult - C#:
MethodName_Scenario_ExpectedResult
This seems helpful. Consistent naming makes tests scannable, grep-able, and self-documenting. The spec-driven Testing-as-Code specification treats naming conventions as a core practice with explicit violation patterns and auto-fix suggestions.
But naming conventions are a trap. They create a brittle, human-maintained link between tests and the things they test. The typed approach makes naming irrelevant — because the [Verifies] attribute IS the link.
How Convention-Based Naming Breaks
Scenario 1: Renamed acceptance criterion.
The AC was originally "User can reset password." A product owner renames it to "User can request password recovery." In the spec-driven approach:
// The test name references the OLD AC wording
[Fact]
public void RequestReset_ValidEmail_SendsEmail() // "Reset" not "Recovery"
{
// ...
}// The test name references the OLD AC wording
[Fact]
public void RequestReset_ValidEmail_SendsEmail() // "Reset" not "Recovery"
{
// ...
}The test still passes. The name is now wrong — it says "Reset" but the AC says "Recovery." Nobody notices. Over months, half the tests reference old AC names and half reference new ones. The naming convention that was supposed to provide traceability now provides misinformation.
In the typed approach:
// The AC method is renamed via IDE refactoring
// Before: nameof(PasswordResetFeature.UserCanResetPassword)
// After: nameof(PasswordRecoveryFeature.UserCanRequestPasswordRecovery)
[Verifies(typeof(PasswordRecoveryFeature),
nameof(PasswordRecoveryFeature.UserCanRequestPasswordRecovery))]
public void Valid_email_sends_recovery_link()
{
// Test name doesn't matter — the attribute is the link
}// The AC method is renamed via IDE refactoring
// Before: nameof(PasswordResetFeature.UserCanResetPassword)
// After: nameof(PasswordRecoveryFeature.UserCanRequestPasswordRecovery)
[Verifies(typeof(PasswordRecoveryFeature),
nameof(PasswordRecoveryFeature.UserCanRequestPasswordRecovery))]
public void Valid_email_sends_recovery_link()
{
// Test name doesn't matter — the attribute is the link
}The IDE rename propagated the change to every [Verifies] attribute automatically. The test name can be anything — it's the attribute that provides traceability.
Scenario 2: Stale naming convention.
Six months ago, the team agreed on MethodName_Scenario_ExpectedResult. Three new developers joined. They write tests with different patterns:
// Developer A (original convention)
public void RequestReset_ValidEmail_SendsEmail() { }
// Developer B (BDD-style)
public void Should_send_email_when_valid_email_provided() { }
// Developer C (Given-When-Then)
public void GivenValidEmail_WhenResetRequested_ThenEmailSent() { }
// Developer D (descriptive)
public void A_registered_user_requesting_password_reset_receives_an_email() { }// Developer A (original convention)
public void RequestReset_ValidEmail_SendsEmail() { }
// Developer B (BDD-style)
public void Should_send_email_when_valid_email_provided() { }
// Developer C (Given-When-Then)
public void GivenValidEmail_WhenResetRequested_ThenEmailSent() { }
// Developer D (descriptive)
public void A_registered_user_requesting_password_reset_receives_an_email() { }All four tests do the same thing. All four follow a "convention" — just not the same one. The naming convention document says MethodName_Scenario_ExpectedResult, but humans are inconsistent. No one enforces it because the convention is text, not code.
In the typed approach, all four tests can have any name they want. The traceability comes from the attribute:
[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.UserCanRequestPasswordResetEmail))]
public void Whatever_name_the_developer_prefers() { }[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.UserCanRequestPasswordResetEmail))]
public void Whatever_name_the_developer_prefers() { }The name is irrelevant. The [Verifies] attribute is the single source of truth. It's not a convention to follow — it's a structural link the compiler checks.
Scenario 3: Cross-language inconsistency.
The spec-driven approach defines different naming conventions per language. A polyglot team has:
- C# tests:
MethodName_Scenario_ExpectedResult - Python tests:
test_when_given_then - JavaScript tests:
should_behavior_when_condition
Now try to generate a traceability matrix. You need a parser for each naming convention, each language's test discovery mechanism, and a mapper that connects differently-named tests to the same AC. This is fragile, language-specific, and breaks whenever someone doesn't follow the convention perfectly.
The typed approach solves this for C# with attributes. For other languages, the same principle applies with their native mechanisms — Python decorators, JavaScript/TypeScript decorators, Rust proc macros. The point isn't "use C# attributes" — it's "use structural metadata, not naming conventions."
The Comparison
| Dimension | Naming Convention (Spec-Driven) | [Verifies] Attribute (Typed) |
|---|---|---|
| Link type | Implicit (name encodes meaning) | Explicit (attribute declares link) |
| Refactoring | Manual rename across tests | IDE propagation via nameof() |
| Enforcement | Code review (human) | Compiler (automated) |
| Consistency | Depends on team discipline | Structural — always consistent |
| Cross-language | Different convention per language | Same pattern per language's metadata |
| Stale detection | None (stale names compile fine) | Compile error (nameof fails) |
| Traceability matrix | Requires name-parsing heuristics | Exact: attribute → feature → AC |
| New team members | Must read and memorize convention | Must add attribute (compiler reminds) |
| Grep-ability | Good (names are searchable) | Better (attribute is searchable AND precise) |
| Wrong link detection | Impossible (name can lie) | Compile error (wrong AC = CS0117) |
The spec-driven naming convention is a social contract: "we all agree to name tests this way." Social contracts are valuable but fragile. They work when teams are small, stable, and disciplined. They break when teams grow, rotate, and face deadline pressure.
The [Verifies] attribute is a structural contract: "the compiler verifies this test covers this AC." Structural contracts don't depend on discipline. They work regardless of team size, turnover, or deadline pressure. The compiler doesn't get tired. The compiler doesn't forget the convention. The compiler doesn't join the team six months late and use a different pattern.
This is the general principle applied to a specific domain: conventions describe expectations; types enforce them. Naming conventions are conventions. Attributes are types. In a system where correctness matters, types win.
The Convention Graveyard
Every team that has existed for more than two years has a convention graveyard — a collection of abandoned, contradictory, or partially-followed naming conventions. The graveyard grows because conventions have no deprecation mechanism. When the team switches from MethodName_Scenario_ExpectedResult to BDD-style Should_behavior_when_condition, the old tests keep the old convention. Nobody renames 400 existing tests. The convention document gets updated; the codebase doesn't.
The Convention Graveyard:
Year 1 (3 developers):
Convention: MethodName_Scenario_ExpectedResult
Tests following convention: 100% (200 tests)
Enforcement: Code review (3 people, consistent)
Year 2 (6 developers, 2 new):
Convention: MethodName_Scenario_ExpectedResult
Tests following convention: 85% (340 of 400 tests)
New developers sometimes use: Should_behavior_when, GivenWhenThen
Enforcement: Code review (inconsistent, reviewers disagree)
Year 3 (10 developers, 4 new):
Convention: "We use MethodName_Scenario_ExpectedResult"
Reality: 60% old convention, 25% BDD, 10% Given-When-Then, 5% random
New convention document: "Use BDD style: Should_behavior_when_condition"
Old tests: Not renamed (too many, too risky)
Enforcement: Aspirational
Year 4 (15 developers, 6 new):
Convention: "Check the wiki" (wiki has 3 conflicting entries)
Reality: Every developer uses their own style
Test-to-AC traceability: Impossible (no consistent pattern to parse)
Enforcement: NoneThe Convention Graveyard:
Year 1 (3 developers):
Convention: MethodName_Scenario_ExpectedResult
Tests following convention: 100% (200 tests)
Enforcement: Code review (3 people, consistent)
Year 2 (6 developers, 2 new):
Convention: MethodName_Scenario_ExpectedResult
Tests following convention: 85% (340 of 400 tests)
New developers sometimes use: Should_behavior_when, GivenWhenThen
Enforcement: Code review (inconsistent, reviewers disagree)
Year 3 (10 developers, 4 new):
Convention: "We use MethodName_Scenario_ExpectedResult"
Reality: 60% old convention, 25% BDD, 10% Given-When-Then, 5% random
New convention document: "Use BDD style: Should_behavior_when_condition"
Old tests: Not renamed (too many, too risky)
Enforcement: Aspirational
Year 4 (15 developers, 6 new):
Convention: "Check the wiki" (wiki has 3 conflicting entries)
Reality: Every developer uses their own style
Test-to-AC traceability: Impossible (no consistent pattern to parse)
Enforcement: NoneThe typed approach has no convention graveyard. There's nothing to rename, nothing to migrate, nothing to deprecate. The [Verifies] attribute is the link. It was the link on Day 1 and it's the link on Day 1,000. The test method can be named anything — Test1, ShouldWork, A_very_descriptive_name_that_explains_the_scenario_in_detail — and the traceability is identical.
This isn't a trivial advantage. Test naming conventions are one of the most common sources of technical debt in test suites. Teams spend hours in code reviews debating names. They write linting rules to enforce naming patterns. They build custom tools to extract traceability from test names. All of this effort is eliminated by a single attribute. The convention that requires no convention is the best convention.
When Naming Conventions Actively Mislead
The worst case isn't inconsistent naming — it's consistently wrong naming. A test named RequestReset_ValidEmail_SendsEmail that was later refactored to test token validation, but never renamed:
// The name says: tests that a valid email sends an email
// The test actually: validates that expired tokens are rejected
// The disconnect: invisible to conventions, caught by nobody
[Fact]
public void RequestReset_ValidEmail_SendsEmail()
{
var token = new ResetToken(Guid.NewGuid(), DateTime.UtcNow.AddHours(-25));
_tokenStore.Setup(t => t.Find(token.Id)).Returns(token);
var result = _service.ValidateResetToken(token.Id);
Assert.False(result.IsSuccess);
Assert.Contains("expired", result.Error);
}// The name says: tests that a valid email sends an email
// The test actually: validates that expired tokens are rejected
// The disconnect: invisible to conventions, caught by nobody
[Fact]
public void RequestReset_ValidEmail_SendsEmail()
{
var token = new ResetToken(Guid.NewGuid(), DateTime.UtcNow.AddHours(-25));
_tokenStore.Setup(t => t.Find(token.Id)).Returns(token);
var result = _service.ValidateResetToken(token.Id);
Assert.False(result.IsSuccess);
Assert.Contains("expired", result.Error);
}The naming convention says this test covers "valid email sends email." The test actually covers "expired token is rejected." A traceability tool that parses names will map this test to the wrong AC. The coverage report will show "valid email sending" as tested and "token expiry" as untested — the exact opposite of reality.
With [Verifies]:
[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.ResetLinkExpiresAfter24Hours))]
public void RequestReset_ValidEmail_SendsEmail() // Name is wrong, but irrelevant
{
// ... same test body
}[Verifies(typeof(PasswordResetFeature),
nameof(PasswordResetFeature.ResetLinkExpiresAfter24Hours))]
public void RequestReset_ValidEmail_SendsEmail() // Name is wrong, but irrelevant
{
// ... same test body
}The name is wrong, but the attribute is right. The traceability system maps this test to ResetLinkExpiresAfter24Hours — the correct AC. The coverage report is accurate. The misleading name is a cosmetic issue, not a structural one.
The typed approach separates the concern of "what does this test verify?" from "what is this test called?" Naming is for humans (readability). Linking is for the system (traceability). Conflating the two — using the name as the link — guarantees that one will be wrong when the other changes.
Summary
| Dimension | Spec-Driven Testing | Typed Specification Testing |
|---|---|---|
| Scope | 15+ strategies, comprehensive | Requirement-to-test chain only |
| Guidance | Excellent (principles, practices, patterns) | Minimal (compiler diagnostics) |
| Enforcement | Coverage thresholds (line, branch, function) | Per-AC requirement coverage (REQ3xx) |
| Stale detection | None | REQ302 (stale [Verifies] reference) |
| Missing detection | Low coverage flag (line-level) | Specific AC diagnostic (REQ301) |
| Granularity | File/class/method level | Acceptance criterion level |
| Learning | Read the document | Use the system |
| Language support | Rust, Python, JS, Java, C#, C++, Go | C# (with .NET ecosystem) |
| Testing DSLs | Described (text) | Enforced (compiler-checked attributes) |
| Naming | Convention-based (social contract) | Attribute-based (structural contract) |
Part VI examines the broader validation question: quality gates vs Roslyn analyzers.