Part V: Tests First

"Legacy code is code without tests." -- Michael Feathers, Working Effectively with Legacy Code

SubscriptionHub has 12 tests. Six of them are decorated with [Ignore]. The last time all six remaining tests were green was four months ago. The test that verifies proration creates a real HTTP connection to the Stripe sandbox, which times out on the CI runner. The test that verifies invoice generation seeds 14 tables through raw SQL because there is no other way to get the system into a testable state. Two tests assert the same behavior with different setups and nobody knows which one is canonical.

This is the safety net you are about to jump into.

If you start refactoring now -- moving files, extracting classes, renaming namespaces -- how do you know you did not break anything? You do not. You cannot. The existing tests cover approximately 0.01% of the behavior. The remaining 99.99% is verified by customers filing support tickets.

This is non-negotiable: write the tests before you move a single line of code.

Not after. Not "we'll add tests in Phase 6." Not "the domain tests will cover it." Before. Now. Phase 1.

Tests Are Not a Phase

Most migration plans put testing at the end. "Phase 7: Write tests." This is backwards. It is like saying "Phase 7: Install the scaffolding" after the building is already up and the workers have fallen off the third floor.

Tests are not a phase. Tests are the foundation. They wrap around every phase, not appended at the end. Every phase in this migration produces two things: code and tests. The tests come first.

Diagram — Tests wrap every phase rather than trailing at the end — characterization tests first, then structural, contract, and TDD unit-and-integration tests, each pair delivering working code behind a green bar.

The green comes first. Always.

Phase 1 is special: it produces only tests and zero code changes. That is the entire point. You are capturing the system's current behavior -- warts, bugs, undocumented edge cases, and all -- so that when you start cutting in Phase 2, every change is verified against the golden master.

The kind of test changes with each phase:

Phase 1: characterization tests -- what does the system do today?
Phase 2: structural tests -- does it still compile? Do characterization tests still pass after files move?
Phase 3: contract tests -- does the ACL adapter translate correctly?
Phase 4: TDD -- write Money.Add() test first, then implement Money
Phase 5: TDD -- write "ChangePlan on cancelled fails" test first, then enforce in aggregate
Phase 6: TDD -- write "PlanChangedEvent triggers prorated invoice" test first, then wire the handler

Every phase's test suite builds on the previous one. By the end, the characterization tests from Phase 1 have been gradually replaced by proper domain tests -- but they were never absent. There was never a moment where the code changed without a test verifying the change.

Characterization Tests

A characterization test captures what the system actually does, not what it should do. Michael Feathers introduced the term in Working Effectively with Legacy Code. The process is straightforward:

Set up the system in a known state
Call the code under test
Record what happens
Assert against the recording

If the system has a bug where cancelled subscriptions silently accept plan changes, your characterization test captures that bug. You do not fix it. Not yet. You document it as a test. Later, when you build the Subscription aggregate in Phase 5 and add the invariant if (Status == Cancelled) return Result.Failure(...), you will explicitly change the characterization test to the new expected behavior. That is a conscious, tracked decision -- not an accidental side effect of refactoring.

The Golden Master Pattern

The golden master is the output snapshot from a characterization test. Run the legacy code, record every observable side effect -- database writes, return values, notifications queued, events published -- and save them. During migration, run the same test against the new code. If the output matches, you have not broken anything. If it does not match, you either introduced a regression or intentionally changed behavior. Either way, you know.

Here is a characterization test for BillingService.ProcessMonthlyBilling() -- the 200-line God Method from Part II:

[Collection("Database")]
public class BillingCharacterizationTests : IClassFixture<DatabaseFixture>
{
    private readonly DatabaseFixture _db;

    public BillingCharacterizationTests(DatabaseFixture db)
    {
        _db = db;
    }

    [Fact]
    public async Task ProcessMonthlyBilling_ActiveSubscription_GoldenMaster()
    {
        // Arrange: seed a known state
        await using var context = _db.CreateContext();

        var plan = new Plan
        {
            Name = "Professional",
            MonthlyPrice = 49.99m,
            AnnualPrice = 499.99m,
            MaxUsers = 10
        };
        context.Plans.Add(plan);
        await context.SaveChangesAsync();

        var customer = new Customer
        {
            FullName = "Alice Martin",
            Email = "alice@example.com",
            CompanyName = "Acme Corp",
            BillingAddress = new BillingAddress
            {
                Country = "FR",
                Currency = "EUR"
            }
        };
        context.Customers.Add(customer);
        await context.SaveChangesAsync();

        var subscription = new Subscription
        {
            CustomerId = customer.Id,
            PlanId = plan.Id,
            Status = "Active",
            BillingInterval = "Monthly",
            StartDate = new DateTime(2025, 1, 1),
            CurrentPeriodStart = new DateTime(2025, 6, 1),
            CurrentPeriodEnd = new DateTime(2025, 7, 1)
        };
        context.Subscriptions.Add(subscription);

        var usageRecords = new[]
        {
            new UsageRecord
            {
                SubscriptionId = subscription.Id,
                MetricName = "api_calls",
                Quantity = 15000,
                RecordedAt = new DateTime(2025, 6, 15)
            }
        };
        context.UsageRecords.AddRange(usageRecords);
        await context.SaveChangesAsync();

        // Act: call the legacy method
        var billingService = CreateLegacyBillingService(context);
        var result = await billingService.ProcessMonthlyBilling(
            customer.Id, new DateTime(2025, 7, 1));

        // Assert: golden master -- capture what the system actually does
        result.InvoiceId.Should().BeGreaterThan(0);
        result.TotalAmount.Should().Be(49.99m);
        result.Currency.Should().Be("EUR");
        result.LineItemCount.Should().Be(1);
        result.TaxAmount.Should().Be(10.00m); // 20% FR VAT on 49.99 = 10.00
        result.Status.Should().Be("Draft");

        // Verify side effects in the database
        var invoice = await context.Invoices
            .Include(i => i.LineItems)
            .FirstAsync(i => i.Id == result.InvoiceId);
        invoice.LineItems.Should().HaveCount(1);
        invoice.LineItems[0].Description.Should().Be("Plan: Professional");
        invoice.LineItems[0].Amount.Should().Be(49.99m);

        // Verify notification was queued
        var notifications = await context.NotificationQueue
            .Where(n => n.CustomerId == customer.Id)
            .ToListAsync();
        notifications.Should().HaveCount(1);
        notifications[0].Type.Should().Be("InvoiceGenerated");
    }
}

Eighty lines. Ugly. It creates raw EF Core entities with magic strings. It uses a CreateLegacyBillingService helper that wires up the God Service with all its dependencies. It asserts against hard-coded expected values. It is not a well-designed test. It is a photograph of the system, and photographs do not need to be pretty -- they need to be accurate.

Here is the second characterization test -- one that captures a bug:

[Fact]
public async Task ChangePlan_ActiveSubscription_GoldenMaster()
{
    await using var context = _db.CreateContext();
    var (customer, subscription) = await SeedActiveSubscription(context);

    var subscriptionService = CreateLegacySubscriptionService(context);
    var result = await subscriptionService.ChangePlan(
        subscription.Id, newPlanId: 2);

    result.Success.Should().BeTrue();
    result.NewPlanName.Should().Be("Enterprise");
    result.ProratedAmount.Should().Be(25.00m);

    var updated = await context.Subscriptions.FindAsync(subscription.Id);
    updated!.PlanId.Should().Be(2);
    updated.PlanChangedDate.Should().NotBeNull();
    updated.Status.Should().Be("Active");
}

[Fact]
public async Task ChangePlan_CancelledSubscription_GoldenMaster_BUG()
{
    // BUG: cancelled subscriptions silently accept plan changes.
    // The legacy system does not check subscription status before
    // applying the change. This test captures the buggy behavior
    // so we can consciously fix it in Phase 5 when we build the
    // Subscription aggregate with proper invariants.

    await using var context = _db.CreateContext();
    var (customer, subscription) = await SeedCancelledSubscription(context);

    var subscriptionService = CreateLegacySubscriptionService(context);
    var result = await subscriptionService.ChangePlan(
        subscription.Id, newPlanId: 2);

    // The bug: this succeeds when it should fail
    result.Success.Should().BeTrue();
    result.NewPlanName.Should().Be("Enterprise");

    var updated = await context.Subscriptions.FindAsync(subscription.Id);
    updated!.PlanId.Should().Be(2);
    updated.Status.Should().Be("Cancelled"); // Still cancelled, but plan changed
}

Notice the test name: _BUG. This is intentional. When you find a characterization test that captures buggy behavior, name it accordingly. In Phase 5, when the Subscription aggregate rejects plan changes on cancelled subscriptions, you will update this test:

// Phase 5 update -- replace the BUG test with the correct behavior
[Fact]
public async Task ChangePlan_CancelledSubscription_RejectsChange()
{
    var subscription = new SubscriptionBuilder()
        .WithStatus(SubscriptionStatus.Cancelled)
        .Build();

    var result = subscription.ChangePlan(PlanId.Enterprise);

    result.IsFailure.Should().BeTrue();
    result.Error.Should().Be(SubscriptionErrors.CannotChangePlanWhenCancelled);
}

That is the lifecycle: characterization test captures the bug in Phase 1, domain test replaces it in Phase 5. The bug goes from "undocumented" to "documented" to "fixed and tested."

Setting Up Test Infrastructure for Legacy Code

The legacy system was not designed for testing. There are no interfaces to mock. BillingService takes 8 constructor parameters, including HttpClient, SmtpClient, and IConfiguration. You have two options:

Mock everything: 47 lines of Mock<T>() setup, brittle, breaks when implementation changes
Use real infrastructure, mock only externals: Testcontainers for the database, real DI container, fake adapters for Stripe and SMTP

Option 2. Always option 2.

The characterization tests need to exercise the real code path -- the same SQL, the same EF Core queries, the same transaction behavior. If you mock the database, your characterization test is a photograph of a painting of the system. It is a copy of what you think the system does, not what the system actually does. The whole point is to record reality.

Mock only what you cannot run locally: Stripe's API, the tax calculation service, the SMTP server. Everything else -- the database, the DI container, the middleware pipeline -- runs for real.

These tests are temporary scaffolding. They are ugly. They test implementation details. They have brittle assertions against exact amounts and string values. They will be replaced. But for the next six phases, they are the only thing standing between you and a production outage caused by a refactoring gone wrong.

Essential.

Tests at Every Phase

Each phase of the migration produces both code and tests. The tests come first -- write the test, then make it pass. Here is what each phase adds.

Phase 2: Structural Tests

Phase 2 moves files into bounded context libraries. The code does not change -- only the project structure. The tests verify:

Does the solution still compile?
Do all characterization tests still pass?
Are there no unintended dependency changes?

You do not write new behavioral tests in Phase 2. The characterization tests from Phase 1 are the safety net. If they pass after you move InvoiceService.cs from SubscriptionHub.Services to Billing.Application, the move was safe.

Phase 3: Contract Tests

Phase 3 introduces Anti-Corruption Layers -- adapters that translate between external systems and your domain. Each adapter gets a contract test:

[Fact]
public async Task StripeAdapter_CreateCharge_TranslatesToPaymentResult()
{
    var fakeStripe = new FakeStripeAdapter();
    IPaymentGateway gateway = fakeStripe;

    var result = await gateway.CreateCharge(
        PaymentRequest.Create(
            customerId: CustomerId.From("cus_test_123"),
            amount: Money.EUR(49.99m),
            description: "Invoice #INV-2025-001"
        ));

    result.IsSuccess.Should().BeTrue();
    result.Value.TransactionId.Should().NotBeEmpty();
    result.Value.Amount.Should().Be(Money.EUR(49.99m));
    result.Value.Status.Should().Be(PaymentStatus.Succeeded);
}

[Fact]
public async Task StripeAdapter_PaymentFailure_TranslatesToResult()
{
    var fakeStripe = new FakeStripeAdapter();
    fakeStripe.SimulateDecline(DeclineReason.InsufficientFunds);
    IPaymentGateway gateway = fakeStripe;

    var result = await gateway.CreateCharge(
        PaymentRequest.Create(
            customerId: CustomerId.From("cus_test_123"),
            amount: Money.EUR(49.99m),
            description: "Invoice #INV-2025-001"
        ));

    result.IsFailure.Should().BeTrue();
    result.Error.Should().Be(PaymentErrors.InsufficientFunds);
}

Notice: the contract test uses the same IPaymentGateway interface that the real Stripe adapter implements. If the fake and the real adapter both satisfy this contract, the domain code works with either. The contract test proves the port design is sound.

Phase 4: TDD Value Object Tests

Phase 4 is pure TDD. Write the test, watch it fail, implement the Value Object, watch it pass. Value Objects are pure functions -- no database, no DI, no infrastructure. These are the fastest tests in the suite.

[Fact]
public void Money_Add_SameCurrency_ReturnsSum()
{
    var a = Money.EUR(30.00m);
    var b = Money.EUR(19.99m);

    var result = a.Add(b);

    result.IsSuccess.Should().BeTrue();
    result.Value.Should().Be(Money.EUR(49.99m));
}

[Fact]
public void Money_Add_DifferentCurrency_ReturnsFailure()
{
    var eur = Money.EUR(30.00m);
    var usd = Money.USD(30.00m);

    var result = eur.Add(usd);

    result.IsFailure.Should().BeTrue();
    result.Error.Should().Be(MoneyErrors.CurrencyMismatch);
}

[Fact]
public void Money_Negative_ReturnsFailure()
{
    var result = Money.Create(-10.00m, Currency.EUR);

    result.IsFailure.Should().BeTrue();
    result.Error.Should().Be(MoneyErrors.NegativeAmount);
}

Three tests. No mocks. No database. No DI. Pure domain logic. These run in microseconds. The three proration implementations from Part II -- the one in BillingService, the one in SubscriptionController, and the one in sp_CalculateProration -- collapse into a single SubscriptionPeriod.CalculateProration() method with 15 tests covering every edge case.

Cross-reference: the Result.IsFailure pattern used throughout these tests is detailed in The Result Pattern.

Phase 5: TDD Aggregate Tests

Phase 5 builds domain aggregates with invariants. The test specifies the invariant; the aggregate enforces it.

[Fact]
public void Subscription_ChangePlan_WhenCancelled_Fails()
{
    var subscription = new SubscriptionBuilder()
        .WithStatus(SubscriptionStatus.Cancelled)
        .Build();

    var result = subscription.ChangePlan(
        PlanId.Enterprise,
        clock: new FakeClock(new DateTime(2025, 7, 15)));

    result.IsFailure.Should().BeTrue();
    result.Error.Should().Be(SubscriptionErrors.CannotChangePlanWhenCancelled);
}

[Fact]
public void Subscription_ChangePlan_WhenActive_SucceedsAndRaisesDomainEvent()
{
    var subscription = new SubscriptionBuilder()
        .WithStatus(SubscriptionStatus.Active)
        .WithPlan(PlanId.Professional)
        .Build();

    var result = subscription.ChangePlan(
        PlanId.Enterprise,
        clock: new FakeClock(new DateTime(2025, 7, 15)));

    result.IsSuccess.Should().BeTrue();
    subscription.PlanId.Should().Be(PlanId.Enterprise);
    subscription.DomainEvents.Should().ContainSingle<PlanChangedEvent>(e =>
        e.OldPlanId == PlanId.Professional &&
        e.NewPlanId == PlanId.Enterprise);
}

This is where the characterization test ChangePlan_CancelledSubscription_GoldenMaster_BUG gets replaced. The bug is now an invariant. The test documents the correct behavior. The old characterization test gets deleted with a commit message: "Replace characterization test with domain invariant -- cancelled subscriptions now reject plan changes."

Phase 6: Domain Event Tests

Phase 6 wires domain events to handlers. Each handler gets a focused test:

[Fact]
public async Task PlanChangedHandler_GeneratesProratedInvoice()
{
    var fakeRepository = new FakeSubscriptionRepository();
    var fakeInvoiceService = new FakeInvoiceService();
    var handler = new GenerateProratedInvoiceHandler(
        fakeRepository, fakeInvoiceService);

    var @event = new PlanChangedEvent(
        SubscriptionId: SubscriptionId.From(42),
        OldPlanId: PlanId.Professional,
        NewPlanId: PlanId.Enterprise,
        ChangedAt: new DateTime(2025, 7, 15));

    await handler.HandleAsync(@event);

    fakeInvoiceService.GeneratedInvoices.Should().ContainSingle(inv =>
        inv.SubscriptionId == SubscriptionId.From(42) &&
        inv.Type == InvoiceType.ProratedCredit);
}

No database. No HTTP. The handler receives an event, calls the repository, generates an invoice. The fakes record everything. The test reads like a specification.

Test Matrix

Here is the complete picture -- what tests exist at each phase, and what they verify:

Characterization tests peak at Phase 1 and decline as domain tests replace them. Unit tests appear in Phase 4 and grow rapidly. Architecture tests appear in Phase 2 and stay forever. By Phase 6, the characterization tests are gone -- replaced by a proper test suite that verifies intent, not just behavior.

Testcontainers -- Real Databases, Not SQLite

Do not use SQLite in-memory for integration tests. I know it is tempting. I know it starts instantly. I know it does not require Docker. But it lies.

SQLite behaves differently from PostgreSQL in ways that will burn you:

JSON columns: PostgreSQL has native jsonb. SQLite stores JSON as text. EF Core generates different SQL for JsonProperty queries. A test passing on SQLite can fail on PostgreSQL because the generated WHERE clause is different.
Collation: PostgreSQL is case-sensitive by default. SQLite is case-insensitive for ASCII. Your test passes because "alice@example.com" matches "Alice@Example.com" in SQLite. In production, the query returns zero rows.
Concurrency tokens: PostgreSQL's xmin system column does not exist in SQLite. If you use xmin for optimistic concurrency (and you should), your tests cannot verify concurrency conflicts.
Transactions: SQLite serializes all writes. PostgreSQL allows concurrent transactions with different isolation levels. A test that verifies read-committed behavior against SQLite is verifying nothing.
Date/time functions: EXTRACT(MONTH FROM ...) in PostgreSQL is strftime('%m', ...) in SQLite. EF Core translates some LINQ expressions differently depending on the provider.

You need the real database. Testcontainers gives you the real database with two seconds of startup cost.

public sealed class DatabaseFixture : IAsyncLifetime
{
    private readonly PostgreSqlContainer _container = new PostgreSqlBuilder()
        .WithImage("postgres:16")
        .WithDatabase("subscriptionhub_test")
        .WithUsername("test")
        .WithPassword("test")
        .WithCleanUp(true)
        .Build();

    public string ConnectionString => _container.GetConnectionString();

    public async Task InitializeAsync()
    {
        await _container.StartAsync();

        // Apply EF Core migrations to the test database
        await using var context = CreateContext();
        await context.Database.MigrateAsync();
    }

    public async Task DisposeAsync()
    {
        await _container.DisposeAsync();
    }

    public AppDbContext CreateContext()
    {
        var options = new DbContextOptionsBuilder<AppDbContext>()
            .UseNpgsql(ConnectionString)
            .Options;

        return new AppDbContext(options);
    }
}

That is it. A PostgreSQL 16 container starts in ~2 seconds on a modern machine, runs the same migrations your production database uses, and provides a connection string that EF Core uses transparently. No SQLite. No in-memory provider. The real thing.

Use xUnit's IClassFixture<DatabaseFixture> to share the container across all tests in a class, and [Collection("Database")] to share it across test classes that need the same database:

[Collection("Database")]
public class InvoiceCharacterizationTests : IClassFixture<DatabaseFixture>
{
    private readonly DatabaseFixture _db;

    public InvoiceCharacterizationTests(DatabaseFixture db)
    {
        _db = db;
    }

    [Fact]
    public async Task GenerateInvoice_WithUsageMetering_CalculatesOverageCharges()
    {
        await using var context = _db.CreateContext();

        // Seed data and run legacy code...
        var (customer, subscription) = await SeedSubscriptionWithUsage(
            context, apiCalls: 15_000, includedCalls: 10_000);

        var billingService = CreateLegacyBillingService(context);
        var result = await billingService.ProcessMonthlyBilling(
            customer.Id, new DateTime(2025, 7, 1));

        // Golden master: 5,000 overage calls * $0.001 = $5.00 extra
        result.LineItemCount.Should().Be(2);
        result.TotalAmount.Should().Be(54.99m); // 49.99 base + 5.00 overage
    }
}

The container lifecycle is automatic. xUnit creates it before the first test, reuses it across the class, and destroys it after the last test. Docker handles cleanup. You never see it.

Performance: a typical characterization test run with 30 tests and a shared PostgreSQL container completes in ~8 seconds. The container starts in ~2 seconds; the remaining 6 seconds are actual test execution against a real database. Compare this to the legacy test suite's 45-second runtime with 12 tests (6 ignored), half of which timeout waiting for external APIs.

Real SQL. Real transactions. Real connection pooling. Real PostgreSQL behavior. When your test passes, it means something.

WebApplicationFactory -- Real HTTP Pipeline

Testcontainers gives you a real database. WebApplicationFactory<Program> gives you a real HTTP pipeline. Together, they give you integration tests that exercise the entire stack: routing, middleware, authentication, model binding, serialization, DI resolution, database access, and response formatting.

This is how you test the full request/response cycle without deploying anything.

public class ApiFixture : WebApplicationFactory<Program>, IAsyncLifetime
{
    private readonly PostgreSqlContainer _container = new PostgreSqlBuilder()
        .WithImage("postgres:16")
        .WithDatabase("subscriptionhub_test")
        .WithUsername("test")
        .WithPassword("test")
        .WithCleanUp(true)
        .Build();

    public async Task InitializeAsync()
    {
        await _container.StartAsync();
    }

    public new async Task DisposeAsync()
    {
        await _container.DisposeAsync();
        await base.DisposeAsync();
    }

    protected override void ConfigureWebHost(IWebHostBuilder builder)
    {
        builder.ConfigureServices(services =>
        {
            // Remove the real DbContext registration
            var descriptor = services.SingleOrDefault(
                d => d.ServiceType == typeof(DbContextOptions<AppDbContext>));
            if (descriptor != null) services.Remove(descriptor);

            // Register DbContext with Testcontainers connection string
            services.AddDbContext<AppDbContext>(options =>
                options.UseNpgsql(_container.GetConnectionString()));

            // Replace real external adapters with fakes
            services.RemoveAll<IPaymentGateway>();
            services.AddSingleton<IPaymentGateway, FakeStripeAdapter>();

            services.RemoveAll<ITaxCalculator>();
            services.AddSingleton<ITaxCalculator, FakeTaxCalculator>();

            services.RemoveAll<IEmailSender>();
            services.AddSingleton<IEmailSender, FakeEmailSender>();
        });
    }
}

The key principle: mock what you do not control; run what you do. You do not control Stripe's API, so you swap in FakeStripeAdapter. You do not control the tax calculation service, so you swap in FakeTaxCalculator. But you do control the database, the middleware, the routing, the serialization. Those run for real.

Here is a full end-to-end test:

[Collection("Api")]
public class SubscriptionApiTests : IClassFixture<ApiFixture>
{
    private readonly HttpClient _client;
    private readonly ApiFixture _fixture;

    public SubscriptionApiTests(ApiFixture fixture)
    {
        _fixture = fixture;
        _client = fixture.CreateClient();
    }

    [Fact]
    public async Task ChangePlan_ActiveSubscription_Returns200AndUpdatesState()
    {
        // Arrange: seed via API (tests the POST endpoint too)
        var createResponse = await _client.PostAsJsonAsync(
            "/api/subscriptions",
            new
            {
                CustomerEmail = "alice@example.com",
                PlanId = "professional",
                BillingInterval = "monthly"
            });
        createResponse.EnsureSuccessStatusCode();
        var created = await createResponse.Content
            .ReadFromJsonAsync<SubscriptionDto>();

        // Act: change plan
        var changeResponse = await _client.PutAsJsonAsync(
            $"/api/subscriptions/{created!.Id}/change-plan",
            new { NewPlanId = "enterprise" });

        // Assert: HTTP response
        changeResponse.StatusCode.Should().Be(HttpStatusCode.OK);
        var result = await changeResponse.Content
            .ReadFromJsonAsync<PlanChangeResultDto>();
        result!.OldPlan.Should().Be("professional");
        result.NewPlan.Should().Be("enterprise");
        result.ProratedAmount.Should().BeGreaterThan(0);

        // Assert: database state
        using var scope = _fixture.Services.CreateScope();
        var context = scope.ServiceProvider.GetRequiredService<AppDbContext>();
        var subscription = await context.Subscriptions
            .FirstAsync(s => s.Id == created.Id);
        subscription.PlanId.Should().Be(
            context.Plans.First(p => p.Name == "Enterprise").Id);
        subscription.PlanChangedDate.Should().NotBeNull();

        // Assert: notification was queued
        var emailSender = _fixture.Services
            .GetRequiredService<IEmailSender>() as FakeEmailSender;
        emailSender!.SentEmails.Should().Contain(e =>
            e.To == "alice@example.com" &&
            e.Subject.Contains("plan change"));
    }
}

This test exercises the entire stack. It sends real HTTP requests through the ASP.NET pipeline. It hits a real PostgreSQL database. It verifies the response body, the database state, and the side effects. The only things that are fake are the external services you do not control.

During the migration, these tests are invaluable for ACL verification. When you introduce the IPaymentGateway interface in Phase 3, you can verify that the old endpoint still returns the same response shape:

[Fact]
public async Task LegacyEndpoint_StillReturnsOriginalShape_DuringMigration()
{
    // Arrange
    var (client, subscriptionId) = await CreateActiveSubscription();

    // Act: call the legacy endpoint that now delegates to the new aggregate
    var response = await client.PostAsJsonAsync(
        $"/api/billing/process/{subscriptionId}",
        new { BillingDate = "2025-07-01" });

    // Assert: response shape unchanged -- golden master
    response.StatusCode.Should().Be(HttpStatusCode.OK);
    var result = await response.Content.ReadFromJsonAsync<JsonElement>();
    result.GetProperty("invoiceId").GetInt32().Should().BeGreaterThan(0);
    result.GetProperty("totalAmount").GetDecimal().Should().BeGreaterThan(0);
    result.GetProperty("currency").GetString().Should().NotBeNullOrEmpty();
    result.GetProperty("lineItemCount").GetInt32().Should().BeGreaterThan(0);
    result.GetProperty("status").GetString().Should().Be("Draft");
}

This test does not care about the internals. It cares about the HTTP contract. As long as the response shape is unchanged, the migration is invisible to the API consumers. That is the Strangler Fig in action, verified by a test.

Fakes Over Mocks

Moq. NSubstitute. FakeItEasy. They are all fine libraries. Stop using them for domain tests.

Here is the problem with mocks:

// Mock-based test -- fragile, coupled to implementation
[Fact]
public async Task ProcessBilling_CallsStripe_WithCorrectAmount()
{
    var mockGateway = new Mock<IPaymentGateway>();
    mockGateway
        .Setup(g => g.CreateCharge(It.IsAny<PaymentRequest>()))
        .ReturnsAsync(Result.Success(new PaymentResult("ch_123",
            Money.EUR(49.99m), PaymentStatus.Succeeded)));

    var service = new BillingService(mockGateway.Object, ...);

    await service.ProcessMonthlyBilling(customerId, billingDate);

    mockGateway.Verify(g => g.CreateCharge(
        It.Is<PaymentRequest>(r => r.Amount == Money.EUR(49.99m))),
        Times.Once);
}

This test verifies that BillingService calls CreateCharge with the right amount. It does not verify that CreateCharge works. It does not verify what happens when the charge fails. It does not verify what happens when the charge succeeds but returns a different status. Every new scenario requires a new Setup() line. The mock grows until it is more complex than the real adapter.

Fakes are different. A fake implements the same interface as the real adapter, with a simplified in-memory implementation. It is reusable. It is deterministic. It documents the expected behavior of the port.

public class FakeStripeAdapter : IPaymentGateway
{
    private readonly Dictionary<string, PaymentResult> _charges = new();
    private DeclineReason? _forcedDecline;

    public IReadOnlyList<PaymentRequest> ReceivedRequests { get; } = new List<PaymentRequest>();

    public Task<Result<PaymentResult>> CreateCharge(PaymentRequest request)
    {
        ((List<PaymentRequest>)ReceivedRequests).Add(request);

        if (_forcedDecline.HasValue)
        {
            return Task.FromResult(Result.Failure<PaymentResult>(
                PaymentErrors.Declined(_forcedDecline.Value)));
        }

        var transactionId = $"ch_fake_{Guid.NewGuid():N}";
        var result = new PaymentResult(
            transactionId, request.Amount, PaymentStatus.Succeeded);
        _charges[transactionId] = result;

        return Task.FromResult(Result.Success(result));
    }

    public Task<Result<RefundResult>> RefundCharge(
        string transactionId, Money amount)
    {
        if (!_charges.ContainsKey(transactionId))
        {
            return Task.FromResult(Result.Failure<RefundResult>(
                PaymentErrors.TransactionNotFound));
        }

        var result = new RefundResult(
            $"re_fake_{Guid.NewGuid():N}", amount, RefundStatus.Succeeded);
        return Task.FromResult(Result.Success(result));
    }

    // Test control methods
    public void SimulateDecline(DeclineReason reason) => _forcedDecline = reason;
    public void Reset() { _charges.Clear(); _forcedDecline = null; }
}

The fake implements IPaymentGateway with an in-memory dictionary. Charges succeed by default and can be made to fail with SimulateDecline(). Every received request is recorded in ReceivedRequests for assertions. This is not a mock -- it is a miniature, deterministic implementation of the payment port.

Here is the tax calculator fake:

public class FakeTaxCalculator : ITaxCalculator
{
    private decimal _taxRate = 0.20m; // Default: 20% (FR VAT)

    public Task<Result<TaxCalculation>> CalculateTax(
        Money amount, Address billingAddress)
    {
        var taxAmount = Money.Create(
            Math.Round(amount.Amount * _taxRate, 2),
            amount.Currency);

        if (taxAmount.IsFailure)
            return Task.FromResult(Result.Failure<TaxCalculation>(
                taxAmount.Error));

        var calculation = new TaxCalculation(
            amount, taxAmount.Value, _taxRate, billingAddress.Country);

        return Task.FromResult(Result.Success(calculation));
    }

    // Test control
    public void SetTaxRate(decimal rate) => _taxRate = rate;
    public void SetZeroRate() => _taxRate = 0m;
}

Fifteen lines. Configurable per test. Deterministic. Uses the same Result<T> return type as the real adapter. When a test needs a different tax rate, it calls SetTaxRate(0.10m) instead of adding another Setup() line to a mock.

And the event bus fake -- the most useful of the three:

public class FakeEventBus : IEventPublisher
{
    private readonly List<IDomainEvent> _publishedEvents = new();

    public IReadOnlyList<IDomainEvent> PublishedEvents => _publishedEvents;

    public Task PublishAsync(IDomainEvent @event,
        CancellationToken cancellationToken = default)
    {
        _publishedEvents.Add(@event);
        return Task.CompletedTask;
    }

    public Task PublishAsync(IEnumerable<IDomainEvent> events,
        CancellationToken cancellationToken = default)
    {
        _publishedEvents.AddRange(events);
        return Task.CompletedTask;
    }

    // Assertion helpers
    public void ShouldHavePublished<TEvent>() where TEvent : IDomainEvent
    {
        _publishedEvents.Should().ContainSingle(e => e is TEvent,
            $"Expected a {typeof(TEvent).Name} event to be published");
    }

    public TEvent Single<TEvent>() where TEvent : IDomainEvent
    {
        return _publishedEvents.OfType<TEvent>().Should().ContainSingle()
            .Subject;
    }

    public void ShouldBeEmpty()
    {
        _publishedEvents.Should().BeEmpty(
            "No events should have been published");
    }

    public void Reset() => _publishedEvents.Clear();
}

Usage in a test:

[Fact]
public void Subscription_ChangePlan_PublishesPlanChangedEvent()
{
    var subscription = new SubscriptionBuilder()
        .WithStatus(SubscriptionStatus.Active)
        .WithPlan(PlanId.Professional)
        .Build();

    subscription.ChangePlan(PlanId.Enterprise,
        clock: new FakeClock(new DateTime(2025, 7, 15)));

    var @event = subscription.DomainEvents.OfType<PlanChangedEvent>().Single();
    @event.OldPlanId.Should().Be(PlanId.Professional);
    @event.NewPlanId.Should().Be(PlanId.Enterprise);
    @event.ChangedAt.Should().Be(new DateTime(2025, 7, 15));
}

No mocking framework. No Setup(). No Verify(). The domain object raises events, and you inspect them directly. The FakeEventBus is there for integration tests where events cross aggregate boundaries.

Fakes are production code for tests. They implement the same interfaces your real adapters implement. They are proof that your ports are well-designed -- if you cannot write a simple fake for a port, the port is doing too much. They live in SubscriptionHub.Testing.Shared and are used across all test projects.

Test Builders -- Readable Setup

Look at the characterization test from earlier. The setup is 40 lines of new Plan { ... }, new Customer { ... }, new Subscription { ... }. For one test. When you have 30 characterization tests, you have 1,200 lines of setup code. Most of it is identical. The only thing that varies between tests is the one property being tested.

Test builders solve this. They provide sensible defaults and let you override only what matters.

public class SubscriptionBuilder
{
    private SubscriptionStatus _status = SubscriptionStatus.Active;
    private PlanId _planId = PlanId.Professional;
    private Money _price = Money.EUR(49.99m);
    private DateTime _startDate = new(2025, 1, 1);
    private DateTime? _planChangedDate;
    private string _billingInterval = "Monthly";
    private CustomerId _customerId = CustomerId.From(1);

    public SubscriptionBuilder WithStatus(SubscriptionStatus status)
    {
        _status = status;
        return this;
    }

    public SubscriptionBuilder WithPlan(PlanId planId)
    {
        _planId = planId;
        return this;
    }

    public SubscriptionBuilder WithPrice(Money price)
    {
        _price = price;
        return this;
    }

    public SubscriptionBuilder WithPlanChangedDate(DateTime date)
    {
        _planChangedDate = date;
        return this;
    }

    public SubscriptionBuilder WithBillingInterval(string interval)
    {
        _billingInterval = interval;
        return this;
    }

    public SubscriptionBuilder ForCustomer(CustomerId customerId)
    {
        _customerId = customerId;
        return this;
    }

    public Subscription Build()
    {
        return Subscription.Reconstitute(
            id: SubscriptionId.New(),
            customerId: _customerId,
            planId: _planId,
            status: _status,
            price: _price,
            billingInterval: _billingInterval,
            startDate: _startDate,
            planChangedDate: _planChangedDate);
    }
}

Now compare a test with and without builders:

// Without builder -- 15 lines, most of it noise
var subscription = new Subscription
{
    Id = 42,
    CustomerId = 1,
    PlanId = 2,
    Status = "Active",
    BillingInterval = "Monthly",
    StartDate = new DateTime(2025, 1, 1),
    CurrentPeriodStart = new DateTime(2025, 6, 1),
    CurrentPeriodEnd = new DateTime(2025, 7, 1),
    MonthlyPrice = 49.99m,
    Currency = "EUR"
};

// With builder -- 3 lines, signal only
var subscription = new SubscriptionBuilder()
    .WithStatus(SubscriptionStatus.Active)
    .Build();

The builder version communicates intent: "I need an active subscription." The raw version communicates noise: "I need an object with these twelve properties set to these twelve values, ten of which are irrelevant to this test."

The same pattern applies to every domain object:

public class InvoiceBuilder
{
    private Money _amount = Money.EUR(49.99m);
    private InvoiceStatus _status = InvoiceStatus.Draft;
    private SubscriptionId _subscriptionId = SubscriptionId.From(1);
    private DateTime _invoiceDate = new(2025, 7, 1);
    private readonly List<InvoiceLineItem> _lineItems = new();

    public InvoiceBuilder WithAmount(Money amount)
    { _amount = amount; return this; }

    public InvoiceBuilder WithStatus(InvoiceStatus status)
    { _status = status; return this; }

    public InvoiceBuilder WithLineItem(string description, Money amount)
    {
        _lineItems.Add(new InvoiceLineItem(description, amount));
        return this;
    }

    public Invoice Build()
    {
        var invoice = Invoice.Create(_subscriptionId, _invoiceDate, _amount);
        foreach (var item in _lineItems)
            invoice.AddLineItem(item.Description, item.Amount);
        return invoice;
    }
}

public class PlanBuilder
{
    private string _name = "Professional";
    private Money _monthlyPrice = Money.EUR(49.99m);
    private Money _annualPrice = Money.EUR(499.99m);
    private int _maxUsers = 10;

    public PlanBuilder WithName(string name)
    { _name = name; return this; }

    public PlanBuilder WithMonthlyPrice(Money price)
    { _monthlyPrice = price; return this; }

    public Plan Build()
    {
        return Plan.Create(PlanId.New(), _name, _monthlyPrice,
            _annualPrice, _maxUsers);
    }
}

Builders live in SubscriptionHub.Testing.Shared -- a shared project referenced by all test projects. They evolve alongside the domain. When Subscription gains a new required property in Phase 5, you add a default to the builder. All existing tests continue to work because the builder provides sensible defaults. Only tests that care about the new property override it.

Tests read like specifications:

var subscription = new SubscriptionBuilder()
    .WithPlan(PlanId.Annual)
    .WithPrice(Money.EUR(499.99m))
    .WithStatus(SubscriptionStatus.Active)
    .Build();

"Given an active annual subscription at 499.99 EUR..." The test tells a story. The builder handles the plumbing.

Cross-reference: for a comprehensive treatment of the Builder pattern beyond testing, see Builder Pattern.

Architecture Tests -- Automated Rule Enforcement

Unit tests verify behavior. Integration tests verify interaction. Architecture tests verify structure. They enforce rules that humans forget.

"Domain projects must not reference Infrastructure." Everyone agrees. The team writes it on the wiki. Three months later, someone adds using Microsoft.EntityFrameworkCore; to a Domain project because they need DbContext for a "quick fix." Nobody catches it in code review because the reviewer is looking at the business logic, not the using statements.

Architecture tests catch it at build time.

Using NetArchTest.Rules:

[Fact]
public void Domain_ShouldNotReference_Infrastructure()
{
    var result = Types
        .InAssembly(typeof(Subscription).Assembly)
        .ShouldNot()
        .HaveDependencyOn("Microsoft.EntityFrameworkCore")
        .GetResult();

    result.IsSuccessful.Should().BeTrue(
        string.Join(", ", result.FailingTypeNames ?? Array.Empty<string>()));
}

[Fact]
public void Domain_ShouldNotReference_AspNetCore()
{
    var result = Types
        .InAssembly(typeof(Subscription).Assembly)
        .ShouldNot()
        .HaveDependencyOn("Microsoft.AspNetCore")
        .GetResult();

    result.IsSuccessful.Should().BeTrue();
}

[Fact]
public void DomainEvents_ShouldBeSealed_Records()
{
    var result = Types
        .InAssembly(typeof(Subscription).Assembly)
        .That()
        .ImplementInterface(typeof(IDomainEvent))
        .Should()
        .BeSealed()
        .GetResult();

    result.IsSuccessful.Should().BeTrue(
        $"Domain events must be sealed. Violations: " +
        string.Join(", ", result.FailingTypeNames ?? Array.Empty<string>()));
}

[Fact]
public void Aggregates_ShouldNotHave_PublicSetters()
{
    var aggregateTypes = Types
        .InAssembly(typeof(Subscription).Assembly)
        .That()
        .Inherit(typeof(AggregateRoot<>))
        .GetTypes();

    foreach (var type in aggregateTypes)
    {
        var publicSetters = type.GetProperties()
            .Where(p => p.SetMethod?.IsPublic == true)
            .Select(p => p.Name)
            .ToList();

        publicSetters.Should().BeEmpty(
            $"Aggregate {type.Name} has public setters: " +
            string.Join(", ", publicSetters));
    }
}

[Fact]
public void BillingContext_ShouldNotReference_SubscriptionDomain()
{
    var result = Types
        .InAssembly(typeof(Invoice).Assembly)
        .ShouldNot()
        .HaveDependencyOn("Subscription.Domain")
        .GetResult();

    result.IsSuccessful.Should().BeTrue(
        "Billing context must not directly reference Subscription domain. " +
        "Use events or shared kernel types.");
}

These tests run in milliseconds. They require no database, no Docker, no HTTP. They inspect the compiled assembly and verify structural rules. When someone adds a forbidden dependency, the test fails with a clear message: "Aggregate Subscription has public setters: Status, PlanId."

The compiler is the first line of defense. Architecture tests are the second. Between the two, the architecture enforces itself. You do not need to write it on a wiki. You do not need to catch it in code review. The build fails. That is the best documentation there is.

Architecture tests appear in Phase 2 (when you create the bounded context libraries) and grow with every subsequent phase. They are permanent -- they never get deleted. They are the immune system that prevents the Big Ball of Mud from regrowing.

CI Pipeline

Not all tests should run at the same time. Domain tests take microseconds and need no infrastructure. Integration tests need Docker and take seconds. E2E tests need the full pipeline and take a minute. Running all of them on every commit wastes time and blocks developers.

The test pyramid maps to the CI pipeline:

Every commit runs domain tests and architecture tests. These are pure C# -- no database, no Docker, no external dependencies. They run in under a second. If they fail, the developer knows immediately. The feedback loop is tight enough to keep them in flow.

Every pull request runs integration tests. These need Docker for Testcontainers and take ~30 seconds. They verify database access, ACL contracts, and event handling. They run in the CI runner, not on the developer's machine (though developers can run them locally if they have Docker).

Every merge to main runs E2E tests. These boot WebApplicationFactory, start a PostgreSQL container, and send real HTTP requests through the full pipeline. They verify the entire request/response cycle. They take ~60 seconds. They are the final gate before deployment.

The test categories are enforced by xUnit traits:

// Domain tests -- no trait needed, they are the default
[Fact]
public void Money_Add_SameCurrency_ReturnsSum() { ... }

// Integration tests
[Fact]
[Trait("Category", "Integration")]
public async Task InvoiceRepository_Save_PersistsToDatabase() { ... }

// E2E tests
[Fact]
[Trait("Category", "E2E")]
public async Task ChangePlan_FullPipeline_Returns200() { ... }

The CI pipeline filters by trait:

# Every commit
dotnet test --filter "Category!=Integration&Category!=E2E"

# Pull request
dotnet test --filter "Category=Integration"

# Merge to main
dotnet test --filter "Category=E2E"

This gives you fast feedback on every commit, thorough verification on every PR, and full confidence on every merge. The 45-second legacy test suite with its 6 ignored tests is replaced by an 800ms domain suite, a 30-second integration suite, and a 60-second E2E suite -- all green, all the time.

Quality Gates

Tests tell you whether code works. Quality gates tell you whether the migration is progressing. They are automated thresholds that fail the build when quality drops below an acceptable level. Without them, quality erodes invisibly -- one PR at a time.

Code Coverage Thresholds

Domain libraries are pure logic. They should have near-total coverage. Infrastructure libraries integrate with external systems -- coverage expectations are lower but still meaningful.

// In your CI script or .runsettings
// Domain libraries: 90% minimum. They're pure logic -- there's no excuse.
// Application libraries: 80% minimum. Handlers are tested via integration tests too.
// Infrastructure libraries: 60% minimum. Some code is adapter boilerplate.

In the CI pipeline, use dotnet test with Coverlet and a threshold check:

# Collect coverage
dotnet test --collect:"XPlat Code Coverage" --results-directory ./coverage

# Fail if domain coverage drops below 90%
dotnet reportgenerator -reports:./coverage/**/coverage.cobertura.xml \
  -targetdir:./coverage/report -reporttypes:TextSummary

# Parse and enforce (simplified -- use a script or quality gate tool)
DOMAIN_COV=$(grep "Subscriptions.Domain" ./coverage/report/Summary.txt | awk '{print $NF}')
if (( $(echo "$DOMAIN_COV < 90" | bc -l) )); then
  echo "❌ Domain coverage $DOMAIN_COV% < 90% threshold"
  exit 1
fi

The key insight: coverage thresholds are per-project, not global. A global 80% hides the fact that your domain lib is at 95% (good) while your infrastructure lib is at 40% (bad). Per-project thresholds surface problems where they actually live.

Test Pass Rate

Zero tolerance for ignored tests. The Big Ball of Mud had 6 [Ignore]d tests -- and nobody noticed for 4 months. Quality gates prevent that:

# Fail if any test is skipped
SKIPPED=$(dotnet test --logger "trx" 2>&1 | grep -c "Skipped")
if [ "$SKIPPED" -gt 0 ]; then
  echo "❌ $SKIPPED skipped tests. Fix or remove them."
  exit 1
fi

Build-Breaking Policies

Combine architecture tests, coverage thresholds, and pass rates into a single quality gate that runs on every PR:

Architecture tests pass -- domain doesn't reference infrastructure (NetArchTest)
Domain coverage ≥ 90% -- pure logic has no excuse for low coverage
Zero skipped tests -- ignored tests are tech debt with interest
Zero compiler warnings in Domain -- <TreatWarningsAsErrors>true</TreatWarningsAsErrors> in Domain .csproj

Cross-ref: for a full treatment of automated quality gates in .NET, see QualityGate: Roslyn-Powered Static Analysis.

Performance Regression Testing

The Big Ball of Mud had N+1 queries (Part II, Pathology 6). The migration should not introduce new ones -- and it should fix the existing ones. Performance regression tests verify this.

EF Core Query Counting

During integration tests, count the SQL queries EF Core executes. If a single operation generates more queries than expected, the test fails:

public class QueryCountingInterceptor : DbCommandInterceptor
{
    public int QueryCount { get; private set; }

    public void Reset() => QueryCount = 0;

    public override InterceptionResult<DbDataReader> ReaderExecuting(
        DbCommand command, CommandEventData eventData, InterceptionResult<DbDataReader> result)
    {
        QueryCount++;
        return result;
    }
}

// In your integration test
[Fact]
[Trait("Category", "Integration")]
public async Task LoadSubscription_ExecutesExactlyOneQuery()
{
    var counter = _fixture.GetService<QueryCountingInterceptor>();
    counter.Reset();

    var subscription = await _repository.GetByIdAsync(subscriptionId);

    Assert.Equal(1, counter.QueryCount); // Not 1 + N child queries
}

Response Time Baselines

For critical API endpoints, measure response time and fail if it regresses beyond a threshold. This is not a load test -- it is a single-request sanity check:

[Fact]
[Trait("Category", "E2E")]
public async Task ChangePlan_RespondsWithin200ms()
{
    var stopwatch = Stopwatch.StartNew();

    var response = await _client.PostAsJsonAsync(
        $"/api/subscriptions/{_subscriptionId}/change-plan",
        new { PlanId = "annual" });

    stopwatch.Stop();
    Assert.True(response.IsSuccessStatusCode);
    Assert.True(stopwatch.ElapsedMilliseconds < 200,
        $"ChangePlan took {stopwatch.ElapsedMilliseconds}ms, expected < 200ms");
}

This catches the most common migration regression: accidentally loading the entire object graph through navigation properties that should have been removed. The N+1 query from Pathology 6 -- loading a Subscription that accidentally loads Customer, PaymentMethod, Invoice[], UsageRecord[] -- would show up as a 2-second response time instead of 50ms.

Before/After Performance Comparison

Track these metrics across the migration phases:

Metric	Before (Mud)	After (DDD)
`GET /subscriptions/{id}`	340ms (N+1, 14 queries)	12ms (1 query, aggregate only)
`POST /change-plan`	890ms (7 service calls, 2 HTTP)	45ms (1 aggregate + 1 event)
`GET /invoices` (list)	2.1s (Cartesian product)	85ms (read model, single query)
EF Core queries per request (avg)	14.3	1.8

Acceptance Tests — Business Requirement Validation

Unit tests verify code behavior. Integration tests verify wiring. Acceptance tests verify that the system does what the business asked for. They are written in Given/When/Then style, readable by non-developers, and tied to specific business requirements.

Given/When/Then Structure

public class PlanChangeAcceptanceTests : IClassFixture<ApiFixture>
{
    [Fact]
    [Trait("Category", "Acceptance")]
    public async Task Given_ActiveSubscription_When_PlanChanged_Then_ProratedInvoiceGenerated()
    {
        // Given: an active monthly subscription at €49/month, mid-cycle
        var subscriptionId = await _fixture.CreateSubscription(
            plan: "monthly", price: Money.EUR(49m),
            period: new SubscriptionPeriod(
                new DateOnly(2026, 3, 1), new DateOnly(2026, 3, 31)));

        // When: the customer upgrades to annual at €399/year on March 16
        var response = await _client.PostAsJsonAsync(
            $"/api/subscriptions/{subscriptionId}/change-plan",
            new { PlanId = "annual" });

        // Then: a prorated invoice is generated
        response.EnsureSuccessStatusCode();

        var invoice = await _fixture.GetLatestInvoice(subscriptionId);
        Assert.NotNull(invoice);

        // Credit for remaining 15 days of monthly: -€23.71
        // Charge for annual plan prorated to remaining period: proportional
        Assert.True(invoice.LineItems.Count >= 2, "Expected credit + charge line items");
        Assert.Contains(invoice.LineItems, li => li.Amount.Amount < 0); // credit
        Assert.Contains(invoice.LineItems, li => li.Amount.Amount > 0); // charge
    }

    [Fact]
    [Trait("Category", "Acceptance")]
    public async Task Given_CancelledSubscription_When_PlanChanged_Then_Rejected()
    {
        // Given: a cancelled subscription
        var subscriptionId = await _fixture.CreateSubscription(plan: "monthly");
        await _client.PostAsync($"/api/subscriptions/{subscriptionId}/cancel", null);

        // When: someone tries to change the plan
        var response = await _client.PostAsJsonAsync(
            $"/api/subscriptions/{subscriptionId}/change-plan",
            new { PlanId = "annual" });

        // Then: the request is rejected with a clear error
        Assert.Equal(HttpStatusCode.UnprocessableEntity, response.StatusCode);
        var error = await response.Content.ReadFromJsonAsync<ProblemDetails>();
        Assert.Contains("cancelled", error!.Detail, StringComparison.OrdinalIgnoreCase);
    }

    [Fact]
    [Trait("Category", "Acceptance")]
    public async Task Given_FailedPayment_When_ThreeRetries_Then_SubscriptionSuspended()
    {
        // Given: an active subscription with a failing payment method
        var subscriptionId = await _fixture.CreateSubscription(plan: "monthly");
        _fixture.GetService<FakeStripeAdapter>().SimulatePaymentFailure();

        // When: three payment attempts fail
        for (int i = 0; i < 3; i++)
        {
            await _fixture.TriggerBillingCycle(subscriptionId);
        }

        // Then: the subscription is suspended
        var subscription = await _client.GetFromJsonAsync<SubscriptionDto>(
            $"/api/subscriptions/{subscriptionId}");
        Assert.Equal("Suspended", subscription!.Status);
    }
}

Why Acceptance Tests Matter for Migration

During the migration, acceptance tests verify that business behavior is preserved -- not just code behavior. Characterization tests capture what the code does (including bugs). Acceptance tests capture what the business expects. When they diverge, you have found a bug in the legacy system that the migration should fix, not preserve.

The CI pipeline runs acceptance tests alongside E2E tests on merge to main:

# Merge to main
dotnet test --filter "Category=E2E|Category=Acceptance"

Smoke Tests — Post-Deployment Verification

The migration uses the Strangler Fig pattern with feature flags. When you flip a flag to route traffic through the new aggregate instead of the legacy service, you need to verify that production is healthy. Smoke tests run immediately after deployment and verify critical paths.

Health Check Endpoints

Each bounded context exposes a health check:

// In Subscriptions.Infrastructure
public class SubscriptionsHealthCheck : IHealthCheck
{
    private readonly SubscriptionsDbContext _db;

    public SubscriptionsHealthCheck(SubscriptionsDbContext db) => _db = db;

    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context, CancellationToken ct = default)
    {
        try
        {
            await _db.Database.CanConnectAsync(ct);
            return HealthCheckResult.Healthy("Subscriptions DB reachable");
        }
        catch (Exception ex)
        {
            return HealthCheckResult.Unhealthy("Subscriptions DB unreachable", ex);
        }
    }
}

// In Program.cs
builder.Services
    .AddHealthChecks()
    .AddCheck<SubscriptionsHealthCheck>("subscriptions-db")
    .AddCheck<BillingHealthCheck>("billing-db")
    .AddCheck<StripeHealthCheck>("stripe-api");

Post-Deployment Smoke Script

After each deployment, a smoke script hits critical endpoints and verifies basic functionality:

#!/bin/bash
# smoke-test.sh — run after deployment

BASE_URL="${1:-https://subscriptionhub.example.com}"
FAILED=0

check() {
  local name="$1" url="$2" expected_status="${3:-200}"
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$url")
  if [ "$STATUS" -eq "$expected_status" ]; then
    echo "  ✓ $name ($STATUS)"
  else
    echo "  ✗ $name (expected $expected_status, got $STATUS)"
    FAILED=1
  fi
}

echo "Smoke testing $BASE_URL..."
check "Health"          "$BASE_URL/health"
check "Subscriptions"   "$BASE_URL/api/subscriptions?limit=1"
check "Invoices"        "$BASE_URL/api/invoices?limit=1"
check "Plans"           "$BASE_URL/api/plans"

exit $FAILED

Strangler Fig Verification

When you flip a feature flag to route through the new Subscription aggregate:

Deploy with the flag ON for 5% of traffic (canary)
Run smoke tests against the canary
Compare response shapes: old path vs new path must return identical JSON structure
Monitor error rates and response times for 15 minutes
If healthy: ramp to 100%. If not: flip flag OFF (instant rollback)

[Fact]
[Trait("Category", "Smoke")]
public async Task StranglerFigParity_OldAndNewPaths_ReturnSameShape()
{
    // Force old path
    _fixture.SetFeatureFlag("UseNewSubscriptionAggregate", false);
    var oldResponse = await _client.GetAsync($"/api/subscriptions/{_id}");
    var oldJson = await oldResponse.Content.ReadAsStringAsync();

    // Force new path
    _fixture.SetFeatureFlag("UseNewSubscriptionAggregate", true);
    var newResponse = await _client.GetAsync($"/api/subscriptions/{_id}");
    var newJson = await newResponse.Content.ReadAsStringAsync();

    // Same HTTP status
    Assert.Equal(oldResponse.StatusCode, newResponse.StatusCode);

    // Same JSON structure (ignoring values that may differ like timestamps)
    var oldKeys = JsonDocument.Parse(oldJson).RootElement.EnumerateObject()
        .Select(p => p.Name).OrderBy(n => n);
    var newKeys = JsonDocument.Parse(newJson).RootElement.EnumerateObject()
        .Select(p => p.Name).OrderBy(n => n);
    Assert.Equal(oldKeys, newKeys);
}

"The Strangler Fig gives you a rollback mechanism. Smoke tests give you the confidence to use it. Without both, you are deploying blind."

Before and After

Here is the transformation in numbers.

Before: Phase 0

Metric	Value
Total tests	12
Ignored tests	6
Passing tests	6 (when the Stripe sandbox is up)
Test runtime	45 seconds
Last green run	4 months ago
Code coverage	~0.01%
Test infrastructure	Raw SQL seeding, live HTTP to Stripe sandbox
Developer confidence	"I don't touch BillingService on Fridays"

After: Phase 6 Complete

Metric	Value
Total tests	500+
Ignored tests	0
Passing tests	500+
Domain test runtime	~800ms
Integration test runtime	~30 seconds
E2E test runtime	~60 seconds
Last green run	Last commit
Code coverage	85%+ on domain, 70%+ overall
Test infrastructure	Testcontainers, WebApplicationFactory, fakes, builders
Developer confidence	"The tests will tell me if I broke something"

Growth Over Phases

The test count does not grow linearly. It grows in waves, with each phase adding a different type of test:

Phase	New Tests	Cumulative	Primary Test Type
Phase 1	+30	30	Characterization (golden master)
Phase 2	+5	35	Structural + architecture
Phase 3	+25	60	Contract (ACL adapters)
Phase 4	+120	180	Unit (Value Objects -- TDD)
Phase 5	+180	360	Aggregate (invariants -- TDD)
Phase 6	+140	500	Event handlers + integration

Characterization tests peak at Phase 1 (30 tests) and decline as domain tests replace them. By Phase 6, all 30 characterization tests have been replaced by proper domain tests that verify intent, not just behavior.

The color shift tells the story. Phase 1 is red -- characterization tests are emergency scaffolding, not healthy architecture. By Phase 6, everything is green -- proper domain tests, integration tests, and architecture tests that verify intent, enforce structure, and run in under a second.

The 30 characterization tests do not disappear instantly. They decline gradually:

Phase 4: 10 characterization tests replaced by Value Object tests (proration, currency conversion, tax calculation now have proper unit tests)
Phase 5: 15 characterization tests replaced by aggregate tests (the _BUG tests become invariant tests, the happy-path tests become aggregate behavior tests)
Phase 6: 5 remaining characterization tests replaced by event handler + integration tests

Each replacement is a conscious decision. You do not delete a characterization test until its domain replacement covers the same behavior. You do not delete the _BUG test until the aggregate enforces the invariant and the new test verifies it. The transition is gradual, tracked, and reversible.

Summary

What We Covered	Key Takeaway
Tests-first principle	Tests wrap every phase -- they are the foundation, not the finish
Characterization tests	Capture what the system does today, bugs included
Golden master pattern	Record outputs, assert against them during migration
Tests at every phase	Phase 1 characterization, Phase 4-5 TDD, Phase 6 event tests
Testcontainers	Real PostgreSQL, not SQLite -- 2-second startup, real SQL behavior
WebApplicationFactory	Full HTTP pipeline in-process -- routing, DI, middleware, serialization
Fakes over mocks	Implement the same interfaces, reusable, deterministic, self-documenting
Test builders	Sensible defaults, override only what matters, tests read like specs
Architecture tests	Enforce dependency rules, sealed events, no public setters -- build fails
Quality gates	Per-project coverage thresholds, zero skipped tests, build-breaking policies
Performance regression	Query counting, response time baselines, N+1 detection
Acceptance tests	Given/When/Then for business requirements -- what the business expects, not just what the code does
Smoke tests	Post-deployment health checks, Strangler Fig parity verification, canary rollback
CI pipeline	Domain (<1s) on commit, integration (30s) on PR, E2E + acceptance (60s) on merge, smoke on deploy
Before/after	12 tests (6 ignored) to 500+ tests (0 ignored), 45s to 800ms (domain)

The safety net is in place. The test infrastructure is ready. The characterization tests capture the system's current behavior -- every quirk, every bug, every undocumented edge case. The fakes implement the same ports the real adapters will use. The builders keep the setup code readable. The architecture tests will prevent the mud from regrowing.

Now we can start cutting.

Next: Part VI: Create Bounded Context Libraries -- reify the boundaries discovered in Event Storming into project structure. Separate DbContexts, same database. Kill the Common project. The compiler enforces the architecture.

`[` or `Alt+S`	Focus sidebar navigation
`]` or `Alt+C`	Focus main content
`↑` `↓`	Navigate between sidebar items
`Enter`	Open page / toggle section
`Space`	Toggle section expand/collapse
`Escape`	Close overlay / sidebar

`Ctrl+K`	Open search
`?`	Show this help

`Ctrl+=` or `Ctrl+↑`	Increase font size
`Ctrl+−` or `Ctrl+↓`	Decrease font size
`f`	Open console font selector

`Ctrl+⇧+=` or `Ctrl+⇧+↑`	Browser zoom in
`Ctrl+⇧+−` or `Ctrl+⇧+↓`	Browser zoom out
`Ctrl+⇧+0`	Reset browser zoom

`Tab`	Focus a diagram or image
`Enter`	Open full size overlay
`+` `−`	Zoom in / out (in overlay)
`Escape`	Close overlay, return focus

Part V: Tests First📋

Tests Are Not a Phase📋

Characterization Tests📋

The Golden Master Pattern📋

Setting Up Test Infrastructure for Legacy Code📋

Tests at Every Phase📋

Phase 2: Structural Tests📋

Phase 3: Contract Tests📋

Phase 4: TDD Value Object Tests📋

Phase 5: TDD Aggregate Tests📋

Phase 6: Domain Event Tests📋

Test Matrix📋

Testcontainers -- Real Databases, Not SQLite📋

WebApplicationFactory -- Real HTTP Pipeline📋

Fakes Over Mocks📋

Test Builders -- Readable Setup📋

Architecture Tests -- Automated Rule Enforcement📋

CI Pipeline📋

Quality Gates📋

Code Coverage Thresholds📋

Test Pass Rate📋

Build-Breaking Policies📋

Performance Regression Testing📋

EF Core Query Counting📋

Response Time Baselines📋

Before/After Performance Comparison📋

Acceptance Tests — Business Requirement Validation📋

Given/When/Then Structure📋

Why Acceptance Tests Matter for Migration📋

Smoke Tests — Post-Deployment Verification📋

Health Check Endpoints📋

Post-Deployment Smoke Script📋

Strangler Fig Verification📋

Before and After📋

Before: Phase 0📋

After: Phase 6 Complete📋

Growth Over Phases📋

Summary📋