Part IV: The Migration Strategy
"The only way to go fast is to go well." -- Robert C. Martin
We have the diagnosis (Part II). We have the blueprint (Part III). Four bounded contexts, five aggregates, eight Value Objects, nine policies, five boundary crossings needing ACLs, and an entity migration map that tells us where every one of the 31 flat entities belongs.
Now the question that kills most modernization efforts: how do we get there?
The answer is not "rewrite it." The answer is never "rewrite it." The answer is a strategy that lets you migrate one piece at a time, ship each piece to production independently, and roll back any piece that fails -- all while the existing system continues to serve customers without interruption.
This part lays out that strategy. We start with why the alternative -- the big-bang rewrite -- fails. Then we introduce the Strangler Fig pattern. Then we inventory what SubscriptionHub already has that we can leverage. Then we define six concrete phases, each with deliverables, risk levels, and enabling conditions. Then we decide which context to migrate first. And throughout, we talk about the one pattern that makes incremental migration possible at all: the Anti-Corruption Layer.
Why Big-Bang Rewrites Fail
Every architect has fantasized about it. You open the codebase Monday morning, look at the 47-property SubscriptionDto, the 2,400-line BillingService, the 31-entity flat DbContext, and you think: let us start over.
New solution. Clean architecture from day one. Proper aggregates. Proper events. Proper tests. Two sprints to get the foundation right, then migrate features one by one into the new system. Six months, tops. Maybe nine.
This never works. It has failed so consistently, across so many organizations, for so many decades, that the failure mode has a name. Several, actually.
The Netscape 6 Lesson
In 2000, Netscape decided to rewrite their browser from scratch. Navigator 4 was a mess -- tangled rendering engine, platform-specific hacks, years of accretion. The team decided to throw it away and build a new rendering engine (Gecko) from the ground up.
The rewrite took three years. During those three years, Internet Explorer went from 50% market share to 95%. By the time Netscape 6 shipped, nobody cared. The company never recovered. Joel Spolsky called it "the single worst strategic mistake that any software company can make," and he was right. Not because the new code was bad -- it was good. Because three years of silence is three years of market share loss, and market share does not wait for your rewrite to finish.
Fred Brooks' Second System Effect
Fred Brooks described this in The Mythical Man-Month in 1975. The second system -- the rewrite -- accumulates every feature the first system should have had, every design decision the team wished they had made, every abstraction they learned about since the first system shipped. The scope of the second system is always larger than the first, because the team conflates "migration" with "improvement." They are not just rewriting the billing service -- they are also adding multi-currency support, a better proration algorithm, event sourcing, CQRS, and a microservice boundary. The rewrite becomes a feature development project disguised as a migration.
The Real Risks
Here is what actually happens when a team attempts a big-bang rewrite of SubscriptionHub:
Feature parity takes 2x longer than estimated. The old system has six years of edge cases. The upgrade path for annual subscribers who changed plans mid-cycle and received a prorated credit that was applied to their next invoice which happened to span a tax rate change -- that edge case is handled by 47 lines of BillingService that nobody remembers writing. The rewrite team discovers it in month four when a customer complains. They discover fourteen more like it over the next six months.
The business keeps moving. While the rewrite team builds the new system, the product team ships new features in the old system. The rewrite team now has a moving target. Every new feature in the old system is a feature the rewrite team must also implement. The distance between "old" and "new" grows even as the rewrite team makes progress.
The team burns out. Rewrites are not glamorous. The first month is exciting -- clean code, new architecture, fresh decisions. By month six, the team is implementing the same CRUD operations they implemented in the old system, but with more ceremony. By month nine, morale is low. By month twelve, people start leaving. The institutional knowledge of the old system leaves with them.
Two systems run in parallel. The rewrite is never a clean cutover. There is always a transition period where the old system and the new system coexist. During this period, bug fixes must be applied to both. Configuration changes must be coordinated. The on-call team must know both systems. The operational burden doubles.
The old system survives. This is the cruelest outcome. After 18 months of rewrite effort, the new system handles 60% of the use cases. The remaining 40% are the hard ones -- the edge cases, the integrations, the compliance requirements. The team lacks the energy, the budget, or the knowledge to finish. So the old system stays. Now you have two systems, two sets of bugs, two deploys, and a team that resents both.
Why It Still Tempts Us
If big-bang rewrites fail so reliably, why does every team consider them? Because the Big Ball of Mud is exhausting. The emotional weight of opening BillingService.cs every Monday, scrolling past 2,400 lines to find the method you need, debugging through three layers of accidental coupling, and manually testing because the test suite is broken -- that weight accumulates. The rewrite fantasy is a coping mechanism: "If we could just start over, we would do it right this time." And you would. For the first three months. Then the same pressures that created the first Big Ball of Mud -- ship dates, feature requests, staff changes, scope creep -- would start creating the second one.
The answer is not to suppress the frustration. The frustration is valid. The answer is to channel it into incremental improvements that deliver the same psychological relief as a rewrite -- clean code, clear boundaries, working tests -- without the existential risk.
The only successful rewrites are the ones nobody calls rewrites. They are incremental migrations where the old system and the new code coexist, the boundary between them shifts gradually, and one day the old code is simply deleted because nothing calls it anymore.
That is the Strangler Fig.
The Strangler Fig
Martin Fowler named the pattern in 2004, borrowing from biology. In Australian rainforests, strangler fig seeds germinate in the canopy of a host tree. The fig sends roots down the trunk, wrapping around the host. It draws nutrients from the soil, grows its own canopy, and gradually replaces the host tree's structure. The host tree dies -- slowly, from the inside out -- and the fig stands in its place, using the host's shape as scaffolding.
Software migration works the same way. The new code does not replace the old code all at once. It wraps the old code, intercepts calls at specific points, and handles them with new logic. The old code is still there. It still runs. But fewer and fewer requests reach it. Over time, the new code handles everything, and the old code can be deleted.
How It Works in Practice
The key insight is that the API surface stays the same. The controllers, the endpoints, the request/response contracts -- none of these change during the migration. What changes is what happens behind the controller.
Before the migration, a controller action looks like this:
[HttpPost("subscriptions/{id}/change-plan")]
public async Task<IActionResult> ChangePlan(int id, [FromBody] ChangePlanRequest request)
{
// Old path: calls SubscriptionService directly
var result = await _subscriptionService.ChangePlan(id, request.NewPlanId);
return Ok(result);
}[HttpPost("subscriptions/{id}/change-plan")]
public async Task<IActionResult> ChangePlan(int id, [FromBody] ChangePlanRequest request)
{
// Old path: calls SubscriptionService directly
var result = await _subscriptionService.ChangePlan(id, request.NewPlanId);
return Ok(result);
}During the migration, the controller delegates to either the old service or the new aggregate, controlled by a feature flag:
[HttpPost("subscriptions/{id}/change-plan")]
public async Task<IActionResult> ChangePlan(int id, [FromBody] ChangePlanRequest request)
{
if (_featureFlags.IsEnabled("use-subscription-aggregate"))
{
// New path: loads aggregate, issues command, returns Result
var subscription = await _subscriptionRepository.GetById(new SubscriptionId(id));
var changePlanResult = subscription.ChangePlan(
new PlanTier(request.NewPlanId), _planChangePolicy);
if (changePlanResult.IsFailure)
return BadRequest(changePlanResult.Error);
await _subscriptionRepository.Save(subscription);
return Ok(MapToResponse(changePlanResult.Value));
}
else
{
// Old path: unchanged
var result = await _subscriptionService.ChangePlan(id, request.NewPlanId);
return Ok(result);
}
}[HttpPost("subscriptions/{id}/change-plan")]
public async Task<IActionResult> ChangePlan(int id, [FromBody] ChangePlanRequest request)
{
if (_featureFlags.IsEnabled("use-subscription-aggregate"))
{
// New path: loads aggregate, issues command, returns Result
var subscription = await _subscriptionRepository.GetById(new SubscriptionId(id));
var changePlanResult = subscription.ChangePlan(
new PlanTier(request.NewPlanId), _planChangePolicy);
if (changePlanResult.IsFailure)
return BadRequest(changePlanResult.Error);
await _subscriptionRepository.Save(subscription);
return Ok(MapToResponse(changePlanResult.Value));
}
else
{
// Old path: unchanged
var result = await _subscriptionService.ChangePlan(id, request.NewPlanId);
return Ok(result);
}
}The feature flag is the strangler fig's root. When the flag is off, the old code runs. When the flag is on, the new code runs. The switch is instant. The rollback is a flag flip. No redeployment. No downtime. No prayer.
You enable the flag for 1% of traffic first. Monitor. Check logs. Compare outputs. When you are confident, 10%. Then 50%. Then 100%. The old SubscriptionService.ChangePlan() method still exists. It still compiles. Nobody calls it. One day, you delete it. Nobody notices.
The beauty is psychological as much as technical. The business never notices. The customers never notice. The CEO does not get an email saying "we're pausing features for six months to rewrite the billing engine." There is no rewrite. There is a continuous improvement. Each sprint delivers value and migrates a slice of the architecture. The migration is invisible to everyone except the engineering team.
And if the migration fails -- if the new aggregate has a bug, if the new event handler drops messages, if the new ACL miscalculates tax -- you flip a flag and the old code takes over again. You fix the bug. You flip the flag back. Production never suffered.
"The business never notices. The customers never notice. One day the old code is simply gone."
What Already Exists
Before writing a single line of new code, take stock of what SubscriptionHub already has. Not everything in the mud is waste. Some of it is scaffolding -- implicit structures that the migration will formalize rather than replace.
Implicit Boundaries = Context Candidates
In Part I, we saw that the four teams already think in contexts. The Billing team knows their domain. The Subscriptions team knows their domain. The Platform team (Notifications) knows their domain. The Data team (Analytics) knows their domain. Event Storming in Part III validated these implicit boundaries and added precision: the events that belong to each context, the aggregates inside each context, the policies that connect contexts.
The implicit boundaries are not accidents. They grew from the natural division of labor over six years. They reflect real domain separations. Conway's Law says that organizations produce designs that mirror their communication structures -- and here, the communication structure is healthy. Four teams, four domains, four natural contexts. The code just never caught up.
The migration does not invent boundaries -- it encodes the boundaries that already exist in people's heads into the project structure. That is a fundamentally different operation than designing boundaries from scratch. Design requires discovery, debate, and iteration. Encoding requires discipline and tooling. We have already done the discovery (Event Storming). Now we encode.
Ad-Hoc ACLs = Translation Points
SubscriptionHub.PaymentGateway is a leaky ACL. It returns Stripe.Charge instead of a domain type. But it exists. The Billing team already decided that Stripe access should go through a wrapper. They just did not finish the job. The migration formalizes the wrapper: replace Stripe.Charge return types with PaymentResult, remove StripeCustomerId and StripeChargeId from domain entities, and push the Stripe vocabulary behind the ACL boundary where it belongs.
Similarly, the TaxCalculationService in SubscriptionHub.Services is a proto-ACL for the Tax API. It wraps the HTTP call. It has retry logic. It has a fallback rate. It just leaks -- the fallback is a hardcoded 0.10m in the middle of a billing method instead of a configurable default in the ACL.
These are not failures to replace. They are starting points. The migration takes what exists and strengthens it.
Proto-Contexts = Contexts That Started Forming Naturally
SubscriptionHub.Notifications is its own project. It has its own service classes, its own templates, its own retry logic. If you squint hard enough, it is a bounded context. It just has one fatal flaw: it queries the main DbContext instead of receiving event payloads.
SubscriptionHub.Analytics.ETL is similar. It has its own pipeline, its own warehouse loader, its own SQL queries. It is functionally independent -- the only coupling is the raw SQL that reads from the main database.
These proto-contexts are further along than you think. The migration's job is not to build them from scratch. It is to cut the last wire -- the database dependency -- and give them their own data sources (events, projections, read models).
The Existing Test Infrastructure
SubscriptionHub has 12 tests and a DatabaseFixture that requires a running SQL Server instance. That is not nothing. The DatabaseFixture knows how to seed data, reset state, and connect to a test database. The tests themselves -- the 6 that are not [Ignore]d -- exercise real code paths. They are characterization tests already, even if nobody called them that.
Phase 1 does not start from zero. It starts from 12 tests and extends to 200. The DatabaseFixture evolves into a proper WebApplicationFactory setup. The [Ignore]d tests get their infrastructure dependencies replaced with Testcontainers. The existing investment is not wasted -- it is the seed crystal for the full test suite.
The Team Knowledge
The most valuable existing asset is invisible. The Billing team's senior developer can explain the dunning process in two minutes. The Subscriptions team's tech lead knows every edge case in plan change proration. The Platform team's engineer wrote the notification template system and knows exactly which events trigger which emails.
This knowledge is the domain model. It lives in people's heads, not in the code. Event Storming extracted it onto sticky notes. The migration encodes it into aggregates, Value Objects, and policies. None of this is new knowledge -- it is existing knowledge made explicit.
"These aren't waste. They're scaffolding. The migration formalizes them."
Six Phases
The migration is not one project. It is six phases, each independently deployable, each delivering measurable value, each enabling the next. If you stop after Phase 2, you still have cleaner boundaries. If you stop after Phase 4, you still have type-safe Value Objects and formalized ACLs. Every phase is a checkpoint where the team can pause, assess, and decide whether to continue.
Here is the overview. The rest of this section expands each phase.
Phase 1: Write Characterization Tests
What it delivers: A safety net. Golden master tests that capture the current behavior of the system -- warts and all. Not tests that verify the correct behavior, but tests that verify the current behavior. The distinction matters. If the current system charges $49.99 for a monthly plan with a 10% tax fallback, the characterization test asserts $54.99, even if the real tax rate should be 8.25%. Correctness comes later. Safety comes first.
Risk level: LOW. Zero production code changes. You are adding test files only. The worst that can happen is that a test fails, which tells you something you did not know about the system.
Why it is first: Because every subsequent phase changes production code. Phase 2 moves files between projects. Phase 3 changes method signatures. Phase 4 replaces decimal fields with Value Objects. Phase 5 moves behavior from services to aggregates. Phase 6 introduces asynchronous event handlers. Every one of these changes can break something. Without tests, you will not know what broke until a customer reports it. With tests, you know in seconds.
Non-negotiable: This is the one phase that cannot be skipped, compressed, or deferred. If time is short, you can skip Phase 6 (Domain Events) and still have a viable architecture. You can defer Phase 5 (Aggregates) and still have clean boundaries. You cannot skip Phase 1. Without it, every other phase is a gamble.
What it enables: Confidence to refactor. The golden master tests tell you, "this method returned this JSON for this input." If you move the method to a different project (Phase 2), change its ACL interface (Phase 3), replace its decimal amount with Money (Phase 4), or move its logic into an aggregate (Phase 5), the test still runs. If the output changes, you know immediately.
Part V: Tests First covers the full testing strategy: golden master, characterization tests, contract tests for ACLs, TDD for Value Objects and Aggregates, integration tests for events, Testcontainers, WebApplicationFactory, architecture tests, and CI pipeline integration.
Phase 2: Create Bounded Context Libraries
What it delivers: Project structure that matches the domain. The four contexts discovered in Event Storming become four sets of projects:
Subscriptions/
├── Subscriptions.Domain/ ← Aggregates, VOs, events, interfaces
├── Subscriptions.Application/ ← Commands, queries, handlers
├── Subscriptions.Infrastructure/ ← EF Core, Stripe, HTTP
└── Subscriptions.Tests/ ← Unit + integration
Billing/
├── Billing.Domain/
├── Billing.Application/
├── Billing.Infrastructure/
└── Billing.Tests/
Notifications/
├── Notifications.Domain/
├── Notifications.Application/
├── Notifications.Infrastructure/
└── Notifications.Tests/
Analytics/
├── Analytics.Application/ ← No domain layer (read-only context)
├── Analytics.Infrastructure/
└── Analytics.Tests/
SharedKernel/
├── SharedKernel/ ← Money, EmailAddress, base types
└── SharedKernel.Tests/Subscriptions/
├── Subscriptions.Domain/ ← Aggregates, VOs, events, interfaces
├── Subscriptions.Application/ ← Commands, queries, handlers
├── Subscriptions.Infrastructure/ ← EF Core, Stripe, HTTP
└── Subscriptions.Tests/ ← Unit + integration
Billing/
├── Billing.Domain/
├── Billing.Application/
├── Billing.Infrastructure/
└── Billing.Tests/
Notifications/
├── Notifications.Domain/
├── Notifications.Application/
├── Notifications.Infrastructure/
└── Notifications.Tests/
Analytics/
├── Analytics.Application/ ← No domain layer (read-only context)
├── Analytics.Infrastructure/
└── Analytics.Tests/
SharedKernel/
├── SharedKernel/ ← Money, EmailAddress, base types
└── SharedKernel.Tests/Risk level: LOW. This phase is file moves and namespace renames. No behavior changes. The code inside the files stays identical. The compiler catches broken references. The characterization tests from Phase 1 verify nothing changed.
What it changes: The solution structure. Code that was in SubscriptionHub.Services moves into the appropriate context's Application or Domain layer. Entities from SubscriptionHub.Data move into the appropriate context's Domain or Infrastructure layer. The Common project starts shrinking -- each class goes to its owning context or to SharedKernel.
What it does NOT change: The database. The AppDbContext stays. The tables stay. The columns stay. We are not splitting the database here -- we are splitting the code that accesses the database. Separate DbContext classes per context can map to the same physical database, the same schema, the same tables. EF Core does not care that SubscriptionsDbContext maps to the Subscriptions table in the same database as BillingDbContext maps to the Invoices table.
What it enables: The compiler now enforces context boundaries. If Notifications code tries to reference a Billing entity directly, the project reference is missing and the build fails. You cannot accidentally couple contexts when they live in different projects. The architecture is encoded in the project graph.
Part VI: Create Bounded Context Libraries covers the full process: project structure, the Common project decomposition, separate DbContext classes on the same database, SharedKernel design rules, and the build verification.
Phase 3: Fix & Formalize ACLs
What it delivers: Domain interfaces for every boundary crossing. Adapter implementations that translate between external vocabulary and domain vocabulary. Contract tests that verify the translation.
The five boundary crossings from the ACL inventory:
- PaymentGateway (Billing --> Stripe):
IPaymentGatewayreturnsPaymentResult, notStripe.Charge. The adapter wrapsStripeClientand translates. - Tax API (Billing --> Tax Provider):
ITaxCalculatorreturnsTaxCalculation, not raw JSON. The adapter wraps the HTTP call, handles retries, caches. - Billing <--> Subscriptions (circular calls): Replaced by events in Phase 6. For now, the ACL formalizes the interface --
BillingServicecallsISubscriptionQueries(read-only) instead ofSubscriptionServicedirectly. - Notifications --> Main DB:
NotificationServicereceives aNotificationPayloadDTO instead of queryingAppDbContext. The ACL translates the internal query into a dedicated read model. - Analytics --> Main DB: The ETL reads from a dedicated analytics view instead of raw SQL against core tables. The view is the ACL.
Risk level: MEDIUM. This phase changes method signatures at boundaries. Callers must be updated. But the behavior stays the same -- the ACL translates, it does not add logic. The characterization tests from Phase 1 catch regressions.
What it enables: Each context can now evolve independently. When Stripe releases a new SDK, only the StripePaymentGatewayAdapter changes. When the Tax API adds a new field, only the TaxApiAdapter changes. The domain code never sees external types. The blast radius of external changes shrinks from "the entire codebase" to "one adapter class."
Part VII: Fix & Formalize ACLs covers the full treatment: ACL anatomy, the three shapes of Anti-Corruption Layers (adapter, translator, facade), contract testing, and the Notifications context's transition from database queries to event payloads.
Phase 4: Extract Value Objects
What it delivers: Type-safe domain concepts. Money instead of decimal amount + string currency. EmailAddress instead of string email. SubscriptionPeriod instead of DateTime startDate + DateTime endDate. TaxRate instead of decimal taxRate. Every Value Object has tests, equality semantics, and validation on construction.
Risk level: LOW. Value Objects are the safest refactoring in the DDD toolkit. They do not change behavior -- they make implicit behavior explicit. The proration calculation already uses endDate - startDate; wrapping that in SubscriptionPeriod.RemainingDays() does not change the arithmetic. It names it.
What it changes: Property types on entities and DTOs. Invoice.TotalAmount changes from decimal to Money. Customer.Email changes from string to EmailAddress. EF Core maps these as owned entities -- the database schema does not change. Same columns, same tables, same data. The mapping layer translates.
What it enables: The three proration implementations from Part II can now be collapsed. When Money handles currency arithmetic and SubscriptionPeriod handles date arithmetic, the proration calculation is four lines instead of forty-seven. Type safety catches bugs at compile time: you cannot add Money("USD", 49.99) to Money("EUR", 39.99) because the + operator checks currency equality.
Part VIII: Extract Value Objects covers the full TDD workflow: write the test, then extract the Value Object. Money, SubscriptionPeriod, EmailAddress, TaxRate, PlanTier. EF Core owned entity mapping with zero schema changes.
Phase 5: Build Aggregates
What it delivers: Rich domain models that encapsulate behavior and enforce invariants. The Subscription aggregate with typed IDs, private setters, Result<T> returns, and domain events. The Invoice aggregate that builds itself from commands and refuses invalid state transitions. The Payment aggregate with its own lifecycle.
Risk level: MEDIUM. This is where behavior moves from services to aggregates. The SubscriptionService.ChangePlan() method's logic moves into Subscription.ChangePlan(). The Strangler Fig pattern controls the transition: a feature flag in the controller decides whether the old service or the new aggregate handles the request. Both coexist. The flag enables gradual rollout.
What it changes: The location of business logic. Before: BillingService.ProcessMonthlyBilling() handles everything. After: Invoice.GenerateFromSubscription() creates the invoice, Invoice.AddLineItem() adds charges, Invoice.CalculateTax() applies tax, Invoice.Finalize() locks the invoice for payment. Each method enforces invariants. Each method returns Result<T> instead of throwing exceptions.
What it enables: The God Services start shrinking. BillingService delegates to Invoice methods. SubscriptionService delegates to Subscription methods. The services become thin orchestrators -- load aggregate, call method, save aggregate -- instead of 2,400-line logic containers. When a service is thin enough, it becomes a command handler in the Application layer and the old service class is deleted.
Part IX: Build Aggregates covers the full TDD process: invariant tests first, then enforce in the aggregate. Transform anemic entities into rich domain models. The Strangler Fig ACL with feature flags.
Phase 6: Introduce Domain Events
What it delivers: Decoupled communication between contexts. PlanChangedEvent, PaymentFailedEvent, InvoiceGeneratedEvent. Event handlers that live in their own context. Policy automation: when PaymentFailed fires three times, the RetryPayment policy escalates to dunning. When DunningExhausted fires, the Subscriptions context suspends the subscription -- without Billing calling SubscriptionService directly.
Risk level: MEDIUM. Introducing events changes the communication model from synchronous method calls to asynchronous message passing. This has implications for error handling, ordering, idempotency, and eventual consistency. The Transactional Outbox pattern ensures events are not lost. Idempotent handlers ensure events can be safely retried.
What it changes: The dependency graph. Before: BillingService calls SubscriptionService.Suspend() directly (circular dependency). After: BillingService publishes DunningExhausted. A handler in the Subscriptions context reacts by issuing a SuspendSubscription command. No circular dependency. No runtime coupling.
Before: NotificationService queries AppDbContext to build email templates. After: NotificationService receives InvoiceFinalized with the customer name, email, plan name, and amount in the event payload. No database query. No cross-context data access.
Before: AnalyticsEtlJob runs raw SQL against the database replica. After: Analytics event handlers consume SubscriptionCreated, PlanChanged, InvoiceFinalized, and PaymentSucceeded, updating denormalized views. No raw SQL. No schema coupling.
What it enables: True context autonomy. Each context can be deployed, tested, and evolved independently. The Subscriptions context does not know the Billing context exists -- it publishes events, and whoever subscribes receives them. The event contract is the only coupling between contexts.
Part X: Introduce Domain Events covers the full implementation: event types, handlers, Transactional Outbox, idempotency, and the subscription lifecycle as an event-driven state machine.
The Constant: Tests Accompany Every Phase
Notice that tests are not listed as a final phase. They are not listed as "Phase 7: Add Tests." Tests are Phase 1, and they continue through every subsequent phase.
| Phase | Test Deliverable |
|---|---|
| Phase 1 | Golden master tests, characterization tests, baseline CI |
| Phase 2 | Architecture tests (verify project references match context boundaries) |
| Phase 3 | Contract tests (verify ACL adapters translate correctly) |
| Phase 4 | Unit tests for each Value Object (TDD: test first, then extract) |
| Phase 5 | Invariant tests for each Aggregate (TDD: test first, then enforce) |
| Phase 6 | Event handler tests, integration tests for cross-context flows |
Tests are not a phase. Tests are the method. Every phase starts with a test that describes the desired behavior, then implements the behavior to make the test pass. The golden master tests from Phase 1 run continuously -- they are the regression safety net that proves the migration did not break existing behavior.
"Tests are NOT a final phase -- they are the FIRST thing and they accompany every subsequent phase."
Decision Tree: Where Do I Start?
If you are reading this and wondering which phase applies to your situation, the answer is almost always Phase 1. But here is the decision tree.
For SubscriptionHub, the answer is Phase 1. Twelve tests, six [Ignore]d, last meaningful commit four months ago. We start with characterization tests.
Which Context First
The blueprint from Part III gives us four bounded contexts to migrate. We cannot migrate all four simultaneously -- the team does not have the bandwidth, and the risk of coordinated changes across all four is too high. We pick one. We migrate it through all six phases. We learn. We adjust. Then we pick the next.
Prioritization Criteria
Four dimensions determine which context to migrate first:
Change Frequency -- How often does the team modify this area of the code? High change frequency means high migration ROI, because every change after the migration is easier.
Bug Cost -- How expensive are bugs in this area? Bugs in billing cost money (literally -- incorrect invoices, missed charges, wrong tax). Bugs in analytics cost dashboards. The higher the cost, the higher the urgency.
Team Willingness -- Is the owning team motivated? A reluctant team will produce a reluctant migration. A motivated team will champion it, write the tests, learn the patterns, and teach the next team.
Surface Area -- How large is the context's codebase? Smaller surface areas are faster to migrate, producing a visible success that builds momentum for the larger contexts.
The Prioritization Matrix
| Context | Change Frequency | Bug Cost | Team Willingness | Surface Area | Score |
|---|---|---|---|---|---|
| Subscriptions | High (weekly) | High ($$$) | High | Medium | 1st |
| Billing | High (weekly) | Very High ($$$$) | Medium | Large | 2nd |
| Notifications | Medium (bi-weekly) | Low ($) | High | Small | 3rd |
| Analytics | Low (monthly) | Medium ($$) | Low | Small | 4th |
Subscriptions wins as the pilot context. Here is why:
Highest change frequency: The Subscriptions team modifies
SubscriptionServiceevery sprint. Plan changes, trial management, pause/resume logic -- every feature request touches this context. Every sprint, the team fights the God Service.High bug cost: Subscription status bugs cause incorrect billing. A subscription marked "Active" that should be "Cancelled" generates invoices. A subscription marked "Cancelled" that should be "Active" locks the customer out. These are revenue-impacting bugs.
Willing team: The Subscriptions team's tech lead attended the Event Storming workshop and said, "We should have done this two years ago." Motivation is not a technical factor, but it is the most important human factor.
Medium surface area: The Subscriptions context has ~8 entities (after migration), ~2 aggregates, and ~3 Value Objects. It is large enough to prove the pattern but small enough to complete in 4-6 sprints.
Upstream position: Subscriptions is the upstream context. Billing, Notifications, and Analytics all consume subscription events. Migrating Subscriptions first means the downstream contexts can start consuming clean, well-typed events instead of querying the shared database. The migration cascades.
Why not Billing first? Billing has the highest bug cost, but it also has the largest surface area and the most complex domain logic (invoicing, payments, tax, dunning, proration, refunds). Starting with the most complex context risks a slow first migration that demoralizes the team. Start with a moderate context, learn the patterns, build momentum, then tackle Billing with experience.
Why not Notifications first? Notifications has the smallest surface area and a willing team, but it is a downstream context. Migrating it first would mean building event handlers for events that do not exist yet (because Subscriptions and Billing still publish nothing). You would have to build the event infrastructure before the upstream contexts produce events. That is backwards. Migrate the upstream first.
ACLs: The Constant Companion
The six phases are sequential -- you cannot extract Value Objects (Phase 4) before you have bounded context libraries to put them in (Phase 2). But one pattern appears in every phase, adapts to every situation, and solves a different problem each time it appears.
The Anti-Corruption Layer is not just Phase 3. It is the connective tissue of the entire migration.
Phase 1: ACLs in Tests
When writing characterization tests, you need to isolate the system under test from external dependencies. The test should not call Stripe. The test should not hit the Tax API. The test should not send emails. You build test doubles that stand in for external systems -- fakes, stubs, mocks. These test doubles are ACLs. They translate between the external system's behavior (Stripe charges, HTTP responses, SMTP delivery) and the test's expectations (a PaymentResult, a TaxCalculation, a logged email).
The test doubles you build in Phase 1 often become the design template for the production ACLs in Phase 3. If your test fake returns PaymentResult instead of Stripe.Charge, you have already designed the IPaymentGateway interface.
Phase 2: ACLs Between Contexts
When you create bounded context libraries, the contexts need to communicate. But the contexts do not share types -- that is the whole point of separate libraries. The ACL translates between the upstream context's vocabulary and the downstream context's vocabulary.
In Phase 2, the ACL is simple: a shared contract in the SharedKernel (for Value Objects like Money and EmailAddress) or a thin DTO that the downstream context defines and the upstream context maps to. The complexity increases in later phases, but the principle is the same: contexts do not reach into each other. They translate at the boundary.
Phase 3: ACLs at External Boundaries
This is the classic ACL use case. IPaymentGateway wraps Stripe. ITaxCalculator wraps the Tax API. IEmailSender wraps SMTP. Each adapter translates between external types and domain types. Each adapter has contract tests that verify the translation.
Phase 5: ACLs as the Strangler Fig Mechanism
When the aggregate coexists with the old service, the controller needs to route between them. The feature flag is the routing mechanism. But the old service and the new aggregate have different signatures, different return types, different error handling. The ACL bridges them. It translates the aggregate's Result<PlanChangedEvent> into the controller's IActionResult in the same shape as the old service's return type. The API contract stays the same. The implementation underneath switches.
Phase 6: ACLs as Event Translators
When context A publishes InvoiceFinalized and context B (Notifications) consumes it, the event is an ACL. It carries exactly the data that the downstream context needs -- customer name, email, amount, plan name -- and nothing more. It does not carry the full Invoice entity. It does not carry EF Core navigation properties. It is a deliberate, minimal translation of the upstream context's internal state into a format the downstream context can consume without coupling.
The Evolution
The ACL evolves across phases:
| Phase | ACL Role | Example |
|---|---|---|
| Phase 1 | Test isolation | Fake IPaymentGateway in tests |
| Phase 2 | Context separation | Shared contract DTOs |
| Phase 3 | External boundary | StripePaymentGatewayAdapter |
| Phase 4 | Type translation | Money ↔ decimal mapping in EF Core |
| Phase 5 | Strangler bridge | Feature flag routing old/new code |
| Phase 6 | Event contract | InvoiceFinalized event payload |
"If you only learn one pattern from this series, learn ACL. It is the only thing that lets you migrate incrementally without breaking production."
The Complete Timeline
Let us put concrete numbers on this. SubscriptionHub has four teams. The Subscriptions team (3 developers + 1 tech lead) takes the pilot. The other teams continue feature work.
| Phase | Scope | Duration Estimate | Team Impact |
|---|---|---|---|
| Phase 1 | Characterization tests for Subscriptions | 2-3 sprints | Subscriptions team |
| Phase 2 | Create Subscriptions.* libraries, move code |
1-2 sprints | Subscriptions team + build engineer |
| Phase 3 | Formalize ISubscriptionQueries ACL |
1 sprint | Subscriptions + Billing (contract) |
| Phase 4 | Extract SubscriptionPeriod, PlanTier, UsageQuota |
2-3 sprints | Subscriptions team |
| Phase 5 | Build Subscription + Plan aggregates, Strangler Fig |
3-4 sprints | Subscriptions team |
| Phase 6 | Publish SubscriptionCreated, PlanChanged, etc. |
2-3 sprints | Subscriptions + downstream teams |
| Total | Subscriptions context fully migrated | ~12-16 sprints |
Twelve to sixteen sprints for one context. Is that a lot? Compare it to a big-bang rewrite: 12-18 months (24-36 sprints) for the entire system, with no incremental value delivery, no rollback capability, and a 50/50 chance of abandonment.
The Strangler Fig approach delivers value at every phase boundary. After Phase 1, you have tests. After Phase 2, you have clean project structure. After Phase 3, you have formalized boundaries. After Phase 4, you have type-safe domain concepts. Each phase is a shippable improvement. If the project loses funding at Phase 4, you still have a better codebase than you started with.
After the Subscriptions context is migrated, the Billing context comes next. But the team has learned the patterns. The test infrastructure is established. The build pipeline supports bounded context libraries. The second migration is faster. The third is faster still.
What About Feature Work?
This is the question that management always asks: "Can we still ship features while migrating?"
Yes. That is the entire point of the Strangler Fig. Feature work continues on the old code. When a feature touches an area that has already been migrated, it uses the new aggregate instead of the old service. When a feature touches an area that has not been migrated yet, it uses the old code. The feature flag boundary handles the coexistence.
In practice, a sprint during the migration looks like this: 60-70% feature work, 30-40% migration work. The migration work is not a separate initiative with its own budget line -- it is an engineering investment embedded in the team's normal sprint capacity. Some sprints are heavier on migration (when Phase 1 tests are being written). Some sprints are heavier on features (during a product launch). The ratio flexes.
The one rule: never mix a feature change with a migration change in the same pull request. A PR either moves code to a new project (migration) or adds behavior (feature). Mixing them makes rollback impossible and code review meaningless. Separate concerns, separate PRs, separate reviews.
Risks and Mitigations
No strategy is risk-free. Here are the risks specific to the Strangler Fig approach, and how to handle them.
Risk: The old code and new code drift apart. When the aggregate handles 80% of use cases and the old service handles 20%, the two implementations can diverge. A bug fix in the aggregate might not be applied to the old service, or vice versa. Mitigation: The characterization tests from Phase 1 run against both paths. The test verifies that the old path and the new path produce the same output for the same input. Any divergence fails the test.
Risk: Feature flags accumulate. After six months, the codebase has twelve feature flags for different migration states. Nobody remembers which flags are safe to remove. Mitigation: Each feature flag has an expiry date in the flag configuration. After the flag has been at 100% for two sprints with no incidents, the old code path and the flag check are removed in a cleanup PR. Flag cleanup is a recurring sprint task, not a "someday" item.
Risk: The team loses momentum. Phase 1 (tests) is not glamorous. Writing characterization tests for a legacy system is tedious work. The team gets bored. The migration stalls. Mitigation: Celebrate Phase 1 completion. Show the team the CI dashboard going green. Show them the test count going from 12 to 200. Show them that the tests catch a real bug in the first week. Early wins build momentum. And start Phase 2 immediately -- moving files to new projects is visible, structural change that feels like progress because it is progress.
Summary
| What We Covered | Key Takeaway |
|---|---|
| Big-bang rewrites | They fail because of feature parity drift, moving targets, burnout, and dual maintenance |
| Second system effect | Rewrites accumulate scope; migration should be distinct from improvement |
| The Strangler Fig | New code wraps old code; feature flags control the switch; rollback is instant |
| Existing scaffolding | Implicit boundaries, ad-hoc ACLs, and proto-contexts are starting points, not waste |
| Six phases | Tests, BC libraries, ACLs, Value Objects, Aggregates, Domain Events -- each independently deployable |
| Phase sequencing | Tests first; every subsequent phase includes its own test deliverable |
| Context prioritization | Highest change frequency + highest bug cost + willing team + smallest surface area |
| Subscriptions as pilot | Upstream position, weekly changes, motivated team, medium surface area |
| ACLs throughout | ACLs appear in every phase, from test isolation to event contracts |
| Timeline | ~12-16 sprints for one context; each phase delivers value independently |
We have the diagnosis (Part II), the blueprint (Part III), and the strategy (this part). Now we execute.
Phase 1: write the characterization tests. Before we move a single file, rename a single namespace, or extract a single Value Object, we capture the current behavior in an automated test suite that proves the system works as it works today -- not as we wish it worked, but as it actually works. That test suite is the safety net for everything that follows.
Next: Part V: Tests First -- before touching ANY code, write characterization tests. The golden master pattern. TDD for Value Objects and Aggregates. Contract tests for ACLs. Test infrastructure: Testcontainers, WebApplicationFactory, fakes over mocks.