The Vision -- Everything Is a DSL
Twenty-five parts. Twenty-two DSLs. Three tiers. Eight shared meta-primitives. One thesis: no domain of operational knowledge exists that cannot be expressed as a typed DSL.
This part draws the conclusion.
The Thesis Proven
Consider what the twenty-two DSLs cover:
- How services are deployed (Deployment)
- How databases evolve (Migration)
- How systems are monitored (Observability)
- How configuration is managed and how failures are handled (Configuration + Resilience)
- How fast things must be (Performance)
- How much load they must sustain (Load Testing)
- What happens when things break (Chaos)
- Who can access what (Security)
- How quality is verified (Testing, Quality)
- What infrastructure exists (Infrastructure)
- How networks are configured (Networking)
- How data is governed (Data Governance)
- What regulations apply (Compliance)
- Where dependencies come from (Supply Chain)
- How much it costs (Cost)
- How it scales (Capacity)
- Who responds when it fails (Incident)
- How APIs evolve (ApiContract)
- How environments stay consistent (EnvironmentParity)
- How components age and die (Lifecycle)
Every one of these was, before this series, a wiki page. Or a spreadsheet. Or tribal knowledge. Or a YAML file copy-pasted from Stack Overflow. Or nothing at all -- just an assumption that someone would figure it out when the time came.
Every one of these is now a set of C# attributes with IntelliSense, a source generator that produces artifacts, and an analyzer that validates correctness. The pattern is the same for all twenty-two: declare, compile, generate, validate. The specifics differ. The architecture does not.
If the pattern works for deployment, observability, chaos engineering, compliance, cost management, and incident response -- domains that seem to have nothing in common -- then the pattern works for any operational concern. The thesis holds: every domain of operational knowledge can be expressed as a typed DSL.
Every Wiki Page Is a DSL That Has Not Been Written Yet
Pick any wiki page in your organization:
- "On-call rotation schedule" -- that is
[OnCallRotation]. - "API versioning policy" -- that is
[ApiVersionPolicy]+[BreakingChangeGuard]. - "Production deployment runbook" -- that is
[DeploymentApp]+[CanaryStrategy]+[RollbackPlan]. - "Data retention policy" -- that is
[RetentionPolicy]+[GdprDataMap]. - "Incident severity definitions" -- that is
[IncidentSeverity]with response times and notification channels. - "Feature flag inventory" -- that is
[FeatureFlag]with sunset dates and owner features. - "Tech debt backlog" -- that is
[TechDebtItem]with priorities, deadlines, and categories. - "SLO targets" -- that is
[ServiceLevelObjective]with targets and windows.
The wiki page and the attribute contain the same information. The difference is what happens after the information is written.
The wiki page sits. It is read (maybe) by whoever remembers it exists. It is updated (maybe) by whoever feels responsible. It is validated (never) against the actual state of the system. When the system changes and the wiki does not, nobody knows. When the wiki changes and the system does not, nobody knows either.
The attribute is compiled. The source generator produces artifacts. The analyzer validates consistency. When the attribute changes, the generated artifacts change. When the system changes without the attribute changing, the analyzer flags the inconsistency. The feedback loop is measured in milliseconds, not in "whenever someone notices."
Every spreadsheet that tracks operational knowledge -- on-call schedules, feature flag inventories, dependency lists, cost budgets, tech debt backlogs -- is a source generator waiting to happen. The spreadsheet is the manual version of what the generator automates. The difference is that the spreadsheet can be wrong silently, and the generator cannot.
Start Today: The Three-Tier On-Ramp
The three-tier model is not an all-or-nothing commitment. It is an on-ramp.
Day 1: InProcess tier. Add a [ChaosExperiment(Tier = InProcess)] attribute to your test project. The source generator emits a DI decorator that injects timeouts. Run dotnet test. You are doing chaos engineering. No Docker. No Terraform. No Kubernetes. No cloud account. No budget approval. No infrastructure team buy-in.
// This is all it takes to start.
[ChaosExperiment("payment-timeout",
Tier = Tier.InProcess,
FaultKind = FaultKind.Timeout,
TargetService = "IPaymentGateway",
Duration = "30s",
Hypothesis = "Circuit breaker trips, order returns 503")]
public partial class OrderServiceOps { }// This is all it takes to start.
[ChaosExperiment("payment-timeout",
Tier = Tier.InProcess,
FaultKind = FaultKind.Timeout,
TargetService = "IPaymentGateway",
Duration = "30s",
Hypothesis = "Circuit breaker trips, order returns 503")]
public partial class OrderServiceOps { }Add [HealthCheck] attributes. The generator emits IHealthCheck implementations. Add [CircuitBreaker] attributes. The generator emits Polly decorators. Add [PerformanceBudget] attributes. The generator emits benchmark assertions. All InProcess. All compiled. All running in dotnet test with no external dependencies.
Month 1: Container tier. When the InProcess experiments prove their value, add Container-tier attributes. The generator emits docker-compose.yaml, prometheus.yaml, toxiproxy-config.json, and k6-load-test.js. Run docker compose up. You have a local observability stack, chaos injection at the network level, and load testing. Still no cloud. Still no Terraform. Still no budget discussion.
Quarter 1: Cloud tier. When the Container experiments prove their value to the team, add Cloud-tier attributes. The generator emits Terraform modules, Kubernetes manifests, LitmusChaos CRDs, and HPA configurations. The deployment pipeline applies them. Now you have production-grade operational specifications generated from the same attributes that started as InProcess unit tests.
The progression is gradual. Each tier adds value independently. A team that only uses InProcess tier is still better off than a team with no chaos engineering at all. A team that uses InProcess and Container tiers but deploys to a simple PaaS without Kubernetes still gets local load testing, chaos injection, and observability.
The Cost Curve
DSLs have a cost structure that is the inverse of documents.
Documents are cheap on Day 1. Open a wiki page. Write some markdown. Five minutes. Done. The on-call rotation is documented. The API versioning policy is written. The deployment runbook exists.
Documents are expensive on every subsequent day. The on-call rotation changes when someone leaves. The wiki page is not updated. The API adds an endpoint. The versioning policy wiki is not updated. The deployment procedure changes because of a new health check. The runbook is not updated. Each day that passes, the document drifts further from reality. The maintenance cost is paid in confusion, incidents, and "the wiki is wrong" messages in Slack.
DSLs are expensive on Day 1. Define the attributes. Write the source generators. Write the analyzers. Build the three-tier code generation pipeline. This is not a five-minute task. It is an engineering investment.
DSLs are cheap on every subsequent day. The on-call rotation changes: update the attribute, the PagerDuty config regenerates. The API adds an endpoint: the analyzer checks the breaking change guard, the consumer contracts are verified, the deprecation headers are updated. The deployment procedure changes: the attribute changes, the generated Kubernetes manifest changes, the generated runbook changes. The maintenance cost is zero because there is nothing to maintain -- the generated artifacts are always derived from the attributes.
For a service that lives six months, the document wins. The DSL investment does not pay off.
For a service that lives six years, the DSL wins by a wide margin. Six years of document drift versus six years of compiled specifications. Six years of "check the wiki" versus six years of "check the analyzer output." Six years of "I think the runbook is up to date" versus six years of "the runbook is generated from the deployment attributes, so by definition it matches."
Most production services live longer than six months. The economics are not close.
The Specification Maturity Model
Where is your organization?
Level 0: No Specs. Operational knowledge is tribal. The senior engineer knows how to deploy. The on-call engineer knows who to call. When the senior engineer leaves, the knowledge leaves. When the on-call engineer is sick, the knowledge is unavailable. Jira tickets describe features. Hope describes operations.
Level 1: Documents. Operational knowledge is written down. Wiki pages, ADRs, runbooks, Google Docs. Better than nothing. Worse than it appears because the documents are snapshots that diverge from reality at the speed of development.
Level 2: Structured Documents. Operational knowledge uses consistent formats. OpenAPI specs for APIs. Markdown templates for runbooks. YAML for configuration. The structure enables tooling -- linters, validators, diff tools. But the documents are still separate from the code. They still drift.
Level 3: Typed Domain Specs. The domain model is typed. DDD attributes declare aggregate roots, entities, and domain events. The Requirements DSL declares features with acceptance criteria. The Content DSL declares content types with typed fields. The domain is compiled. The operations are still documents.
Level 4: Typed Operational Specs. The twenty-two Ops DSLs. Deployment, observability, chaos, security, compliance, cost, capacity, incident management -- all typed, all compiled, all validated. The operations are as typed as the domain. The gap between "what the code does" and "how the code operates" is closed.
Level 5: Everything Typed. Every operational concern is a compiled, validated, generated DSL. No wiki pages for operational knowledge. No spreadsheets for tracking. No tribal knowledge for procedures. The type system is the single source of truth for both domain logic and operational behavior. The compiler validates everything. The generators produce everything. The organization operates at the speed of dotnet build.
Most organizations are at Level 1 or 2. The CMF with its six dev-side DSLs is at Level 3. The Ops DSL ecosystem moves to Level 4. Level 5 is the vision: a codebase where opening any service's ops class tells you everything about how it is deployed, monitored, tested, secured, scaled, and recovered -- and all of it is validated by the compiler.
The Blog Ecosystem Map
This series does not stand alone. It connects to every other series:
Convention Over Convention (the meta-pattern). The Ops DSLs follow the same meta-pattern as the dev-side DSLs: attributes declare intent, source generators produce implementations, analyzers validate constraints. The architectural decision to use C# attributes + source generators + Roslyn analyzers is not specific to operations. It is the universal pattern. The convention-over-convention series established why this pattern works. The Ops DSL series proves it works for operational concerns.
Typed System Auto-Documentation. The first five Ops DSLs (Deployment, Migration, Observability, Configuration, Resilience) were introduced as examples of typed specifications that generate their own documentation. The auto-documentation series showed that the generated Grafana dashboard IS the observability documentation. The generated Kubernetes manifest IS the deployment documentation. The Ops DSL series extends this to all twenty-two operational concerns.
Spec-Driven vs. Typed Specs. The spec-driven series established the layer hierarchy: Layer 0 (no specs), Layer 1 (documents), Layer 2 (structured specs), Layer 3 (typed specs). The Ops DSL series is Layer 3 for operations. The promise of the spec-driven article -- "what if your specs were compiled?" -- is fulfilled here for every operational domain.
Feature Tracking (Requirements DSL). The Requirements DSL declares features with acceptance criteria. The Ops DSLs link to those criteria through [OpsRequirementLink]. The chaos experiment validates an acceptance criterion. The performance budget implements an acceptance criterion. The compliance control documents an acceptance criterion. The traceability is end-to-end: from feature request to acceptance criterion to operational specification to generated artifact.
CMF (Domain DSLs). The six domain DSLs are the other half of the type system. DDD defines the service boundary. Content defines the data. Admin defines the interface. Pages define the presentation. Workflow defines the process. Requirements define the success criteria. The Ops DSLs define how all of it operates in production. The M3 metamodel registry unifies both sides.
What This Is Not
This is not a framework you download from NuGet. It is an architecture. The twenty-two DSLs are designs -- attribute definitions, generator patterns, analyzer rules -- that demonstrate what the approach looks like when applied to every operational concern.
Building all twenty-two DSLs is a large engineering investment. No team should build all twenty-two at once. The recommendation:
Start with the DSLs that address your most painful operational gap. If incidents are your pain, start with Incident + Observability. If deployment failures are your pain, start with Deployment + Migration + Resilience. If compliance audits are your pain, start with Compliance + Security + Data Governance.
Build InProcess tier first. Every DSL has an InProcess tier that requires zero infrastructure. The decorators, health checks, policies, and validators work in
dotnet test. The ROI is immediate.Add Container tier when InProcess proves value. Docker Compose, Prometheus, Toxiproxy, k6. The jump from "unit test with chaos injection" to "local environment with network-level chaos" is meaningful but not expensive.
Add Cloud tier when Container proves value. Terraform modules, Kubernetes manifests, production monitoring configs. This is where the deployment pipeline consumes the generated artifacts.
Add cross-DSL analyzers when multiple DSLs coexist. The real power is not in any single DSL. It is in the cross-validation: circuit breaker without chaos experiment (build failure), autoscale without cost budget (build warning), P1 severity without escalation policy (build failure). Each cross-DSL analyzer catches a class of operational mistakes that no single tool catches.
The Final Accounting
What the twenty-two DSLs replace:
| Before | After |
|---|---|
| Wiki page: "Deployment Runbook" | [DeploymentApp] + [CanaryStrategy] + [RollbackPlan] |
| Wiki page: "On-Call Schedule" | [OnCallRotation] + [EscalationPolicy] |
| Wiki page: "SLO Targets" | [ServiceLevelObjective] + [PerformanceBudget] |
| Wiki page: "Incident Response" | [IncidentSeverity] + generated response guide |
| Spreadsheet: "Feature Flags" | [FeatureFlag] with sunset dates and typed owners |
| Spreadsheet: "Tech Debt" | [TechDebtItem] with deadlines enforced by analyzer |
| Spreadsheet: "Cost Budget" | [ResourceBudget] + [RightSizing] |
| Google Doc: "Post-Mortem Template" | [PostMortemTemplate] + generated markdown |
| Google Doc: "API Versioning Policy" | [ApiVersionPolicy] + [BreakingChangeGuard] |
| Confluence: "Compliance Controls" | [ComplianceFramework] + generated evidence |
| Confluence: "Data Retention" | [RetentionPolicy] + [GdprDataMap] |
| PagerDuty console: "Escalation" | [EscalationPolicy] + generated PagerDuty config |
| Grafana UI: "Dashboard" | [Dashboard] + generated JSON |
| Prometheus UI: "Alert Rules" | [AlertRule] + generated prometheus rules |
| Helm values.yaml | [DeploymentApp] + [ContainerSpec] + generated values |
| Terraform by hand | [Infrastructure] + [Networking] + generated .tf files |
| Copy-pasted k6 scripts | [LoadProfile] + generated k6 scripts |
| Undocumented chaos tests | [ChaosExperiment] + generated LitmusChaos CRDs |
Every row in that table is the same transformation: from a document that drifts to an attribute that compiles.
Closing
The compiler does not go on vacation. It does not forget to update the wiki. It does not lose track of which spreadsheet has the current on-call rotation. It does not accept a post-mortem without action items. It does not allow a P1 severity definition without a response time target. It does not let a feature flag outlive its sunset date. It does not let a circuit breaker exist without a chaos experiment to verify it works.
Documents sleep. Types do not.
Every operational concern that can be described in a wiki page can be described in a C# attribute. The attribute is harder to write the first time. The attribute is easier to maintain every time after that. For systems that live for years -- which is most systems that matter -- the maintenance cost dominates the creation cost.
The twenty-two Ops DSLs are twenty-two demonstrations of this principle. Twenty-two domains that seemed too "soft" for the type system -- on-call rotations, post-mortem templates, cost budgets, compliance evidence -- and twenty-two proofs that the type system handles them cleanly.
The vision is not "build all twenty-two DSLs." The vision is "recognize that every piece of operational knowledge is a specification, and every specification is better when it is typed, compiled, and validated." Start with one. The compiler will show you where to go next.
The compiler does not sleep. The wiki does.