Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Composing the Ops DSLs — A Real Deployment

"One class. Five NuGet references. Twenty-plus typed attributes. Zero wiki pages."


The Scenario

OrderService v2.4 introduces a new payment flow. The deployment is not trivial:

  • 3 applications: OrderApi (3 replicas), OrderWorker (2 replicas), AuditWorker (1 replica)
  • 2 databases: OrdersDb (schema change + index + data backfill), AuditDb (new table)
  • Canary strategy: 5% traffic for 15 minutes, then full rollout
  • Parallel DB migrations: the schema changes to OrdersDb and AuditDb run concurrently (step 1), then the index creation depends on the OrdersDb schema change (step 2), then a standalone exe migrator backfills ~2.4M rows (step 3)
  • Config transforms per environment: Staging uses a sandbox payment gateway, Production uses the live one and requires a restart
  • Health checks: payment endpoint, worker queue, database connectivity
  • Observability: latency histogram, error rate alert with runbook, Grafana dashboard
  • Rollback: automatic for schema changes, manual (with approval) for the data backfill
  • Circuit breaker: payment gateway with fallback to legacy

In a wiki-based world, this is a 3-page runbook maintained by the person who deployed last. By the time someone else reads it, half the steps are stale.

In a typed world, this is one C# class.


The Project References

The deployment project references all 5 Ops sub-DSLs plus the Document DSL:

<!-- OrderService.Deployment.csproj -->
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <TargetFramework>net9.0</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <!-- The 5 Ops sub-DSLs -->
    <ProjectReference Include="Ops.Deployment.Lib" />
    <ProjectReference Include="Ops.Migration.Lib" />
    <ProjectReference Include="Ops.Observability.Lib" />
    <ProjectReference Include="Ops.Configuration.Lib" />
    <ProjectReference Include="Ops.Resilience.Lib" />

    <!-- The Document DSL — generates docs from the Ops attributes -->
    <ProjectReference Include="Cmf.Document.Lib" />
  </ItemGroup>

</Project>

Six <ProjectReference> lines. Each brings its own attributes, source generators, and analyzers. The compiler sees all of them in one compilation unit and can validate cross-DSL references.


The Complete Deployment Class

This is the entire OrderServiceV24Deployment class. Every attribute is a typed declaration. Every method body is an implementation detail — the metadata on the method is the source of truth for documentation, validation, and orchestration.

// ═══════════════════════════════════════════════════════════════
// OrderServiceV24Deployment.cs — One class, five sub-DSLs
// ═══════════════════════════════════════════════════════════════

using Ops.Deployment;
using Ops.Migration;
using Ops.Observability;
using Ops.Configuration;
using Ops.Resilience;
using Cmf.Document;

// ── Document DSL: generate docs from this class ──────────────
[assembly: DocumentSuite<OrderServiceV24Deployment>(
    Title = "Order Service v2.4 — Deployment Documentation",
    Formats = [DocumentFormat.Markdown, DocumentFormat.Yaml, DocumentFormat.Json],
    CrossReference = true)]

// ═══════════════════════════════════════════════════════════════
// Class-level: orchestration, apps, dependencies, environment
// ═══════════════════════════════════════════════════════════════

[DeploymentOrchestrator("v2.4.0",
    Description = "Order service v2.4 — new payment flow",
    Strategy = DeploymentStrategy.Canary)]

[DeploymentDependency(typeof(PaymentGatewayV3Deployment),
    Kind = DependencyKind.MustCompleteBefore)]

[DeploymentApp("OrderApi",
    Replicas = 3,
    RequiredConfigs = ["PaymentGateway", "OrderDb"])]

[DeploymentApp("OrderWorker", Replicas = 2)]

[DeploymentApp("AuditWorker", Replicas = 1)]

[EnvironmentMatrix("Staging", "Production",
    RequiredConfigs = ["PaymentGateway", "OrderDb", "AuditDb"],
    RequiredSecrets = ["orders/payment-gateway-api-key"])]

[Dashboard("order-v24-deployment",
    Panels = ["migration-progress", "error-rate", "latency-p99", "canary-traffic"])]

public sealed partial class OrderServiceV24Deployment
{
    // ═══════════════════════════════════════════════════════════
    // Ops.Migration — Database & data migration steps
    // ═══════════════════════════════════════════════════════════

    [MigrationStep(1,
        Database = "OrdersDb",
        Kind = MigrationKind.Schema,
        Parallel = true,
        EstimatedDuration = "30s")]
    [MigrationValidation(
        PreCondition = "Column 'PaymentMethod' does not exist",
        PostCondition = "Column 'PaymentMethod' exists with type nvarchar(50)",
        RollbackValidation = "Column 'PaymentMethod' does not exist")]
    public void AddPaymentMethodColumn()
    {
        // ALTER TABLE Orders ADD PaymentMethod NVARCHAR(50) NULL;
    }

    [MigrationStep(1,
        Database = "AuditDb",
        Kind = MigrationKind.Schema,
        Parallel = true,
        EstimatedDuration = "15s")]
    [MigrationValidation(
        PreCondition = "Table 'PaymentAudit' does not exist",
        PostCondition = "Table 'PaymentAudit' exists with columns Id, OrderId, Method, Timestamp")]
    public void AddPaymentAuditTable()
    {
        // CREATE TABLE PaymentAudit (...);
    }

    [MigrationStep(2,
        Database = "OrdersDb",
        Kind = MigrationKind.Index,
        EstimatedDuration = "2m",
        DependsOn = [nameof(AddPaymentMethodColumn)])]
    public void CreatePaymentMethodIndex()
    {
        // CREATE NONCLUSTERED INDEX IX_Orders_PaymentMethod
        //     ON Orders (PaymentMethod) INCLUDE (Id, Status);
    }

    [ExeMigration(3,
        App = "OrderMigrator",
        Description = "Backfill payment methods from legacy column",
        ExpectedRowCount = "~2.4M",
        Timeout = "00:45:00")]
    [MigrationValidation(
        PreCondition = "Column 'LegacyPaymentType' has non-null values",
        PostCondition = "Column 'PaymentMethod' has no nulls where LegacyPaymentType was non-null",
        RollbackValidation = "Column 'PaymentMethod' is all NULL")]
    public void BackfillPaymentMethods()
    {
        // OrderMigrator.exe --connection $ORDERS_DB --batch-size 10000
    }

    // ═══════════════════════════════════════════════════════════
    // Ops.Configuration — Config transforms & secrets
    // ═══════════════════════════════════════════════════════════

    [ConfigTransform("Staging", Section = "PaymentGateway")]
    public void UpdatePaymentGatewayStaging()
    {
        // PaymentGateway:BaseUrl = "https://sandbox.payments.example.com/v3"
        // PaymentGateway:TimeoutMs = 5000
        // PaymentGateway:RetryCount = 3
    }

    [ConfigTransform("Production",
        Section = "PaymentGateway",
        RequiresRestart = true)]
    public void UpdatePaymentGatewayProduction()
    {
        // PaymentGateway:BaseUrl = "https://payments.example.com/v3"
        // PaymentGateway:TimeoutMs = 3000
        // PaymentGateway:RetryCount = 5
    }

    [Secret(
        VaultPath = "orders/payment-gateway-api-key",
        Provider = SecretProvider.AzureKeyVault,
        RotationPolicy = "30d")]
    public string PaymentGatewayApiKey { get; }

    // ═══════════════════════════════════════════════════════════
    // Ops.Observability — Health checks, metrics, alerts, dashboard
    // ═══════════════════════════════════════════════════════════

    [HealthCheck("OrderApi",
        Endpoint = "/health/payment",
        Timeout = "10s",
        RetryCount = 5,
        Severity = HealthCheckSeverity.Critical)]
    public void VerifyPaymentEndpoint()
    {
        // GET /health/payment → 200 OK with { "status": "healthy" }
    }

    [HealthCheck("OrderWorker",
        Endpoint = "/health/queue",
        Kind = HealthCheckKind.Queue,
        Severity = HealthCheckSeverity.Critical)]
    public void VerifyWorkerQueue()
    {
        // Queue depth < 1000, consumer lag < 30s
    }

    [HealthCheck("OrdersDb",
        Kind = HealthCheckKind.Database,
        Timeout = "5s")]
    public void VerifyDatabaseConnectivity()
    {
        // SELECT 1 FROM Orders (WITH NOLOCK) — validates connectivity + table exists
    }

    [Metric("order_payment_latency_ms",
        Kind = MetricKind.Histogram,
        Unit = "milliseconds",
        Labels = ["method", "status"],
        AlertThreshold = "p99 > 500ms")]
    public void TrackPaymentLatency()
    {
        // Histogram buckets: 10, 25, 50, 100, 250, 500, 1000, 2500
    }

    [AlertRule("OrderPaymentErrorSpike",
        Condition = "rate(order_payment_errors_total[5m]) > 0.01",
        Duration = "3m",
        Severity = AlertSeverity.Critical,
        NotifyChannels = ["oncall-slack", "oncall-pager"],
        Runbook = "runbooks/order-payment-errors.md")]
    public void MonitorPaymentErrors()
    {
        // Alert fires when error rate exceeds 1% over a 5-minute window
        // sustained for 3 minutes
    }

    // ═══════════════════════════════════════════════════════════
    // Ops.Resilience — Rollback, canary, circuit breaker
    // ═══════════════════════════════════════════════════════════

    [RollbackProcedure(nameof(AddPaymentMethodColumn),
        Kind = RollbackKind.Automatic,
        EstimatedDuration = "10s")]
    public void RollbackPaymentColumn()
    {
        // ALTER TABLE Orders DROP COLUMN PaymentMethod;
    }

    [RollbackProcedure(nameof(AddPaymentAuditTable),
        Kind = RollbackKind.Automatic)]
    public void RollbackAuditTable()
    {
        // DROP TABLE PaymentAudit;
    }

    [RollbackProcedure(nameof(BackfillPaymentMethods),
        Kind = RollbackKind.Manual,
        RequiresApproval = true,
        PreConditions = ["No orders with new payment methods"])]
    public void RollbackPaymentBackfill()
    {
        // UPDATE Orders SET PaymentMethod = NULL WHERE PaymentMethod IS NOT NULL;
        // Requires manual approval — data backfill may have been consumed by downstream
    }

    [CanaryRule(
        TrafficPercentage = 5,
        Duration = "15m",
        SuccessMetric = "order_payment_latency_ms.p99 < 300ms",
        RollbackThreshold = "error_rate > 0.5%")]
    public void CanaryPaymentFlow()
    {
        // Route 5% of traffic to canary instances for 15 minutes
        // Auto-rollback if p99 latency exceeds 300ms or error rate > 0.5%
    }

    [CircuitBreaker("PaymentGateway",
        FailureThreshold = 3,
        OpenDuration = "60s",
        FallbackMethod = nameof(FallbackToLegacyPayment))]
    public void PaymentGatewayCircuitBreaker()
    {
        // After 3 consecutive failures, open circuit for 60s
        // During open state, route to FallbackToLegacyPayment
    }

    public void FallbackToLegacyPayment()
    {
        // Use legacy payment endpoint /v2/payments as fallback
    }
}

That is 20+ typed attributes across 5 sub-DSLs, composed on one class. Every attribute property is a documentation field. Every method name is a cross-reference target. The compiler sees all of it.


What the Compiler Catches

The 5 Ops sub-DSLs ship 12 analyzers that validate cross-DSL consistency. These fire as build errors or warnings — not at deployment time, not in a review, at compile time.

Here is every diagnostic, applied to the OrderServiceV24Deployment example:

OPS001: Circular Deployment Dependency

error OPS001: Circular deployment dependency detected:
              OrderServiceV24Deployment → PaymentGatewayV3Deployment
              → OrderServiceV24Deployment
              [Ops.Deployment/DeploymentDependencyAnalyzer]

If PaymentGatewayV3Deployment also declared [DeploymentDependency(typeof(OrderServiceV24Deployment))], the analyzer walks the dependency graph and reports the cycle. Deployment DAGs must be acyclic — the compiler enforces it.

OPS002: App Without Health Check

error OPS002: [DeploymentApp("AuditWorker")] has no [HealthCheck] method
              targeting "AuditWorker" — every deployed app must have
              at least one health check
              in 'OrderServiceV24Deployment'
              [Ops.Observability/HealthCheckCoverageAnalyzer]

The analyzer cross-references [DeploymentApp] names with [HealthCheck] first arguments. If an app has no health check, the build fails. You cannot deploy something you cannot verify.

OPS003: Migration Without Validation

warning OPS003: [MigrationStep] 'CreatePaymentMethodIndex' has no
                [MigrationValidation] attribute — consider adding
                pre/post conditions for safe rollback verification
                in 'OrderServiceV24Deployment'
                [Ops.Migration/MigrationValidationAnalyzer]

The index creation step has no [MigrationValidation]. The analyzer warns — it does not error, because index creation is often idempotent. But schema and data migrations without validation are errors.

OPS004: Parallel Migration Conflict

error OPS004: Parallel migration conflict: steps 'AddPaymentMethodColumn'
              and 'AddPaymentAuditTable' are both [Parallel = true]
              at step 1, but target the same database 'OrdersDb'
              — parallel migrations must target different databases
              in 'OrderServiceV24Deployment'
              [Ops.Migration/ParallelMigrationAnalyzer]

Two step-1 migrations both targeting OrdersDb with Parallel = true would deadlock. The analyzer validates that parallel steps target distinct databases. In our class, they target OrdersDb and AuditDb respectively — this diagnostic would only fire if someone mistakenly set both to the same database.

OPS005: Metric Without Alert Rule

warning OPS005: [Metric("order_payment_latency_ms")] has
                [AlertThreshold = "p99 > 500ms"] but no [AlertRule]
                references this metric — threshold will not trigger
                any notification
                in 'OrderServiceV24Deployment'
                [Ops.Observability/MetricAlertAnalyzer]

A metric with a threshold but no alert rule is a dead threshold. The analyzer checks that every [Metric] with an AlertThreshold has at least one [AlertRule] whose Condition references the metric name.

OPS006: Alert Without Runbook

error OPS006: [AlertRule("OrderPaymentErrorSpike")] has
              Runbook = "runbooks/order-payment-errors.md" but
              file does not exist at that path
              in 'OrderServiceV24Deployment'
              [Ops.Observability/RunbookExistenceAnalyzer]

The analyzer resolves the Runbook path relative to the project and checks that the file exists. An alert without a runbook means an on-call engineer gets paged with no instructions. The compiler catches it.

OPS007: Missing Secret for Config Section

error OPS007: [ConfigTransform("Production", Section = "PaymentGateway")]
              references config section 'PaymentGateway' which requires
              secret 'PaymentGateway:ApiKey', but no [Secret] attribute
              provides a vault path for this key
              in 'OrderServiceV24Deployment'
              [Ops.Configuration/SecretCoverageAnalyzer]

The analyzer cross-references [ConfigTransform] sections with [Secret] vault paths. If a config section references a key that needs a secret and no [Secret] attribute provides it, the build fails.

OPS008: Environment Matrix Gap

error OPS008: [EnvironmentMatrix] declares environment "Production"
              but no [ConfigTransform("Production", ...)] exists for
              required config section "AuditDb"
              in 'OrderServiceV24Deployment'
              [Ops.Configuration/EnvironmentMatrixAnalyzer]

The [EnvironmentMatrix] declares required configs per environment. The analyzer checks that every environment has a [ConfigTransform] for every required config section. A missing transform means the deployment will use the wrong config — caught at compile time.

OPS009: Migration Without Rollback

error OPS009: [MigrationStep] 'CreatePaymentMethodIndex' has no
              [RollbackProcedure] targeting it — every migration
              step must have a rollback procedure
              in 'OrderServiceV24Deployment'
              [Ops.Resilience/RollbackCoverageAnalyzer]

The analyzer matches [MigrationStep] method names with [RollbackProcedure] first arguments. If a migration has no rollback, the build fails. You cannot deploy a migration you cannot undo.

OPS010: Canary Without Success Metric

error OPS010: [CanaryRule] references SuccessMetric
              "order_payment_latency_ms.p99 < 300ms" but no [Metric]
              named "order_payment_latency_ms" exists in this class
              in 'OrderServiceV24Deployment'
              [Ops.Resilience/CanaryMetricAnalyzer]

The [CanaryRule] SuccessMetric property references a metric by name. The analyzer parses the metric name from the expression and verifies that a [Metric] attribute with that name exists. A canary rule referencing a non-existent metric would silently pass — the compiler prevents it.

OPS011: Circuit Breaker Without Fallback

error OPS011: [CircuitBreaker("PaymentGateway")] references
              FallbackMethod = "FallbackToLegacyPayment" but no
              method named 'FallbackToLegacyPayment' exists
              in 'OrderServiceV24Deployment'
              [Ops.Resilience/CircuitBreakerAnalyzer]

The analyzer resolves FallbackMethod via nameof() to a method on the same class. If the method is renamed or removed, the build breaks immediately — not when the circuit opens in production at 3 AM.

OPS012: Cross-DSL Reference Target Not Found

error OPS012: [RollbackProcedure(nameof(AddPaymentAuditTable))]
              references migration step 'AddPaymentAuditTable' but
              no [MigrationStep] method with that name exists
              in 'OrderServiceV24Deployment'
              [Ops.Resilience/CrossDslReferenceAnalyzer]

The most powerful diagnostic. Rollback references migration. Canary references metric. Alert references runbook. Config transform references secret. The analyzer validates every cross-DSL reference — if the target is renamed, moved, or deleted, the build fails.

Summary Table

ID Severity DSL Validates
OPS001 Error Deployment Dependency DAG is acyclic
OPS002 Error Deployment + Observability Every app has a health check
OPS003 Warning Migration Migration steps have validation
OPS004 Error Migration Parallel steps target different databases
OPS005 Warning Observability Metrics with thresholds have alerts
OPS006 Error Observability Alert runbook files exist
OPS007 Error Configuration Config sections have required secrets
OPS008 Error Configuration Every environment has all required transforms
OPS009 Error Migration + Resilience Every migration has a rollback
OPS010 Error Resilience + Observability Canary rules reference existing metrics
OPS011 Error Resilience Circuit breakers have fallback methods
OPS012 Error Cross-DSL All cross-DSL nameof() targets resolve

Twelve diagnostics. Ten are errors — the build does not succeed. Two are warnings — they appear in IDE and build output. All fire at compile time.


Cross-DSL Validation in Action

The most valuable diagnostics are the ones that span multiple sub-DSLs. These are impossible to catch with single-DSL validation. They require the compiler to see all attributes on one class and verify that references between them are consistent.

Migration without Rollback

The AddPaymentAuditTable migration step needs a matching [RollbackProcedure]. Remove RollbackAuditTable() and the build breaks:

error OPS009: [MigrationStep] 'AddPaymentAuditTable' has no
              [RollbackProcedure] targeting it — every migration
              step must have a rollback procedure
              in 'OrderServiceV24Deployment'

This crosses Ops.Migration and Ops.Resilience. The migration analyzer does not know about rollbacks. The resilience analyzer does not know about migrations. The cross-DSL reference analyzer (OPS012) sees both and validates the link.

Canary References Metric

The [CanaryRule] declares SuccessMetric = "order_payment_latency_ms.p99 < 300ms". The analyzer parses the metric name (order_payment_latency_ms) and looks for a [Metric] attribute with that name:

// CanaryRule (Ops.Resilience) references:
[CanaryRule(SuccessMetric = "order_payment_latency_ms.p99 < 300ms")]

// Metric (Ops.Observability) provides:
[Metric("order_payment_latency_ms", Kind = MetricKind.Histogram)]

Rename the metric and the canary rule breaks at compile time — not during a canary deployment when the success metric silently evaluates to nothing and the canary passes with a broken payment flow.

Alert References Runbook

The [AlertRule] declares Runbook = "runbooks/order-payment-errors.md". The analyzer checks that this file exists relative to the project root:

error OPS006: [AlertRule("OrderPaymentErrorSpike")] has
              Runbook = "runbooks/order-payment-errors.md" but
              file does not exist at that path

Delete the runbook, the build fails. Move the runbook, the build fails. The on-call engineer always has instructions because the compiler ensures the runbook exists.

Environment Matrix Validates Secrets

The [EnvironmentMatrix] declares RequiredSecrets = ["orders/payment-gateway-api-key"]. The analyzer checks that a [Secret] attribute provides a vault path matching that key:

// EnvironmentMatrix (Ops.Configuration) requires:
[EnvironmentMatrix(RequiredSecrets = ["orders/payment-gateway-api-key"])]

// Secret (Ops.Configuration) provides:
[Secret(VaultPath = "orders/payment-gateway-api-key",
    Provider = SecretProvider.AzureKeyVault)]

Remove the [Secret] property and the build fails:

error OPS007: [EnvironmentMatrix] requires secret
              "orders/payment-gateway-api-key" but no [Secret]
              attribute provides this vault path
              in 'OrderServiceV24Deployment'

The compiler validates that the secret your config needs actually has a vault path, a provider, and a rotation policy — before the deployment runs.

The Cross-Reference Graph

All of these validations form a graph. The analyzer walks it:

AddPaymentMethodColumn ←── RollbackPaymentColumn         (OPS009/OPS012)
AddPaymentAuditTable   ←── RollbackAuditTable             (OPS009/OPS012)
BackfillPaymentMethods ←── RollbackPaymentBackfill        (OPS009/OPS012)
CreatePaymentMethodIndex ── (no rollback)                  (OPS009: error)

order_payment_latency_ms ←── CanaryPaymentFlow            (OPS010/OPS012)
order_payment_latency_ms ←── AlertThreshold                (OPS005)
OrderPaymentErrorSpike   ──→ runbooks/order-payment-errors.md  (OPS006)

PaymentGateway config    ←── PaymentGatewayApiKey secret   (OPS007)
EnvironmentMatrix        ──→ ConfigTransform per env       (OPS008)

OrderApi                 ←── VerifyPaymentEndpoint         (OPS002)
OrderWorker              ←── VerifyWorkerQueue             (OPS002)
AuditWorker              ←── (no health check)             (OPS002: error)

PaymentGateway breaker   ──→ FallbackToLegacyPayment      (OPS011/OPS012)

Every arrow is a typed reference. Every arrow is compiler-verified. Remove a node, and every arrow pointing to it becomes a build error.


What's Next

The class is written. The compiler validates it. But we have not yet seen what gets generated from it.

Part VI shows the real outputs: not just a Markdown runbook, but a deployment DAG in Mermaid, Grafana dashboard JSON with pre-configured panels, Prometheus alert YAML ready for kubectl apply, Kubernetes readiness probes, and Helm values — all generated from the same 20+ attributes we just wrote. Side-by-side with the wiki runbook it replaces.