Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

The Ops Meta-DSL -- Architecture of an Ecosystem

"One big namespace fails in DDD for the same reason one big Ops.Lib fails: the concepts are too different to share a vocabulary."


Why One Big Ops.Lib Fails

The Auto-Documentation series introduced 5 Ops sub-DSLs as a sketch: Deployment, Migration, Observability, Configuration, Resilience. The sketch worked because 5 DSLs fit in one mental model.

At 22 DSLs, a single Ops.Lib becomes a monolith. The problems compound:

Vocabulary collision. Threshold means "error rate boundary" in Observability, "cost limit" in Cost, "latency budget" in Performance, "scaling trigger" in Scaling. One ThresholdAttribute with 15 optional properties is a god-attribute.

Generator bloat. A single Source Generator that handles deployment DAGs, chaos decorators, Terraform modules, k6 scripts, compliance matrices, and backup schedules is unmaintainable. Each concern has different Roslyn analysis patterns, different output formats, different test strategies.

Dependency graph. A team that needs only chaos engineering should not transitively depend on compliance, cost management, and data pipeline libraries. NuGet dependency hygiene matters.

Analyzer noise. 100+ analyzer rules in a single package means every build runs every rule. A team that does not use Ops.Compliance should not see OPS080-OPS089 diagnostics.

The solution: 22 independently publishable NuGet packages, each with its own Lib, Generators, and Analyzers. One shared kernel for the 8 primitives. Two meta-packages for convenience.


The NuGet Architecture

Ops.Primitives.Lib                ← shared kernel (8 primitives, enums, base types)
│
├── Ops.Deployment.Lib            ← orchestration, ordering, dependencies, strategies
│   ├── Ops.Deployment.Generators
│   └── Ops.Deployment.Analyzers
│
├── Ops.Migration.Lib             ← DB schema/data/index/exe migrations, parallelism
│   ├── Ops.Migration.Generators
│   └── Ops.Migration.Analyzers
│
├── Ops.Observability.Lib         ← health checks, metrics, alerts, dashboards
│   ├── Ops.Observability.Generators
│   └── Ops.Observability.Analyzers
│
├── Ops.Configuration.Lib         ← env transforms, secrets, rotation, validation
│   ├── Ops.Configuration.Generators
│   └── Ops.Configuration.Analyzers
│
├── Ops.Resilience.Lib            ← rollback, canary, circuit breaker, blue-green
│   ├── Ops.Resilience.Generators
│   └── Ops.Resilience.Analyzers
│
├── Ops.Chaos.Lib                 ← experiments, fault injection, steady-state
│   ├── Ops.Chaos.Generators
│   └── Ops.Chaos.Analyzers
│
├── Ops.Performance.Lib           ← budgets, load profiles, baselines
│   ├── Ops.Performance.Generators
│   └── Ops.Performance.Analyzers
│
├── Ops.Security.Lib              ← policies, scans, compliance, headers
│   ├── Ops.Security.Generators
│   └── Ops.Security.Analyzers
│
├── Ops.Cost.Lib                  ← budgets, alerts, optimization
│   ├── Ops.Cost.Generators
│   └── Ops.Cost.Analyzers
│
├── Ops.Network.Lib               ← policies, DNS, TLS, certificates
│   ├── Ops.Network.Generators
│   └── Ops.Network.Analyzers
│
├── Ops.Storage.Lib               ← backup, retention, replication
│   ├── Ops.Storage.Generators
│   └── Ops.Storage.Analyzers
│
├── Ops.Scaling.Lib               ← rules, limits, predictions
│   ├── Ops.Scaling.Generators
│   └── Ops.Scaling.Analyzers
│
├── Ops.Sla.Lib                   ← SLI/SLO/SLA, error budgets, status pages
│   ├── Ops.Sla.Generators
│   └── Ops.Sla.Analyzers
│
├── Ops.Incident.Lib              ← runbooks, escalation, postmortem
│   ├── Ops.Incident.Generators
│   └── Ops.Incident.Analyzers
│
├── Ops.Compliance.Lib            ← matrices, audits, evidence
│   ├── Ops.Compliance.Generators
│   └── Ops.Compliance.Analyzers
│
├── Ops.FeatureFlag.Lib           ← flags, rollouts, A/B experiments
│   ├── Ops.FeatureFlag.Generators
│   └── Ops.FeatureFlag.Analyzers
│
├── Ops.DataPipeline.Lib          ← ETL, streaming, data quality
│   ├── Ops.DataPipeline.Generators
│   └── Ops.DataPipeline.Analyzers
│
├── Ops.Essentials.Lib            ← meta-package: curated 7 (Deployment, Migration,
│                                    Observability, Configuration, Resilience,
│                                    Chaos, Performance)
│
└── Ops.Lib                       ← meta-package: all 22 sub-DSLs

Each *.Lib project depends only on Ops.Primitives.Lib. Each *.Generators project depends on its own *.Lib plus Roslyn APIs. Each *.Analyzers project depends on its own *.Lib plus Roslyn diagnostic APIs. No cross-dependencies between sub-DSLs at the Lib level.

Cross-DSL validation happens at the Analyzer level via symbol resolution: the analyzer scans the compilation for attributes from other DSLs and validates references without taking a NuGet dependency on those DSLs.


Ops.Essentials vs. Ops.Lib

<!-- Option A: Everything — research teams, platform teams -->
<PackageReference Include="Ops.Lib" Version="1.0.0" />

<!-- Option B: Curated 7 — most product teams -->
<PackageReference Include="Ops.Essentials.Lib" Version="1.0.0" />

<!-- Option C: Just what you need — minimalists -->
<PackageReference Include="Ops.Chaos.Lib" Version="1.0.0" />
<PackageReference Include="Ops.Observability.Lib" Version="1.0.0" />

Ops.Essentials includes the 7 DSLs that virtually every production service needs:

DSL Why It's Essential
Deployment Every service deploys
Migration Every service with a database migrates
Observability Every service needs health checks and metrics
Configuration Every service has environment-specific config
Resilience Every service needs rollback and circuit breaking
Chaos Every service benefits from fault injection testing
Performance Every service has latency requirements

The remaining 10 are situational: not every service needs compliance matrices, cost budgets, or data pipeline quality gates. Teams add them as needed.


The 8 Shared Primitives

The Ops.Primitives.Lib package defines 8 concepts registered in the M3 MetamodelRegistry. Every sub-DSL uses these primitives as building blocks. They are the kernel of the Ops ecosystem.

1. OpsTarget

What the operational concern applies to.

[MetaConcept("OpsTarget", Description = "The infrastructure target of an operational concern")]
public enum OpsTarget
{
    [MetaProperty(Description = "A deployed application or microservice")]
    Application,

    [MetaProperty(Description = "A database instance (SQL, NoSQL, graph)")]
    Database,

    [MetaProperty(Description = "A message broker (RabbitMQ, Kafka, SQS)")]
    Queue,

    [MetaProperty(Description = "A cache layer (Redis, Memcached)")]
    Cache,

    [MetaProperty(Description = "An API gateway or reverse proxy")]
    Gateway,

    [MetaProperty(Description = "Object/block/file storage (S3, Blob, NFS)")]
    Storage,

    [MetaProperty(Description = "Network infrastructure (VPC, subnet, firewall)")]
    Network,

    [MetaProperty(Description = "TLS/SSL certificate")]
    Certificate,

    [MetaProperty(Description = "DNS record or zone")]
    Dns,

    [MetaProperty(Description = "CDN or edge cache")]
    Cdn
}

2. OpsProbe

A check against a target -- the fundamental unit of observability.

[MetaConcept("OpsProbe", Description = "A health or readiness check against an OpsTarget")]
[AttributeUsage(AttributeTargets.Class | AttributeTargets.Method, AllowMultiple = true)]
public sealed class OpsProbeAttribute : Attribute
{
    public string Name { get; }
    public OpsTarget Target { get; init; } = OpsTarget.Application;
    public int IntervalSeconds { get; init; } = 30;
    public int TimeoutSeconds { get; init; } = 5;
    public int FailureThreshold { get; init; } = 3;
    public string Endpoint { get; init; } = "/health";
    public ProbeKind Kind { get; init; } = ProbeKind.Http;

    public OpsProbeAttribute(string name) => Name = name;
}

public enum ProbeKind
{
    Http,       // GET endpoint, expect 200
    Tcp,        // TCP connect, expect open port
    Command,    // Execute command, expect exit code 0
    Grpc,       // gRPC health check protocol
    Sql         // Execute query, expect rows
}

3. OpsThreshold

A numeric boundary with severity escalation.

[MetaConcept("OpsThreshold", Description = "A numeric boundary that triggers escalation")]
[AttributeUsage(AttributeTargets.Class | AttributeTargets.Property, AllowMultiple = true)]
public sealed class OpsThresholdAttribute : Attribute
{
    public string Metric { get; }
    public ThresholdCondition Condition { get; }
    public double Value { get; }
    public OpsSeverity Severity { get; init; } = OpsSeverity.Warning;
    public string Unit { get; init; } = "";
    public string Description { get; init; } = "";

    public OpsThresholdAttribute(
        string metric, ThresholdCondition condition, double value)
    {
        Metric = metric;
        Condition = condition;
        Value = value;
    }
}

public enum ThresholdCondition
{
    GreaterThan,
    GreaterThanOrEqual,
    LessThan,
    LessThanOrEqual,
    Equals,
    NotEquals
}

public enum OpsSeverity
{
    Info,       // Logged, no action
    Warning,    // Dashboard highlight, Slack notification
    Critical,   // PagerDuty alert, requires acknowledgement
    PageNow     // Immediate page, wake up the on-call
}

4. OpsPolicy

A governance rule with enforcement mode.

[MetaConcept("OpsPolicy", Description = "A governance rule with enforcement mode")]
[AttributeUsage(AttributeTargets.Class | AttributeTargets.Assembly, AllowMultiple = true)]
public sealed class OpsPolicyAttribute : Attribute
{
    public string Name { get; }
    public PolicyEnforcement Enforcement { get; init; } = PolicyEnforcement.CompileTime;
    public string Description { get; init; } = "";
    public string Rationale { get; init; } = "";
    public string[] AppliesTo { get; init; } = [];

    public OpsPolicyAttribute(string name) => Name = name;
}

public enum PolicyEnforcement
{
    /// Roslyn analyzer emits a diagnostic error. Build fails.
    CompileTime,

    /// Runtime check logs ILogger.Warning. Build succeeds.
    RuntimeWarning,

    /// Runtime check throws. Build succeeds, but violation at runtime is fatal.
    RuntimeBlock
}

5. OpsEnvironment

Scoping to environment tiers.

[MetaConcept("OpsEnvironment", Description = "Scopes a declaration to specific environment tiers")]
[AttributeUsage(AttributeTargets.Class, AllowMultiple = true)]
public sealed class OpsEnvironmentAttribute : Attribute
{
    public EnvironmentTier Tier { get; }
    public string[] Overrides { get; init; } = [];

    public OpsEnvironmentAttribute(EnvironmentTier tier) => Tier = tier;
}

public enum EnvironmentTier
{
    Development,
    Testing,
    Staging,
    Production,
    DisasterRecovery
}

6. OpsSchedule

Cron-based scheduling.

[MetaConcept("OpsSchedule", Description = "A cron-based schedule for recurring operations")]
[AttributeUsage(AttributeTargets.Class | AttributeTargets.Method, AllowMultiple = true)]
public sealed class OpsScheduleAttribute : Attribute
{
    public string CronExpression { get; }
    public string Timezone { get; init; } = "UTC";
    public string Description { get; init; } = "";
    public bool EnabledInDevelopment { get; init; } = false;

    public OpsScheduleAttribute(string cronExpression)
        => CronExpression = cronExpression;
}

7. OpsExecutionTier

Covered in detail in Part 2. The enum that determines InProcess / Container / Cloud.

[MetaConcept("OpsExecutionTier",
    Description = "Execution tier determining infrastructure requirements")]
public enum OpsExecutionTier
{
    InProcess = 0,
    Container = 1,
    Cloud = 2
}

Traceability back to the Requirements DSL.

[MetaConcept("OpsRequirementLink",
    Description = "Links an Ops declaration to a requirement, feature, or acceptance criterion")]
[AttributeUsage(AttributeTargets.Class, AllowMultiple = true)]
public sealed class OpsRequirementLinkAttribute : Attribute
{
    public Type RequirementType { get; }
    public string? AcceptanceCriterion { get; init; }
    public string Rationale { get; init; } = "";

    public OpsRequirementLinkAttribute(Type requirementType)
        => RequirementType = requirementType;
}

M3 Registration

Each primitive is registered in the MetamodelRegistry as a first-class concept:

// <auto-generated by Ops.Primitives.Generators />
namespace Ops.Primitives.Generated;

public static class OpsMetamodelRegistration
{
    public static void Register(IMetamodelRegistry registry)
    {
        // Layer: M2 (Ops DSL concepts sit at M2, on top of M3 infrastructure)
        var opsLayer = registry.GetOrCreateLayer("Ops", MetaLevel.M2);

        opsLayer.RegisterConcept<OpsTarget>(
            name: "OpsTarget",
            kind: ConceptKind.Enum,
            values: Enum.GetNames<OpsTarget>());

        opsLayer.RegisterConcept<OpsProbeAttribute>(
            name: "OpsProbe",
            kind: ConceptKind.Attribute,
            properties: ["Name", "Target", "IntervalSeconds",
                         "TimeoutSeconds", "FailureThreshold",
                         "Endpoint", "Kind"]);

        opsLayer.RegisterConcept<OpsThresholdAttribute>(
            name: "OpsThreshold",
            kind: ConceptKind.Attribute,
            properties: ["Metric", "Condition", "Value",
                         "Severity", "Unit", "Description"]);

        opsLayer.RegisterConcept<OpsPolicyAttribute>(
            name: "OpsPolicy",
            kind: ConceptKind.Attribute,
            properties: ["Name", "Enforcement", "Description",
                         "Rationale", "AppliesTo"]);

        opsLayer.RegisterConcept<OpsEnvironmentAttribute>(
            name: "OpsEnvironment",
            kind: ConceptKind.Attribute,
            properties: ["Tier", "Overrides"]);

        opsLayer.RegisterConcept<OpsScheduleAttribute>(
            name: "OpsSchedule",
            kind: ConceptKind.Attribute,
            properties: ["CronExpression", "Timezone",
                         "Description", "EnabledInDevelopment"]);

        opsLayer.RegisterConcept<OpsExecutionTier>(
            name: "OpsExecutionTier",
            kind: ConceptKind.Enum,
            values: ["InProcess", "Container", "Cloud"]);

        opsLayer.RegisterConcept<OpsRequirementLinkAttribute>(
            name: "OpsRequirementLink",
            kind: ConceptKind.Attribute,
            properties: ["RequirementType", "AcceptanceCriterion",
                         "Rationale"]);
    }
}

This registration allows the Document DSL to introspect every Ops concept: Document<OpsProbe> generates probe documentation, Document<OpsThreshold> generates threshold matrices, Document<OpsPolicy> generates governance reports.


The Cross-DSL Validation Graph

The 22 sub-DSLs are independent at the Lib level but interconnected at the Analyzer level. The analyzers resolve symbols across NuGet boundaries to validate cross-DSL references. Here are the 14 validated cross-references:

# Source DSL Target DSL What's Validated
1 Deployment Migration Migration completes before app starts
2 Deployment Configuration All RequiredConfigs exist in config transforms
3 Deployment Observability HealthCheck endpoint exists on deployed app
4 Resilience Observability Canary metric references a declared metric
5 Resilience Deployment Rollback target version is a declared deployment
6 Chaos Resilience Experiment references a declared circuit breaker
7 Chaos Observability SteadyStateHypothesis metric exists
8 Performance Observability Budget metric matches a declared SLI
9 Performance SLA Budget thresholds are within SLA bounds
10 SLA Observability SLI metric exists and has correct aggregation
11 Incident Observability Runbook references a declared alert
12 Compliance Security Compliance requirement maps to security policy
13 FeatureFlag Deployment Flag declared before deployment references it
14 DataPipeline Observability Pipeline quality check emits declared metrics

Each cross-reference is an analyzer rule. For example, rule #4 (Resilience references Observability):

// Ops.Resilience.Analyzers — Cross-DSL Rule #4
// Validates that canary metric references a declared Observability metric.

[DiagnosticAnalyzer(LanguageNames.CSharp)]
public sealed class CanaryMetricExistsAnalyzer : DiagnosticAnalyzer
{
    private static readonly DiagnosticDescriptor Rule = new(
        id: "OPS004",
        title: "Canary references undeclared metric",
        messageFormat: "Canary '{0}' references metric '{1}' " +
                       "which is not declared in any [MetricDefinition]",
        category: "Ops.CrossDsl",
        defaultSeverity: DiagnosticSeverity.Error,
        isEnabledByDefault: true);

    public override ImmutableArray<DiagnosticDescriptor>
        SupportedDiagnostics => [Rule];

    public override void Initialize(AnalysisContext context)
    {
        context.EnableConcurrentExecution();
        context.ConfigureGeneratedCodeAnalysis(
            GeneratedCodeAnalysisFlags.None);

        context.RegisterCompilationStartAction(compilationStart =>
        {
            // Collect all [MetricDefinition] names in the compilation
            var declaredMetrics = new HashSet<string>();

            compilationStart.RegisterSymbolAction(symbolContext =>
            {
                var attrs = symbolContext.Symbol.GetAttributes();
                foreach (var attr in attrs)
                {
                    if (attr.AttributeClass?.Name == "MetricDefinitionAttribute"
                        && attr.ConstructorArguments.Length > 0)
                    {
                        declaredMetrics.Add(
                            attr.ConstructorArguments[0].Value?.ToString() ?? "");
                    }
                }
            }, SymbolKind.NamedType);

            // Check all [CanaryMetric] references
            compilationStart.RegisterCompilationEndAction(endContext =>
            {
                // ... scan for CanaryMetric attributes, verify metric name
                // exists in declaredMetrics, emit OPS004 if not found
            });
        });
    }
}

The key insight: the Resilience analyzer does not take a NuGet dependency on Observability. It resolves MetricDefinitionAttribute by name from the compilation's symbol table. This means the cross-DSL validation works even if the team uses only Ops.Resilience.Lib and Ops.Observability.Lib -- the analyzer finds the symbols regardless of package boundaries.


The 7-Stage Pipeline

The dev-side CMF uses a 5-stage source generation pipeline (Stages 0-4). The Ops ecosystem extends this to 7 stages:

Stage 0: Collection       ← Discover all attributes in the compilation
Stage 1: Validation       ← Verify constraints (single-DSL)
Stage 2: Registration     ← Register concepts in MetamodelRegistry
Stage 3: Code Generation  ← Generate C# code (decorators, extensions, DI)
Stage 4: Cross-Validation ← Cross-DSL analyzer checks (the 14 rules above)
Stage 5: Ops Code Gen     ← Generate ops-specific C# (validators, middleware)
Stage 6: Artifact Gen     ← Generate non-C# artifacts (YAML, JSON, HCL, k6)

Stages 0-4 are unchanged from the CMF pipeline. Stages 5-6 are new:

Stage 5 (Ops Code Gen) generates C# code that is specific to the operational domain: health check registration, metrics middleware, circuit breaker policies, compliance validators. This code is part of the compilation and benefits from type checking.

Stage 6 (Artifact Gen) generates non-C# artifacts: docker-compose.yaml, Prometheus alert rules, Terraform modules, k6 scripts, Grafana dashboard JSON, Kubernetes manifests. These are written to the build output directory and consumed by external tools.

The separation matters. Stage 5 output is compiled C# -- if it has errors, the build fails. Stage 6 output is external artifacts -- they are validated by their respective tools (Terraform validate, k6 check, promtool check rules) in CI, not by the C# compiler.

dotnet build
  │
  ├── Stage 0-4: CMF pipeline (Requirements, DDD, API, Test, Document)
  │
  ├── Stage 5: Ops C# code gen
  │     ├── HealthCheckRegistration.g.cs
  │     ├── MetricsMiddleware.g.cs
  │     ├── CircuitBreakerPolicies.g.cs
  │     ├── ChaosDecorators.g.cs
  │     └── ComplianceValidator.g.cs
  │
  └── Stage 6: Ops artifact gen
        ├── ops-manifest.g.json
        ├── docker-compose.chaos.yaml
        ├── prometheus/alerts.yaml
        ├── grafana/dashboards/*.json
        ├── terraform/chaos-*/main.tf
        ├── k6/load-tests/*.js
        └── kubernetes/network-policies/*.yaml

One dotnet build. Every operational artifact. Typed, validated, generated.

⬇ Download