Shared Primitives -- The Ops Kernel
"Eight concepts. Every Ops DSL composes them. No sub-DSL invents its own threshold, its own severity, its own environment model."
Why a Shared Kernel
The 22 sub-DSLs must share a common vocabulary. Without it, each DSL invents its own:
- Chaos has
ChaosSeverity.High, Observability hasAlertSeverity.Critical, Incident hasIncidentPriority.P1-- three names for the same concept - Performance uses
EnvironmentName = "prod", Deployment usesTargetEnv.Production, Configuration usesTier = "production"-- three representations of the same environment - SLA defines thresholds as
(metric, operator, value), Observability defines them as(name, condition, threshold), Cost defines them as(budget, limit, alert)-- three shapes for the same pattern
The shared kernel normalizes these into 8 primitives that every DSL reuses. A Warning in Chaos is the same OpsSeverity.Warning as in Observability. A Production in Deployment is the same EnvironmentTier.Production as in Configuration. A threshold is always (metric, condition, value, severity).
Primitive 1: OpsTarget -- What You Are Operating
Every operational concern applies to something. That something is an OpsTarget.
[OpsProbe("order-db-health", Target = OpsTarget.Database)]
[OpsProbe("order-api-health", Target = OpsTarget.Application)]
[OpsProbe("order-cache-health", Target = OpsTarget.Cache)]
public sealed class OrderServiceHealthChecks { }[OpsProbe("order-db-health", Target = OpsTarget.Database)]
[OpsProbe("order-api-health", Target = OpsTarget.Application)]
[OpsProbe("order-cache-health", Target = OpsTarget.Cache)]
public sealed class OrderServiceHealthChecks { }The target determines what kind of probe, threshold, and artifact makes sense:
| Target | Valid Probe Kinds | Generated Artifact |
|---|---|---|
| Application | Http, Grpc | Kubernetes readiness/liveness probe |
| Database | Sql, Tcp | Connection pool check, migration status |
| Queue | Tcp, Http | Queue depth metric, consumer lag alert |
| Cache | Tcp, Command | Hit rate metric, eviction alert |
| Gateway | Http, Tcp | Upstream health aggregation |
| Storage | Http, Command | Bucket accessibility, quota check |
| Network | Tcp | Connectivity probe, latency measurement |
| Certificate | Command | Expiry check, chain validation |
| Dns | Command | Resolution check, TTL validation |
| Cdn | Http | Cache hit rate, purge verification |
The analyzer validates target-probe compatibility:
// OPS020: Sql probe on Application target
[OpsProbe("bad-probe", Target = OpsTarget.Application, Kind = ProbeKind.Sql)]
// ^^^^^^^^^^^^
// Error: ProbeKind.Sql is only valid for OpsTarget.Database// OPS020: Sql probe on Application target
[OpsProbe("bad-probe", Target = OpsTarget.Application, Kind = ProbeKind.Sql)]
// ^^^^^^^^^^^^
// Error: ProbeKind.Sql is only valid for OpsTarget.DatabaseTarget Hierarchies
Sub-DSLs extend targets with domain-specific detail. The Deployment DSL adds application-level properties:
[DeploymentOrchestrator("2.4")]
[DeploymentApp("order-service",
Target = OpsTarget.Application,
Runtime = "dotnet",
Replicas = 3)]
[DeploymentApp("order-db",
Target = OpsTarget.Database,
Engine = "postgres",
Version = "16")]
public sealed class OrderServiceV24Deployment { }[DeploymentOrchestrator("2.4")]
[DeploymentApp("order-service",
Target = OpsTarget.Application,
Runtime = "dotnet",
Replicas = 3)]
[DeploymentApp("order-db",
Target = OpsTarget.Database,
Engine = "postgres",
Version = "16")]
public sealed class OrderServiceV24Deployment { }The Chaos DSL uses targets to determine fault injection scope:
[ChaosExperiment("CacheFailure", Tier = OpsExecutionTier.InProcess)]
[TargetService(typeof(ICacheService), Target = OpsTarget.Cache)]
[FaultInjection(FaultKind.Exception,
ExceptionType = typeof(RedisConnectionException))]
public sealed class CacheFailureExperiment { }[ChaosExperiment("CacheFailure", Tier = OpsExecutionTier.InProcess)]
[TargetService(typeof(ICacheService), Target = OpsTarget.Cache)]
[FaultInjection(FaultKind.Exception,
ExceptionType = typeof(RedisConnectionException))]
public sealed class CacheFailureExperiment { }The OpsTarget flows through the generated artifacts. A health check with Target = OpsTarget.Database generates a Kubernetes liveness probe that checks the database connection. A health check with Target = OpsTarget.Application generates a readiness probe that calls the HTTP health endpoint.
Primitive 2: OpsProbe -- Checking the Target
A probe is the atomic unit of operational observation. Every monitoring system, every health check, every readiness gate reduces to: "call something, check the result, report the status."
Full Usage Example
[OpsProbe("order-db-connectivity",
Target = OpsTarget.Database,
Kind = ProbeKind.Sql,
IntervalSeconds = 15,
TimeoutSeconds = 3,
FailureThreshold = 3,
Endpoint = "SELECT 1")]
[OpsProbe("order-api-ready",
Target = OpsTarget.Application,
Kind = ProbeKind.Http,
IntervalSeconds = 10,
TimeoutSeconds = 5,
FailureThreshold = 2,
Endpoint = "/health/ready")]
[OpsProbe("order-cache-alive",
Target = OpsTarget.Cache,
Kind = ProbeKind.Tcp,
IntervalSeconds = 30,
TimeoutSeconds = 2,
FailureThreshold = 5)]
public sealed class OrderServiceProbes { }[OpsProbe("order-db-connectivity",
Target = OpsTarget.Database,
Kind = ProbeKind.Sql,
IntervalSeconds = 15,
TimeoutSeconds = 3,
FailureThreshold = 3,
Endpoint = "SELECT 1")]
[OpsProbe("order-api-ready",
Target = OpsTarget.Application,
Kind = ProbeKind.Http,
IntervalSeconds = 10,
TimeoutSeconds = 5,
FailureThreshold = 2,
Endpoint = "/health/ready")]
[OpsProbe("order-cache-alive",
Target = OpsTarget.Cache,
Kind = ProbeKind.Tcp,
IntervalSeconds = 30,
TimeoutSeconds = 2,
FailureThreshold = 5)]
public sealed class OrderServiceProbes { }Generated: Health Check Registration
// <auto-generated by Ops.Observability.Generators />
namespace Ops.Observability.Generated;
public static class OrderServiceProbesRegistration
{
public static IHealthChecksBuilder AddOrderServiceProbes(
this IHealthChecksBuilder builder)
{
builder.AddCheck("order-db-connectivity",
new SqlHealthCheck(
query: "SELECT 1",
timeout: TimeSpan.FromSeconds(3)),
failureStatus: HealthStatus.Unhealthy,
tags: ["db", "readiness"]);
builder.AddCheck("order-api-ready",
new HttpHealthCheck(
endpoint: "/health/ready",
timeout: TimeSpan.FromSeconds(5)),
failureStatus: HealthStatus.Unhealthy,
tags: ["api", "readiness"]);
builder.AddCheck("order-cache-alive",
new TcpHealthCheck(
timeout: TimeSpan.FromSeconds(2)),
failureStatus: HealthStatus.Degraded,
tags: ["cache", "liveness"]);
return builder;
}
}// <auto-generated by Ops.Observability.Generators />
namespace Ops.Observability.Generated;
public static class OrderServiceProbesRegistration
{
public static IHealthChecksBuilder AddOrderServiceProbes(
this IHealthChecksBuilder builder)
{
builder.AddCheck("order-db-connectivity",
new SqlHealthCheck(
query: "SELECT 1",
timeout: TimeSpan.FromSeconds(3)),
failureStatus: HealthStatus.Unhealthy,
tags: ["db", "readiness"]);
builder.AddCheck("order-api-ready",
new HttpHealthCheck(
endpoint: "/health/ready",
timeout: TimeSpan.FromSeconds(5)),
failureStatus: HealthStatus.Unhealthy,
tags: ["api", "readiness"]);
builder.AddCheck("order-cache-alive",
new TcpHealthCheck(
timeout: TimeSpan.FromSeconds(2)),
failureStatus: HealthStatus.Degraded,
tags: ["cache", "liveness"]);
return builder;
}
}Generated: Kubernetes Probes
# <auto-generated by Ops.Observability.Generators />
# Probes for: OrderServiceProbes
apiVersion: v1
kind: Pod
spec:
containers:
- name: order-service
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 2
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 2
failureThreshold: 5# <auto-generated by Ops.Observability.Generators />
# Probes for: OrderServiceProbes
apiVersion: v1
kind: Pod
spec:
containers:
- name: order-service
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 2
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 2
failureThreshold: 5One set of attributes. Two generated artifacts (C# health check registration + Kubernetes YAML). Both always in sync because they are generated from the same source.
Primitive 3: OpsThreshold -- Severity Escalation
Thresholds define when a metric value becomes a problem. The severity determines the response.
The Escalation Pattern
[OpsThreshold("order.error.rate",
ThresholdCondition.GreaterThan, 0.01,
Severity = OpsSeverity.Info,
Unit = "ratio",
Description = "Error rate above 1% — log for investigation")]
[OpsThreshold("order.error.rate",
ThresholdCondition.GreaterThan, 0.05,
Severity = OpsSeverity.Warning,
Description = "Error rate above 5% — Slack notification")]
[OpsThreshold("order.error.rate",
ThresholdCondition.GreaterThan, 0.10,
Severity = OpsSeverity.Critical,
Description = "Error rate above 10% — PagerDuty alert")]
[OpsThreshold("order.error.rate",
ThresholdCondition.GreaterThan, 0.25,
Severity = OpsSeverity.PageNow,
Description = "Error rate above 25% — immediate page, likely outage")]
public sealed class OrderServiceThresholds { }[OpsThreshold("order.error.rate",
ThresholdCondition.GreaterThan, 0.01,
Severity = OpsSeverity.Info,
Unit = "ratio",
Description = "Error rate above 1% — log for investigation")]
[OpsThreshold("order.error.rate",
ThresholdCondition.GreaterThan, 0.05,
Severity = OpsSeverity.Warning,
Description = "Error rate above 5% — Slack notification")]
[OpsThreshold("order.error.rate",
ThresholdCondition.GreaterThan, 0.10,
Severity = OpsSeverity.Critical,
Description = "Error rate above 10% — PagerDuty alert")]
[OpsThreshold("order.error.rate",
ThresholdCondition.GreaterThan, 0.25,
Severity = OpsSeverity.PageNow,
Description = "Error rate above 25% — immediate page, likely outage")]
public sealed class OrderServiceThresholds { }Four thresholds on the same metric, escalating from Info to PageNow. The Source Generator produces a Prometheus alerting rule with severity labels:
# <auto-generated by Ops.Observability.Generators />
groups:
- name: OrderServiceThresholds
rules:
- alert: OrderErrorRateInfo
expr: rate(order_errors_total[5m]) / rate(order_requests_total[5m]) > 0.01
labels:
severity: info
annotations:
summary: "Error rate above 1% — log for investigation"
- alert: OrderErrorRateWarning
expr: rate(order_errors_total[5m]) / rate(order_requests_total[5m]) > 0.05
for: 2m
labels:
severity: warning
annotations:
summary: "Error rate above 5% — Slack notification"
- alert: OrderErrorRateCritical
expr: rate(order_errors_total[5m]) / rate(order_requests_total[5m]) > 0.10
for: 1m
labels:
severity: critical
annotations:
summary: "Error rate above 10% — PagerDuty alert"
- alert: OrderErrorRatePageNow
expr: rate(order_errors_total[5m]) / rate(order_requests_total[5m]) > 0.25
labels:
severity: pagenow
annotations:
summary: "Error rate above 25% — immediate page, likely outage"# <auto-generated by Ops.Observability.Generators />
groups:
- name: OrderServiceThresholds
rules:
- alert: OrderErrorRateInfo
expr: rate(order_errors_total[5m]) / rate(order_requests_total[5m]) > 0.01
labels:
severity: info
annotations:
summary: "Error rate above 1% — log for investigation"
- alert: OrderErrorRateWarning
expr: rate(order_errors_total[5m]) / rate(order_requests_total[5m]) > 0.05
for: 2m
labels:
severity: warning
annotations:
summary: "Error rate above 5% — Slack notification"
- alert: OrderErrorRateCritical
expr: rate(order_errors_total[5m]) / rate(order_requests_total[5m]) > 0.10
for: 1m
labels:
severity: critical
annotations:
summary: "Error rate above 10% — PagerDuty alert"
- alert: OrderErrorRatePageNow
expr: rate(order_errors_total[5m]) / rate(order_requests_total[5m]) > 0.25
labels:
severity: pagenow
annotations:
summary: "Error rate above 25% — immediate page, likely outage"Cross-DSL Usage
The same OpsThreshold primitive is used by multiple DSLs with different semantics:
| DSL | Metric | Meaning |
|---|---|---|
| Observability | order.error.rate |
Alert when error rate exceeds boundary |
| Resilience | order.canary.error.rate |
Rollback canary when metric exceeds threshold |
| Performance | order.p95.latency |
Fail build when latency regresses |
| Cost | order.monthly.cost |
Alert when cloud spend exceeds budget |
| SLA | order.availability |
Burn error budget when availability drops |
| Chaos | order.completion.rate |
Validate steady-state hypothesis during experiment |
Same attribute shape. Same severity enum. Same condition operators. Different DSL, different generated artifact.
Primitive 4: OpsPolicy -- Governance Enforcement
Policies are rules about rules. They govern how the other primitives may be used.
Enforcement Modes
// CompileTime: Roslyn analyzer emits error. Build fails.
[OpsPolicy("AllHealthChecksRequired",
Enforcement = PolicyEnforcement.CompileTime,
Description = "Every deployment must declare at least one health check",
AppliesTo = ["DeploymentOrchestrator"])]
// RuntimeWarning: ILogger.Warning at startup. Build succeeds.
[OpsPolicy("RecommendChaosTests",
Enforcement = PolicyEnforcement.RuntimeWarning,
Description = "Services with > 3 external dependencies should have chaos tests",
Rationale = "High dependency count increases failure surface")]
// RuntimeBlock: Throws at startup. Build succeeds, runtime fails.
[OpsPolicy("RequireEncryptionAtRest",
Enforcement = PolicyEnforcement.RuntimeBlock,
Description = "All database connections must use TLS in production",
AppliesTo = ["Database"])]
public sealed class OrderServicePolicies { }// CompileTime: Roslyn analyzer emits error. Build fails.
[OpsPolicy("AllHealthChecksRequired",
Enforcement = PolicyEnforcement.CompileTime,
Description = "Every deployment must declare at least one health check",
AppliesTo = ["DeploymentOrchestrator"])]
// RuntimeWarning: ILogger.Warning at startup. Build succeeds.
[OpsPolicy("RecommendChaosTests",
Enforcement = PolicyEnforcement.RuntimeWarning,
Description = "Services with > 3 external dependencies should have chaos tests",
Rationale = "High dependency count increases failure surface")]
// RuntimeBlock: Throws at startup. Build succeeds, runtime fails.
[OpsPolicy("RequireEncryptionAtRest",
Enforcement = PolicyEnforcement.RuntimeBlock,
Description = "All database connections must use TLS in production",
AppliesTo = ["Database"])]
public sealed class OrderServicePolicies { }CompileTime enforcement is the strongest: the analyzer scans the compilation for policy violations and emits diagnostic errors. The project does not build until the policy is satisfied. This is appropriate for rules that must never be violated (every deployment has a health check, every canary has a metric).
RuntimeWarning enforcement is for recommendations: the generated IHostedService checks the policy at startup and logs a warning. This is appropriate for best practices that teams should follow but are not hard requirements (chaos tests for high-dependency services).
RuntimeBlock enforcement is for environment-specific rules: the generated startup check throws an InvalidOperationException if the policy is violated in the target environment. This is appropriate for security rules that must be enforced in production but can be relaxed in development (TLS requirement, secret rotation).
Generated Policy Validator
// <auto-generated by Ops.Primitives.Generators />
namespace Ops.Primitives.Generated;
public sealed class OpsPolicyValidator : IHostedService
{
private readonly ILogger<OpsPolicyValidator> _logger;
private readonly IHostEnvironment _env;
public OpsPolicyValidator(
ILogger<OpsPolicyValidator> logger,
IHostEnvironment env)
{
_logger = logger;
_env = env;
}
public Task StartAsync(CancellationToken ct)
{
// RuntimeWarning: RecommendChaosTests
if (ExternalDependencyCount > 3 && !HasChaosTests)
{
_logger.LogWarning(
"Policy 'RecommendChaosTests': Service has {Count} external " +
"dependencies but no chaos tests declared",
ExternalDependencyCount);
}
// RuntimeBlock: RequireEncryptionAtRest (production only)
if (_env.IsProduction() && !AllDatabaseConnectionsUseTls)
{
throw new InvalidOperationException(
"Policy 'RequireEncryptionAtRest' violated: " +
"Database connection 'OrderDb' does not use TLS. " +
"All database connections must use TLS in production.");
}
return Task.CompletedTask;
}
public Task StopAsync(CancellationToken ct) => Task.CompletedTask;
}// <auto-generated by Ops.Primitives.Generators />
namespace Ops.Primitives.Generated;
public sealed class OpsPolicyValidator : IHostedService
{
private readonly ILogger<OpsPolicyValidator> _logger;
private readonly IHostEnvironment _env;
public OpsPolicyValidator(
ILogger<OpsPolicyValidator> logger,
IHostEnvironment env)
{
_logger = logger;
_env = env;
}
public Task StartAsync(CancellationToken ct)
{
// RuntimeWarning: RecommendChaosTests
if (ExternalDependencyCount > 3 && !HasChaosTests)
{
_logger.LogWarning(
"Policy 'RecommendChaosTests': Service has {Count} external " +
"dependencies but no chaos tests declared",
ExternalDependencyCount);
}
// RuntimeBlock: RequireEncryptionAtRest (production only)
if (_env.IsProduction() && !AllDatabaseConnectionsUseTls)
{
throw new InvalidOperationException(
"Policy 'RequireEncryptionAtRest' violated: " +
"Database connection 'OrderDb' does not use TLS. " +
"All database connections must use TLS in production.");
}
return Task.CompletedTask;
}
public Task StopAsync(CancellationToken ct) => Task.CompletedTask;
}Primitive 5: OpsEnvironment -- Scoping to Tiers
Not every declaration applies to every environment. A chaos experiment that kills database connections should not run in production (unless explicitly configured). A backup schedule that runs every hour is overkill for development.
[ChaosExperiment("DbPartition", Tier = OpsExecutionTier.Container)]
[OpsEnvironment(EnvironmentTier.Testing)]
[OpsEnvironment(EnvironmentTier.Staging)]
// Intentionally NOT Production — this experiment is too destructive
public sealed class DbPartitionExperiment { }
[BackupSchedule("order-db-backup",
CronExpression = "0 */6 * * *")]
[OpsEnvironment(EnvironmentTier.Production)]
[OpsEnvironment(EnvironmentTier.DisasterRecovery)]
// No backup in dev/test — waste of resources
public sealed class OrderDbBackup { }[ChaosExperiment("DbPartition", Tier = OpsExecutionTier.Container)]
[OpsEnvironment(EnvironmentTier.Testing)]
[OpsEnvironment(EnvironmentTier.Staging)]
// Intentionally NOT Production — this experiment is too destructive
public sealed class DbPartitionExperiment { }
[BackupSchedule("order-db-backup",
CronExpression = "0 */6 * * *")]
[OpsEnvironment(EnvironmentTier.Production)]
[OpsEnvironment(EnvironmentTier.DisasterRecovery)]
// No backup in dev/test — waste of resources
public sealed class OrderDbBackup { }The Source Generator filters declarations by environment when generating artifacts:
# <auto-generated by Ops.Storage.Generators />
# Environment: Production
# Backup: order-db-backup — every 6 hours
apiVersion: batch/v1
kind: CronJob
metadata:
name: order-db-backup
spec:
schedule: "0 */6 * * *"
# ...# <auto-generated by Ops.Storage.Generators />
# Environment: Production
# Backup: order-db-backup — every 6 hours
apiVersion: batch/v1
kind: CronJob
metadata:
name: order-db-backup
spec:
schedule: "0 */6 * * *"
# ...The analyzer validates environment constraints:
OPS022: ChaosExperiment 'DbPartition' has no OpsEnvironment declaration.
Add [OpsEnvironment] to scope this experiment to specific environments,
or add [OpsEnvironment(EnvironmentTier.Production)] to confirm
it should run everywhere.OPS022: ChaosExperiment 'DbPartition' has no OpsEnvironment declaration.
Add [OpsEnvironment] to scope this experiment to specific environments,
or add [OpsEnvironment(EnvironmentTier.Production)] to confirm
it should run everywhere.Primitive 6: OpsSchedule -- Cron-Based Operations
Many operational concerns are time-based: backup schedules, secret rotation, compliance audits, cost reports, certificate renewal checks.
[OpsSchedule("0 2 * * *",
Timezone = "Europe/Paris",
Description = "Nightly backup at 2 AM Paris time")]
[OpsEnvironment(EnvironmentTier.Production)]
public sealed class OrderDbNightlyBackup { }
[OpsSchedule("0 0 1 * *",
Timezone = "UTC",
Description = "Monthly cost report on the 1st")]
public sealed class OrderServiceCostReport { }
[OpsSchedule("0 8 * * 1",
Timezone = "UTC",
Description = "Weekly certificate expiry check on Monday 8 AM")]
public sealed class CertificateExpiryCheck { }[OpsSchedule("0 2 * * *",
Timezone = "Europe/Paris",
Description = "Nightly backup at 2 AM Paris time")]
[OpsEnvironment(EnvironmentTier.Production)]
public sealed class OrderDbNightlyBackup { }
[OpsSchedule("0 0 1 * *",
Timezone = "UTC",
Description = "Monthly cost report on the 1st")]
public sealed class OrderServiceCostReport { }
[OpsSchedule("0 8 * * 1",
Timezone = "UTC",
Description = "Weekly certificate expiry check on Monday 8 AM")]
public sealed class CertificateExpiryCheck { }The analyzer validates cron expressions at compile time:
OPS024: Invalid cron expression '0 25 * * *' on CertificateExpiryCheck.
Hour field '25' is out of range (0-23).OPS024: Invalid cron expression '0 25 * * *' on CertificateExpiryCheck.
Hour field '25' is out of range (0-23).Primitive 7: OpsExecutionTier -- Tier Constraint Enforcement
Covered in depth in Part 2. Here we focus on the constraint enforcement implementation.
The analyzer maintains a classification of every Ops attribute by its minimum required tier:
internal static class TierClassification
{
// Attributes that are valid at InProcess (Tier 0+)
private static readonly HashSet<string> InProcessValid =
[
"ChaosExperimentAttribute", // with TargetService
"FaultInjectionAttribute",
"SteadyStateHypothesisAttribute",
"PerformanceBudgetAttribute",
"CircuitBreakerAttribute",
"RetryPolicyAttribute",
"FallbackAttribute",
"RateLimitAttribute",
];
// Attributes that require Container (Tier 1+)
private static readonly HashSet<string> ContainerRequired =
[
"ContainerAttribute",
"ToxiProxyAttribute",
"NetworkFaultAttribute",
"DockerVolumeAttribute",
];
// Attributes that require Cloud (Tier 2)
private static readonly HashSet<string> CloudRequired =
[
"CloudProviderAttribute",
"CloudRegionAttribute",
"AzFailureAttribute",
"DistributedFromAttribute",
"TerraformResourceAttribute",
];
}internal static class TierClassification
{
// Attributes that are valid at InProcess (Tier 0+)
private static readonly HashSet<string> InProcessValid =
[
"ChaosExperimentAttribute", // with TargetService
"FaultInjectionAttribute",
"SteadyStateHypothesisAttribute",
"PerformanceBudgetAttribute",
"CircuitBreakerAttribute",
"RetryPolicyAttribute",
"FallbackAttribute",
"RateLimitAttribute",
];
// Attributes that require Container (Tier 1+)
private static readonly HashSet<string> ContainerRequired =
[
"ContainerAttribute",
"ToxiProxyAttribute",
"NetworkFaultAttribute",
"DockerVolumeAttribute",
];
// Attributes that require Cloud (Tier 2)
private static readonly HashSet<string> CloudRequired =
[
"CloudProviderAttribute",
"CloudRegionAttribute",
"AzFailureAttribute",
"DistributedFromAttribute",
"TerraformResourceAttribute",
];
}When a class declares Tier = OpsExecutionTier.InProcess but uses a ContainerRequired attribute, the analyzer emits OPS014. The constraint matrix is clear:
| Declared Tier | InProcess attrs | Container attrs | Cloud attrs |
|---|---|---|---|
| InProcess | allowed | OPS014 error | OPS014 error |
| Container | allowed | allowed | OPS016 error |
| Cloud | allowed | allowed | allowed |
Primitive 8: OpsRequirementLink -- Traceability to Requirements
The Requirements DSL declares features and acceptance criteria. The Ops DSLs implement operational aspects of those requirements. The OpsRequirementLink bridges them.
// In Requirements DSL:
[Feature("OrderCancellation",
Description = "Customer can cancel an order within 30 minutes")]
[AcceptanceCriterion("Cancellation triggers full refund")]
[AcceptanceCriterion("Cancelled order status updated within 5 seconds")]
public sealed class OrderCancellationFeature { }
// In Ops DSLs:
[PerformanceBudget("order.cancel.latency",
P95Ms = 5000,
Description = "Cancellation must complete within 5 seconds")]
[OpsRequirementLink(
typeof(OrderCancellationFeature),
AcceptanceCriterion = nameof(
OrderCancellationFeature.CancelledOrderStatusUpdatedWithin5Seconds),
Rationale = "AC requires status update within 5 seconds; " +
"performance budget enforces this at the ops level")]
public sealed class CancellationLatencyBudget { }
[ChaosExperiment("RefundGatewayTimeout",
Tier = OpsExecutionTier.InProcess)]
[TargetService(typeof(IRefundGateway))]
[FaultInjection(FaultKind.Timeout, Probability = 0.3)]
[OpsRequirementLink(
typeof(OrderCancellationFeature),
AcceptanceCriterion = nameof(
OrderCancellationFeature.CancellationTriggersFullRefund),
Rationale = "Verify refund completes even when payment gateway is slow")]
public sealed class RefundGatewayTimeoutExperiment { }// In Requirements DSL:
[Feature("OrderCancellation",
Description = "Customer can cancel an order within 30 minutes")]
[AcceptanceCriterion("Cancellation triggers full refund")]
[AcceptanceCriterion("Cancelled order status updated within 5 seconds")]
public sealed class OrderCancellationFeature { }
// In Ops DSLs:
[PerformanceBudget("order.cancel.latency",
P95Ms = 5000,
Description = "Cancellation must complete within 5 seconds")]
[OpsRequirementLink(
typeof(OrderCancellationFeature),
AcceptanceCriterion = nameof(
OrderCancellationFeature.CancelledOrderStatusUpdatedWithin5Seconds),
Rationale = "AC requires status update within 5 seconds; " +
"performance budget enforces this at the ops level")]
public sealed class CancellationLatencyBudget { }
[ChaosExperiment("RefundGatewayTimeout",
Tier = OpsExecutionTier.InProcess)]
[TargetService(typeof(IRefundGateway))]
[FaultInjection(FaultKind.Timeout, Probability = 0.3)]
[OpsRequirementLink(
typeof(OrderCancellationFeature),
AcceptanceCriterion = nameof(
OrderCancellationFeature.CancellationTriggersFullRefund),
Rationale = "Verify refund completes even when payment gateway is slow")]
public sealed class RefundGatewayTimeoutExperiment { }The typeof() is a compile-time reference. If OrderCancellationFeature is renamed or deleted, the compiler emits CS0246. The nameof() is a compile-time reference to the acceptance criterion. If the criterion is renamed, the compiler catches it.
Generated Traceability
The Source Generator produces a traceability report linking requirements to their operational coverage:
// <auto-generated> ops-requirement-traceability.g.json
{
"requirements": [
{
"type": "OrderCancellationFeature",
"feature": "OrderCancellation",
"acceptanceCriteria": [
{
"name": "CancellationTriggersFullRefund",
"opsLinks": [
{
"dsl": "Ops.Chaos",
"declaration": "RefundGatewayTimeoutExperiment",
"kind": "ChaosExperiment",
"tier": "InProcess",
"rationale": "Verify refund completes even when payment gateway is slow"
}
]
},
{
"name": "CancelledOrderStatusUpdatedWithin5Seconds",
"opsLinks": [
{
"dsl": "Ops.Performance",
"declaration": "CancellationLatencyBudget",
"kind": "PerformanceBudget",
"tier": "InProcess",
"rationale": "AC requires status update within 5 seconds"
}
]
}
]
}
]
}// <auto-generated> ops-requirement-traceability.g.json
{
"requirements": [
{
"type": "OrderCancellationFeature",
"feature": "OrderCancellation",
"acceptanceCriteria": [
{
"name": "CancellationTriggersFullRefund",
"opsLinks": [
{
"dsl": "Ops.Chaos",
"declaration": "RefundGatewayTimeoutExperiment",
"kind": "ChaosExperiment",
"tier": "InProcess",
"rationale": "Verify refund completes even when payment gateway is slow"
}
]
},
{
"name": "CancelledOrderStatusUpdatedWithin5Seconds",
"opsLinks": [
{
"dsl": "Ops.Performance",
"declaration": "CancellationLatencyBudget",
"kind": "PerformanceBudget",
"tier": "InProcess",
"rationale": "AC requires status update within 5 seconds"
}
]
}
]
}
]
}The Generated Ops Manifest
Every Ops declaration from every sub-DSL is collected into a single manifest file: ops-manifest.g.json. This is the single source of truth for the operational posture of a service.
// <auto-generated> ops-manifest.g.json
{
"generatedAt": "2026-04-06T14:32:00Z",
"service": "OrderService",
"version": "2.4.0",
"summary": {
"totalDeclarations": 47,
"byDsl": {
"Deployment": 3,
"Migration": 5,
"Observability": 12,
"Configuration": 4,
"Resilience": 6,
"Chaos": 8,
"Performance": 4,
"Security": 2,
"SLA": 2,
"Compliance": 1
},
"byTier": {
"InProcess": 31,
"Container": 12,
"Cloud": 4
},
"byEnvironment": {
"Development": 15,
"Testing": 31,
"Staging": 40,
"Production": 47,
"DisasterRecovery": 8
}
},
"declarations": [
{
"dsl": "Ops.Observability",
"kind": "HealthCheck",
"name": "order-db-connectivity",
"target": "Database",
"tier": "InProcess",
"environments": ["Testing", "Staging", "Production"],
"probe": {
"kind": "Sql",
"endpoint": "SELECT 1",
"intervalSeconds": 15,
"timeoutSeconds": 3,
"failureThreshold": 3
}
},
{
"dsl": "Ops.Chaos",
"kind": "ChaosExperiment",
"name": "PaymentTimeout",
"target": "Application",
"tier": "InProcess",
"environments": ["Testing", "Staging"],
"fault": {
"kind": "Timeout",
"probability": 0.3,
"timeoutMs": 5000
},
"hypothesis": {
"metric": "order.completion.rate",
"condition": "GreaterThan",
"value": 0.95
},
"requirementLinks": [
{
"requirement": "OrderCancellationFeature",
"criterion": "CancellationTriggersFullRefund"
}
]
}
// ... 45 more declarations
]
}// <auto-generated> ops-manifest.g.json
{
"generatedAt": "2026-04-06T14:32:00Z",
"service": "OrderService",
"version": "2.4.0",
"summary": {
"totalDeclarations": 47,
"byDsl": {
"Deployment": 3,
"Migration": 5,
"Observability": 12,
"Configuration": 4,
"Resilience": 6,
"Chaos": 8,
"Performance": 4,
"Security": 2,
"SLA": 2,
"Compliance": 1
},
"byTier": {
"InProcess": 31,
"Container": 12,
"Cloud": 4
},
"byEnvironment": {
"Development": 15,
"Testing": 31,
"Staging": 40,
"Production": 47,
"DisasterRecovery": 8
}
},
"declarations": [
{
"dsl": "Ops.Observability",
"kind": "HealthCheck",
"name": "order-db-connectivity",
"target": "Database",
"tier": "InProcess",
"environments": ["Testing", "Staging", "Production"],
"probe": {
"kind": "Sql",
"endpoint": "SELECT 1",
"intervalSeconds": 15,
"timeoutSeconds": 3,
"failureThreshold": 3
}
},
{
"dsl": "Ops.Chaos",
"kind": "ChaosExperiment",
"name": "PaymentTimeout",
"target": "Application",
"tier": "InProcess",
"environments": ["Testing", "Staging"],
"fault": {
"kind": "Timeout",
"probability": 0.3,
"timeoutMs": 5000
},
"hypothesis": {
"metric": "order.completion.rate",
"condition": "GreaterThan",
"value": 0.95
},
"requirementLinks": [
{
"requirement": "OrderCancellationFeature",
"criterion": "CancellationTriggersFullRefund"
}
]
}
// ... 45 more declarations
]
}The dotnet ops report Command
A CLI tool reads ops-manifest.g.json and produces a human-readable operational posture report:
$ dotnet ops report
OrderService v2.4.0 — Operational Posture
═════════════════════════════════════════════
Declarations: 47 total (31 InProcess, 12 Container, 4 Cloud)
Health Checks: 6 probes across 4 targets
Chaos Tests: 8 experiments (5 InProcess, 2 Container, 1 Cloud)
Perf Budgets: 4 budgets (p95 < 200ms, p99 < 500ms, error < 0.1%)
SLA: 99.9% availability, 43.8 min/month error budget
Compliance: SOC2 Type II — 12/14 controls evidenced
Coverage by Environment:
Development 15/47 (32%) ← expected: not all ops concerns apply to dev
Testing 31/47 (66%)
Staging 40/47 (85%)
Production 47/47 (100%) ← full coverage
DR 8/47 (17%) ← expected: only backup/failover relevant
Cross-DSL Validation: 14/14 rules passed ✓
Requirement Traceability: 4 features linked, 11/14 ACs covered (79%)$ dotnet ops report
OrderService v2.4.0 — Operational Posture
═════════════════════════════════════════════
Declarations: 47 total (31 InProcess, 12 Container, 4 Cloud)
Health Checks: 6 probes across 4 targets
Chaos Tests: 8 experiments (5 InProcess, 2 Container, 1 Cloud)
Perf Budgets: 4 budgets (p95 < 200ms, p99 < 500ms, error < 0.1%)
SLA: 99.9% availability, 43.8 min/month error budget
Compliance: SOC2 Type II — 12/14 controls evidenced
Coverage by Environment:
Development 15/47 (32%) ← expected: not all ops concerns apply to dev
Testing 31/47 (66%)
Staging 40/47 (85%)
Production 47/47 (100%) ← full coverage
DR 8/47 (17%) ← expected: only backup/failover relevant
Cross-DSL Validation: 14/14 rules passed ✓
Requirement Traceability: 4 features linked, 11/14 ACs covered (79%)The Generated DI Registration
// <auto-generated by Ops.Primitives.Generators />
namespace Ops.Generated;
public static class OpsRegistryExtensions
{
/// Registers all Ops infrastructure discovered in this compilation:
/// health checks, metrics, middleware, policies, validators.
public static IServiceCollection AddOpsInfrastructure(
this IServiceCollection services)
{
// Observability: health checks
services.AddHealthChecks()
.AddOrderServiceProbes();
// Observability: metrics middleware
services.AddSingleton<OrderServiceMetricsMiddleware>();
// Resilience: circuit breaker policies
services.AddOrderServiceCircuitBreakers();
// Chaos: experiment decorators (when chaos is active)
services.AddChaosDecorators();
// Primitives: policy validator
services.AddHostedService<OpsPolicyValidator>();
return services;
}
}// <auto-generated by Ops.Primitives.Generators />
namespace Ops.Generated;
public static class OpsRegistryExtensions
{
/// Registers all Ops infrastructure discovered in this compilation:
/// health checks, metrics, middleware, policies, validators.
public static IServiceCollection AddOpsInfrastructure(
this IServiceCollection services)
{
// Observability: health checks
services.AddHealthChecks()
.AddOrderServiceProbes();
// Observability: metrics middleware
services.AddSingleton<OrderServiceMetricsMiddleware>();
// Resilience: circuit breaker policies
services.AddOrderServiceCircuitBreakers();
// Chaos: experiment decorators (when chaos is active)
services.AddChaosDecorators();
// Primitives: policy validator
services.AddHostedService<OpsPolicyValidator>();
return services;
}
}One call: services.AddOpsInfrastructure(). Every operational concern registered. Generated from the same attributes that produce the Kubernetes YAML, the Prometheus alerts, and the Terraform modules.
The Primitive Composition Pattern
The 8 primitives compose. A single Ops declaration typically uses 3-5 primitives:
// Composes: OpsTarget + OpsProbe + OpsThreshold + OpsEnvironment + OpsSchedule
[OpsProbe("cert-expiry-check",
Target = OpsTarget.Certificate, // Primitive 1: target
Kind = ProbeKind.Command,
IntervalSeconds = 86400)] // Primitive 2: probe
[OpsThreshold("cert.days.remaining",
ThresholdCondition.LessThan, 30,
Severity = OpsSeverity.Warning,
Description = "Certificate expires in < 30 days")]
[OpsThreshold("cert.days.remaining",
ThresholdCondition.LessThan, 7,
Severity = OpsSeverity.PageNow,
Description = "Certificate expires in < 7 days")] // Primitive 3: thresholds
[OpsEnvironment(EnvironmentTier.Staging)]
[OpsEnvironment(EnvironmentTier.Production)] // Primitive 5: environments
[OpsSchedule("0 8 * * 1",
Timezone = "UTC",
Description = "Weekly check on Monday 8 AM")] // Primitive 6: schedule
[OpsRequirementLink(typeof(SecurityComplianceFeature),
AcceptanceCriterion = nameof(
SecurityComplianceFeature.TlsCertificatesNeverExpire))] // Primitive 8: req link
public sealed class TlsCertificateExpiryCheck { }// Composes: OpsTarget + OpsProbe + OpsThreshold + OpsEnvironment + OpsSchedule
[OpsProbe("cert-expiry-check",
Target = OpsTarget.Certificate, // Primitive 1: target
Kind = ProbeKind.Command,
IntervalSeconds = 86400)] // Primitive 2: probe
[OpsThreshold("cert.days.remaining",
ThresholdCondition.LessThan, 30,
Severity = OpsSeverity.Warning,
Description = "Certificate expires in < 30 days")]
[OpsThreshold("cert.days.remaining",
ThresholdCondition.LessThan, 7,
Severity = OpsSeverity.PageNow,
Description = "Certificate expires in < 7 days")] // Primitive 3: thresholds
[OpsEnvironment(EnvironmentTier.Staging)]
[OpsEnvironment(EnvironmentTier.Production)] // Primitive 5: environments
[OpsSchedule("0 8 * * 1",
Timezone = "UTC",
Description = "Weekly check on Monday 8 AM")] // Primitive 6: schedule
[OpsRequirementLink(typeof(SecurityComplianceFeature),
AcceptanceCriterion = nameof(
SecurityComplianceFeature.TlsCertificatesNeverExpire))] // Primitive 8: req link
public sealed class TlsCertificateExpiryCheck { }Seven attributes. Six primitives composed. One declaration. The Source Generator reads all of them and produces:
- A Kubernetes CronJob that runs weekly
- A Prometheus alert rule with two severity levels
- An entry in ops-manifest.g.json
- A traceability link to the SecurityComplianceFeature requirement
- An environment filter (staging + production only)
All from one class with attributes. No YAML. No wiki. No manual process. The primitives are the kernel. The sub-DSLs are the vocabulary. The generators are the compilers. The analyzers are the type checkers. The manifest is the output.