The Problem
Every organization has performance goals. Very few enforce them.
The pattern repeats:
- The SRE team defines SLOs in a Google Doc. The document is titled "SLO Definitions Q3 2025." It has a table with four columns: service, SLI, target, window. Nobody updates it after Q3.
- A developer adds an endpoint. There is no performance budget. The endpoint returns 2 MB of JSON because it eagerly loads three navigation properties. Nobody notices until a mobile client on 3G reports a 12-second load time.
- A cache is added to fix the slow endpoint. The cache has no invalidation strategy. A customer updates their order and sees stale data for 5 minutes. The fix is to reduce the TTL to 30 seconds, which defeats the purpose of the cache.
- A performance regression ships because the benchmark suite was last run six months ago. The BenchmarkDotNet project still references the old API. It does not compile.
What is missing:
- SLIs defined in code. An SLI is a measurement. It should be declared next to the service it measures, not in a wiki.
- SLOs that reference SLIs. An SLO is a target for an SLI. If the SLI does not exist, the SLO is meaningless. The compiler should reject orphan SLOs.
- Per-endpoint performance budgets. Every endpoint should have a P50, P95, P99 latency budget and a maximum payload size. The build should fail if an endpoint has no budget.
- Cache policies with invalidation. A cache without an invalidation strategy is a bug waiting to happen. The analyzer should require an invalidation event for every cache policy.
- Benchmark targets that stay in sync. If a method signature changes, the benchmark should fail to compile — not silently skip the method.
Attribute Definitions
// =================================================================
// Ops.Performance.Lib -- Performance DSL Attributes
// =================================================================
/// The kind of service-level indicator being measured.
public enum SliKind
{
Availability, // percentage of successful requests
Latency, // response time distribution
Throughput, // requests per second
ErrorRate, // percentage of failed requests
Saturation // resource utilization (CPU, memory, connections)
}
/// The unit of measurement for an SLI.
public enum SliMeasurement
{
Milliseconds,
Seconds,
Percentage,
RequestsPerSecond,
BytesPerSecond,
Count
}
/// Declares a service-level indicator — a concrete measurement
/// attached to a service or endpoint.
[AttributeUsage(AttributeTargets.Class | AttributeTargets.Method, AllowMultiple = true)]
public sealed class SliAttribute : Attribute
{
public string Name { get; }
public SliKind Kind { get; }
public SliMeasurement Measurement { get; }
public string Description { get; init; } = "";
public string[] Labels { get; init; } = [];
public SliAttribute(string name, SliKind kind, SliMeasurement measurement)
{
Name = name;
Kind = kind;
Measurement = measurement;
}
}
/// Declares a service-level objective — a target for an SLI
/// over a rolling window, with error budget and burn-rate alerting.
[AttributeUsage(AttributeTargets.Class, AllowMultiple = true)]
public sealed class SloAttribute : Attribute
{
public string Name { get; }
public string SliName { get; }
public double Target { get; }
public int WindowDays { get; init; } = 30;
public double ErrorBudget { get; init; }
public double BurnRateAlertThreshold { get; init; } = 14.4;
public string EscalationChannel { get; init; } = "";
public SloAttribute(string name, string sliName, double target)
{
Name = name;
SliName = sliName;
Target = target;
ErrorBudget = 1.0 - target;
}
}
/// Per-endpoint performance budget. Every public endpoint must have one.
[AttributeUsage(AttributeTargets.Method, AllowMultiple = false)]
public sealed class PerformanceBudgetAttribute : Attribute
{
public string Endpoint { get; }
public int P50Ms { get; init; }
public int P95Ms { get; init; }
public int P99Ms { get; init; }
public int MaxPayloadBytes { get; init; }
public string Owner { get; init; } = "";
public PerformanceBudgetAttribute(string endpoint)
{
Endpoint = endpoint;
}
}
/// Cache strategy for a data source or endpoint.
public enum CacheStrategy
{
ReadThrough, // read from cache; on miss, read from source and populate
WriteThrough, // write to source and cache simultaneously
WriteBehind, // write to cache immediately, source asynchronously
Aside // application manages cache explicitly
}
/// Declares a cache policy with an explicit invalidation event.
[AttributeUsage(AttributeTargets.Method | AttributeTargets.Class, AllowMultiple = true)]
public sealed class CachePolicyAttribute : Attribute
{
public string Key { get; }
public int TtlSeconds { get; init; } = 300;
public CacheStrategy Strategy { get; init; } = CacheStrategy.ReadThrough;
public string InvalidationEvent { get; init; } = "";
public bool SlidingExpiration { get; init; } = false;
public string Region { get; init; } = "default";
public CachePolicyAttribute(string key)
{
Key = key;
}
}
/// Links a method to a BenchmarkDotNet target with regression thresholds.
[AttributeUsage(AttributeTargets.Method, AllowMultiple = false)]
public sealed class BenchmarkTargetAttribute : Attribute
{
public string Method { get; }
public int MaxDurationMs { get; init; }
public long MaxAllocationsBytes { get; init; }
public double RegressionThresholdPercent { get; init; } = 10.0;
public int WarmupCount { get; init; } = 3;
public int IterationCount { get; init; } = 10;
public BenchmarkTargetAttribute(string method)
{
Method = method;
}
}// =================================================================
// Ops.Performance.Lib -- Performance DSL Attributes
// =================================================================
/// The kind of service-level indicator being measured.
public enum SliKind
{
Availability, // percentage of successful requests
Latency, // response time distribution
Throughput, // requests per second
ErrorRate, // percentage of failed requests
Saturation // resource utilization (CPU, memory, connections)
}
/// The unit of measurement for an SLI.
public enum SliMeasurement
{
Milliseconds,
Seconds,
Percentage,
RequestsPerSecond,
BytesPerSecond,
Count
}
/// Declares a service-level indicator — a concrete measurement
/// attached to a service or endpoint.
[AttributeUsage(AttributeTargets.Class | AttributeTargets.Method, AllowMultiple = true)]
public sealed class SliAttribute : Attribute
{
public string Name { get; }
public SliKind Kind { get; }
public SliMeasurement Measurement { get; }
public string Description { get; init; } = "";
public string[] Labels { get; init; } = [];
public SliAttribute(string name, SliKind kind, SliMeasurement measurement)
{
Name = name;
Kind = kind;
Measurement = measurement;
}
}
/// Declares a service-level objective — a target for an SLI
/// over a rolling window, with error budget and burn-rate alerting.
[AttributeUsage(AttributeTargets.Class, AllowMultiple = true)]
public sealed class SloAttribute : Attribute
{
public string Name { get; }
public string SliName { get; }
public double Target { get; }
public int WindowDays { get; init; } = 30;
public double ErrorBudget { get; init; }
public double BurnRateAlertThreshold { get; init; } = 14.4;
public string EscalationChannel { get; init; } = "";
public SloAttribute(string name, string sliName, double target)
{
Name = name;
SliName = sliName;
Target = target;
ErrorBudget = 1.0 - target;
}
}
/// Per-endpoint performance budget. Every public endpoint must have one.
[AttributeUsage(AttributeTargets.Method, AllowMultiple = false)]
public sealed class PerformanceBudgetAttribute : Attribute
{
public string Endpoint { get; }
public int P50Ms { get; init; }
public int P95Ms { get; init; }
public int P99Ms { get; init; }
public int MaxPayloadBytes { get; init; }
public string Owner { get; init; } = "";
public PerformanceBudgetAttribute(string endpoint)
{
Endpoint = endpoint;
}
}
/// Cache strategy for a data source or endpoint.
public enum CacheStrategy
{
ReadThrough, // read from cache; on miss, read from source and populate
WriteThrough, // write to source and cache simultaneously
WriteBehind, // write to cache immediately, source asynchronously
Aside // application manages cache explicitly
}
/// Declares a cache policy with an explicit invalidation event.
[AttributeUsage(AttributeTargets.Method | AttributeTargets.Class, AllowMultiple = true)]
public sealed class CachePolicyAttribute : Attribute
{
public string Key { get; }
public int TtlSeconds { get; init; } = 300;
public CacheStrategy Strategy { get; init; } = CacheStrategy.ReadThrough;
public string InvalidationEvent { get; init; } = "";
public bool SlidingExpiration { get; init; } = false;
public string Region { get; init; } = "default";
public CachePolicyAttribute(string key)
{
Key = key;
}
}
/// Links a method to a BenchmarkDotNet target with regression thresholds.
[AttributeUsage(AttributeTargets.Method, AllowMultiple = false)]
public sealed class BenchmarkTargetAttribute : Attribute
{
public string Method { get; }
public int MaxDurationMs { get; init; }
public long MaxAllocationsBytes { get; init; }
public double RegressionThresholdPercent { get; init; } = 10.0;
public int WarmupCount { get; init; } = 3;
public int IterationCount { get; init; } = 10;
public BenchmarkTargetAttribute(string method)
{
Method = method;
}
}Usage: Order API Performance Contract
An e-commerce order service with latency SLIs, SLOs with error budgets, per-endpoint budgets, and cache policies that invalidate on domain events.
// =================================================================
// OrderService Performance Contract
// =================================================================
[OpsTarget("order-service")]
// SLIs: what we measure
[Sli("order-latency", SliKind.Latency, SliMeasurement.Milliseconds,
Description = "End-to-end latency for order operations",
Labels = ["endpoint", "method", "status_code"])]
[Sli("order-availability", SliKind.Availability, SliMeasurement.Percentage,
Description = "Percentage of non-5xx responses")]
[Sli("order-throughput", SliKind.Throughput, SliMeasurement.RequestsPerSecond,
Description = "Orders processed per second")]
// SLOs: what we promise
[Slo("order-latency-slo", "order-latency", 0.999,
WindowDays = 30,
BurnRateAlertThreshold = 14.4,
EscalationChannel = "#order-oncall")]
[Slo("order-availability-slo", "order-availability", 0.999,
WindowDays = 30,
BurnRateAlertThreshold = 10.0,
EscalationChannel = "#order-oncall")]
public partial class OrderPerformanceContract
{
// Per-endpoint budgets
[PerformanceBudget("POST /api/orders",
P50Ms = 50, P95Ms = 150, P99Ms = 300, MaxPayloadBytes = 8192)]
public partial void CreateOrder();
[PerformanceBudget("GET /api/orders/{id}",
P50Ms = 20, P95Ms = 50, P99Ms = 100, MaxPayloadBytes = 4096)]
[CachePolicy("order:{id}",
TtlSeconds = 60,
Strategy = CacheStrategy.ReadThrough,
InvalidationEvent = nameof(OrderUpdatedEvent))]
public partial void GetOrder();
[PerformanceBudget("GET /api/orders?status={status}",
P50Ms = 80, P95Ms = 200, P99Ms = 400, MaxPayloadBytes = 32768)]
[CachePolicy("orders:list:{status}",
TtlSeconds = 30,
Strategy = CacheStrategy.Aside,
InvalidationEvent = nameof(OrderStatusChangedEvent),
SlidingExpiration = true)]
public partial void ListOrders();
// Benchmark targets for hot paths
[BenchmarkTarget(nameof(OrderPriceCalculator.CalculateTotal),
MaxDurationMs = 5, MaxAllocationsBytes = 1024,
RegressionThresholdPercent = 15.0)]
public partial void BenchmarkPriceCalculation();
[BenchmarkTarget(nameof(OrderValidator.Validate),
MaxDurationMs = 2, MaxAllocationsBytes = 512)]
public partial void BenchmarkOrderValidation();
}// =================================================================
// OrderService Performance Contract
// =================================================================
[OpsTarget("order-service")]
// SLIs: what we measure
[Sli("order-latency", SliKind.Latency, SliMeasurement.Milliseconds,
Description = "End-to-end latency for order operations",
Labels = ["endpoint", "method", "status_code"])]
[Sli("order-availability", SliKind.Availability, SliMeasurement.Percentage,
Description = "Percentage of non-5xx responses")]
[Sli("order-throughput", SliKind.Throughput, SliMeasurement.RequestsPerSecond,
Description = "Orders processed per second")]
// SLOs: what we promise
[Slo("order-latency-slo", "order-latency", 0.999,
WindowDays = 30,
BurnRateAlertThreshold = 14.4,
EscalationChannel = "#order-oncall")]
[Slo("order-availability-slo", "order-availability", 0.999,
WindowDays = 30,
BurnRateAlertThreshold = 10.0,
EscalationChannel = "#order-oncall")]
public partial class OrderPerformanceContract
{
// Per-endpoint budgets
[PerformanceBudget("POST /api/orders",
P50Ms = 50, P95Ms = 150, P99Ms = 300, MaxPayloadBytes = 8192)]
public partial void CreateOrder();
[PerformanceBudget("GET /api/orders/{id}",
P50Ms = 20, P95Ms = 50, P99Ms = 100, MaxPayloadBytes = 4096)]
[CachePolicy("order:{id}",
TtlSeconds = 60,
Strategy = CacheStrategy.ReadThrough,
InvalidationEvent = nameof(OrderUpdatedEvent))]
public partial void GetOrder();
[PerformanceBudget("GET /api/orders?status={status}",
P50Ms = 80, P95Ms = 200, P99Ms = 400, MaxPayloadBytes = 32768)]
[CachePolicy("orders:list:{status}",
TtlSeconds = 30,
Strategy = CacheStrategy.Aside,
InvalidationEvent = nameof(OrderStatusChangedEvent),
SlidingExpiration = true)]
public partial void ListOrders();
// Benchmark targets for hot paths
[BenchmarkTarget(nameof(OrderPriceCalculator.CalculateTotal),
MaxDurationMs = 5, MaxAllocationsBytes = 1024,
RegressionThresholdPercent = 15.0)]
public partial void BenchmarkPriceCalculation();
[BenchmarkTarget(nameof(OrderValidator.Validate),
MaxDurationMs = 2, MaxAllocationsBytes = 512)]
public partial void BenchmarkOrderValidation();
}One class. Five SLI/SLO definitions. Three endpoint budgets. Two cache policies with domain-event invalidation. Two benchmark targets with allocation limits. Zero ambiguity about what "fast enough" means.
BenchmarkConfig.g.cs
The generator produces a BenchmarkDotNet harness that compiles against the actual method signatures and fails if a regression exceeds the declared threshold.
// <auto-generated by Ops.Performance.Generator />
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Validators;
[MemoryDiagnoser]
[SimpleJob(warmupCount: 3, iterationCount: 10)]
public class OrderPerformanceBenchmarks
{
private OrderPriceCalculator _calculator = null!;
private OrderValidator _validator = null!;
private Order _testOrder = null!;
[GlobalSetup]
public void Setup()
{
_calculator = new OrderPriceCalculator();
_validator = new OrderValidator();
_testOrder = OrderTestData.CreateTypicalOrder();
}
[Benchmark]
[MaxDuration(milliseconds: 5)]
[MaxAllocations(bytes: 1024)]
public decimal CalculateTotal() => _calculator.CalculateTotal(_testOrder);
[Benchmark]
[MaxDuration(milliseconds: 2)]
[MaxAllocations(bytes: 512)]
public ValidationResult Validate() => _validator.Validate(_testOrder);
}
/// Regression gate: fails CI if mean exceeds threshold.
public class OrderPerformanceRegressionValidator : IValidator
{
private static readonly Dictionary<string, (int MaxMs, long MaxBytes, double Threshold)> _targets = new()
{
["CalculateTotal"] = (5, 1024, 0.15),
["Validate"] = (2, 512, 0.10),
};
public bool TreatsWarningsAsErrors => true;
public IEnumerable<ValidationError> Validate(ValidationParameters parameters)
{
// Compares current run against baseline stored in benchmarks/baseline.json
// Returns ValidationError for each method exceeding RegressionThresholdPercent
}
}// <auto-generated by Ops.Performance.Generator />
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Validators;
[MemoryDiagnoser]
[SimpleJob(warmupCount: 3, iterationCount: 10)]
public class OrderPerformanceBenchmarks
{
private OrderPriceCalculator _calculator = null!;
private OrderValidator _validator = null!;
private Order _testOrder = null!;
[GlobalSetup]
public void Setup()
{
_calculator = new OrderPriceCalculator();
_validator = new OrderValidator();
_testOrder = OrderTestData.CreateTypicalOrder();
}
[Benchmark]
[MaxDuration(milliseconds: 5)]
[MaxAllocations(bytes: 1024)]
public decimal CalculateTotal() => _calculator.CalculateTotal(_testOrder);
[Benchmark]
[MaxDuration(milliseconds: 2)]
[MaxAllocations(bytes: 512)]
public ValidationResult Validate() => _validator.Validate(_testOrder);
}
/// Regression gate: fails CI if mean exceeds threshold.
public class OrderPerformanceRegressionValidator : IValidator
{
private static readonly Dictionary<string, (int MaxMs, long MaxBytes, double Threshold)> _targets = new()
{
["CalculateTotal"] = (5, 1024, 0.15),
["Validate"] = (2, 512, 0.10),
};
public bool TreatsWarningsAsErrors => true;
public IEnumerable<ValidationError> Validate(ValidationParameters parameters)
{
// Compares current run against baseline stored in benchmarks/baseline.json
// Returns ValidationError for each method exceeding RegressionThresholdPercent
}
}CacheRegistration.g.cs
The generator wires up IDistributedCache or IMemoryCache with the declared policies and subscribes to domain events for invalidation.
// <auto-generated by Ops.Performance.Generator />
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.DependencyInjection;
public static class OrderCacheRegistration
{
public static IServiceCollection AddOrderCachePolicies(this IServiceCollection services)
{
services.AddSingleton<ICachePolicy>(new CachePolicy
{
Key = "order:{id}",
TtlSeconds = 60,
Strategy = CacheStrategy.ReadThrough,
SlidingExpiration = false,
Region = "default",
});
services.AddSingleton<ICachePolicy>(new CachePolicy
{
Key = "orders:list:{status}",
TtlSeconds = 30,
Strategy = CacheStrategy.Aside,
SlidingExpiration = true,
Region = "default",
});
// Subscribe to invalidation events
services.AddTransient<IEventHandler<OrderUpdatedEvent>, OrderCacheInvalidator>();
services.AddTransient<IEventHandler<OrderStatusChangedEvent>, OrderListCacheInvalidator>();
return services;
}
}
/// Invalidates "order:{id}" when OrderUpdatedEvent is raised.
public sealed class OrderCacheInvalidator : IEventHandler<OrderUpdatedEvent>
{
private readonly IDistributedCache _cache;
public OrderCacheInvalidator(IDistributedCache cache) => _cache = cache;
public async Task HandleAsync(OrderUpdatedEvent evt, CancellationToken ct)
{
await _cache.RemoveAsync($"order:{evt.OrderId}", ct);
}
}
/// Invalidates "orders:list:{status}" when OrderStatusChangedEvent is raised.
public sealed class OrderListCacheInvalidator : IEventHandler<OrderStatusChangedEvent>
{
private readonly IDistributedCache _cache;
public OrderListCacheInvalidator(IDistributedCache cache) => _cache = cache;
public async Task HandleAsync(OrderStatusChangedEvent evt, CancellationToken ct)
{
// Invalidate both old and new status lists
await _cache.RemoveAsync($"orders:list:{evt.OldStatus}", ct);
await _cache.RemoveAsync($"orders:list:{evt.NewStatus}", ct);
}
}// <auto-generated by Ops.Performance.Generator />
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.DependencyInjection;
public static class OrderCacheRegistration
{
public static IServiceCollection AddOrderCachePolicies(this IServiceCollection services)
{
services.AddSingleton<ICachePolicy>(new CachePolicy
{
Key = "order:{id}",
TtlSeconds = 60,
Strategy = CacheStrategy.ReadThrough,
SlidingExpiration = false,
Region = "default",
});
services.AddSingleton<ICachePolicy>(new CachePolicy
{
Key = "orders:list:{status}",
TtlSeconds = 30,
Strategy = CacheStrategy.Aside,
SlidingExpiration = true,
Region = "default",
});
// Subscribe to invalidation events
services.AddTransient<IEventHandler<OrderUpdatedEvent>, OrderCacheInvalidator>();
services.AddTransient<IEventHandler<OrderStatusChangedEvent>, OrderListCacheInvalidator>();
return services;
}
}
/// Invalidates "order:{id}" when OrderUpdatedEvent is raised.
public sealed class OrderCacheInvalidator : IEventHandler<OrderUpdatedEvent>
{
private readonly IDistributedCache _cache;
public OrderCacheInvalidator(IDistributedCache cache) => _cache = cache;
public async Task HandleAsync(OrderUpdatedEvent evt, CancellationToken ct)
{
await _cache.RemoveAsync($"order:{evt.OrderId}", ct);
}
}
/// Invalidates "orders:list:{status}" when OrderStatusChangedEvent is raised.
public sealed class OrderListCacheInvalidator : IEventHandler<OrderStatusChangedEvent>
{
private readonly IDistributedCache _cache;
public OrderListCacheInvalidator(IDistributedCache cache) => _cache = cache;
public async Task HandleAsync(OrderStatusChangedEvent evt, CancellationToken ct)
{
// Invalidate both old and new status lists
await _cache.RemoveAsync($"orders:list:{evt.OldStatus}", ct);
await _cache.RemoveAsync($"orders:list:{evt.NewStatus}", ct);
}
}InProcess SLO Tracker
A lightweight in-memory SLO tracker for development and test environments.
// <auto-generated by Ops.Performance.Generator />
public sealed class OrderSloTracker : ISloTracker
{
private readonly ConcurrentDictionary<string, SliWindow> _windows = new();
public void RecordLatency(string endpoint, double ms)
{
var window = _windows.GetOrAdd("order-latency", _ => new SliWindow(TimeSpan.FromDays(30)));
window.Record(ms);
}
public SloStatus GetStatus(string sloName) => sloName switch
{
"order-latency-slo" => Evaluate("order-latency", target: 0.999, burnRateThreshold: 14.4),
"order-availability-slo" => Evaluate("order-availability", target: 0.999, burnRateThreshold: 10.0),
_ => SloStatus.Unknown,
};
public double GetErrorBudgetRemaining(string sloName)
{
var status = GetStatus(sloName);
return status.ErrorBudgetRemainingPercent;
}
}// <auto-generated by Ops.Performance.Generator />
public sealed class OrderSloTracker : ISloTracker
{
private readonly ConcurrentDictionary<string, SliWindow> _windows = new();
public void RecordLatency(string endpoint, double ms)
{
var window = _windows.GetOrAdd("order-latency", _ => new SliWindow(TimeSpan.FromDays(30)));
window.Record(ms);
}
public SloStatus GetStatus(string sloName) => sloName switch
{
"order-latency-slo" => Evaluate("order-latency", target: 0.999, burnRateThreshold: 14.4),
"order-availability-slo" => Evaluate("order-availability", target: 0.999, burnRateThreshold: 10.0),
_ => SloStatus.Unknown,
};
public double GetErrorBudgetRemaining(string sloName)
{
var status = GetStatus(sloName);
return status.ErrorBudgetRemainingPercent;
}
}prometheus-slo-rules.yaml
Multi-window, multi-burn-rate alert rules following the Google SRE book pattern.
# Auto-generated by Ops.Performance.Generator
# Source: OrderPerformanceContract
groups:
- name: order-service-slo-rules
rules:
# ── SLI Recording Rules ──────────────────────────────────
- record: order_service:latency:p99_5m
expr: |
histogram_quantile(0.99,
rate(http_request_duration_seconds_bucket{service="order-service"}[5m]))
- record: order_service:availability:ratio_5m
expr: |
1 - (
sum(rate(http_requests_total{service="order-service", status=~"5.."}[5m]))
/
sum(rate(http_requests_total{service="order-service"}[5m]))
)
- record: order_service:throughput:rps_5m
expr: |
sum(rate(http_requests_total{service="order-service"}[5m]))
# ── Error Budget ─────────────────────────────────────────
- record: order_service:latency_slo:error_budget_remaining
expr: |
1 - (
(1 - order_service:latency:slo_compliance_30d)
/
(1 - 0.999)
)
# ── Multi-window Burn Rate Alerts ────────────────────────
# Fast burn: 14.4x in 1h (page)
- alert: OrderLatencyBurnRateCritical
expr: |
(
order_service:latency:error_ratio_1h > (14.4 * 0.001)
and
order_service:latency:error_ratio_5m > (14.4 * 0.001)
)
for: 2m
labels:
severity: critical
slo: order-latency-slo
escalation: "#order-oncall"
annotations:
summary: "Order latency SLO burn rate critical (14.4x)"
description: |
Error budget for order-latency-slo is burning at {{ $value | humanizePercentage }}
of the 30-day budget per hour. At this rate, the budget will be exhausted in
{{ printf "%.1f" (divf 1.0 (mulf $value 24)) }} days.
# Slow burn: 3x in 3d (ticket)
- alert: OrderLatencyBurnRateSlow
expr: |
(
order_service:latency:error_ratio_3d > (3.0 * 0.001)
and
order_service:latency:error_ratio_6h > (3.0 * 0.001)
)
for: 1h
labels:
severity: warning
slo: order-latency-slo
annotations:
summary: "Order latency SLO burn rate elevated (3x)"
# ── Availability SLO ─────────────────────────────────────
- alert: OrderAvailabilityBurnRateCritical
expr: |
(
order_service:availability:error_ratio_1h > (10.0 * 0.001)
and
order_service:availability:error_ratio_5m > (10.0 * 0.001)
)
for: 2m
labels:
severity: critical
slo: order-availability-slo
escalation: "#order-oncall"# Auto-generated by Ops.Performance.Generator
# Source: OrderPerformanceContract
groups:
- name: order-service-slo-rules
rules:
# ── SLI Recording Rules ──────────────────────────────────
- record: order_service:latency:p99_5m
expr: |
histogram_quantile(0.99,
rate(http_request_duration_seconds_bucket{service="order-service"}[5m]))
- record: order_service:availability:ratio_5m
expr: |
1 - (
sum(rate(http_requests_total{service="order-service", status=~"5.."}[5m]))
/
sum(rate(http_requests_total{service="order-service"}[5m]))
)
- record: order_service:throughput:rps_5m
expr: |
sum(rate(http_requests_total{service="order-service"}[5m]))
# ── Error Budget ─────────────────────────────────────────
- record: order_service:latency_slo:error_budget_remaining
expr: |
1 - (
(1 - order_service:latency:slo_compliance_30d)
/
(1 - 0.999)
)
# ── Multi-window Burn Rate Alerts ────────────────────────
# Fast burn: 14.4x in 1h (page)
- alert: OrderLatencyBurnRateCritical
expr: |
(
order_service:latency:error_ratio_1h > (14.4 * 0.001)
and
order_service:latency:error_ratio_5m > (14.4 * 0.001)
)
for: 2m
labels:
severity: critical
slo: order-latency-slo
escalation: "#order-oncall"
annotations:
summary: "Order latency SLO burn rate critical (14.4x)"
description: |
Error budget for order-latency-slo is burning at {{ $value | humanizePercentage }}
of the 30-day budget per hour. At this rate, the budget will be exhausted in
{{ printf "%.1f" (divf 1.0 (mulf $value 24)) }} days.
# Slow burn: 3x in 3d (ticket)
- alert: OrderLatencyBurnRateSlow
expr: |
(
order_service:latency:error_ratio_3d > (3.0 * 0.001)
and
order_service:latency:error_ratio_6h > (3.0 * 0.001)
)
for: 1h
labels:
severity: warning
slo: order-latency-slo
annotations:
summary: "Order latency SLO burn rate elevated (3x)"
# ── Availability SLO ─────────────────────────────────────
- alert: OrderAvailabilityBurnRateCritical
expr: |
(
order_service:availability:error_ratio_1h > (10.0 * 0.001)
and
order_service:availability:error_ratio_5m > (10.0 * 0.001)
)
for: 2m
labels:
severity: critical
slo: order-availability-slo
escalation: "#order-oncall"grafana-slo-dashboard.json
A Grafana dashboard with panels for each SLI, error budget burn-down, and per-endpoint budget compliance.
{
"dashboard": {
"title": "Order Service SLO Dashboard",
"uid": "order-service-slo",
"panels": [
{
"title": "Latency SLO (99.9% target, 30d window)",
"type": "gauge",
"targets": [{
"expr": "order_service:latency:slo_compliance_30d",
"legendFormat": "Compliance"
}],
"fieldConfig": {
"defaults": {
"thresholds": {
"steps": [
{ "value": 0, "color": "red" },
{ "value": 0.995, "color": "yellow" },
{ "value": 0.999, "color": "green" }
]
},
"min": 0.99, "max": 1.0
}
}
},
{
"title": "Error Budget Remaining",
"type": "timeseries",
"targets": [{
"expr": "order_service:latency_slo:error_budget_remaining",
"legendFormat": "Budget %"
}]
},
{
"title": "Per-Endpoint P95 vs Budget",
"type": "table",
"targets": [{
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{service=\"order-service\"}[5m]))",
"legendFormat": "{{ endpoint }}"
}],
"transformations": [{
"id": "addFieldFromCalculation",
"options": {
"mode": "binary",
"fieldName": "budget_ms",
"values": {
"POST /api/orders": 150,
"GET /api/orders/{id}": 50,
"GET /api/orders?status={status}": 200
}
}
}]
}
]
}
}{
"dashboard": {
"title": "Order Service SLO Dashboard",
"uid": "order-service-slo",
"panels": [
{
"title": "Latency SLO (99.9% target, 30d window)",
"type": "gauge",
"targets": [{
"expr": "order_service:latency:slo_compliance_30d",
"legendFormat": "Compliance"
}],
"fieldConfig": {
"defaults": {
"thresholds": {
"steps": [
{ "value": 0, "color": "red" },
{ "value": 0.995, "color": "yellow" },
{ "value": 0.999, "color": "green" }
]
},
"min": 0.99, "max": 1.0
}
}
},
{
"title": "Error Budget Remaining",
"type": "timeseries",
"targets": [{
"expr": "order_service:latency_slo:error_budget_remaining",
"legendFormat": "Budget %"
}]
},
{
"title": "Per-Endpoint P95 vs Budget",
"type": "table",
"targets": [{
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{service=\"order-service\"}[5m]))",
"legendFormat": "{{ endpoint }}"
}],
"transformations": [{
"id": "addFieldFromCalculation",
"options": {
"mode": "binary",
"fieldName": "budget_ms",
"values": {
"POST /api/orders": 150,
"GET /api/orders/{id}": 50,
"GET /api/orders?status={status}": 200
}
}
}]
}
]
}
}Tier 3: Cloud
The Cloud tier produces production SLO monitoring integrated with the cloud provider's native monitoring.
# Auto-generated by Ops.Performance.Generator
# terraform/performance/order-slo/main.tf
resource "azurerm_monitor_metric_alert" "order_latency_slo" {
name = "order-latency-slo-burn-rate"
resource_group_name = var.resource_group_name
scopes = [var.app_service_id]
description = "Order latency SLO burn rate exceeds 14.4x threshold"
severity = 0
frequency = "PT1M"
window_size = "PT1H"
criteria {
metric_namespace = "Microsoft.Web/sites"
metric_name = "HttpResponseTime"
aggregation = "Average"
operator = "GreaterThan"
threshold = 0.300 # P99 budget = 300ms
}
action {
action_group_id = var.oncall_action_group_id
}
}
resource "azurerm_monitor_scheduled_query_rules_alert_v2" "order_error_budget" {
name = "order-error-budget-alert"
resource_group_name = var.resource_group_name
location = var.location
scopes = [var.log_analytics_workspace_id]
description = "Order service error budget consumed past 50%"
criteria {
query = <<-QUERY
AppRequests
| where AppRoleName == "order-service"
| where TimeGenerated > ago(30d)
| summarize
total = count(),
failures = countif(ResultCode >= 500 or Duration > 300)
| extend error_ratio = todouble(failures) / todouble(total)
| extend budget_consumed = error_ratio / 0.001
| where budget_consumed > 0.5
QUERY
time_aggregation_method = "Count"
operator = "GreaterThan"
threshold = 0
}
action {
action_groups = [var.oncall_action_group_id]
}
}
resource "grafana_dashboard" "order_slo" {
config_json = file("${path.module}/grafana-slo-dashboard.json")
folder = var.grafana_slo_folder_id
}# Auto-generated by Ops.Performance.Generator
# terraform/performance/order-slo/main.tf
resource "azurerm_monitor_metric_alert" "order_latency_slo" {
name = "order-latency-slo-burn-rate"
resource_group_name = var.resource_group_name
scopes = [var.app_service_id]
description = "Order latency SLO burn rate exceeds 14.4x threshold"
severity = 0
frequency = "PT1M"
window_size = "PT1H"
criteria {
metric_namespace = "Microsoft.Web/sites"
metric_name = "HttpResponseTime"
aggregation = "Average"
operator = "GreaterThan"
threshold = 0.300 # P99 budget = 300ms
}
action {
action_group_id = var.oncall_action_group_id
}
}
resource "azurerm_monitor_scheduled_query_rules_alert_v2" "order_error_budget" {
name = "order-error-budget-alert"
resource_group_name = var.resource_group_name
location = var.location
scopes = [var.log_analytics_workspace_id]
description = "Order service error budget consumed past 50%"
criteria {
query = <<-QUERY
AppRequests
| where AppRoleName == "order-service"
| where TimeGenerated > ago(30d)
| summarize
total = count(),
failures = countif(ResultCode >= 500 or Duration > 300)
| extend error_ratio = todouble(failures) / todouble(total)
| extend budget_consumed = error_ratio / 0.001
| where budget_consumed > 0.5
QUERY
time_aggregation_method = "Count"
operator = "GreaterThan"
threshold = 0
}
action {
action_groups = [var.oncall_action_group_id]
}
}
resource "grafana_dashboard" "order_slo" {
config_json = file("${path.module}/grafana-slo-dashboard.json")
folder = var.grafana_slo_folder_id
}Analyzer Diagnostics
| ID | Severity | Rule | Example |
|---|---|---|---|
| PRF001 | Error | SLO references nonexistent SLI | [Slo("x", "missing-sli", 0.99)] -- no [Sli] with name "missing-sli" in scope |
| PRF002 | Warning | Public endpoint without performance budget | Controller action GetOrderHistory has [HttpGet] but no [PerformanceBudget] |
| PRF003 | Error | Cache policy without invalidation event | [CachePolicy("key", TtlSeconds = 300)] -- InvalidationEvent is empty string |
| PRF004 | Warning | Performance budget P99 exceeds SLO target | P99 of 500ms on an endpoint when the service SLO implies max 300ms |
| PRF005 | Info | Benchmark target method signature changed | [BenchmarkTarget("OldMethodName")] -- method was renamed or parameters changed |
PRF003 is the one that catches the most bugs. Every cache must declare how it gets invalidated. No exception. If the answer is "it just expires after the TTL," the developer must explicitly set InvalidationEvent = "TTL_ONLY" to acknowledge the decision.
Performance to Observability
Every [Sli] maps to an [OpsMetric] in the Observability DSL. The generator verifies that a Prometheus metric exists for each SLI:
// Ops.Performance declares the SLI
[Sli("order-latency", SliKind.Latency, SliMeasurement.Milliseconds)]
// Ops.Observability must have a matching metric — PRF006 fires if missing
[OpsMetric("http_request_duration_seconds", MetricKind.Histogram,
Labels = ["service", "endpoint", "method", "status_code"])]// Ops.Performance declares the SLI
[Sli("order-latency", SliKind.Latency, SliMeasurement.Milliseconds)]
// Ops.Observability must have a matching metric — PRF006 fires if missing
[OpsMetric("http_request_duration_seconds", MetricKind.Histogram,
Labels = ["service", "endpoint", "method", "status_code"])]Performance to Requirements
Performance budgets link to features via OpsRequirementLink:
[PerformanceBudget("POST /api/orders", P50Ms = 50, P95Ms = 150, P99Ms = 300)]
[OpsRequirementLink("FEATURE-789", "Order creation must complete within 300ms at P99")]
public partial void CreateOrder();[PerformanceBudget("POST /api/orders", P50Ms = 50, P95Ms = 150, P99Ms = 300)]
[OpsRequirementLink("FEATURE-789", "Order creation must complete within 300ms at P99")]
public partial void CreateOrder();The compliance report shows which features have performance budgets and which do not. If a feature acceptance criterion mentions latency, the analyzer warns if no [PerformanceBudget] matches.
Performance to LoadTesting
Every [PerformanceBudget] defines the pass/fail criteria for the load test of that endpoint. The LoadTesting DSL reads the budget and generates k6 thresholds that match:
// PerformanceBudget says P95 <= 150ms for POST /api/orders
// LoadTesting generator produces:
// thresholds: { 'http_req_duration{endpoint:POST /api/orders}': ['p(95)<150'] }// PerformanceBudget says P95 <= 150ms for POST /api/orders
// LoadTesting generator produces:
// thresholds: { 'http_req_duration{endpoint:POST /api/orders}': ['p(95)<150'] }No manual synchronization. Change the budget attribute, the load test threshold updates on next build.
Performance to Resilience
When a [PerformanceBudget] P99 is declared, the generator verifies that the corresponding endpoint has a [CircuitBreaker] with a timeout that does not exceed the P99. If a circuit breaker timeout is 5 seconds but the P99 budget is 300ms, the analyzer flags the inconsistency: the circuit breaker will never trip before the budget is blown.