Container Tier -- Docker Compose Generation
The InProcess walkthrough proved the circuit breaker works with simulated faults inside dotnet test. But simulated timeouts are not real network latency. A Task.Delay does not exercise TCP retransmission, connection pool exhaustion, or driver timeout handling. For that, we need a real database connection with real network degradation.
This walkthrough generates a complete Docker Compose environment from C# attributes: PostgreSQL, RabbitMQ, Toxiproxy sidecars, and a test runner. The generator produces every file. You write the experiment declaration and the test. Nothing else.
Step 1: The Scenario
OrderService depends on PostgreSQL for persistence and RabbitMQ for event publishing. The question: what happens when the database connection has 500ms added latency and 10% packet loss?
This cannot be answered with InProcess chaos. A DI decorator can delay a method call, but it cannot simulate TCP packet loss on a Npgsql connection. The driver has its own connection pool, its own timeout logic, its own retry behavior. The only way to test this is to put a network proxy between the application and the database and degrade the actual TCP stream.
Toxiproxy does exactly this. It sits between the test runner and the upstream service, and it introduces configurable network faults: latency, bandwidth limits, packet loss, connection resets, slow-close, and more. The Container tier generates the Toxiproxy configuration alongside the Docker Compose file.
Step 2: Declare the Experiment
// DatabaseLatencyChaos.cs
using Ops.Chaos;
[ChaosExperiment("DatabaseLatency", Tier = OpsExecutionTier.Container,
Hypothesis = "Order queries complete within 2s even with 500ms DB latency")]
[Container("postgres", Image = "postgres:16", Port = 5432,
Environment = new[] { "POSTGRES_PASSWORD=test", "POSTGRES_DB=orders" })]
[Container("rabbitmq", Image = "rabbitmq:3-management", Port = 5672)]
[ToxiProxy("postgres-proxy", Upstream = "postgres:5432", ListenPort = 15432)]
[ToxiProxy("rabbitmq-proxy", Upstream = "rabbitmq:5672", ListenPort = 15672)]
[FaultInjection(FaultKind.Latency, Duration = "500ms", Target = "postgres-proxy")]
[FaultInjection(FaultKind.PacketLoss, Probability = 0.1, Target = "postgres-proxy")]
[SteadyStateProbe(Metric = "order.query.p99", Expected = "< 2000ms")]
public partial class DatabaseLatencyChaos { }// DatabaseLatencyChaos.cs
using Ops.Chaos;
[ChaosExperiment("DatabaseLatency", Tier = OpsExecutionTier.Container,
Hypothesis = "Order queries complete within 2s even with 500ms DB latency")]
[Container("postgres", Image = "postgres:16", Port = 5432,
Environment = new[] { "POSTGRES_PASSWORD=test", "POSTGRES_DB=orders" })]
[Container("rabbitmq", Image = "rabbitmq:3-management", Port = 5672)]
[ToxiProxy("postgres-proxy", Upstream = "postgres:5432", ListenPort = 15432)]
[ToxiProxy("rabbitmq-proxy", Upstream = "rabbitmq:5672", ListenPort = 15672)]
[FaultInjection(FaultKind.Latency, Duration = "500ms", Target = "postgres-proxy")]
[FaultInjection(FaultKind.PacketLoss, Probability = 0.1, Target = "postgres-proxy")]
[SteadyStateProbe(Metric = "order.query.p99", Expected = "< 2000ms")]
public partial class DatabaseLatencyChaos { }The [Container] attributes declare the infrastructure. The [ToxiProxy] attributes declare the proxies. The [FaultInjection] attributes declare the network faults and which proxy they target. The generator reads all of this and produces the complete Docker Compose file, the Toxiproxy configuration, and the xUnit test fixture.
Note that the faults target postgres-proxy, not postgres directly. RabbitMQ is proxied too (via rabbitmq-proxy), but no faults are injected on it in this experiment. It is there to prove that the application correctly routes through the proxy and that event publishing still works under database degradation.
Step 3: Generated docker-compose.chaos.yaml
The source generator produces this file at generated/docker-compose.chaos.yaml:
# Auto-generated by Ops.Chaos.Generator from DatabaseLatencyChaos
# Do not edit. Regenerate by building the project.
version: "3.8"
networks:
chaos-net:
driver: bridge
internal: true
services:
postgres:
image: postgres:16
environment:
POSTGRES_PASSWORD: test
POSTGRES_DB: orders
ports:
- "5432"
networks:
- chaos-net
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 2s
timeout: 5s
retries: 10
volumes:
- postgres-data:/var/lib/postgresql/data
rabbitmq:
image: rabbitmq:3-management
ports:
- "5672"
- "15672"
networks:
- chaos-net
healthcheck:
test: ["CMD-SHELL", "rabbitmq-diagnostics -q ping"]
interval: 5s
timeout: 10s
retries: 10
toxiproxy:
image: ghcr.io/shopify/toxiproxy:2.9.0
ports:
- "8474:8474" # Toxiproxy API
- "15432:15432" # postgres-proxy listen port
- "15672:15672" # rabbitmq-proxy listen port
networks:
- chaos-net
volumes:
- ./toxiproxy-config.json:/config/toxiproxy.json
command: ["-config", "/config/toxiproxy.json"]
depends_on:
postgres:
condition: service_healthy
rabbitmq:
condition: service_healthy
test-runner:
build:
context: ../..
dockerfile: Dockerfile.test
environment:
CONNECTION_STRING: "Host=toxiproxy;Port=15432;Database=orders;Username=postgres;Password=test;Timeout=10"
RABBITMQ_CONNECTION: "amqp://guest:guest@toxiproxy:15672"
TOXIPROXY_API: "http://toxiproxy:8474"
networks:
- chaos-net
depends_on:
toxiproxy:
condition: service_started
volumes:
postgres-data:# Auto-generated by Ops.Chaos.Generator from DatabaseLatencyChaos
# Do not edit. Regenerate by building the project.
version: "3.8"
networks:
chaos-net:
driver: bridge
internal: true
services:
postgres:
image: postgres:16
environment:
POSTGRES_PASSWORD: test
POSTGRES_DB: orders
ports:
- "5432"
networks:
- chaos-net
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 2s
timeout: 5s
retries: 10
volumes:
- postgres-data:/var/lib/postgresql/data
rabbitmq:
image: rabbitmq:3-management
ports:
- "5672"
- "15672"
networks:
- chaos-net
healthcheck:
test: ["CMD-SHELL", "rabbitmq-diagnostics -q ping"]
interval: 5s
timeout: 10s
retries: 10
toxiproxy:
image: ghcr.io/shopify/toxiproxy:2.9.0
ports:
- "8474:8474" # Toxiproxy API
- "15432:15432" # postgres-proxy listen port
- "15672:15672" # rabbitmq-proxy listen port
networks:
- chaos-net
volumes:
- ./toxiproxy-config.json:/config/toxiproxy.json
command: ["-config", "/config/toxiproxy.json"]
depends_on:
postgres:
condition: service_healthy
rabbitmq:
condition: service_healthy
test-runner:
build:
context: ../..
dockerfile: Dockerfile.test
environment:
CONNECTION_STRING: "Host=toxiproxy;Port=15432;Database=orders;Username=postgres;Password=test;Timeout=10"
RABBITMQ_CONNECTION: "amqp://guest:guest@toxiproxy:15672"
TOXIPROXY_API: "http://toxiproxy:8474"
networks:
- chaos-net
depends_on:
toxiproxy:
condition: service_started
volumes:
postgres-data:Every connection string points through Toxiproxy, not directly to the upstream service. The test-runner container gets toxiproxy:15432 for PostgreSQL, not postgres:5432. The test code sees a normal Npgsql connection string -- it has no idea that a proxy is in the middle.
The network is marked internal: true. Nothing leaks to the host network. The chaos environment is fully isolated.
Step 4: Generated toxiproxy-config.json
[
{
"name": "postgres-proxy",
"listen": "0.0.0.0:15432",
"upstream": "postgres:5432",
"enabled": true
},
{
"name": "rabbitmq-proxy",
"listen": "0.0.0.0:15672",
"upstream": "rabbitmq:5672",
"enabled": true
}
][
{
"name": "postgres-proxy",
"listen": "0.0.0.0:15432",
"upstream": "postgres:5432",
"enabled": true
},
{
"name": "rabbitmq-proxy",
"listen": "0.0.0.0:15672",
"upstream": "rabbitmq:5672",
"enabled": true
}
]This is the initial Toxiproxy configuration. The proxies are created at startup with no toxics -- clean passthrough. The test fixture adds the faults dynamically via the Toxiproxy REST API. This allows the test to add, remove, and modify faults mid-test.
Step 5: Generated TestInfraFixture.g.cs
The generator produces an xUnit collection fixture that manages the Docker Compose lifecycle:
// <auto-generated by Ops.Chaos.Generator />
using System.Diagnostics;
namespace OrderService.Chaos.Tests;
[CollectionDefinition("DatabaseLatencyChaos")]
public class DatabaseLatencyChaosCollection : ICollectionFixture<DatabaseLatencyChaosFixture> { }
public class DatabaseLatencyChaosFixture : IAsyncLifetime
{
private readonly string _composeFile;
public string ConnectionString { get; private set; } = null!;
public string RabbitMqConnection { get; private set; } = null!;
public ToxiProxyClient ToxiProxy { get; private set; } = null!;
public DatabaseLatencyChaosFixture()
{
_composeFile = Path.Combine(
AppContext.BaseDirectory, "generated", "docker-compose.chaos.yaml");
}
public async Task InitializeAsync()
{
// Start the environment
await RunProcess("docker", $"compose -f {_composeFile} up -d --wait");
// Resolve the mapped ports
var toxiProxyPort = await GetMappedPort("toxiproxy", 8474);
var postgresProxyPort = await GetMappedPort("toxiproxy", 15432);
var rabbitProxyPort = await GetMappedPort("toxiproxy", 15672);
ConnectionString =
$"Host=localhost;Port={postgresProxyPort};Database=orders;" +
$"Username=postgres;Password=test;Timeout=10;Command Timeout=10";
RabbitMqConnection =
$"amqp://guest:guest@localhost:{rabbitProxyPort}";
ToxiProxy = new ToxiProxyClient($"http://localhost:{toxiProxyPort}");
// Wait for Toxiproxy proxies to be ready
await ToxiProxy.WaitForProxy("postgres-proxy", timeout: TimeSpan.FromSeconds(30));
await ToxiProxy.WaitForProxy("rabbitmq-proxy", timeout: TimeSpan.FromSeconds(30));
// Apply the declared faults
await ToxiProxy.Proxy("postgres-proxy")
.AddLatency(milliseconds: 500, jitter: 100)
.AddPacketLoss(probability: 0.1)
.Apply();
}
public async Task DisposeAsync()
{
// Collect logs before teardown
await RunProcess("docker",
$"compose -f {_composeFile} logs --no-color > chaos-logs.txt 2>&1");
// Tear down
await RunProcess("docker", $"compose -f {_composeFile} down -v");
}
private static async Task RunProcess(string fileName, string arguments)
{
var psi = new ProcessStartInfo(fileName, arguments)
{
RedirectStandardOutput = true,
RedirectStandardError = true,
UseShellExecute = false
};
using var process = Process.Start(psi)!;
await process.WaitForExitAsync();
if (process.ExitCode != 0)
{
var stderr = await process.StandardError.ReadToEndAsync();
throw new InvalidOperationException(
$"Process '{fileName} {arguments}' failed with exit code {process.ExitCode}: {stderr}");
}
}
private static async Task<int> GetMappedPort(string service, int containerPort)
{
var psi = new ProcessStartInfo("docker",
$"compose port {service} {containerPort}")
{
RedirectStandardOutput = true,
UseShellExecute = false
};
using var process = Process.Start(psi)!;
var output = await process.StandardOutput.ReadToEndAsync();
await process.WaitForExitAsync();
// Output format: 0.0.0.0:12345
var port = output.Trim().Split(':').Last();
return int.Parse(port);
}
}// <auto-generated by Ops.Chaos.Generator />
using System.Diagnostics;
namespace OrderService.Chaos.Tests;
[CollectionDefinition("DatabaseLatencyChaos")]
public class DatabaseLatencyChaosCollection : ICollectionFixture<DatabaseLatencyChaosFixture> { }
public class DatabaseLatencyChaosFixture : IAsyncLifetime
{
private readonly string _composeFile;
public string ConnectionString { get; private set; } = null!;
public string RabbitMqConnection { get; private set; } = null!;
public ToxiProxyClient ToxiProxy { get; private set; } = null!;
public DatabaseLatencyChaosFixture()
{
_composeFile = Path.Combine(
AppContext.BaseDirectory, "generated", "docker-compose.chaos.yaml");
}
public async Task InitializeAsync()
{
// Start the environment
await RunProcess("docker", $"compose -f {_composeFile} up -d --wait");
// Resolve the mapped ports
var toxiProxyPort = await GetMappedPort("toxiproxy", 8474);
var postgresProxyPort = await GetMappedPort("toxiproxy", 15432);
var rabbitProxyPort = await GetMappedPort("toxiproxy", 15672);
ConnectionString =
$"Host=localhost;Port={postgresProxyPort};Database=orders;" +
$"Username=postgres;Password=test;Timeout=10;Command Timeout=10";
RabbitMqConnection =
$"amqp://guest:guest@localhost:{rabbitProxyPort}";
ToxiProxy = new ToxiProxyClient($"http://localhost:{toxiProxyPort}");
// Wait for Toxiproxy proxies to be ready
await ToxiProxy.WaitForProxy("postgres-proxy", timeout: TimeSpan.FromSeconds(30));
await ToxiProxy.WaitForProxy("rabbitmq-proxy", timeout: TimeSpan.FromSeconds(30));
// Apply the declared faults
await ToxiProxy.Proxy("postgres-proxy")
.AddLatency(milliseconds: 500, jitter: 100)
.AddPacketLoss(probability: 0.1)
.Apply();
}
public async Task DisposeAsync()
{
// Collect logs before teardown
await RunProcess("docker",
$"compose -f {_composeFile} logs --no-color > chaos-logs.txt 2>&1");
// Tear down
await RunProcess("docker", $"compose -f {_composeFile} down -v");
}
private static async Task RunProcess(string fileName, string arguments)
{
var psi = new ProcessStartInfo(fileName, arguments)
{
RedirectStandardOutput = true,
RedirectStandardError = true,
UseShellExecute = false
};
using var process = Process.Start(psi)!;
await process.WaitForExitAsync();
if (process.ExitCode != 0)
{
var stderr = await process.StandardError.ReadToEndAsync();
throw new InvalidOperationException(
$"Process '{fileName} {arguments}' failed with exit code {process.ExitCode}: {stderr}");
}
}
private static async Task<int> GetMappedPort(string service, int containerPort)
{
var psi = new ProcessStartInfo("docker",
$"compose port {service} {containerPort}")
{
RedirectStandardOutput = true,
UseShellExecute = false
};
using var process = Process.Start(psi)!;
var output = await process.StandardOutput.ReadToEndAsync();
await process.WaitForExitAsync();
// Output format: 0.0.0.0:12345
var port = output.Trim().Split(':').Last();
return int.Parse(port);
}
}The fixture handles everything: starting Docker Compose, waiting for health checks, resolving mapped ports, configuring Toxiproxy faults, and tearing down with log collection. The test class gets a ready-to-use ConnectionString that routes through the latent proxy and a ToxiProxy client for mid-test fault manipulation.
Step 6: Generated ToxiProxyClient.g.cs
The generator also produces a typed Toxiproxy client with a fluent API:
// <auto-generated by Ops.Chaos.Generator />
using System.Net.Http.Json;
namespace OrderService.Chaos.Tests;
public class ToxiProxyClient
{
private readonly HttpClient _http;
public ToxiProxyClient(string baseUrl)
{
_http = new HttpClient { BaseAddress = new Uri(baseUrl) };
}
public ToxicBuilder Proxy(string name) => new(_http, name);
public async Task WaitForProxy(string name, TimeSpan timeout)
{
var deadline = DateTime.UtcNow + timeout;
while (DateTime.UtcNow < deadline)
{
try
{
var response = await _http.GetAsync($"/proxies/{name}");
if (response.IsSuccessStatusCode) return;
}
catch { /* retry */ }
await Task.Delay(500);
}
throw new TimeoutException($"Toxiproxy proxy '{name}' not ready after {timeout}");
}
public async Task RemoveAllToxics(string proxyName)
{
var response = await _http.GetFromJsonAsync<ToxiProxyInfo>($"/proxies/{proxyName}");
if (response?.Toxics != null)
{
foreach (var toxic in response.Toxics)
{
await _http.DeleteAsync($"/proxies/{proxyName}/toxics/{toxic.Name}");
}
}
}
private record ToxiProxyInfo(string Name, ToxicInfo[]? Toxics);
private record ToxicInfo(string Name, string Type);
}
public class ToxicBuilder
{
private readonly HttpClient _http;
private readonly string _proxyName;
private readonly List<object> _toxics = new();
public ToxicBuilder(HttpClient http, string proxyName)
{
_http = http;
_proxyName = proxyName;
}
public ToxicBuilder AddLatency(int milliseconds, int jitter = 0)
{
_toxics.Add(new
{
name = $"latency_{milliseconds}ms",
type = "latency",
stream = "downstream",
toxicity = 1.0,
attributes = new { latency = milliseconds, jitter }
});
return this;
}
public ToxicBuilder AddPacketLoss(double probability)
{
_toxics.Add(new
{
name = $"packet_loss_{(int)(probability * 100)}pct",
type = "timeout",
stream = "downstream",
toxicity = probability,
attributes = new { timeout = 0 }
});
return this;
}
public ToxicBuilder AddBandwidthLimit(long bytesPerSecond)
{
_toxics.Add(new
{
name = $"bandwidth_{bytesPerSecond}bps",
type = "bandwidth",
stream = "downstream",
toxicity = 1.0,
attributes = new { rate = bytesPerSecond }
});
return this;
}
public ToxicBuilder AddConnectionReset(int timeout = 0)
{
_toxics.Add(new
{
name = "reset_peer",
type = "reset_peer",
stream = "downstream",
toxicity = 1.0,
attributes = new { timeout }
});
return this;
}
public async Task Apply()
{
foreach (var toxic in _toxics)
{
await _http.PostAsJsonAsync($"/proxies/{_proxyName}/toxics", toxic);
}
}
}// <auto-generated by Ops.Chaos.Generator />
using System.Net.Http.Json;
namespace OrderService.Chaos.Tests;
public class ToxiProxyClient
{
private readonly HttpClient _http;
public ToxiProxyClient(string baseUrl)
{
_http = new HttpClient { BaseAddress = new Uri(baseUrl) };
}
public ToxicBuilder Proxy(string name) => new(_http, name);
public async Task WaitForProxy(string name, TimeSpan timeout)
{
var deadline = DateTime.UtcNow + timeout;
while (DateTime.UtcNow < deadline)
{
try
{
var response = await _http.GetAsync($"/proxies/{name}");
if (response.IsSuccessStatusCode) return;
}
catch { /* retry */ }
await Task.Delay(500);
}
throw new TimeoutException($"Toxiproxy proxy '{name}' not ready after {timeout}");
}
public async Task RemoveAllToxics(string proxyName)
{
var response = await _http.GetFromJsonAsync<ToxiProxyInfo>($"/proxies/{proxyName}");
if (response?.Toxics != null)
{
foreach (var toxic in response.Toxics)
{
await _http.DeleteAsync($"/proxies/{proxyName}/toxics/{toxic.Name}");
}
}
}
private record ToxiProxyInfo(string Name, ToxicInfo[]? Toxics);
private record ToxicInfo(string Name, string Type);
}
public class ToxicBuilder
{
private readonly HttpClient _http;
private readonly string _proxyName;
private readonly List<object> _toxics = new();
public ToxicBuilder(HttpClient http, string proxyName)
{
_http = http;
_proxyName = proxyName;
}
public ToxicBuilder AddLatency(int milliseconds, int jitter = 0)
{
_toxics.Add(new
{
name = $"latency_{milliseconds}ms",
type = "latency",
stream = "downstream",
toxicity = 1.0,
attributes = new { latency = milliseconds, jitter }
});
return this;
}
public ToxicBuilder AddPacketLoss(double probability)
{
_toxics.Add(new
{
name = $"packet_loss_{(int)(probability * 100)}pct",
type = "timeout",
stream = "downstream",
toxicity = probability,
attributes = new { timeout = 0 }
});
return this;
}
public ToxicBuilder AddBandwidthLimit(long bytesPerSecond)
{
_toxics.Add(new
{
name = $"bandwidth_{bytesPerSecond}bps",
type = "bandwidth",
stream = "downstream",
toxicity = 1.0,
attributes = new { rate = bytesPerSecond }
});
return this;
}
public ToxicBuilder AddConnectionReset(int timeout = 0)
{
_toxics.Add(new
{
name = "reset_peer",
type = "reset_peer",
stream = "downstream",
toxicity = 1.0,
attributes = new { timeout }
});
return this;
}
public async Task Apply()
{
foreach (var toxic in _toxics)
{
await _http.PostAsJsonAsync($"/proxies/{_proxyName}/toxics", toxic);
}
}
}The fluent API chains fault declarations and applies them in a single batch. The test can add faults, remove them, and add different faults mid-test. This is how you test recovery: inject latency, observe degradation, remove latency, observe recovery.
Step 7: Write the Test
// DatabaseLatencyChaosTests.cs
using Npgsql;
[Collection("DatabaseLatencyChaos")]
public class DatabaseLatencyChaosTests
{
private readonly DatabaseLatencyChaosFixture _fixture;
public DatabaseLatencyChaosTests(DatabaseLatencyChaosFixture fixture)
{
_fixture = fixture;
}
[Fact]
public async Task OrderQueries_CompleteWithin2s_Under500msLatency()
{
// Arrange: create the orders table and seed data
await using var conn = new NpgsqlConnection(_fixture.ConnectionString);
await conn.OpenAsync();
await using (var cmd = conn.CreateCommand())
{
cmd.CommandText = """
CREATE TABLE IF NOT EXISTS orders (
id SERIAL PRIMARY KEY,
customer_id TEXT NOT NULL,
total DECIMAL(10,2) NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
""";
await cmd.ExecuteNonQueryAsync();
}
// Seed 100 orders
for (int i = 0; i < 100; i++)
{
await using var cmd = conn.CreateCommand();
cmd.CommandText =
"INSERT INTO orders (customer_id, total, status) VALUES ($1, $2, $3)";
cmd.Parameters.AddWithValue($"customer-{i % 10}");
cmd.Parameters.AddWithValue((decimal)(i * 9.99));
cmd.Parameters.AddWithValue(i % 3 == 0 ? "completed" : "pending");
await cmd.ExecuteNonQueryAsync();
}
// Act: query orders with the 500ms latency active
var queryTimes = new List<double>();
for (int i = 0; i < 50; i++)
{
var sw = System.Diagnostics.Stopwatch.StartNew();
await using var cmd = conn.CreateCommand();
cmd.CommandText =
"SELECT id, customer_id, total, status, created_at FROM orders " +
"WHERE customer_id = $1 ORDER BY created_at DESC LIMIT 10";
cmd.Parameters.AddWithValue($"customer-{i % 10}");
await using var reader = await cmd.ExecuteReaderAsync();
while (await reader.ReadAsync())
{
_ = reader.GetInt32(0); // consume the results
}
sw.Stop();
queryTimes.Add(sw.Elapsed.TotalMilliseconds);
}
// Assert: p99 < 2000ms (the steady-state probe)
queryTimes.Sort();
var p99Index = (int)Math.Ceiling(queryTimes.Count * 0.99) - 1;
var p99 = queryTimes[p99Index];
Assert.True(p99 < 2000,
$"Steady-state probe failed: p99 query time was {p99:F0}ms, expected < 2000ms. " +
$"Median: {queryTimes[queryTimes.Count / 2]:F0}ms, " +
$"p50: {queryTimes[queryTimes.Count / 2]:F0}ms, " +
$"p95: {queryTimes[(int)(queryTimes.Count * 0.95)]:F0}ms");
}
[Fact]
public async Task OrderQueries_RecoverAfterLatencyRemoved()
{
await using var conn = new NpgsqlConnection(_fixture.ConnectionString);
await conn.OpenAsync();
// Measure with latency (already active from fixture)
var withLatency = await MeasureQueryTime(conn);
// Remove all toxics
await _fixture.ToxiProxy.RemoveAllToxics("postgres-proxy");
// Measure without latency
var withoutLatency = await MeasureQueryTime(conn);
// Assert recovery: queries without latency should be significantly faster
Assert.True(withoutLatency < withLatency,
$"Expected recovery after toxic removal. With latency: {withLatency:F0}ms, " +
$"Without: {withoutLatency:F0}ms");
// Re-apply toxics for other tests
await _fixture.ToxiProxy.Proxy("postgres-proxy")
.AddLatency(milliseconds: 500, jitter: 100)
.AddPacketLoss(probability: 0.1)
.Apply();
}
private static async Task<double> MeasureQueryTime(NpgsqlConnection conn)
{
var times = new List<double>();
for (int i = 0; i < 20; i++)
{
var sw = System.Diagnostics.Stopwatch.StartNew();
await using var cmd = conn.CreateCommand();
cmd.CommandText = "SELECT COUNT(*) FROM orders";
await cmd.ExecuteScalarAsync();
sw.Stop();
times.Add(sw.Elapsed.TotalMilliseconds);
}
times.Sort();
return times[times.Count / 2]; // median
}
}// DatabaseLatencyChaosTests.cs
using Npgsql;
[Collection("DatabaseLatencyChaos")]
public class DatabaseLatencyChaosTests
{
private readonly DatabaseLatencyChaosFixture _fixture;
public DatabaseLatencyChaosTests(DatabaseLatencyChaosFixture fixture)
{
_fixture = fixture;
}
[Fact]
public async Task OrderQueries_CompleteWithin2s_Under500msLatency()
{
// Arrange: create the orders table and seed data
await using var conn = new NpgsqlConnection(_fixture.ConnectionString);
await conn.OpenAsync();
await using (var cmd = conn.CreateCommand())
{
cmd.CommandText = """
CREATE TABLE IF NOT EXISTS orders (
id SERIAL PRIMARY KEY,
customer_id TEXT NOT NULL,
total DECIMAL(10,2) NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
""";
await cmd.ExecuteNonQueryAsync();
}
// Seed 100 orders
for (int i = 0; i < 100; i++)
{
await using var cmd = conn.CreateCommand();
cmd.CommandText =
"INSERT INTO orders (customer_id, total, status) VALUES ($1, $2, $3)";
cmd.Parameters.AddWithValue($"customer-{i % 10}");
cmd.Parameters.AddWithValue((decimal)(i * 9.99));
cmd.Parameters.AddWithValue(i % 3 == 0 ? "completed" : "pending");
await cmd.ExecuteNonQueryAsync();
}
// Act: query orders with the 500ms latency active
var queryTimes = new List<double>();
for (int i = 0; i < 50; i++)
{
var sw = System.Diagnostics.Stopwatch.StartNew();
await using var cmd = conn.CreateCommand();
cmd.CommandText =
"SELECT id, customer_id, total, status, created_at FROM orders " +
"WHERE customer_id = $1 ORDER BY created_at DESC LIMIT 10";
cmd.Parameters.AddWithValue($"customer-{i % 10}");
await using var reader = await cmd.ExecuteReaderAsync();
while (await reader.ReadAsync())
{
_ = reader.GetInt32(0); // consume the results
}
sw.Stop();
queryTimes.Add(sw.Elapsed.TotalMilliseconds);
}
// Assert: p99 < 2000ms (the steady-state probe)
queryTimes.Sort();
var p99Index = (int)Math.Ceiling(queryTimes.Count * 0.99) - 1;
var p99 = queryTimes[p99Index];
Assert.True(p99 < 2000,
$"Steady-state probe failed: p99 query time was {p99:F0}ms, expected < 2000ms. " +
$"Median: {queryTimes[queryTimes.Count / 2]:F0}ms, " +
$"p50: {queryTimes[queryTimes.Count / 2]:F0}ms, " +
$"p95: {queryTimes[(int)(queryTimes.Count * 0.95)]:F0}ms");
}
[Fact]
public async Task OrderQueries_RecoverAfterLatencyRemoved()
{
await using var conn = new NpgsqlConnection(_fixture.ConnectionString);
await conn.OpenAsync();
// Measure with latency (already active from fixture)
var withLatency = await MeasureQueryTime(conn);
// Remove all toxics
await _fixture.ToxiProxy.RemoveAllToxics("postgres-proxy");
// Measure without latency
var withoutLatency = await MeasureQueryTime(conn);
// Assert recovery: queries without latency should be significantly faster
Assert.True(withoutLatency < withLatency,
$"Expected recovery after toxic removal. With latency: {withLatency:F0}ms, " +
$"Without: {withoutLatency:F0}ms");
// Re-apply toxics for other tests
await _fixture.ToxiProxy.Proxy("postgres-proxy")
.AddLatency(milliseconds: 500, jitter: 100)
.AddPacketLoss(probability: 0.1)
.Apply();
}
private static async Task<double> MeasureQueryTime(NpgsqlConnection conn)
{
var times = new List<double>();
for (int i = 0; i < 20; i++)
{
var sw = System.Diagnostics.Stopwatch.StartNew();
await using var cmd = conn.CreateCommand();
cmd.CommandText = "SELECT COUNT(*) FROM orders";
await cmd.ExecuteScalarAsync();
sw.Stop();
times.Add(sw.Elapsed.TotalMilliseconds);
}
times.Sort();
return times[times.Count / 2]; // median
}
}Two tests. The first verifies the steady-state hypothesis: queries complete within 2 seconds even under 500ms latency and 10% packet loss. The second tests recovery: remove the faults, verify that query times drop back to baseline, then re-apply the faults for subsequent tests.
This is testing real Npgsql behavior over a real TCP connection with real network degradation. The connection pool, the timeout handling, the retry logic -- all exercised with actual network-level faults.
Step 8: Run in CI
Locally:
$ docker compose -f generated/docker-compose.chaos.yaml up -d --wait
✔ Container postgres Healthy
✔ Container rabbitmq Healthy
✔ Container toxiproxy Started
$ dotnet test --filter "Collection=DatabaseLatencyChaos" --logger "console;verbosity=detailed"
[xUnit] DatabaseLatencyChaosTests.OrderQueries_CompleteWithin2s_Under500msLatency
[info] Toxiproxy: Applied latency_500ms (500ms +/- 100ms jitter) to postgres-proxy
[info] Toxiproxy: Applied packet_loss_10pct (10% drop) to postgres-proxy
[info] Seeded 100 orders
[info] Ran 50 queries through latent proxy
[info] Results: p50=612ms, p95=743ms, p99=891ms
Passed! [4.2s]
[xUnit] DatabaseLatencyChaosTests.OrderQueries_RecoverAfterLatencyRemoved
[info] Median with latency: 608ms
[info] Removed all toxics from postgres-proxy
[info] Median without latency: 3ms
[info] Recovery confirmed: 608ms -> 3ms
Passed! [2.1s]
Test Run Successful.
Total tests: 2
Passed: 2
Duration: 6.8s
$ docker compose -f generated/docker-compose.chaos.yaml down -v
✔ Container test-runner Removed
✔ Container toxiproxy Removed
✔ Container postgres Removed
✔ Container rabbitmq Removed
✔ Volume postgres-data Removed$ docker compose -f generated/docker-compose.chaos.yaml up -d --wait
✔ Container postgres Healthy
✔ Container rabbitmq Healthy
✔ Container toxiproxy Started
$ dotnet test --filter "Collection=DatabaseLatencyChaos" --logger "console;verbosity=detailed"
[xUnit] DatabaseLatencyChaosTests.OrderQueries_CompleteWithin2s_Under500msLatency
[info] Toxiproxy: Applied latency_500ms (500ms +/- 100ms jitter) to postgres-proxy
[info] Toxiproxy: Applied packet_loss_10pct (10% drop) to postgres-proxy
[info] Seeded 100 orders
[info] Ran 50 queries through latent proxy
[info] Results: p50=612ms, p95=743ms, p99=891ms
Passed! [4.2s]
[xUnit] DatabaseLatencyChaosTests.OrderQueries_RecoverAfterLatencyRemoved
[info] Median with latency: 608ms
[info] Removed all toxics from postgres-proxy
[info] Median without latency: 3ms
[info] Recovery confirmed: 608ms -> 3ms
Passed! [2.1s]
Test Run Successful.
Total tests: 2
Passed: 2
Duration: 6.8s
$ docker compose -f generated/docker-compose.chaos.yaml down -v
✔ Container test-runner Removed
✔ Container toxiproxy Removed
✔ Container postgres Removed
✔ Container rabbitmq Removed
✔ Volume postgres-data RemovedFor CI, a GitHub Actions workflow:
# .github/workflows/chaos-container.yaml
name: Container Chaos Tests
on:
push:
paths:
- 'src/OrderService/**'
- 'tests/OrderService.Chaos.Tests/**'
jobs:
chaos-container:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: '9.0.x'
- name: Build and generate chaos artifacts
run: dotnet build tests/OrderService.Chaos.Tests
- name: Start chaos environment
run: |
docker compose -f tests/OrderService.Chaos.Tests/generated/docker-compose.chaos.yaml up -d --wait
sleep 5 # Extra buffer for Toxiproxy initialization
- name: Run container chaos tests
run: |
dotnet test tests/OrderService.Chaos.Tests \
--filter "Collection=DatabaseLatencyChaos" \
--logger "trx;LogFileName=chaos-results.trx"
- name: Collect chaos logs
if: always()
run: |
docker compose -f tests/OrderService.Chaos.Tests/generated/docker-compose.chaos.yaml logs > chaos-logs.txt 2>&1
- name: Tear down
if: always()
run: |
docker compose -f tests/OrderService.Chaos.Tests/generated/docker-compose.chaos.yaml down -v
- name: Upload results
if: always()
uses: actions/upload-artifact@v4
with:
name: chaos-results
path: |
tests/OrderService.Chaos.Tests/TestResults/chaos-results.trx
chaos-logs.txt# .github/workflows/chaos-container.yaml
name: Container Chaos Tests
on:
push:
paths:
- 'src/OrderService/**'
- 'tests/OrderService.Chaos.Tests/**'
jobs:
chaos-container:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: '9.0.x'
- name: Build and generate chaos artifacts
run: dotnet build tests/OrderService.Chaos.Tests
- name: Start chaos environment
run: |
docker compose -f tests/OrderService.Chaos.Tests/generated/docker-compose.chaos.yaml up -d --wait
sleep 5 # Extra buffer for Toxiproxy initialization
- name: Run container chaos tests
run: |
dotnet test tests/OrderService.Chaos.Tests \
--filter "Collection=DatabaseLatencyChaos" \
--logger "trx;LogFileName=chaos-results.trx"
- name: Collect chaos logs
if: always()
run: |
docker compose -f tests/OrderService.Chaos.Tests/generated/docker-compose.chaos.yaml logs > chaos-logs.txt 2>&1
- name: Tear down
if: always()
run: |
docker compose -f tests/OrderService.Chaos.Tests/generated/docker-compose.chaos.yaml down -v
- name: Upload results
if: always()
uses: actions/upload-artifact@v4
with:
name: chaos-results
path: |
tests/OrderService.Chaos.Tests/TestResults/chaos-results.trx
chaos-logs.txtThe chaos environment starts, the tests run, the results are collected, and everything is torn down. The if: always() ensures cleanup happens even when tests fail.
What This Proves Beyond InProcess
The InProcess tier proved the circuit breaker trips. The Container tier proves three additional things:
Real driver behavior under latency. Npgsql's connection pool handles 500ms latency without exhausting connections. The
Command Timeout=10in the connection string is sufficient. This is not something aTask.Delaycan test.Recovery after fault removal. When the toxic is removed, query times drop from 600ms to 3ms. The connection pool recovers. The driver does not hold stale connections. This validates the recovery path.
Multi-service isolation. PostgreSQL is degraded, but RabbitMQ is unaffected (no faults on
rabbitmq-proxy). The test can verify that event publishing continues even when the database is slow. This is a real integration concern that InProcess cannot simulate.
The generated artifacts -- docker-compose.chaos.yaml, toxiproxy-config.json, TestInfraFixture.g.cs, ToxiProxyClient.g.cs -- are all produced from eight attributes on one C# class. No YAML was written by hand. No Toxiproxy configuration was copy-pasted from a wiki. The single source of truth is the attribute declaration.