Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Container Tier -- Docker Compose Generation

The InProcess walkthrough proved the circuit breaker works with simulated faults inside dotnet test. But simulated timeouts are not real network latency. A Task.Delay does not exercise TCP retransmission, connection pool exhaustion, or driver timeout handling. For that, we need a real database connection with real network degradation.

This walkthrough generates a complete Docker Compose environment from C# attributes: PostgreSQL, RabbitMQ, Toxiproxy sidecars, and a test runner. The generator produces every file. You write the experiment declaration and the test. Nothing else.


Step 1: The Scenario

OrderService depends on PostgreSQL for persistence and RabbitMQ for event publishing. The question: what happens when the database connection has 500ms added latency and 10% packet loss?

This cannot be answered with InProcess chaos. A DI decorator can delay a method call, but it cannot simulate TCP packet loss on a Npgsql connection. The driver has its own connection pool, its own timeout logic, its own retry behavior. The only way to test this is to put a network proxy between the application and the database and degrade the actual TCP stream.

Toxiproxy does exactly this. It sits between the test runner and the upstream service, and it introduces configurable network faults: latency, bandwidth limits, packet loss, connection resets, slow-close, and more. The Container tier generates the Toxiproxy configuration alongside the Docker Compose file.


Step 2: Declare the Experiment

// DatabaseLatencyChaos.cs
using Ops.Chaos;

[ChaosExperiment("DatabaseLatency", Tier = OpsExecutionTier.Container,
    Hypothesis = "Order queries complete within 2s even with 500ms DB latency")]
[Container("postgres", Image = "postgres:16", Port = 5432,
    Environment = new[] { "POSTGRES_PASSWORD=test", "POSTGRES_DB=orders" })]
[Container("rabbitmq", Image = "rabbitmq:3-management", Port = 5672)]
[ToxiProxy("postgres-proxy", Upstream = "postgres:5432", ListenPort = 15432)]
[ToxiProxy("rabbitmq-proxy", Upstream = "rabbitmq:5672", ListenPort = 15672)]
[FaultInjection(FaultKind.Latency, Duration = "500ms", Target = "postgres-proxy")]
[FaultInjection(FaultKind.PacketLoss, Probability = 0.1, Target = "postgres-proxy")]
[SteadyStateProbe(Metric = "order.query.p99", Expected = "< 2000ms")]
public partial class DatabaseLatencyChaos { }

The [Container] attributes declare the infrastructure. The [ToxiProxy] attributes declare the proxies. The [FaultInjection] attributes declare the network faults and which proxy they target. The generator reads all of this and produces the complete Docker Compose file, the Toxiproxy configuration, and the xUnit test fixture.

Note that the faults target postgres-proxy, not postgres directly. RabbitMQ is proxied too (via rabbitmq-proxy), but no faults are injected on it in this experiment. It is there to prove that the application correctly routes through the proxy and that event publishing still works under database degradation.


Step 3: Generated docker-compose.chaos.yaml

The source generator produces this file at generated/docker-compose.chaos.yaml:

# Auto-generated by Ops.Chaos.Generator from DatabaseLatencyChaos
# Do not edit. Regenerate by building the project.

version: "3.8"

networks:
  chaos-net:
    driver: bridge
    internal: true

services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: test
      POSTGRES_DB: orders
    ports:
      - "5432"
    networks:
      - chaos-net
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 2s
      timeout: 5s
      retries: 10
    volumes:
      - postgres-data:/var/lib/postgresql/data

  rabbitmq:
    image: rabbitmq:3-management
    ports:
      - "5672"
      - "15672"
    networks:
      - chaos-net
    healthcheck:
      test: ["CMD-SHELL", "rabbitmq-diagnostics -q ping"]
      interval: 5s
      timeout: 10s
      retries: 10

  toxiproxy:
    image: ghcr.io/shopify/toxiproxy:2.9.0
    ports:
      - "8474:8474"     # Toxiproxy API
      - "15432:15432"   # postgres-proxy listen port
      - "15672:15672"   # rabbitmq-proxy listen port
    networks:
      - chaos-net
    volumes:
      - ./toxiproxy-config.json:/config/toxiproxy.json
    command: ["-config", "/config/toxiproxy.json"]
    depends_on:
      postgres:
        condition: service_healthy
      rabbitmq:
        condition: service_healthy

  test-runner:
    build:
      context: ../..
      dockerfile: Dockerfile.test
    environment:
      CONNECTION_STRING: "Host=toxiproxy;Port=15432;Database=orders;Username=postgres;Password=test;Timeout=10"
      RABBITMQ_CONNECTION: "amqp://guest:guest@toxiproxy:15672"
      TOXIPROXY_API: "http://toxiproxy:8474"
    networks:
      - chaos-net
    depends_on:
      toxiproxy:
        condition: service_started

volumes:
  postgres-data:

Every connection string points through Toxiproxy, not directly to the upstream service. The test-runner container gets toxiproxy:15432 for PostgreSQL, not postgres:5432. The test code sees a normal Npgsql connection string -- it has no idea that a proxy is in the middle.

The network is marked internal: true. Nothing leaks to the host network. The chaos environment is fully isolated.


Step 4: Generated toxiproxy-config.json

[
  {
    "name": "postgres-proxy",
    "listen": "0.0.0.0:15432",
    "upstream": "postgres:5432",
    "enabled": true
  },
  {
    "name": "rabbitmq-proxy",
    "listen": "0.0.0.0:15672",
    "upstream": "rabbitmq:5672",
    "enabled": true
  }
]

This is the initial Toxiproxy configuration. The proxies are created at startup with no toxics -- clean passthrough. The test fixture adds the faults dynamically via the Toxiproxy REST API. This allows the test to add, remove, and modify faults mid-test.


Step 5: Generated TestInfraFixture.g.cs

The generator produces an xUnit collection fixture that manages the Docker Compose lifecycle:

// <auto-generated by Ops.Chaos.Generator />
using System.Diagnostics;

namespace OrderService.Chaos.Tests;

[CollectionDefinition("DatabaseLatencyChaos")]
public class DatabaseLatencyChaosCollection : ICollectionFixture<DatabaseLatencyChaosFixture> { }

public class DatabaseLatencyChaosFixture : IAsyncLifetime
{
    private readonly string _composeFile;
    public string ConnectionString { get; private set; } = null!;
    public string RabbitMqConnection { get; private set; } = null!;
    public ToxiProxyClient ToxiProxy { get; private set; } = null!;

    public DatabaseLatencyChaosFixture()
    {
        _composeFile = Path.Combine(
            AppContext.BaseDirectory, "generated", "docker-compose.chaos.yaml");
    }

    public async Task InitializeAsync()
    {
        // Start the environment
        await RunProcess("docker", $"compose -f {_composeFile} up -d --wait");

        // Resolve the mapped ports
        var toxiProxyPort = await GetMappedPort("toxiproxy", 8474);
        var postgresProxyPort = await GetMappedPort("toxiproxy", 15432);
        var rabbitProxyPort = await GetMappedPort("toxiproxy", 15672);

        ConnectionString =
            $"Host=localhost;Port={postgresProxyPort};Database=orders;" +
            $"Username=postgres;Password=test;Timeout=10;Command Timeout=10";

        RabbitMqConnection =
            $"amqp://guest:guest@localhost:{rabbitProxyPort}";

        ToxiProxy = new ToxiProxyClient($"http://localhost:{toxiProxyPort}");

        // Wait for Toxiproxy proxies to be ready
        await ToxiProxy.WaitForProxy("postgres-proxy", timeout: TimeSpan.FromSeconds(30));
        await ToxiProxy.WaitForProxy("rabbitmq-proxy", timeout: TimeSpan.FromSeconds(30));

        // Apply the declared faults
        await ToxiProxy.Proxy("postgres-proxy")
            .AddLatency(milliseconds: 500, jitter: 100)
            .AddPacketLoss(probability: 0.1)
            .Apply();
    }

    public async Task DisposeAsync()
    {
        // Collect logs before teardown
        await RunProcess("docker",
            $"compose -f {_composeFile} logs --no-color > chaos-logs.txt 2>&1");

        // Tear down
        await RunProcess("docker", $"compose -f {_composeFile} down -v");
    }

    private static async Task RunProcess(string fileName, string arguments)
    {
        var psi = new ProcessStartInfo(fileName, arguments)
        {
            RedirectStandardOutput = true,
            RedirectStandardError = true,
            UseShellExecute = false
        };

        using var process = Process.Start(psi)!;
        await process.WaitForExitAsync();

        if (process.ExitCode != 0)
        {
            var stderr = await process.StandardError.ReadToEndAsync();
            throw new InvalidOperationException(
                $"Process '{fileName} {arguments}' failed with exit code {process.ExitCode}: {stderr}");
        }
    }

    private static async Task<int> GetMappedPort(string service, int containerPort)
    {
        var psi = new ProcessStartInfo("docker",
            $"compose port {service} {containerPort}")
        {
            RedirectStandardOutput = true,
            UseShellExecute = false
        };

        using var process = Process.Start(psi)!;
        var output = await process.StandardOutput.ReadToEndAsync();
        await process.WaitForExitAsync();

        // Output format: 0.0.0.0:12345
        var port = output.Trim().Split(':').Last();
        return int.Parse(port);
    }
}

The fixture handles everything: starting Docker Compose, waiting for health checks, resolving mapped ports, configuring Toxiproxy faults, and tearing down with log collection. The test class gets a ready-to-use ConnectionString that routes through the latent proxy and a ToxiProxy client for mid-test fault manipulation.


Step 6: Generated ToxiProxyClient.g.cs

The generator also produces a typed Toxiproxy client with a fluent API:

// <auto-generated by Ops.Chaos.Generator />
using System.Net.Http.Json;

namespace OrderService.Chaos.Tests;

public class ToxiProxyClient
{
    private readonly HttpClient _http;

    public ToxiProxyClient(string baseUrl)
    {
        _http = new HttpClient { BaseAddress = new Uri(baseUrl) };
    }

    public ToxicBuilder Proxy(string name) => new(_http, name);

    public async Task WaitForProxy(string name, TimeSpan timeout)
    {
        var deadline = DateTime.UtcNow + timeout;
        while (DateTime.UtcNow < deadline)
        {
            try
            {
                var response = await _http.GetAsync($"/proxies/{name}");
                if (response.IsSuccessStatusCode) return;
            }
            catch { /* retry */ }
            await Task.Delay(500);
        }
        throw new TimeoutException($"Toxiproxy proxy '{name}' not ready after {timeout}");
    }

    public async Task RemoveAllToxics(string proxyName)
    {
        var response = await _http.GetFromJsonAsync<ToxiProxyInfo>($"/proxies/{proxyName}");
        if (response?.Toxics != null)
        {
            foreach (var toxic in response.Toxics)
            {
                await _http.DeleteAsync($"/proxies/{proxyName}/toxics/{toxic.Name}");
            }
        }
    }

    private record ToxiProxyInfo(string Name, ToxicInfo[]? Toxics);
    private record ToxicInfo(string Name, string Type);
}

public class ToxicBuilder
{
    private readonly HttpClient _http;
    private readonly string _proxyName;
    private readonly List<object> _toxics = new();

    public ToxicBuilder(HttpClient http, string proxyName)
    {
        _http = http;
        _proxyName = proxyName;
    }

    public ToxicBuilder AddLatency(int milliseconds, int jitter = 0)
    {
        _toxics.Add(new
        {
            name = $"latency_{milliseconds}ms",
            type = "latency",
            stream = "downstream",
            toxicity = 1.0,
            attributes = new { latency = milliseconds, jitter }
        });
        return this;
    }

    public ToxicBuilder AddPacketLoss(double probability)
    {
        _toxics.Add(new
        {
            name = $"packet_loss_{(int)(probability * 100)}pct",
            type = "timeout",
            stream = "downstream",
            toxicity = probability,
            attributes = new { timeout = 0 }
        });
        return this;
    }

    public ToxicBuilder AddBandwidthLimit(long bytesPerSecond)
    {
        _toxics.Add(new
        {
            name = $"bandwidth_{bytesPerSecond}bps",
            type = "bandwidth",
            stream = "downstream",
            toxicity = 1.0,
            attributes = new { rate = bytesPerSecond }
        });
        return this;
    }

    public ToxicBuilder AddConnectionReset(int timeout = 0)
    {
        _toxics.Add(new
        {
            name = "reset_peer",
            type = "reset_peer",
            stream = "downstream",
            toxicity = 1.0,
            attributes = new { timeout }
        });
        return this;
    }

    public async Task Apply()
    {
        foreach (var toxic in _toxics)
        {
            await _http.PostAsJsonAsync($"/proxies/{_proxyName}/toxics", toxic);
        }
    }
}

The fluent API chains fault declarations and applies them in a single batch. The test can add faults, remove them, and add different faults mid-test. This is how you test recovery: inject latency, observe degradation, remove latency, observe recovery.


Step 7: Write the Test

// DatabaseLatencyChaosTests.cs
using Npgsql;

[Collection("DatabaseLatencyChaos")]
public class DatabaseLatencyChaosTests
{
    private readonly DatabaseLatencyChaosFixture _fixture;

    public DatabaseLatencyChaosTests(DatabaseLatencyChaosFixture fixture)
    {
        _fixture = fixture;
    }

    [Fact]
    public async Task OrderQueries_CompleteWithin2s_Under500msLatency()
    {
        // Arrange: create the orders table and seed data
        await using var conn = new NpgsqlConnection(_fixture.ConnectionString);
        await conn.OpenAsync();

        await using (var cmd = conn.CreateCommand())
        {
            cmd.CommandText = """
                CREATE TABLE IF NOT EXISTS orders (
                    id SERIAL PRIMARY KEY,
                    customer_id TEXT NOT NULL,
                    total DECIMAL(10,2) NOT NULL,
                    status TEXT NOT NULL DEFAULT 'pending',
                    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
                );
                """;
            await cmd.ExecuteNonQueryAsync();
        }

        // Seed 100 orders
        for (int i = 0; i < 100; i++)
        {
            await using var cmd = conn.CreateCommand();
            cmd.CommandText =
                "INSERT INTO orders (customer_id, total, status) VALUES ($1, $2, $3)";
            cmd.Parameters.AddWithValue($"customer-{i % 10}");
            cmd.Parameters.AddWithValue((decimal)(i * 9.99));
            cmd.Parameters.AddWithValue(i % 3 == 0 ? "completed" : "pending");
            await cmd.ExecuteNonQueryAsync();
        }

        // Act: query orders with the 500ms latency active
        var queryTimes = new List<double>();

        for (int i = 0; i < 50; i++)
        {
            var sw = System.Diagnostics.Stopwatch.StartNew();

            await using var cmd = conn.CreateCommand();
            cmd.CommandText =
                "SELECT id, customer_id, total, status, created_at FROM orders " +
                "WHERE customer_id = $1 ORDER BY created_at DESC LIMIT 10";
            cmd.Parameters.AddWithValue($"customer-{i % 10}");

            await using var reader = await cmd.ExecuteReaderAsync();
            while (await reader.ReadAsync())
            {
                _ = reader.GetInt32(0);  // consume the results
            }

            sw.Stop();
            queryTimes.Add(sw.Elapsed.TotalMilliseconds);
        }

        // Assert: p99 < 2000ms (the steady-state probe)
        queryTimes.Sort();
        var p99Index = (int)Math.Ceiling(queryTimes.Count * 0.99) - 1;
        var p99 = queryTimes[p99Index];

        Assert.True(p99 < 2000,
            $"Steady-state probe failed: p99 query time was {p99:F0}ms, expected < 2000ms. " +
            $"Median: {queryTimes[queryTimes.Count / 2]:F0}ms, " +
            $"p50: {queryTimes[queryTimes.Count / 2]:F0}ms, " +
            $"p95: {queryTimes[(int)(queryTimes.Count * 0.95)]:F0}ms");
    }

    [Fact]
    public async Task OrderQueries_RecoverAfterLatencyRemoved()
    {
        await using var conn = new NpgsqlConnection(_fixture.ConnectionString);
        await conn.OpenAsync();

        // Measure with latency (already active from fixture)
        var withLatency = await MeasureQueryTime(conn);

        // Remove all toxics
        await _fixture.ToxiProxy.RemoveAllToxics("postgres-proxy");

        // Measure without latency
        var withoutLatency = await MeasureQueryTime(conn);

        // Assert recovery: queries without latency should be significantly faster
        Assert.True(withoutLatency < withLatency,
            $"Expected recovery after toxic removal. With latency: {withLatency:F0}ms, " +
            $"Without: {withoutLatency:F0}ms");

        // Re-apply toxics for other tests
        await _fixture.ToxiProxy.Proxy("postgres-proxy")
            .AddLatency(milliseconds: 500, jitter: 100)
            .AddPacketLoss(probability: 0.1)
            .Apply();
    }

    private static async Task<double> MeasureQueryTime(NpgsqlConnection conn)
    {
        var times = new List<double>();
        for (int i = 0; i < 20; i++)
        {
            var sw = System.Diagnostics.Stopwatch.StartNew();
            await using var cmd = conn.CreateCommand();
            cmd.CommandText = "SELECT COUNT(*) FROM orders";
            await cmd.ExecuteScalarAsync();
            sw.Stop();
            times.Add(sw.Elapsed.TotalMilliseconds);
        }
        times.Sort();
        return times[times.Count / 2]; // median
    }
}

Two tests. The first verifies the steady-state hypothesis: queries complete within 2 seconds even under 500ms latency and 10% packet loss. The second tests recovery: remove the faults, verify that query times drop back to baseline, then re-apply the faults for subsequent tests.

This is testing real Npgsql behavior over a real TCP connection with real network degradation. The connection pool, the timeout handling, the retry logic -- all exercised with actual network-level faults.


Step 8: Run in CI

Locally:

$ docker compose -f generated/docker-compose.chaos.yaml up -d --wait
  ✔ Container postgres        Healthy
  ✔ Container rabbitmq        Healthy
  ✔ Container toxiproxy       Started

$ dotnet test --filter "Collection=DatabaseLatencyChaos" --logger "console;verbosity=detailed"

  [xUnit] DatabaseLatencyChaosTests.OrderQueries_CompleteWithin2s_Under500msLatency
  [info] Toxiproxy: Applied latency_500ms (500ms +/- 100ms jitter) to postgres-proxy
  [info] Toxiproxy: Applied packet_loss_10pct (10% drop) to postgres-proxy
  [info] Seeded 100 orders
  [info] Ran 50 queries through latent proxy
  [info] Results: p50=612ms, p95=743ms, p99=891ms

  Passed! [4.2s]

  [xUnit] DatabaseLatencyChaosTests.OrderQueries_RecoverAfterLatencyRemoved
  [info] Median with latency: 608ms
  [info] Removed all toxics from postgres-proxy
  [info] Median without latency: 3ms
  [info] Recovery confirmed: 608ms -> 3ms

  Passed! [2.1s]

  Test Run Successful.
  Total tests: 2
  Passed:      2
  Duration:    6.8s

$ docker compose -f generated/docker-compose.chaos.yaml down -v
  ✔ Container test-runner     Removed
  ✔ Container toxiproxy       Removed
  ✔ Container postgres        Removed
  ✔ Container rabbitmq        Removed
  ✔ Volume postgres-data      Removed

For CI, a GitHub Actions workflow:

# .github/workflows/chaos-container.yaml
name: Container Chaos Tests

on:
  push:
    paths:
      - 'src/OrderService/**'
      - 'tests/OrderService.Chaos.Tests/**'

jobs:
  chaos-container:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-dotnet@v4
        with:
          dotnet-version: '9.0.x'

      - name: Build and generate chaos artifacts
        run: dotnet build tests/OrderService.Chaos.Tests

      - name: Start chaos environment
        run: |
          docker compose -f tests/OrderService.Chaos.Tests/generated/docker-compose.chaos.yaml up -d --wait
          sleep 5  # Extra buffer for Toxiproxy initialization

      - name: Run container chaos tests
        run: |
          dotnet test tests/OrderService.Chaos.Tests \
            --filter "Collection=DatabaseLatencyChaos" \
            --logger "trx;LogFileName=chaos-results.trx"

      - name: Collect chaos logs
        if: always()
        run: |
          docker compose -f tests/OrderService.Chaos.Tests/generated/docker-compose.chaos.yaml logs > chaos-logs.txt 2>&1

      - name: Tear down
        if: always()
        run: |
          docker compose -f tests/OrderService.Chaos.Tests/generated/docker-compose.chaos.yaml down -v

      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: chaos-results
          path: |
            tests/OrderService.Chaos.Tests/TestResults/chaos-results.trx
            chaos-logs.txt

The chaos environment starts, the tests run, the results are collected, and everything is torn down. The if: always() ensures cleanup happens even when tests fail.


What This Proves Beyond InProcess

The InProcess tier proved the circuit breaker trips. The Container tier proves three additional things:

  1. Real driver behavior under latency. Npgsql's connection pool handles 500ms latency without exhausting connections. The Command Timeout=10 in the connection string is sufficient. This is not something a Task.Delay can test.

  2. Recovery after fault removal. When the toxic is removed, query times drop from 600ms to 3ms. The connection pool recovers. The driver does not hold stale connections. This validates the recovery path.

  3. Multi-service isolation. PostgreSQL is degraded, but RabbitMQ is unaffected (no faults on rabbitmq-proxy). The test can verify that event publishing continues even when the database is slow. This is a real integration concern that InProcess cannot simulate.

The generated artifacts -- docker-compose.chaos.yaml, toxiproxy-config.json, TestInfraFixture.g.cs, ToxiProxyClient.g.cs -- are all produced from eight attributes on one C# class. No YAML was written by hand. No Toxiproxy configuration was copy-pasted from a wiki. The single source of truth is the attribute declaration.

⬇ Download