Part XV: Testing Strategy -- From Parser to Deployment

5 testing layers, 400+ tests, zero Docker daemon required for 95% of them.

The Testing Problem Nobody Talks About

Here is the question I get most often about this project: "How do you test it? You need a Docker daemon running, right?"

No. That is the whole point.

The typed Docker stack has five layers (Part II), and each needs its own testing strategy. The key insight: 95% of the tests don't need a Docker daemon. Parsers work on text fixtures. Generators work on JSON fixtures. Builders are pure logic. Output parsers work on stdout fixtures. Only the final integration test needs a real process -- and even that can use FakeProcessRunner.

I have 425 tests. Fifteen of them need Docker. The other 410 run in under ten seconds on a laptop with no containers, no daemon, no network access. That ratio is not an accident -- it is the direct consequence of the layered architecture. Each layer was designed to be testable in isolation, and the seams between layers are the test boundaries.

Most CLI wrapper projects I have seen have the opposite ratio. They start with integration tests because the whole thing is one monolithic Process.Start call. You cannot test the argument serialization without running the process. You cannot test the output parsing without real output. Everything requires a running daemon, so the test suite takes minutes and breaks when the CI server does not have Docker installed.

I refused to accept that. If parsers are pure functions from text to data, test them with text. If generators are pure functions from JSON to C#, test them with JSON. If builders are pure functions from method calls to argument lists, test them with assertions. Only test the real process when you actually need the real process.

Let me walk through each layer.

The Test Pyramid

Diagram — The test pyramid the layered architecture makes possible — only fifteen of the 425 tests need a real Docker daemon, because every seam between layers is a pure function that can be exercised from text or JSON fixtures.

Wide at the base, narrow at the top. The expensive tests that need infrastructure are a sliver. The cheap tests that exercise pure logic are the foundation. This is not a test strategy I invented -- it is the standard test pyramid, but the architecture makes it possible. If your architecture is a single Process.Start call, your pyramid is inverted by force.

Layer 0: Parser Tests (CobraHelpParser)

The parser is the foundation of the entire stack. It takes raw --help text from Docker CLI commands and produces a CommandNode tree. If the parser is wrong, everything downstream is wrong. So the parser has the most tests -- 120 of them.

Fixture-Based Unit Tests

Every Docker version I support has its help text saved as plain .txt files in a fixtures directory. One file per command per version. The directory looks like this:

test/fixtures/docker-help/
    24.0.0/
        container-run.txt
        container-ls.txt
        container-exec.txt
        image-build.txt
        image-push.txt
        image-pull.txt
        network-create.txt
        volume-create.txt
        compose-up.txt
        compose-down.txt
        ...
    25.0.0/
        container-run.txt
        container-ls.txt
        ...
    26.0.0/
        ...

These files are checked into the repository. They are the ground truth. When I scrape a new Docker version (Part III), the scraper saves these files as a side effect, and I commit them. They are never modified after that -- they are historical records of what Docker actually printed at that version.

The simplest test loads a fixture and asserts on the parsed result:

[Fact]
public void ParsesDockerContainerRunHelp()
{
    var helpText = File.ReadAllText("fixtures/docker-help/24.0.0/container-run.txt");
    var parser = new CobraHelpParser();

    var result = parser.ParseHelp(helpText, "docker container run");

    Assert.Equal("run", result.Name);
    Assert.True(result.Options.Count >= 50);
    Assert.Contains(result.Options, o => o.LongName == "detach" && o.ShortName == "d");
    Assert.Contains(result.Options, o => o.LongName == "platform" && o.ClrType == "string");
}

Nothing exotic. Load text, parse it, check the tree. The key is coverage: I have these tests for every command I support, across every version I support. That means roughly 30 commands times 4 versions -- 120 individual parse assertions.

Structural Assertions

Some tests do not check specific options. They check structural invariants that must hold for any valid parse result:

[Fact]
public void AllOptionsMustHaveLongName()
{
    var helpText = File.ReadAllText("fixtures/docker-help/24.0.0/container-run.txt");
    var parser = new CobraHelpParser();

    var result = parser.ParseHelp(helpText, "docker container run");

    Assert.All(result.Options, option =>
    {
        Assert.NotNull(option.LongName);
        Assert.NotEmpty(option.LongName);
        Assert.Matches("^[a-z][a-z0-9-]*$", option.LongName);
    });
}

[Fact]
public void ShortNamesMustBeSingleCharacter()
{
    var helpText = File.ReadAllText("fixtures/docker-help/24.0.0/container-run.txt");
    var parser = new CobraHelpParser();

    var result = parser.ParseHelp(helpText, "docker container run");

    Assert.All(
        result.Options.Where(o => o.ShortName is not null),
        option =>
        {
            Assert.Single(option.ShortName!);
            Assert.Matches("^[a-zA-Z]$", option.ShortName);
        });
}

[Fact]
public void BooleanOptionsMustNotHaveDefaultValues()
{
    var helpText = File.ReadAllText("fixtures/docker-help/24.0.0/container-run.txt");
    var parser = new CobraHelpParser();

    var result = parser.ParseHelp(helpText, "docker container run");

    Assert.All(
        result.Options.Where(o => o.ClrType == "bool"),
        option =>
        {
            Assert.Null(option.DefaultValue);
        });
}

These structural tests catch parser bugs that version-specific tests miss. If a future Docker version introduces a flag with a two-character short name (which Cobra does not allow, but stranger things have happened), this test catches it immediately.

Property-Based Tests with CsCheck

For the parser specifically, I use CsCheck for property-based testing. The idea: generate synthetic help text that follows the Cobra format, parse it, and verify that structural properties hold regardless of the specific content.

[Fact]
public void ParsedOptionsCountMatchesInputOptions()
{
    Gen.Int[1, 20].SelectMany(count =>
        Gen.String.Array[count].Select(names =>
            GenerateCobraHelp("testcmd", names)))
    .Sample(helpText =>
    {
        var result = new CobraHelpParser().ParseHelp(helpText, "test");
        // The number of parsed options should match what we generated
        Assert.True(result.Options.Count >= 1);
        Assert.All(result.Options, o => Assert.NotEmpty(o.LongName));
    });
}

private static string GenerateCobraHelp(string cmdName, string[] optionNames)
{
    var sb = new StringBuilder();
    sb.AppendLine($"Usage:  docker {cmdName} [OPTIONS]");
    sb.AppendLine();
    sb.AppendLine($"Run a {cmdName}");
    sb.AppendLine();
    sb.AppendLine("Options:");

    foreach (var name in optionNames.Distinct())
    {
        var sanitized = new string(name.Where(c => char.IsLetterOrDigit(c) || c == '-').ToArray());
        if (string.IsNullOrEmpty(sanitized)) sanitized = "option";
        sb.AppendLine($"      --{sanitized.ToLower()}   string   Description for {sanitized}");
    }

    return sb.ToString();
}

Property-based tests are not a substitute for fixture-based tests. They cover different failure modes. Fixtures test "does this parser handle real Docker output correctly?" Property tests test "is this parser structurally sound for any valid input?"

Regression Snapshot Tests

The most important parser test is the regression test. It reparses every cached fixture using the current parser version and compares the output against a blessed JSON snapshot:

[Theory]
[MemberData(nameof(AllFixtures))]
public void ReparseMatchesBlessedSnapshot(string fixturePath, string snapshotPath)
{
    var helpText = File.ReadAllText(fixturePath);
    var commandName = ExtractCommandName(fixturePath);
    var parser = new CobraHelpParser();

    var result = parser.ParseHelp(helpText, commandName);
    var actualJson = JsonSerializer.Serialize(result, SerializerOptions);

    var expectedJson = File.ReadAllText(snapshotPath);

    Assert.Equal(
        NormalizeJson(expectedJson),
        NormalizeJson(actualJson));
}

public static IEnumerable<object[]> AllFixtures()
{
    foreach (var versionDir in Directory.GetDirectories("fixtures/docker-help"))
    {
        foreach (var fixture in Directory.GetFiles(versionDir, "*.txt"))
        {
            var snapshot = Path.ChangeExtension(fixture, ".blessed.json");
            if (File.Exists(snapshot))
                yield return [fixture, snapshot];
        }
    }
}

The flow is:

When I improve the parser -- say, to handle a new Cobra flag format -- some snapshots will change. That is expected. The test fails, I review the diff, and if the new output is correct, I update the blessed snapshot. If the new output is wrong, I fix the parser. The blessed snapshots are the contract.

This is the most valuable test in the entire suite. Every other test checks "does this specific input produce this specific output?" The regression test checks "did anything change that I did not expect?" It is the safety net that lets me refactor the parser without fear.

Layer 1: Source Generator Tests

The source generator takes a parsed JSON tree and emits C# code. Testing it requires a different approach: you cannot just assert on strings because generated code has formatting, whitespace, and ordering that might change without affecting correctness. I use three strategies: snapshot tests, compilation tests, and structural tests.

Snapshot Tests

The simplest: feed known JSON into the generator, capture the emitted C#, and compare against a blessed file:

[Fact]
public void GeneratesCommandClassFromJson()
{
    var json = File.ReadAllText("fixtures/simple-binary-1.0.0.json");
    var result = RunGenerator(json, "[BinaryWrapper(\"simple\")]");

    Assert.Contains(
        "SimpleRunCommand",
        result.GeneratedSources.Select(s => s.HintName));

    var source = result.GeneratedSources
        .First(s => s.HintName.Contains("SimpleRunCommand"));

    VerifySnapshot(source.SourceText, "expected/SimpleRunCommand.g.cs");
}

The RunGenerator helper creates an in-memory compilation with the attribute applied, feeds it the JSON fixture as an additional file, and runs the incremental generator. The VerifySnapshot helper compares the output line-by-line, ignoring leading/trailing whitespace differences.

The blessed .g.cs files live in the test project alongside the JSON fixtures. They are the expected output. When the generator changes, I regenerate them and review the diff -- same pattern as the parser snapshots.

Compilation Tests

Snapshot tests verify the text of the generated code. Compilation tests verify that the text is valid C#:

[Fact]
public void GeneratedCodeCompiles()
{
    var json = File.ReadAllText("fixtures/docker-24.0.0.json");
    var result = RunGenerator(json, "[BinaryWrapper(\"docker\")]");

    Assert.Empty(
        result.Diagnostics.Where(d => d.Severity == DiagnosticSeverity.Error));
}

[Fact]
public void GeneratedCodeCompiles_AllVersions()
{
    foreach (var jsonPath in Directory.GetFiles("fixtures", "docker-*.json"))
    {
        var json = File.ReadAllText(jsonPath);
        var result = RunGenerator(json, "[BinaryWrapper(\"docker\")]");

        Assert.Empty(
            result.Diagnostics
                .Where(d => d.Severity == DiagnosticSeverity.Error)
                .Select(d => $"{Path.GetFileName(jsonPath)}: {d.GetMessage()}"));
    }
}

This catches a class of bugs that snapshot tests miss: the generated code looks right but references a type that does not exist, or uses a syntax that the target language version does not support. The compilation test is the final word on "is this valid C#?"

VersionDiffer Tests

The VersionDiffer (Part III) merges multiple version trees into a single tree with SinceVersion and UntilVersion annotations. These tests verify the merge logic:

[Fact]
public void MergeDetectsNewOptions()
{
    var v1 = LoadTree("fixtures/simple-1.0.0.json");
    var v2 = LoadTree("fixtures/simple-2.0.0.json");

    var merged = VersionDiffer.Merge([(v1.Version, v1), (v2.Version, v2)]);

    var run = merged.Root.SubCommands.First(c => c.Name == "run");
    var platform = run.Options.First(o => o.LongName == "platform");

    Assert.Equal(SemanticVersion.Parse("2.0.0"), platform.SinceVersion);
    Assert.Null(platform.UntilVersion);
}

[Fact]
public void MergeDetectsRemovedOptions()
{
    var v1 = LoadTree("fixtures/simple-1.0.0.json"); // has --deprecated-flag
    var v2 = LoadTree("fixtures/simple-2.0.0.json"); // does not have --deprecated-flag

    var merged = VersionDiffer.Merge([(v1.Version, v1), (v2.Version, v2)]);

    var run = merged.Root.SubCommands.First(c => c.Name == "run");
    var deprecated = run.Options.First(o => o.LongName == "deprecated-flag");

    Assert.Equal(SemanticVersion.Parse("1.0.0"), deprecated.SinceVersion);
    Assert.Equal(SemanticVersion.Parse("2.0.0"), deprecated.UntilVersion);
}

[Fact]
public void MergePreservesStableOptions()
{
    var v1 = LoadTree("fixtures/simple-1.0.0.json");
    var v2 = LoadTree("fixtures/simple-2.0.0.json");
    var v3 = LoadTree("fixtures/simple-3.0.0.json");

    var merged = VersionDiffer.Merge([
        (v1.Version, v1),
        (v2.Version, v2),
        (v3.Version, v3)
    ]);

    var run = merged.Root.SubCommands.First(c => c.Name == "run");
    var name = run.Options.First(o => o.LongName == "name");

    // --name exists in all three versions
    Assert.Equal(SemanticVersion.Parse("1.0.0"), name.SinceVersion);
    Assert.Null(name.UntilVersion);
}

The VersionDiffer fixtures are synthetic -- I create them by hand with known differences. Real Docker JSON is too large and too noisy for targeted merge testing. The synthetic fixtures have three or four commands with five or six options each, and the differences between versions are deliberate and documented.

Diagnostic Tests

The generator also emits diagnostics (warnings and errors) for invalid configurations. I test those explicitly:

[Fact]
public void EmitsWarningForUnknownOptionType()
{
    var json = File.ReadAllText("fixtures/unknown-type.json");
    var result = RunGenerator(json, "[BinaryWrapper(\"test\")]");

    var warning = Assert.Single(
        result.Diagnostics.Where(d => d.Id == "BW0012"));

    Assert.Contains("unknown-type", warning.GetMessage());
    Assert.Equal(DiagnosticSeverity.Warning, warning.Severity);
}

[Fact]
public void EmitsErrorForMissingJsonFixture()
{
    var result = RunGenerator(
        jsonContent: null,
        "[BinaryWrapper(\"nonexistent\")]");

    var error = Assert.Single(
        result.Diagnostics.Where(d => d.Id == "BW0001"));

    Assert.Equal(DiagnosticSeverity.Error, error.Severity);
}

Diagnostic tests are the most overlooked category in source generator testing. If the generator silently ignores bad input, the consumer gets no feedback. If it emits a confusing diagnostic, the consumer wastes time. These tests verify that the error messages are both present and useful.

Layer 2: Builder Tests

Builders are the public API surface. They are what consumers interact with daily. Builder tests verify three things: argument correctness, version guards, and validation.

Argument Correctness

The simplest builder test: call methods, build, check arguments.

[Fact]
public void WithDetach_AddsFlag()
{
    var cmd = new DockerContainerRunCommandBuilder()
        .WithDetach(true)
        .WithName("web")
        .WithImage("nginx:latest")
        .Build().ValueOrThrow();

    var args = cmd.ToArguments();

    Assert.Contains("--detach", args);
    Assert.Contains("--name", args);
    Assert.Equal("web", args[args.IndexOf("--name") + 1]);
    Assert.Equal("nginx:latest", args[^1]); // image is always last
}

[Fact]
public void MultipleEnvironmentVariables_ProducesMultipleFlags()
{
    var cmd = new DockerContainerRunCommandBuilder()
        .WithEnv("DB_HOST=localhost")
        .WithEnv("DB_PORT=5432")
        .WithEnv("DB_NAME=mydb")
        .WithImage("myapp:latest")
        .Build().ValueOrThrow();

    var args = cmd.ToArguments();
    var envCount = args.Count(a => a == "--env" || a == "-e");

    Assert.Equal(3, envCount);
}

[Fact]
public void ArgumentOrder_IsDeterministic()
{
    // Build the same command twice -- arguments must be identical
    var build1 = new DockerContainerRunCommandBuilder()
        .WithDetach(true)
        .WithName("web")
        .WithMemory("512m")
        .WithPublish("8080:80")
        .WithImage("nginx:latest")
        .Build().ValueOrThrow();

    var build2 = new DockerContainerRunCommandBuilder()
        .WithDetach(true)
        .WithName("web")
        .WithMemory("512m")
        .WithPublish("8080:80")
        .WithImage("nginx:latest")
        .Build().ValueOrThrow();

    Assert.Equal(build1.ToArguments(), build2.ToArguments());
}

Deterministic argument order matters for caching and for test stability. If the builder produces --name web --detach on one run and --detach --name web on the next, every downstream test becomes flaky. The builder sorts flags in a defined order: global flags first, then command flags alphabetically, then positional arguments.

Version Guard Tests

Version guards prevent consumers from using options that do not exist in their Docker version. These tests verify that the guard triggers correctly:

[Fact]
public void VersionGuard_ThrowsForOldVersion()
{
    var builder = new DockerContainerRunCommandBuilder(
        detectedVersion: SemanticVersion.Parse("18.09.0"));

    var ex = Assert.Throws<OptionNotSupportedException>(() =>
        builder.WithPlatform("linux/arm64"));

    Assert.Contains("19.03.0", ex.Message);
    Assert.Contains("18.09.0", ex.Message);
    Assert.Contains("--platform", ex.Message);
}

[Fact]
public void VersionGuard_AllowsExactVersion()
{
    var builder = new DockerContainerRunCommandBuilder(
        detectedVersion: SemanticVersion.Parse("19.03.0"));

    // Should not throw -- 19.03.0 is the exact SinceVersion for --platform
    var cmd = builder
        .WithPlatform("linux/arm64")
        .WithImage("nginx:latest")
        .Build();

    Assert.True(cmd.IsOk);
}

[Fact]
public void VersionGuard_ThrowsForRemovedOption()
{
    var builder = new DockerContainerRunCommandBuilder(
        detectedVersion: SemanticVersion.Parse("25.0.0"));

    var ex = Assert.Throws<OptionNotSupportedException>(() =>
        builder.WithLinks("db:database"));

    Assert.Contains("removed", ex.Message.ToLower());
    Assert.Contains("--link", ex.Message);
}

The version guard tests are parameterized across every option that has a SinceVersion or UntilVersion. That is a lot of tests -- roughly 30 options across the main commands have version bounds. Each one gets a "too old" test, an "exact version" test, and a "newer version" test. If the generator produces wrong version annotations, these tests catch it.

Validation Tests

Some option combinations are invalid. The builder catches them at build time, not at runtime:

[Fact]
public void Build_FailsWithoutImage()
{
    var result = new DockerContainerRunCommandBuilder()
        .WithDetach(true)
        .WithName("web")
        .Build();

    Assert.True(result.IsError);
    Assert.Contains("image", result.Error.Message.ToLower());
}

[Fact]
public void Build_FailsWithConflictingOptions()
{
    var result = new DockerContainerRunCommandBuilder()
        .WithDetach(true)
        .WithInteractive(true)
        .WithTty(true)
        .WithImage("nginx:latest")
        .Build();

    Assert.True(result.IsError);
    Assert.Contains("--detach", result.Error.Message);
    Assert.Contains("--interactive", result.Error.Message);
}

[Fact]
public void Build_FailsWithInvalidMemoryFormat()
{
    var result = new DockerContainerRunCommandBuilder()
        .WithMemory("not-a-memory-value")
        .WithImage("nginx:latest")
        .Build();

    Assert.True(result.IsError);
    Assert.Contains("memory", result.Error.Message.ToLower());
}

These tests use Result<T> instead of exceptions. The builder's Build() method returns Result<ICliCommand>, not ICliCommand. Invalid combinations produce Result.Failure with a descriptive error, not an exception. The tests verify both the failure and the error message -- because a failure with a useless message is almost as bad as no failure at all.

Layer 3: Output Parser Tests

Output parsers (Part IX) transform raw stdout/stderr lines into typed domain events. Testing them requires saved output from real Docker commands -- another set of fixtures.

Basic Parse Tests

[Fact]
public void DockerBuildParser_ParsesSteps()
{
    var lines = File.ReadAllLines("fixtures/docker-build-output.txt")
        .Select(l => new OutputLine(l, IsError: false, DateTimeOffset.Now));
    var parser = new DockerBuildParser();

    var events = lines
        .SelectMany(l => parser.ParseLine(l))
        .Concat(parser.Complete(0))
        .ToList();

    Assert.Contains(events, e => e is BuildStepStarted { Step: 1, Total: 5 });
    Assert.Contains(events, e => e is BuildLayerCached);
    Assert.Single(events.OfType<BuildComplete>());
}

[Fact]
public void DockerBuildParser_ParsesBuildKitFormat()
{
    var lines = File.ReadAllLines("fixtures/docker-buildkit-output.txt")
        .Select(l => new OutputLine(l, IsError: false, DateTimeOffset.Now));
    var parser = new DockerBuildParser();

    var events = lines
        .SelectMany(l => parser.ParseLine(l))
        .Concat(parser.Complete(0))
        .ToList();

    // BuildKit uses different step format: [1/3] FROM ...
    Assert.Contains(events, e => e is BuildStepStarted);
    Assert.Single(events.OfType<BuildComplete>());
}

The fixtures directory for output parsers:

test/fixtures/
    docker-build-output.txt
    docker-buildkit-output.txt
    docker-run-output.txt
    docker-pull-output.txt
    docker-push-output.txt
    docker-compose-up-output.txt
    docker-compose-down-output.txt
    docker-container-ls-output.txt
    docker-container-ls-output-empty.txt
    docker-build-output-with-warnings.txt
    docker-build-output-with-errors.txt

Every fixture is captured from a real Docker command. I run the command, pipe stdout and stderr to files, and check them in. This means the fixtures reflect actual Docker behavior, not my imagination of what Docker might print.

Edge Case Tests

Real Docker output is messy. These tests cover the edge cases I discovered the hard way:

[Fact]
public void HandlesAnsiColorCodes()
{
    // Docker sometimes emits ANSI escape sequences for colored output
    var line = new OutputLine(
        "\u001b[36mStep 1/3 : FROM nginx:latest\u001b[0m",
        IsError: false,
        DateTimeOffset.Now);
    var parser = new DockerBuildParser();

    var events = parser.ParseLine(line).ToList();

    Assert.Single(events);
    Assert.IsType<BuildStepStarted>(events[0]);
    var step = (BuildStepStarted)events[0];
    Assert.Equal(1, step.Step);
    Assert.Equal(3, step.Total);
}

[Fact]
public void HandlesInterleavedStdoutStderr()
{
    var lines = new[]
    {
        new OutputLine("Step 1/2 : FROM nginx", IsError: false, DateTimeOffset.Now),
        new OutputLine("WARNING: apt does not have a stable CLI interface",
            IsError: true, DateTimeOffset.Now),
        new OutputLine(" ---> abc123", IsError: false, DateTimeOffset.Now),
        new OutputLine("Step 2/2 : COPY . /app", IsError: false, DateTimeOffset.Now),
        new OutputLine("debconf: delaying package configuration",
            IsError: true, DateTimeOffset.Now),
        new OutputLine(" ---> def456", IsError: false, DateTimeOffset.Now),
        new OutputLine("Successfully built def456", IsError: false, DateTimeOffset.Now),
    };
    var parser = new DockerBuildParser();

    var events = lines
        .SelectMany(l => parser.ParseLine(l))
        .Concat(parser.Complete(0))
        .ToList();

    Assert.Equal(2, events.OfType<BuildStepStarted>().Count());
    Assert.Single(events.OfType<BuildComplete>());
    Assert.Contains(events, e => e is BuildWarning);
}

[Fact]
public void HandlesPartialLines()
{
    // Some Docker commands emit progress indicators that overwrite the line
    // using \r without \n -- these arrive as partial lines
    var lines = new[]
    {
        new OutputLine("Pulling from library/nginx", IsError: false, DateTimeOffset.Now),
        new OutputLine("abc123: Pulling fs layer", IsError: false, DateTimeOffset.Now),
        new OutputLine("abc123: Downloading  10%", IsError: false, DateTimeOffset.Now),
        new OutputLine("abc123: Downloading  50%", IsError: false, DateTimeOffset.Now),
        new OutputLine("abc123: Downloading 100%", IsError: false, DateTimeOffset.Now),
        new OutputLine("abc123: Pull complete", IsError: false, DateTimeOffset.Now),
    };
    var parser = new DockerPullParser();

    var events = lines
        .SelectMany(l => parser.ParseLine(l))
        .Concat(parser.Complete(0))
        .ToList();

    var progress = events.OfType<PullProgress>().ToList();
    Assert.True(progress.Count >= 3);
    Assert.Equal(100, progress[^1].Percentage);
}

Compose Output Parser Tests

Docker Compose output has its own format -- service names prefixed with color codes, health check status lines, dependency ordering messages:

[Fact]
public void ComposeUpParser_ParsesServiceStarted()
{
    var lines = File.ReadAllLines("fixtures/docker-compose-up-output.txt")
        .Select(l => new OutputLine(l, IsError: false, DateTimeOffset.Now));
    var parser = new ComposeUpParser();

    var events = lines
        .SelectMany(l => parser.ParseLine(l))
        .Concat(parser.Complete(0))
        .ToList();

    Assert.Contains(events,
        e => e is ComposeServiceStarted { ServiceName: "postgres" });
    Assert.Contains(events,
        e => e is ComposeServiceStarted { ServiceName: "redis" });
    Assert.Contains(events,
        e => e is ComposeServiceHealthy { ServiceName: "postgres" });
}

[Fact]
public void ComposeUpParser_ParsesDependencyWaiting()
{
    var line = new OutputLine(
        " Container myapp-web-1  Waiting",
        IsError: false, DateTimeOffset.Now);
    var parser = new ComposeUpParser();

    var events = parser.ParseLine(line).ToList();

    Assert.Single(events);
    Assert.IsType<ComposeServiceWaiting>(events[0]);
    var waiting = (ComposeServiceWaiting)events[0];
    Assert.Equal("web", waiting.ServiceName);
}

Compose output parsing is harder than Docker CLI output parsing because Docker Compose changed its output format between v1 and v2. The v1 format uses service_name_1 | log line and v2 uses Container project-service-1 Status. Both formats need to work. Both are covered by fixtures.

Layer 4: Integration Tests (FakeProcessRunner)

This is where the layers come together. Integration tests exercise the full pipeline -- builder to executor to parser to result -- but with FakeProcessRunner instead of a real process. The fake replaces the process runner with scripted responses: you tell it "when you see these arguments, emit these lines." No Docker daemon required.

[Fact]
public async Task FullPipeline_BuildAndRun()
{
    var runner = new FakeProcessRunner()
        .Script(
            match: s => s.Arguments.Contains("build"),
            stdoutLines: [
                "Step 1/3 : FROM nginx:latest",
                " ---> abc123",
                "Step 2/3 : COPY . /usr/share/nginx/html",
                " ---> def456",
                "Step 3/3 : EXPOSE 80",
                " ---> ghi789",
                "Successfully built ghi789",
                "Successfully tagged myapp:latest"
            ])
        .Script(
            match: s => s.Arguments.Contains("run"),
            stdoutLines: ["container_id_123"]);

    var executor = new CommandExecutor(runner);
    var binding = new BinaryBinding(
        new BinaryIdentifier("docker"),
        executablePath: "docker",
        detectedVersion: SemanticVersion.Parse("24.0.0"));
    var client = Docker.Create(binding);

    // Build
    var buildCmd = client.Image.Build(b => b
        .WithTag(["myapp:latest"])
        .WithPath("."));
    var buildResult = await executor.ExecuteAsync<BuildEvent, BuildResult>(
        binding, buildCmd, new DockerBuildParser(), new BuildResultCollector());

    Assert.Equal("ghi789", buildResult.ImageId);
    Assert.Equal("myapp:latest", buildResult.Tag);

    // Run
    var runCmd = client.Container.Run(b => b
        .WithDetach(true)
        .WithImage("myapp:latest"));
    var runResult = await executor.ExecuteAsync(binding, runCmd);

    Assert.True(runResult.IsOk);
}

The interaction between the test, the executor, the fake runner, and the parser:

Error Handling Tests

The fake can also simulate failures:

[Fact]
public async Task HandlesNonZeroExitCode()
{
    var runner = new FakeProcessRunner()
        .Script(
            match: s => s.Arguments.Contains("run"),
            stderrLines: [
                "docker: Error response from daemon: Conflict.",
                "The container name \"/web\" is already in use."
            ],
            exitCode: 125);

    var executor = new CommandExecutor(runner);
    var binding = CreateDockerBinding();

    var cmd = Docker.Create(binding).Container.Run(b => b
        .WithName("web")
        .WithImage("nginx:latest"));

    var result = await executor.ExecuteAsync(binding, cmd);

    Assert.True(result.IsError);
    Assert.Equal(125, result.Error.ExitCode);
    Assert.Contains("already in use", result.Error.Stderr);
}

[Fact]
public async Task HandlesProcessTimeout()
{
    var runner = new FakeProcessRunner()
        .Script(
            match: _ => true,
            behavior: FakeProcessBehavior.NeverComplete);

    var executor = new CommandExecutor(runner);
    var binding = CreateDockerBinding();

    var cmd = Docker.Create(binding).Container.Run(b => b
        .WithImage("nginx:latest"));

    var cts = new CancellationTokenSource(TimeSpan.FromMilliseconds(100));

    await Assert.ThrowsAsync<OperationCanceledException>(() =>
        executor.ExecuteAsync(binding, cmd, cancellationToken: cts.Token));
}

[Fact]
public async Task HandlesStderrWarningsWithSuccessfulExit()
{
    var runner = new FakeProcessRunner()
        .Script(
            match: s => s.Arguments.Contains("build"),
            stdoutLines: [
                "Step 1/1 : FROM nginx:latest",
                " ---> abc123",
                "Successfully built abc123"
            ],
            stderrLines: [
                "WARNING: No swap limit support"
            ],
            exitCode: 0);

    var executor = new CommandExecutor(runner);
    var binding = CreateDockerBinding();

    var cmd = Docker.Create(binding).Image.Build(b => b
        .WithPath(".")
        .WithTag(["myapp:latest"]));

    var result = await executor.ExecuteAsync<BuildEvent, BuildResult>(
        binding, cmd, new DockerBuildParser(), new BuildResultCollector());

    // Exit code 0 means success, even with stderr warnings
    Assert.Equal("abc123", result.ImageId);
}

FakeProcessRunner Verification

The fake also verifies that the correct arguments were passed:

[Fact]
public async Task VerifiesCorrectArguments()
{
    var runner = new FakeProcessRunner()
        .Script(
            match: s => true,
            stdoutLines: ["container_id"]);

    var executor = new CommandExecutor(runner);
    var binding = CreateDockerBinding();

    var cmd = Docker.Create(binding).Container.Run(b => b
        .WithDetach(true)
        .WithName("web")
        .WithMemory("512m")
        .WithImage("nginx:latest"));

    await executor.ExecuteAsync(binding, cmd);

    var invocation = Assert.Single(runner.Invocations);
    Assert.Equal("docker", invocation.ExecutablePath);
    Assert.Equal(
        ["container", "run", "--detach", "--memory", "512m", "--name", "web", "nginx:latest"],
        invocation.Arguments);
}

This is the glue test. It verifies that the builder produces the right arguments, the executor passes them to the runner correctly, and the argument order is what Docker expects. If any layer is wrong, this test catches it.

Multi-Command Workflow Tests

Some tests exercise realistic multi-command workflows:

[Fact]
public async Task ComposeWorkflow_UpAndDown()
{
    var runner = new FakeProcessRunner()
        .Script(
            match: s => s.Arguments.Contains("up"),
            stdoutLines: [
                " Container myapp-postgres-1  Starting",
                " Container myapp-postgres-1  Started",
                " Container myapp-postgres-1  Healthy",
                " Container myapp-redis-1  Starting",
                " Container myapp-redis-1  Started",
                " Container myapp-web-1  Starting",
                " Container myapp-web-1  Started",
            ])
        .Script(
            match: s => s.Arguments.Contains("down"),
            stdoutLines: [
                " Container myapp-web-1  Stopping",
                " Container myapp-web-1  Stopped",
                " Container myapp-redis-1  Stopping",
                " Container myapp-redis-1  Stopped",
                " Container myapp-postgres-1  Stopping",
                " Container myapp-postgres-1  Stopped",
                " Container myapp-postgres-1  Removing",
                " Container myapp-postgres-1  Removed",
            ]);

    var executor = new CommandExecutor(runner);
    var binding = CreateDockerBinding();
    var compose = DockerCompose.Create(binding);

    // Up
    var upCmd = compose.Up(b => b.WithDetach(true));
    var upResult = await executor.ExecuteAsync<ComposeEvent, ComposeUpResult>(
        binding, upCmd, new ComposeUpParser(), new ComposeUpResultCollector());

    Assert.Equal(3, upResult.StartedServices.Count);
    Assert.Contains("postgres", upResult.StartedServices);
    Assert.Contains("redis", upResult.StartedServices);
    Assert.Contains("web", upResult.StartedServices);

    // Down
    var downCmd = compose.Down(b => b.WithRemoveOrphans(true));
    var downResult = await executor.ExecuteAsync(binding, downCmd);

    Assert.True(downResult.IsOk);
    Assert.Equal(2, runner.Invocations.Count);
}

Compose Bundle Tests

The Compose Bundle layer (Part IV, Part VIII) has its own test category. It reads JSON Schema files, merges versions, and generates typed C# for Compose YAML authoring.

SchemaReader Tests

[Fact]
public void ParsesServiceDefinition()
{
    var schema = File.ReadAllText("fixtures/compose-spec-3.8.json");
    var reader = new ComposeSchemaReader();

    var model = reader.Parse(schema);

    var service = model.Definitions["service"];
    Assert.Contains("image", service.Properties.Keys);
    Assert.Contains("build", service.Properties.Keys);
    Assert.Contains("ports", service.Properties.Keys);
    Assert.Equal("string", service.Properties["image"].Type);
}

[Fact]
public void ParsesPortFormats()
{
    var schema = File.ReadAllText("fixtures/compose-spec-3.8.json");
    var reader = new ComposeSchemaReader();

    var model = reader.Parse(schema);

    var ports = model.Definitions["service"].Properties["ports"];
    // ports can be string[] or object[] -- the schema uses oneOf
    Assert.True(ports.IsArrayType);
    Assert.True(ports.Items.IsOneOf);
    Assert.Equal(2, ports.Items.OneOf.Count);
}

SchemaVersionMerger Tests

[Fact]
public void MergesThreeVersions()
{
    var v38 = ParseSchema("fixtures/compose-spec-3.8.json");
    var v39 = ParseSchema("fixtures/compose-spec-3.9.json");
    var v2  = ParseSchema("fixtures/compose-spec-2.0.json");

    var merged = ComposeSchemaVersionMerger.Merge([
        ("3.8", v38),
        ("3.9", v39),
        ("2.0", v2)
    ]);

    var service = merged.Definitions["service"];
    var profiles = service.Properties["profiles"];

    // profiles was added in 3.9
    Assert.Equal("3.9", profiles.SinceVersion);
    Assert.Null(profiles.UntilVersion);
}

YAML Round-Trip Tests

These tests verify that the typed Compose API generates valid YAML that Docker Compose can understand:

[Fact]
public void GeneratedYaml_RoundTrips()
{
    var compose = new ComposeFile()
        .AddService("web", service => service
            .WithImage("nginx:latest")
            .WithPorts(["8080:80"])
            .WithRestart("unless-stopped"))
        .AddService("db", service => service
            .WithImage("postgres:16")
            .WithEnvironment(new Dictionary<string, string>
            {
                ["POSTGRES_DB"] = "mydb",
                ["POSTGRES_PASSWORD"] = "secret"
            })
            .WithVolumes(["pgdata:/var/lib/postgresql/data"]))
        .AddVolume("pgdata");

    var yaml = compose.ToYaml();
    var parsed = new YamlParser().Parse(yaml);

    Assert.Equal("nginx:latest", parsed["services"]["web"]["image"].ToString());
    Assert.Equal("postgres:16", parsed["services"]["db"]["image"].ToString());
    Assert.Contains("8080:80", parsed["services"]["web"]["ports"].AsSequence());
}

Layer 5: Real Docker Validation

Fifteen tests need an actual Docker daemon. These tests are in a separate test project, excluded from the default dotnet test run, and only executed when Docker is available:

[DockerRequired]
public class RealDockerTests
{
    [Fact]
    public async Task CanRunContainerAndGetOutput()
    {
        var binding = await BinaryResolver.ResolveAsync("docker");
        var executor = new CommandExecutor(new RealProcessRunner());
        var client = Docker.Create(binding);

        var cmd = client.Container.Run(b => b
            .WithRm(true)
            .WithImage("hello-world"));

        var result = await executor.ExecuteAsync(binding, cmd);

        Assert.True(result.IsOk);
    }

    [Fact]
    public async Task VersionDetectionMatchesReality()
    {
        var binding = await BinaryResolver.ResolveAsync("docker");

        Assert.NotNull(binding.DetectedVersion);
        Assert.True(binding.DetectedVersion >= SemanticVersion.Parse("20.0.0"));
    }

    [Fact]
    public async Task BuildProducesTypedEvents()
    {
        var binding = await BinaryResolver.ResolveAsync("docker");
        var executor = new CommandExecutor(new RealProcessRunner());
        var client = Docker.Create(binding);

        // Build a minimal Dockerfile from the test project
        var cmd = client.Image.Build(b => b
            .WithTag(["typed-docker-test:latest"])
            .WithPath("fixtures/minimal-dockerfile/"));

        var result = await executor.ExecuteAsync<BuildEvent, BuildResult>(
            binding, cmd, new DockerBuildParser(), new BuildResultCollector());

        Assert.NotNull(result.ImageId);
        Assert.NotEmpty(result.ImageId);

        // Cleanup
        await executor.ExecuteAsync(
            binding,
            client.Image.Rm(b => b.WithForce(true).WithImage("typed-docker-test:latest")));
    }
}

The [DockerRequired] attribute is a custom xunit trait that skips the test when Docker is not available:

public sealed class DockerRequiredAttribute : FactAttribute
{
    public DockerRequiredAttribute()
    {
        if (!IsDockerAvailable())
            Skip = "Docker daemon is not available";
    }

    private static bool IsDockerAvailable()
    {
        try
        {
            var process = Process.Start(new ProcessStartInfo
            {
                FileName = "docker",
                Arguments = "version --format '{{.Server.Version}}'",
                RedirectStandardOutput = true,
                RedirectStandardError = true,
                UseShellExecute = false,
                CreateNoWindow = true
            });
            process?.WaitForExit(5000);
            return process?.ExitCode == 0;
        }
        catch
        {
            return false;
        }
    }
}

These tests are the validation layer. They do not test logic -- they validate assumptions. "Does Docker still format build output the way I think it does?" "Does version detection still work against the latest Docker release?" "Does the real process runner handle async output correctly?" If a Docker upgrade changes behavior, these fifteen tests are the canary.

Test Infrastructure

The RunGenerator Helper

Source generator tests need a helper that creates an in-memory compilation and runs the generator. This helper is shared across all generator tests:

internal static class GeneratorTestHelper
{
    private static readonly MetadataReference[] References =
    [
        MetadataReference.CreateFromFile(typeof(object).Assembly.Location),
        MetadataReference.CreateFromFile(typeof(Attribute).Assembly.Location),
        MetadataReference.CreateFromFile(typeof(BinaryWrapperAttribute).Assembly.Location),
        MetadataReference.CreateFromFile(
            Assembly.Load("System.Runtime").Location),
    ];

    public static GeneratorDriverRunResult RunGenerator(
        string? jsonContent,
        string attributeSource)
    {
        var syntaxTree = CSharpSyntaxTree.ParseText($$"""
            using BinaryWrapper;
            
            namespace TestNamespace;
            
            {{attributeSource}}
            public partial class TestBinary;
            """);

        var compilation = CSharpCompilation.Create(
            assemblyName: "TestAssembly",
            syntaxTrees: [syntaxTree],
            references: References,
            options: new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary));

        var generator = new BinaryWrapperGenerator();
        var driver = CSharpGeneratorDriver.Create(generator);

        if (jsonContent is not null)
        {
            var additionalText = new InMemoryAdditionalText(
                "simple-binary-1.0.0.json", jsonContent);
            driver = driver.AddAdditionalTexts([additionalText]);
        }

        driver = driver.RunGeneratorsAndUpdateCompilation(
            compilation, out var outputCompilation, out _);

        return driver.GetRunResult();
    }
}

The VerifySnapshot Helper

Snapshot comparison that ignores insignificant whitespace differences:

internal static class SnapshotHelper
{
    public static void VerifySnapshot(SourceText actual, string expectedPath)
    {
        var expectedText = File.ReadAllText(expectedPath);
        var actualText = actual.ToString();

        var expectedLines = expectedText.Split('\n').Select(l => l.TrimEnd()).ToArray();
        var actualLines = actualText.Split('\n').Select(l => l.TrimEnd()).ToArray();

        Assert.Equal(expectedLines.Length, actualLines.Length);

        for (var i = 0; i < expectedLines.Length; i++)
        {
            Assert.True(
                expectedLines[i] == actualLines[i],
                $"Line {i + 1} differs:\n" +
                $"  Expected: {expectedLines[i]}\n" +
                $"  Actual:   {actualLines[i]}");
        }
    }

    public static void UpdateSnapshot(SourceText actual, string snapshotPath)
    {
        File.WriteAllText(snapshotPath, actual.ToString());
    }
}

Test Project Structure

test/
    TypedDocker.Parser.Tests/
        CobraHelpParserTests.cs
        CobraHelpPropertyTests.cs
        ReparseSnapshotTests.cs
        fixtures/
            docker-help/
                24.0.0/
                25.0.0/
                26.0.0/

    TypedDocker.Generator.Tests/
        SnapshotTests.cs
        CompilationTests.cs
        VersionDifferTests.cs
        DiagnosticTests.cs
        fixtures/
            simple-binary-1.0.0.json
            simple-binary-2.0.0.json
            docker-24.0.0.json
        expected/
            SimpleRunCommand.g.cs
            DockerContainerRun.g.cs

    TypedDocker.Builder.Tests/
        ArgumentTests.cs
        VersionGuardTests.cs
        ValidationTests.cs

    TypedDocker.OutputParser.Tests/
        DockerBuildParserTests.cs
        DockerPullParserTests.cs
        ComposeUpParserTests.cs
        EdgeCaseTests.cs
        fixtures/
            docker-build-output.txt
            docker-buildkit-output.txt
            docker-compose-up-output.txt

    TypedDocker.Integration.Tests/
        FakeProcessRunnerTests.cs
        FullPipelineTests.cs
        WorkflowTests.cs

    TypedDocker.ComposeBundle.Tests/
        SchemaReaderTests.cs
        SchemaVersionMergerTests.cs
        YamlRoundTripTests.cs

    TypedDocker.RealDocker.Tests/
        RealDockerTests.cs
        DockerRequiredAttribute.cs
        fixtures/
            minimal-dockerfile/
                Dockerfile

Each layer has its own test project. This is deliberate -- it means you can run dotnet test TypedDocker.Parser.Tests without pulling in the generator, the builder, or anything else. Each project has the minimum dependencies it needs. The parser tests depend on the parser library and xunit. The generator tests depend on the generator, Roslyn test infrastructure, and xunit. The integration tests depend on everything except the real Docker test project.

Test Statistics

Here is the complete breakdown:

Layer	Tests	Needs Docker?	Typical Duration
Parser fixtures	120	No	<1s
Generator snapshots	45	No	~3s
Builder assertions	85	No	<1s
Output parser sequences	60	No	<1s
Integration (FakeProcessRunner)	70	No	~2s
Compose Bundle	30	No	~2s
Real Docker validation	15	Yes	~30s
Total	425	3.5% need Docker	~40s

The generator tests are the slowest non-Docker tests because they spin up Roslyn compilations. Three seconds for 45 tests is acceptable -- each test creates a compilation, runs the generator, and either compares a snapshot or checks diagnostics. The parser tests are the fastest because they are pure string-to-data transformations. The builder tests are similarly fast because they are pure method-call-to-list transformations.

The real Docker tests are 30 seconds not because they are doing anything complex, but because Docker image pulls and builds are I/O bound. That 30 seconds would be the entire test suite duration if I had not designed for testability.

What This Testing Strategy Enables

Refactoring Without Fear

When I rewrote the Cobra help parser to handle BuildKit-style --help output, I changed roughly 200 lines of parsing logic. The 120 parser tests told me exactly what broke and exactly what improved. Three snapshots changed because the new parser extracted more information from the same input. Two tests failed because I had a regression in short-name detection. I fixed the regression, updated the three improved snapshots, and had confidence that the rewrite was correct.

Without those tests, I would have had to manually run docker container run --help on three Docker versions and visually compare the output. That is what "testing" looks like in most CLI wrapper projects.

Version Upgrades Without Surprises

When Docker 26 released, I ran the scraper, committed the new fixtures, and ran the test suite. Fourteen parser tests failed because Docker 26 added fourteen new options across various commands. Zero parser tests regressed -- the new options were simply not in the blessed snapshots. I updated the snapshots, ran the generator tests, confirmed the new options produced compilable code, and shipped the update. Total effort: twenty minutes, most of it reviewing the new options.

CI Without Docker

The CI pipeline runs dotnet test --filter "Category!=DockerRequired" and completes in under ten seconds. No Docker-in-Docker. No Docker socket mounting. No daemon management. The CI server is a basic build agent with the .NET SDK installed. The real Docker tests run on a separate schedule -- nightly, on a machine that has Docker -- and they validate the assumptions that the fast tests rely on.

This is the payoff of the layered architecture. It is not just about clean code or separation of concerns. It is about testability. Every seam between layers is a test boundary. Every interface is a fake point. Every fixture is a time capsule of real behavior. The tests are fast because the architecture is layered, and the architecture is layered because I wanted the tests to be fast.

Closing

425 tests. 96.5% run without Docker. The test pyramid works: parsers at the base, generators in the middle, integration at the top. The expensive tests validate assumptions. The cheap tests verify logic. The fixtures are the ground truth.

The most important decision was not which test framework to use or how to structure the assertions. It was the architectural decision to make every layer testable in isolation. FakeProcessRunner is not a testing hack -- it is a first-class citizen of the design, the same way IOutputParser and IResultCollector are. The test strategy was not bolted on after the fact. It was a design constraint from day one.

Next: Part XVI -- the design philosophy, comparison with alternatives, and what comes next.

`[` or `Alt+S`	Focus sidebar navigation
`]` or `Alt+C`	Focus main content
`↑` `↓`	Navigate between sidebar items
`Enter`	Open page / toggle section
`Space`	Toggle section expand/collapse
`Escape`	Close overlay / sidebar

`Ctrl+K`	Open search
`?`	Show this help

`Ctrl+=` or `Ctrl+↑`	Increase font size
`Ctrl+−` or `Ctrl+↓`	Decrease font size
`f`	Open console font selector

`Ctrl+⇧+=` or `Ctrl+⇧+↑`	Browser zoom in
`Ctrl+⇧+−` or `Ctrl+⇧+↓`	Browser zoom out
`Ctrl+⇧+0`	Reset browser zoom

`Tab`	Focus a diagram or image
`Enter`	Open full size overlay
`+` `−`	Zoom in / out (in overlay)
`Escape`	Close overlay, return focus

Part XV: Testing Strategy -- From Parser to Deployment📋

The Testing Problem Nobody Talks About📋

The Test Pyramid📋

Layer 0: Parser Tests (CobraHelpParser)📋

Fixture-Based Unit Tests📋

Structural Assertions📋

Property-Based Tests with CsCheck📋

Regression Snapshot Tests📋

Layer 1: Source Generator Tests📋

Snapshot Tests📋

Compilation Tests📋

VersionDiffer Tests📋

Diagnostic Tests📋

Layer 2: Builder Tests📋

Argument Correctness📋

Version Guard Tests📋

Validation Tests📋

Layer 3: Output Parser Tests📋

Basic Parse Tests📋

Edge Case Tests📋

Compose Output Parser Tests📋

Layer 4: Integration Tests (FakeProcessRunner)📋

Error Handling Tests📋

FakeProcessRunner Verification📋

Multi-Command Workflow Tests📋

Compose Bundle Tests📋

SchemaReader Tests📋

SchemaVersionMerger Tests📋

YAML Round-Trip Tests📋

Layer 5: Real Docker Validation📋

Test Infrastructure📋

The RunGenerator Helper📋

The VerifySnapshot Helper📋

Test Project Structure📋

Test Statistics📋

What This Testing Strategy Enables📋

Refactoring Without Fear📋

Version Upgrades Without Surprises📋

CI Without Docker📋

Closing📋