Part 14: The Test Pyramid — From Unit to CLI Harness

"A test that doesn't run on every commit is a test that doesn't exist."

Why

Part 02 said that the CLI is the only test surface that exists. That sentence is true at one level — every meaningful behaviour must be expressible as a CLI invocation — but it would be a disaster to take literally. If every test were an end-to-end CLI test, the test suite would take an hour to run, the CI cost would be unsustainable, and the dev loop would be dead.

The actual test strategy has four surfaces, at four very different speeds, each catching a different class of bug. The four surfaces form a pyramid: lots of fast tests at the bottom, fewer slow tests at the top. The pyramid is the standard advice for any large codebase; what is not standard is the discipline that prevents the pyramid from inverting under pressure.

This part shows the four surfaces, the speed budget for each, the kind of bug each catches, and the architecture rules that keep the pyramid right-side-up.

The four surfaces

                       ╭──────────────────────╮
                       │  CLI E2E (~3 s/test) │   ← few; full pipeline through Process.Start
                       ╰──────────────────────╯
                  ╭────────────────────────────╮
                  │ Pipeline integration (50ms) │  ← stages composed in-memory, no Process
                  ╰────────────────────────────╯
            ╭───────────────────────────────────╮
            │   Unit tests (3 ms each)            │  ← per-handler, per-projector, per-builder
            ╰───────────────────────────────────╯
      ╭────────────────────────────────────────────╮
      │  Architecture tests (10 ms each, all in one) │  ← static rules, IL scanning
      ╰────────────────────────────────────────────╯

Surface	Speed	Quantity	Catches
Architecture	10 ms total	1 test class	SOLID/DRY violations, layering bugs, forbidden dependencies
Unit	3 ms each	hundreds	Logic bugs in single classes
Pipeline integration	50 ms each	dozens	Inter-stage bugs, Result propagation, event ordering
CLI E2E	3 s each	tens	Wiring bugs, real binary failures, real file I/O

The total wall-clock for the suite, on a fast laptop, should be under 60 seconds. That is the budget. If it grows past 60 seconds, the team starts skipping tests, which means the tests stop being a safety net.

Surface 1 — Architecture tests

These are the cheapest and the most underused. An architecture test does not exercise behaviour. It exercises structure. It walks the type system, the IL, or the dependency graph and asserts a static rule.

public sealed class HomeLabArchitectureTests
{
    private static Assembly Lib => typeof(IHomeLabPipeline).Assembly;
    private static Assembly Cli => typeof(HomeLabRootCommand).Assembly;

    [Fact]
    public void cli_must_not_reference_System_IO_directly_in_verbs()
    {
        var verbs = Cli.GetTypes()
            .Where(t => typeof(IHomeLabVerbCommand).IsAssignableFrom(t) && !t.IsInterface);

        foreach (var verb in verbs)
        {
            var calls = MethodCallScanner.Scan(verb, new[]
            {
                typeof(File), typeof(Directory), typeof(Process), typeof(Path)
            });
            calls.Should().BeEmpty($"{verb.Name} must delegate to the lib (no I/O in verbs)");
        }
    }

    [Fact]
    public void lib_must_not_reference_System_CommandLine()
    {
        Lib.GetReferencedAssemblies()
           .Should().NotContain(a => a.Name == "System.CommandLine");
    }

    [Fact]
    public void all_stages_have_unique_order()
    {
        var stages = Lib.GetTypes()
            .Where(t => typeof(IHomeLabStage).IsAssignableFrom(t) && !t.IsInterface)
            .Select(t => (IHomeLabStage)Activator.CreateInstance(t, FakeDeps(t))!);

        var orders = stages.Select(s => s.Order).ToList();
        orders.Should().OnlyHaveUniqueItems("two stages with the same order would be ambiguous");
    }

    [Fact]
    public void all_event_records_implement_IHomeLabEvent()
    {
        var eventTypes = Lib.GetTypes()
            .Where(t => t.Name.EndsWith("Started")
                     || t.Name.EndsWith("Completed")
                     || t.Name.EndsWith("Failed")
                     || t.Name.EndsWith("Added")
                     || t.Name.EndsWith("Removed"));

        eventTypes.Should().OnlyContain(t => typeof(IHomeLabEvent).IsAssignableFrom(t),
            "every event record must implement IHomeLabEvent");
    }

    [Fact]
    public void no_stage_or_contributor_may_call_File_directly()
    {
        var classes = Lib.GetTypes()
            .Where(t => t.IsClass && !t.IsAbstract)
            .Where(t => typeof(IHomeLabStage).IsAssignableFrom(t)
                     || t.Name.EndsWith("Contributor"));

        foreach (var c in classes)
        {
            var calls = MethodCallScanner.Scan(c, new[] { typeof(File), typeof(StreamWriter) });
            calls.Should().BeEmpty($"{c.Name} must use IBundleWriter (DRY)");
        }
    }

    [Fact]
    public void schema_is_in_sync_with_csharp_types()
    {
        var fromDisk = File.ReadAllText("schemas/homelab-config.schema.json");
        var freshlyGenerated = SchemaGenerator.Generate(typeof(HomeLabConfig));
        freshlyGenerated.Should().Be(fromDisk,
            "the committed schema must match the C# types — run `dotnet build` and commit the diff");
    }

    [Fact]
    public void contributor_interfaces_must_have_at_most_one_method()
    {
        var contributorIfaces = Lib.GetTypes()
            .Where(t => t.IsInterface && t.Name.EndsWith("Contributor"));
        foreach (var i in contributorIfaces)
            i.GetMethods().Should().HaveCount(1, $"{i.Name} should have exactly one method (ISP)");
    }

    [Fact]
    public void no_static_state_in_lib()
    {
        var staticFields = Lib.GetTypes()
            .Where(t => t.IsClass)
            .SelectMany(t => t.GetFields(BindingFlags.Static | BindingFlags.NonPublic | BindingFlags.Public))
            .Where(f => !f.IsLiteral && !f.IsInitOnly)
            .Where(f => !f.DeclaringType!.Name.EndsWith("Tests"));

        staticFields.Should().BeEmpty("static mutable state breaks tests and concurrency");
    }
}

Eight tests. They run in a tenth of a second. They prevent the entire class of "someone added a side door to the architecture" slippage. Every architecture rule we have stated in Acts I and II has a corresponding test here.

The cost is the MethodCallScanner and ConstructorScanner utilities (about 80 lines of Mono.Cecil each). The benefit is that the architecture is enforced, not aspired to. You cannot land a PR that breaks SOLID without making one of these tests fail.

Surface 2 — Unit tests

Unit tests target one class. They use fakes for every dependency. They run in milliseconds. There are hundreds of them, and they run on every save in dotnet watch test.

public sealed class HomeLabPlanProjectorTests
{
    [Fact]
    public void projector_emits_one_target_per_compose_service()
    {
        var contributors = new IComposeFileContributor[]
        {
            new FakeComposeContributor("gitlab"),
            new FakeComposeContributor("postgres"),
            new FakeComposeContributor("traefik")
        };
        var projector = new HomeLabPlanProjector(contributors);
        var config = new HomeLabConfig { Name = "x", Topology = "single", Engine = "docker" };

        var ir = projector.Project(config);

        ir.Targets.Select(t => t.Identifier).Should().Equal("gitlab", "postgres", "traefik");
    }

    [Fact]
    public void projector_assigns_topology_tag_to_every_target()
    {
        var contributors = new[] { new FakeComposeContributor("a") };
        var projector = new HomeLabPlanProjector(contributors);
        var config = new HomeLabConfig { Name = "x", Topology = "ha", Engine = "docker" };

        var ir = projector.Project(config);

        ir.Targets.Should().OnlyContain(t => t.Tags["topology"] == "ha");
    }

    [Fact]
    public void projector_skips_probes_for_services_without_healthcheck()
    {
        var contributors = new[] { new FakeComposeContributor("svc-without-hc", healthCheck: null) };
        var projector = new HomeLabPlanProjector(contributors);
        var config = new HomeLabConfig { Name = "x", Topology = "single", Engine = "docker" };

        var ir = projector.Project(config);

        ir.Probes.Should().BeEmpty();
    }
}

Three tests for one class. Each runs in 3 ms. Each fakes its dependencies. None of them touch the file system. None of them spawn a process. None of them care about the rest of the lib.

A unit test that needs more than two fakes is a sign that the class under test has too many dependencies — which is a sign that the class violates SRP. The architecture tests cannot catch that, but the act of writing the unit test will. The test is the design feedback.

Surface 3 — Pipeline integration tests

These tests compose multiple real components in-memory and assert on the result. They use RecordingEventBus (from Part 09) to inspect events. They use MockFileSystem from System.IO.Abstractions.TestingHelpers to avoid touching the disk. They use fake binary clients (FakeVagrantClient, FakePackerClient) instead of Process.Start.

public sealed class HomeLabPipelineIntegrationTests
{
    [Fact]
    public async Task full_pipeline_succeeds_for_minimal_single_topology_config()
    {
        var fs = new MockFileSystem();
        fs.AddFile("/lab/config-homelab.yaml", new MockFileData("name: t\ntopology: single"));

        var bus = new RecordingEventBus();
        var clock = new FakeClock(DateTimeOffset.Parse("2026-04-08T12:00:00Z"));
        var sp = new ServiceCollection()
            .AddHomeLab()
            .Replace(ServiceDescriptor.Singleton<IFileSystem>(fs))
            .Replace(ServiceDescriptor.Singleton<IHomeLabEventBus>(bus))
            .Replace(ServiceDescriptor.Singleton<IClock>(clock))
            .Replace(ServiceDescriptor.Singleton<IPackerClient, FakePackerClient>())
            .Replace(ServiceDescriptor.Singleton<IVagrantClient, FakeVagrantClient>())
            .Replace(ServiceDescriptor.Singleton<IDockerComposeClient, FakeDockerComposeClient>())
            .BuildServiceProvider();

        var pipeline = sp.GetRequiredService<IHomeLabPipeline>();
        var result = await pipeline.RunAsync(new VosUpRequest(MachineName: null, ConfigPath: new("/lab/config-homelab.yaml")), CancellationToken.None);

        result.IsSuccess.Should().BeTrue();
        bus.Recorded.OfType<PipelineCompleted>().Should().ContainSingle();
        bus.Recorded.OfType<StageCompleted>().Should().HaveCount(6); // all six stages
    }

    [Fact]
    public async Task pipeline_short_circuits_if_validate_stage_fails()
    {
        var fs = new MockFileSystem();
        fs.AddFile("/lab/config-homelab.yaml", new MockFileData("name: t\ntopology: BOGUS"));

        var bus = new RecordingEventBus();
        var sp = StandardTestServiceProvider(fs, bus);
        var pipeline = sp.GetRequiredService<IHomeLabPipeline>();

        var result = await pipeline.RunAsync(new VosUpRequest(null, new("/lab/config-homelab.yaml")), CancellationToken.None);

        result.IsFailure.Should().BeTrue();
        bus.Recorded.OfType<StageFailed>().Single().StageName.Should().Be("validate");
        bus.Recorded.OfType<PipelineFailed>().Single().FailedAtStage.Should().Be("validate");
        bus.Recorded.OfType<StageStarted>().Should().HaveCount(1); // only validate started
    }
}

These tests run in 50 ms each. They exercise the real pipeline, the real stages, the real DI graph — only the file system and the binary wrappers are faked. They catch bugs that unit tests miss: wrong stage order, missing event publication, broken Result<T> propagation, missing service registration.

Surface 4 — CLI E2E tests

These are the slow, expensive, real-deal tests. They invoke dotnet run --project HomeLab.Cli, against a real temp directory, against a real Vagrant binary, against a real VirtualBox VM. They take seconds. There are no more than a few dozen of them. They run on CI on the main branch and nightly, not on every push.

public sealed class HomeLabCliE2ETests
{
    [Fact]
    [Trait("category", "e2e")]
    public async Task vos_up_boots_a_real_vm()
    {
        using var lab = await TestLab.NewAsync(name: nameof(vos_up_boots_a_real_vm), topology: "single");
        await lab.WriteConfigAsync(@"
            name: e2e-vos-up
            topology: single
            packer: { distro: alpine, version: '3.21', kind: dockerhost }
            vos:    { box: e2e/alpine-3.21-dockerhost, memory: 1024, cpus: 1 }
            compose:{ traefik: false, gitlab: false }
        ");

        var result = await lab.Cli("vos", "up", "--config", lab.ConfigPath);

        result.ExitCode.Should().Be(0);
        result.StdOut.Should().Contain("Booted 1 VM");
        var vagrantStatus = await lab.Run("vagrant", "status");
        vagrantStatus.StdOut.Should().Contain("running");
    }

    [Fact]
    [Trait("category", "e2e")]
    [Trait("category", "slow")]    // ~15 minutes; runs nightly
    public async Task devlab_full_bootstrap_and_dogfood_loop()
    {
        // The most important test in the suite — the dogfood loop closure test
        // (we saw the body of this in Part 06)
    }
}

E2E tests are gated by a category trait. The default dotnet test run skips them. CI on the main branch runs them once before merging. The nightly job runs the slow ones (full DevLab bootstrap, dogfood loop validation).

The wiring

Test categorization is one xUnit attribute. The CI matrix runs three jobs:

# .gitlab-ci.yml — generated by GitLab.Ci.Yaml from C#
test-fast:
  stage: test
  script:
    - dotnet test --filter "category!=e2e&category!=slow"   # arch + unit + integration; ~30 s

test-e2e:
  stage: test
  script:
    - dotnet test --filter "category=e2e&category!=slow"     # CLI E2E; ~5 min
  only:
    - merge_requests
    - main

test-slow:
  stage: test
  script:
    - dotnet test --filter "category=slow"                    # full dogfood loop; ~30 min
  only:
    - schedules        # nightly cron

Three jobs. Three speeds. Each catches a different class of failure. The fast job runs on every push. The e2e job runs on every MR. The slow job runs nightly. The pyramid is intact.

What this gives you that bash doesn't

A bash project's test strategy is, in practice, "I ran it once and it worked". There is no architecture test (bash has no architecture). There is no unit test (every line of bash is wired into globals). There is no pipeline integration test (the pipeline is the script). There is the script and the bug report.

A four-surface test pyramid gives you, for the same surface area:

Architecture tests that prevent SOLID/DRY/layering slippage at unit-test speed
Unit tests that exercise individual classes in milliseconds
Pipeline integration tests that exercise the lib in-memory, with fake binaries, in 50 ms
CLI E2E tests that exercise the production binary against real infrastructure
A 60-second total budget for the fast suite, so it runs on every save
Categorized slow tests that run on schedule, not on every push
A coverage report generated by QualityGate from the unit + integration coverage

The bargain is the most familiar of all the architectural decisions in the series — every senior developer has heard of the test pyramid — and the hardest to actually live by, because the temptation to "just write an E2E test" is constant. The discipline is the value. The pyramid is upside-down by default; gravity pulls slow tests to the top. The architecture tests are the only thing that forces the pyramid to stay right-side-up, because they fail loudly when someone tries to put I/O in a verb or business logic in a contributor.

End of Act II

We have now defined the entire HomeLab architecture: the CLI thesis, the thin-CLI/fat-lib boundary, schema-validated YAML, git-composable config, the dogfood target, the six-stage pipeline, SOLID/DRY in practice, the event bus, the plugin system, the toolbelt, Ops.Dsl as the substrate, the eight sub-DSLs we consume, and the four-surface test pyramid.

Act III drops one level: we leave the lib internals and start talking to the tools. Docker, Podman, Packer, Vagrant, Traefik, GitLab. Each one is a binary with a CLI surface, and each one needs a typed wrapper. We will see how [BinaryWrapper] generates those wrappers from --help output, what the gotchas are with each tool, and how the wrappers slot into the pipeline as [Injectable] services.

`[` or `Alt+S`	Focus sidebar navigation
`]` or `Alt+C`	Focus main content
`↑` `↓`	Navigate between sidebar items
`Enter`	Open page / toggle section
`Space`	Toggle section expand/collapse
`Escape`	Close overlay / sidebar

`Ctrl+K`	Open search
`?`	Show this help

`Ctrl+=` or `Ctrl+↑`	Increase font size
`Ctrl+−` or `Ctrl+↓`	Decrease font size
`f`	Open console font selector

`Ctrl+⇧+=` or `Ctrl+⇧+↑`	Browser zoom in
`Ctrl+⇧+−` or `Ctrl+⇧+↓`	Browser zoom out
`Ctrl+⇧+0`	Reset browser zoom

`Tab`	Focus a diagram or image
`Enter`	Open full size overlay
`+` `−`	Zoom in / out (in overlay)
`Escape`	Close overlay, return focus

Part 14: The Test Pyramid — From Unit to CLI Harness📋

Why📋

The four surfaces📋

Surface 1 — Architecture tests📋

Surface 2 — Unit tests📋

Surface 3 — Pipeline integration tests📋

Surface 4 — CLI E2E tests📋

The wiring📋

What this gives you that bash doesn't📋

End of Act II📋

Cross-links📋