Part 52: Tearing It All Down — Cleanly

"A system that cannot be cleanly torn down cannot be cleanly tested."

Why

A homelab user does not just create labs. They destroy them. They destroy a pr-1234 after CI finishes. They destroy a dev-single after a week of HomeLab development. They destroy a stale ha-stage to free disk. They destroy everything before reformatting their workstation.

A clean teardown is harder than it looks. The naive vagrant destroy leaves behind:

Docker volumes (in MinIO storage, in Postgres data, etc.)
DNS entries in PiHole or /etc/hosts
Hostname certs in the OS trust store
The instance registry entry
The instance's working directory
Vagrant box references (though the boxes themselves are shared and stay)
Cron-scheduled backup jobs
Open ports forwarded by the hypervisor

Even worse, the naive teardown of one instance can disturb another — for example, removing a docker network that another instance happens to use, or deleting /etc/hosts entries with a regex too greedy.

The thesis of this part is: homelab destroy --instance <name> walks every resource the instance created, removes each one, verifies the removal, and refuses to touch anything that does not belong to the instance. The teardown is itself a typed pipeline, run in reverse order, with Result<T> at every step. Multi-instance isolation is preserved through teardown as well as setup.

The shape

[Injectable(ServiceLifetime.Singleton)]
public sealed class DestroyRequestHandler : IRequestHandler<DestroyRequest, Result<DestroyResponse>>
{
    private readonly IInstanceRegistry _registry;
    private readonly IVosOrchestrator _vos;
    private readonly IContainerEngine _engine;
    private readonly IDnsProvider _dns;
    private readonly IBackupSchedulerControl _backupSched;
    private readonly IFileSystem _fs;
    private readonly IHomeLabEventBus _events;

    public async Task<Result<DestroyResponse>> HandleAsync(DestroyRequest req, CancellationToken ct)
    {
        var scopeResult = await _registry.GetAsync(req.InstanceName, ct);
        if (scopeResult.IsFailure)
            return Result.Failure<DestroyResponse>($"instance not found: {req.InstanceName}");

        var scope = scopeResult.Value;

        if (!req.Force)
        {
            var confirmation = await PromptAsync($"This will destroy instance '{req.InstanceName}' and all its data. Continue? [y/N]");
            if (!confirmation) return Result.Failure<DestroyResponse>("aborted by user");
        }

        await _events.PublishAsync(new InstanceDestroyStarted(req.InstanceName, _clock.UtcNow), ct);

        // Run the teardown stages in reverse order of setup
        var stages = new Func<Task<Result>>[]
        {
            () => StopBackupSchedules(scope, ct),
            () => StopComposeStacks(scope, ct),
            () => DestroyVms(scope, ct),
            () => RemoveDnsEntries(scope, ct),
            () => RemoveCertsFromOsStore(scope, ct),
            () => RemoveDockerNetworks(scope, ct),
            () => RemoveDockerVolumes(scope, ct),
            () => RemoveWorkdir(scope, ct),
            () => ReleaseInstanceRegistry(scope, ct),
        };

        var failures = new List<string>();
        foreach (var stage in stages)
        {
            var result = await stage();
            if (result.IsFailure)
            {
                failures.AddRange(result.Errors);
                if (!req.ContinueOnError) break;
            }
        }

        await _events.PublishAsync(failures.Count == 0
            ? new InstanceDestroyCompleted(req.InstanceName, _clock.UtcNow)
            : new InstanceDestroyFailed(req.InstanceName, failures, _clock.UtcNow), ct);

        return failures.Count == 0
            ? Result.Success(new DestroyResponse(req.InstanceName))
            : Result.Failure<DestroyResponse>(string.Join("\n", failures));
    }

    // ... per-stage methods below
}

Nine stages, run in reverse order of setup. Each one:

Acts only on resources prefixed with the instance scope name. The DNS removal does not touch entries belonging to other instances. The docker volume cleanup uses a label filter (homelab.instance=prod-multi) so it only finds the right volumes.
Returns Result<T>. A failure does not necessarily abort the teardown — --continue-on-error lets the user clean up everything that can be cleaned up, with a list of warnings at the end.
Publishes events. Every step is observable.

Per-stage details

Stop backup schedules

The backup scheduler from Part 45 is told to drop any policy that targets this instance:

private async Task<Result> StopBackupSchedules(InstanceScope scope, CancellationToken ct)
{
    return await _backupSched.RemoveInstanceAsync(scope.Name, ct);
}

If the scheduler is running as a sidecar in the instance itself, this is automatic (the sidecar dies with the instance). For external schedulers, the call is explicit.

Stop compose stacks

private async Task<Result> StopComposeStacks(InstanceScope scope, CancellationToken ct)
{
    var projectName = $"{scope.Name}-devlab";
    var result = await _engine.StopComposeAsync(projectName, removeVolumes: false, ct);
    // We don't remove volumes here — that's a separate stage with its own opt-out
    return result;
}

docker compose down (or podman-compose down) for the instance's project.

Destroy VMs

private async Task<Result> DestroyVms(InstanceScope scope, CancellationToken ct)
{
    var machines = await _vos.ListMachinesAsync(ct);
    var instanceMachines = machines.Value.Where(m => m.Name.StartsWith(scope.Name + "-")).ToList();

    foreach (var m in instanceMachines)
    {
        var result = await _vos.DestroyAsync(m.Name, force: true, ct);
        if (result.IsFailure) return result.Map();
        await _events.PublishAsync(new VosDestroyCompleted(m.Name, _clock.UtcNow), ct);
    }
    return Result.Success();
}

Only machines whose name starts with the instance prefix get destroyed. A vos destroy on prod-multi-platform does not affect dev-single-main.

Remove DNS entries

private async Task<Result> RemoveDnsEntries(InstanceScope scope, CancellationToken ct)
{
    var allEntries = await _dns.ListAsync(ct);
    if (allEntries.IsFailure) return allEntries.Map();

    var instanceEntries = allEntries.Value.Where(e => e.Hostname.EndsWith($".{scope.TldPrefix}.lab")).ToList();
    foreach (var e in instanceEntries)
    {
        var result = await _dns.RemoveAsync(e.Hostname, ct);
        if (result.IsFailure) return result;
    }
    return Result.Success();
}

The list filter is by the instance's TLD prefix, so gitlab.prod-multi.lab is removed but gitlab.dev-single.lab is not.

Remove certs from OS store

private async Task<Result> RemoveCertsFromOsStore(InstanceScope scope, CancellationToken ct)
{
    if (OperatingSystem.IsMacOS())
        return await _security.RunAsync(new[] { "delete-certificate", "-c", $"HomeLab CA - {scope.Name}", "/Library/Keychains/System.keychain" }, ct).Map();

    if (OperatingSystem.IsLinux())
    {
        var caPath = $"/usr/local/share/ca-certificates/homelab-{scope.Name}.crt";
        if (_fs.File.Exists(caPath)) _fs.File.Delete(caPath);
        return await _shell.RunAsync("update-ca-certificates", "--fresh", ct).Map();
    }

    if (OperatingSystem.IsWindows())
        return await _certutil.RunAsync(new[] { "-delstore", "ROOT", $"HomeLab CA - {scope.Name}" }, ct).Map();

    return Result.Failure("unsupported OS");
}

Only the instance's CA is removed from the OS trust store. Other instances' CAs are untouched.

Remove Docker networks and volumes

private async Task<Result> RemoveDockerNetworks(InstanceScope scope, CancellationToken ct)
{
    // Use a label filter — only networks created by this instance
    var labelFilter = $"homelab.instance={scope.Name}";
    var listResult = await _docker.NetworkListAsync(filter: new[] { $"label={labelFilter}" }, ct);
    if (listResult.IsFailure) return listResult.Map();

    foreach (var net in listResult.Value.Networks)
    {
        var rmResult = await _docker.NetworkRemoveAsync(net.Id, ct);
        if (rmResult.IsFailure) return rmResult.Map();
    }
    return Result.Success();
}

Every docker network HomeLab creates is labeled homelab.instance=<name>. The removal filter ensures only the instance's networks are touched.

Remove the working directory

private Task<Result> RemoveWorkdir(InstanceScope scope, CancellationToken ct)
{
    var workdir = $"./{scope.Name}";
    if (_fs.Directory.Exists(workdir))
    {
        _fs.Directory.Delete(workdir, recursive: true);
    }
    return Task.FromResult(Result.Success());
}

The instance's ./prod-multi/ directory is removed. Other instances' directories are untouched.

Release the registry entry

private Task<Result> ReleaseInstanceRegistry(InstanceScope scope, CancellationToken ct)
    => _registry.ReleaseAsync(scope.Name, ct);

The subnet returns to the pool. The next homelab init --name new-instance may be allocated this subnet again.

The dry-run mode

homelab destroy --instance prod-multi --dry-run prints what would happen without doing it:

$ homelab destroy --instance prod-multi --dry-run

Would destroy instance 'prod-multi':

Backups:
  - Stop scheduler entry for 'prod-multi'

Compose stacks:
  - docker compose down -p prod-multi-devlab

VMs:
  - vagrant destroy --force prod-multi-gateway
  - vagrant destroy --force prod-multi-platform
  - vagrant destroy --force prod-multi-data
  - vagrant destroy --force prod-multi-obs

DNS entries:
  - gitlab.prod-multi.lab → 192.168.57.10
  - registry.prod-multi.lab → 192.168.57.10
  - baget.prod-multi.lab → 192.168.57.11
  - data.prod-multi.lab → 192.168.57.12
  - obs.prod-multi.lab → 192.168.57.13

Certificates:
  - Remove 'HomeLab CA - prod-multi' from OS trust store

Docker networks:
  - prod-multi-platform (id: a1b2c3...)
  - prod-multi-data-net (id: d4e5f6...)
  - prod-multi-obs-net (id: 789012...)

Working directory:
  - ./prod-multi/

Instance registry:
  - Release subnet 192.168.57

Use --force to skip the confirmation prompt.
Use --no-dry-run to actually do this.

The dry-run is the most-used mode for destructive operations. Always show the user what before doing it.

The test

[Fact]
public async Task destroy_only_removes_resources_belonging_to_the_instance()
{
    using var fixture = await MultiInstanceFixture.NewAsync(instances: new[] { "a", "b" });

    // Both instances are running; both have DNS, certs, networks, volumes
    await fixture.UpAsync("a");
    await fixture.UpAsync("b");

    fixture.GetDockerNetworks().Should().Contain(n => n.Name == "a-platform");
    fixture.GetDockerNetworks().Should().Contain(n => n.Name == "b-platform");

    // Destroy only "a"
    var result = await fixture.Cli("destroy", "--instance", "a", "--force");
    result.ExitCode.Should().Be(0);

    // "a" is gone, "b" is intact
    fixture.GetDockerNetworks().Should().NotContain(n => n.Name == "a-platform");
    fixture.GetDockerNetworks().Should().Contain(n => n.Name == "b-platform");
    fixture.GetDnsEntries("b").Should().NotBeEmpty();
    fixture.GetVms().Should().Contain(v => v.Name.StartsWith("b-"));
}

[Fact]
public async Task destroy_with_continue_on_error_reports_all_failures_at_end()
{
    var registry = new ScriptedInstanceRegistry();
    var dns = new ScriptedDnsProvider();
    dns.OnRemove("missing.lab", returnFailure: true);
    var handler = TestHandlers.Destroy(registry, dns);

    var result = await handler.HandleAsync(
        new DestroyRequest("test", Force: true, ContinueOnError: true),
        default);

    result.IsFailure.Should().BeTrue();
    result.Errors.Should().Contain(e => e.Contains("missing.lab"));
}

What this gives you that bash doesn't

A bash teardown script is vagrant destroy --force followed by rm -rf followed by hopes. There is no per-resource cleanup. There is no instance scoping. There is no test that proves the teardown is clean.

A typed nine-stage teardown pipeline gives you, for the same surface area:

One verb (homelab destroy --instance <name>) for every layer
Reverse-order stages mirroring setup
Per-instance scoping at every stage (label filters, name prefixes, hostname suffixes)
Multi-instance safety — destroying one does not disturb others
Dry-run mode that shows the plan before executing
--continue-on-error for partial cleanup
Tests for the multi-instance isolation

The bargain pays back the first time you destroy a pr-1234 ephemeral instance and watch prod-multi keep running, completely untouched, hosting the runner that just destroyed the ephemeral.

End of Act VIII

Day-2 operations: covered. The Podman variant: covered. Multiple labs side-by-side: covered. Clean teardown: covered. DevLab can now be created, used, upgraded, observed, backed up, restored, multi-instanced, and destroyed — all through typed verbs, all idempotent, all tested.

Act IX is the developer ergonomics: writing a HomeLab plugin from scratch, and extending Ops.Dsl from a plugin. Two parts. Both are about how to ship a NuGet that adds a new capability to HomeLab without forking it.

`[` or `Alt+S`	Focus sidebar navigation
`]` or `Alt+C`	Focus main content
`↑` `↓`	Navigate between sidebar items
`Enter`	Open page / toggle section
`Space`	Toggle section expand/collapse
`Escape`	Close overlay / sidebar

`Ctrl+K`	Open search
`?`	Show this help

`Ctrl+=` or `Ctrl+↑`	Increase font size
`Ctrl+−` or `Ctrl+↓`	Decrease font size
`f`	Open console font selector

`Ctrl+⇧+=` or `Ctrl+⇧+↑`	Browser zoom in
`Ctrl+⇧+−` or `Ctrl+⇧+↓`	Browser zoom out
`Ctrl+⇧+0`	Reset browser zoom

`Tab`	Focus a diagram or image
`Enter`	Open full size overlay
`+` `−`	Zoom in / out (in overlay)
`Escape`	Close overlay, return focus

Part 52: Tearing It All Down — Cleanly📋

Why📋

The shape📋

Per-stage details📋

Stop backup schedules📋

Stop compose stacks📋

Destroy VMs📋

Remove DNS entries📋

Remove certs from OS store📋

Remove Docker networks and volumes📋

Remove the working directory📋

Release the registry entry📋

The dry-run mode📋

The test📋

What this gives you that bash doesn't📋

End of Act VIII📋

Cross-links📋