Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Part 52: Tearing It All Down — Cleanly

"A system that cannot be cleanly torn down cannot be cleanly tested."


Why

A homelab user does not just create labs. They destroy them. They destroy a pr-1234 after CI finishes. They destroy a dev-single after a week of HomeLab development. They destroy a stale ha-stage to free disk. They destroy everything before reformatting their workstation.

A clean teardown is harder than it looks. The naive vagrant destroy leaves behind:

  • Docker volumes (in MinIO storage, in Postgres data, etc.)
  • DNS entries in PiHole or /etc/hosts
  • Hostname certs in the OS trust store
  • The instance registry entry
  • The instance's working directory
  • Vagrant box references (though the boxes themselves are shared and stay)
  • Cron-scheduled backup jobs
  • Open ports forwarded by the hypervisor

Even worse, the naive teardown of one instance can disturb another — for example, removing a docker network that another instance happens to use, or deleting /etc/hosts entries with a regex too greedy.

The thesis of this part is: homelab destroy --instance <name> walks every resource the instance created, removes each one, verifies the removal, and refuses to touch anything that does not belong to the instance. The teardown is itself a typed pipeline, run in reverse order, with Result<T> at every step. Multi-instance isolation is preserved through teardown as well as setup.


The shape

[Injectable(ServiceLifetime.Singleton)]
public sealed class DestroyRequestHandler : IRequestHandler<DestroyRequest, Result<DestroyResponse>>
{
    private readonly IInstanceRegistry _registry;
    private readonly IVosOrchestrator _vos;
    private readonly IContainerEngine _engine;
    private readonly IDnsProvider _dns;
    private readonly IBackupSchedulerControl _backupSched;
    private readonly IFileSystem _fs;
    private readonly IHomeLabEventBus _events;

    public async Task<Result<DestroyResponse>> HandleAsync(DestroyRequest req, CancellationToken ct)
    {
        var scopeResult = await _registry.GetAsync(req.InstanceName, ct);
        if (scopeResult.IsFailure)
            return Result.Failure<DestroyResponse>($"instance not found: {req.InstanceName}");

        var scope = scopeResult.Value;

        if (!req.Force)
        {
            var confirmation = await PromptAsync($"This will destroy instance '{req.InstanceName}' and all its data. Continue? [y/N]");
            if (!confirmation) return Result.Failure<DestroyResponse>("aborted by user");
        }

        await _events.PublishAsync(new InstanceDestroyStarted(req.InstanceName, _clock.UtcNow), ct);

        // Run the teardown stages in reverse order of setup
        var stages = new Func<Task<Result>>[]
        {
            () => StopBackupSchedules(scope, ct),
            () => StopComposeStacks(scope, ct),
            () => DestroyVms(scope, ct),
            () => RemoveDnsEntries(scope, ct),
            () => RemoveCertsFromOsStore(scope, ct),
            () => RemoveDockerNetworks(scope, ct),
            () => RemoveDockerVolumes(scope, ct),
            () => RemoveWorkdir(scope, ct),
            () => ReleaseInstanceRegistry(scope, ct),
        };

        var failures = new List<string>();
        foreach (var stage in stages)
        {
            var result = await stage();
            if (result.IsFailure)
            {
                failures.AddRange(result.Errors);
                if (!req.ContinueOnError) break;
            }
        }

        await _events.PublishAsync(failures.Count == 0
            ? new InstanceDestroyCompleted(req.InstanceName, _clock.UtcNow)
            : new InstanceDestroyFailed(req.InstanceName, failures, _clock.UtcNow), ct);

        return failures.Count == 0
            ? Result.Success(new DestroyResponse(req.InstanceName))
            : Result.Failure<DestroyResponse>(string.Join("\n", failures));
    }

    // ... per-stage methods below
}

Nine stages, run in reverse order of setup. Each one:

  1. Acts only on resources prefixed with the instance scope name. The DNS removal does not touch entries belonging to other instances. The docker volume cleanup uses a label filter (homelab.instance=prod-multi) so it only finds the right volumes.
  2. Returns Result<T>. A failure does not necessarily abort the teardown — --continue-on-error lets the user clean up everything that can be cleaned up, with a list of warnings at the end.
  3. Publishes events. Every step is observable.

Stop backup schedules

The backup scheduler from Part 45 is told to drop any policy that targets this instance:

private async Task<Result> StopBackupSchedules(InstanceScope scope, CancellationToken ct)
{
    return await _backupSched.RemoveInstanceAsync(scope.Name, ct);
}

If the scheduler is running as a sidecar in the instance itself, this is automatic (the sidecar dies with the instance). For external schedulers, the call is explicit.

Stop compose stacks

private async Task<Result> StopComposeStacks(InstanceScope scope, CancellationToken ct)
{
    var projectName = $"{scope.Name}-devlab";
    var result = await _engine.StopComposeAsync(projectName, removeVolumes: false, ct);
    // We don't remove volumes here — that's a separate stage with its own opt-out
    return result;
}

docker compose down (or podman-compose down) for the instance's project.

Destroy VMs

private async Task<Result> DestroyVms(InstanceScope scope, CancellationToken ct)
{
    var machines = await _vos.ListMachinesAsync(ct);
    var instanceMachines = machines.Value.Where(m => m.Name.StartsWith(scope.Name + "-")).ToList();

    foreach (var m in instanceMachines)
    {
        var result = await _vos.DestroyAsync(m.Name, force: true, ct);
        if (result.IsFailure) return result.Map();
        await _events.PublishAsync(new VosDestroyCompleted(m.Name, _clock.UtcNow), ct);
    }
    return Result.Success();
}

Only machines whose name starts with the instance prefix get destroyed. A vos destroy on prod-multi-platform does not affect dev-single-main.

Remove DNS entries

private async Task<Result> RemoveDnsEntries(InstanceScope scope, CancellationToken ct)
{
    var allEntries = await _dns.ListAsync(ct);
    if (allEntries.IsFailure) return allEntries.Map();

    var instanceEntries = allEntries.Value.Where(e => e.Hostname.EndsWith($".{scope.TldPrefix}.lab")).ToList();
    foreach (var e in instanceEntries)
    {
        var result = await _dns.RemoveAsync(e.Hostname, ct);
        if (result.IsFailure) return result;
    }
    return Result.Success();
}

The list filter is by the instance's TLD prefix, so gitlab.prod-multi.lab is removed but gitlab.dev-single.lab is not.

Remove certs from OS store

private async Task<Result> RemoveCertsFromOsStore(InstanceScope scope, CancellationToken ct)
{
    if (OperatingSystem.IsMacOS())
        return await _security.RunAsync(new[] { "delete-certificate", "-c", $"HomeLab CA - {scope.Name}", "/Library/Keychains/System.keychain" }, ct).Map();

    if (OperatingSystem.IsLinux())
    {
        var caPath = $"/usr/local/share/ca-certificates/homelab-{scope.Name}.crt";
        if (_fs.File.Exists(caPath)) _fs.File.Delete(caPath);
        return await _shell.RunAsync("update-ca-certificates", "--fresh", ct).Map();
    }

    if (OperatingSystem.IsWindows())
        return await _certutil.RunAsync(new[] { "-delstore", "ROOT", $"HomeLab CA - {scope.Name}" }, ct).Map();

    return Result.Failure("unsupported OS");
}

Only the instance's CA is removed from the OS trust store. Other instances' CAs are untouched.

Remove Docker networks and volumes

private async Task<Result> RemoveDockerNetworks(InstanceScope scope, CancellationToken ct)
{
    // Use a label filter — only networks created by this instance
    var labelFilter = $"homelab.instance={scope.Name}";
    var listResult = await _docker.NetworkListAsync(filter: new[] { $"label={labelFilter}" }, ct);
    if (listResult.IsFailure) return listResult.Map();

    foreach (var net in listResult.Value.Networks)
    {
        var rmResult = await _docker.NetworkRemoveAsync(net.Id, ct);
        if (rmResult.IsFailure) return rmResult.Map();
    }
    return Result.Success();
}

Every docker network HomeLab creates is labeled homelab.instance=<name>. The removal filter ensures only the instance's networks are touched.

Remove the working directory

private Task<Result> RemoveWorkdir(InstanceScope scope, CancellationToken ct)
{
    var workdir = $"./{scope.Name}";
    if (_fs.Directory.Exists(workdir))
    {
        _fs.Directory.Delete(workdir, recursive: true);
    }
    return Task.FromResult(Result.Success());
}

The instance's ./prod-multi/ directory is removed. Other instances' directories are untouched.

Release the registry entry

private Task<Result> ReleaseInstanceRegistry(InstanceScope scope, CancellationToken ct)
    => _registry.ReleaseAsync(scope.Name, ct);

The subnet returns to the pool. The next homelab init --name new-instance may be allocated this subnet again.


The dry-run mode

homelab destroy --instance prod-multi --dry-run prints what would happen without doing it:

$ homelab destroy --instance prod-multi --dry-run

Would destroy instance 'prod-multi':

Backups:
  - Stop scheduler entry for 'prod-multi'

Compose stacks:
  - docker compose down -p prod-multi-devlab

VMs:
  - vagrant destroy --force prod-multi-gateway
  - vagrant destroy --force prod-multi-platform
  - vagrant destroy --force prod-multi-data
  - vagrant destroy --force prod-multi-obs

DNS entries:
  - gitlab.prod-multi.lab → 192.168.57.10
  - registry.prod-multi.lab → 192.168.57.10
  - baget.prod-multi.lab → 192.168.57.11
  - data.prod-multi.lab → 192.168.57.12
  - obs.prod-multi.lab → 192.168.57.13

Certificates:
  - Remove 'HomeLab CA - prod-multi' from OS trust store

Docker networks:
  - prod-multi-platform (id: a1b2c3...)
  - prod-multi-data-net (id: d4e5f6...)
  - prod-multi-obs-net (id: 789012...)

Working directory:
  - ./prod-multi/

Instance registry:
  - Release subnet 192.168.57

Use --force to skip the confirmation prompt.
Use --no-dry-run to actually do this.

The dry-run is the most-used mode for destructive operations. Always show the user what before doing it.


The test

[Fact]
public async Task destroy_only_removes_resources_belonging_to_the_instance()
{
    using var fixture = await MultiInstanceFixture.NewAsync(instances: new[] { "a", "b" });

    // Both instances are running; both have DNS, certs, networks, volumes
    await fixture.UpAsync("a");
    await fixture.UpAsync("b");

    fixture.GetDockerNetworks().Should().Contain(n => n.Name == "a-platform");
    fixture.GetDockerNetworks().Should().Contain(n => n.Name == "b-platform");

    // Destroy only "a"
    var result = await fixture.Cli("destroy", "--instance", "a", "--force");
    result.ExitCode.Should().Be(0);

    // "a" is gone, "b" is intact
    fixture.GetDockerNetworks().Should().NotContain(n => n.Name == "a-platform");
    fixture.GetDockerNetworks().Should().Contain(n => n.Name == "b-platform");
    fixture.GetDnsEntries("b").Should().NotBeEmpty();
    fixture.GetVms().Should().Contain(v => v.Name.StartsWith("b-"));
}

[Fact]
public async Task destroy_with_continue_on_error_reports_all_failures_at_end()
{
    var registry = new ScriptedInstanceRegistry();
    var dns = new ScriptedDnsProvider();
    dns.OnRemove("missing.lab", returnFailure: true);
    var handler = TestHandlers.Destroy(registry, dns);

    var result = await handler.HandleAsync(
        new DestroyRequest("test", Force: true, ContinueOnError: true),
        default);

    result.IsFailure.Should().BeTrue();
    result.Errors.Should().Contain(e => e.Contains("missing.lab"));
}

What this gives you that bash doesn't

A bash teardown script is vagrant destroy --force followed by rm -rf followed by hopes. There is no per-resource cleanup. There is no instance scoping. There is no test that proves the teardown is clean.

A typed nine-stage teardown pipeline gives you, for the same surface area:

  • One verb (homelab destroy --instance <name>) for every layer
  • Reverse-order stages mirroring setup
  • Per-instance scoping at every stage (label filters, name prefixes, hostname suffixes)
  • Multi-instance safety — destroying one does not disturb others
  • Dry-run mode that shows the plan before executing
  • --continue-on-error for partial cleanup
  • Tests for the multi-instance isolation

The bargain pays back the first time you destroy a pr-1234 ephemeral instance and watch prod-multi keep running, completely untouched, hosting the runner that just destroyed the ephemeral.


End of Act VIII

Day-2 operations: covered. The Podman variant: covered. Multiple labs side-by-side: covered. Clean teardown: covered. DevLab can now be created, used, upgraded, observed, backed up, restored, multi-instanced, and destroyed — all through typed verbs, all idempotent, all tested.

Act IX is the developer ergonomics: writing a HomeLab plugin from scratch, and extending Ops.Dsl from a plugin. Two parts. Both are about how to ship a NuGet that adds a new capability to HomeLab without forking it.


⬇ Download