Part 52: Tearing It All Down — Cleanly
"A system that cannot be cleanly torn down cannot be cleanly tested."
Why
A homelab user does not just create labs. They destroy them. They destroy a pr-1234 after CI finishes. They destroy a dev-single after a week of HomeLab development. They destroy a stale ha-stage to free disk. They destroy everything before reformatting their workstation.
A clean teardown is harder than it looks. The naive vagrant destroy leaves behind:
- Docker volumes (in MinIO storage, in Postgres data, etc.)
- DNS entries in PiHole or
/etc/hosts - Hostname certs in the OS trust store
- The instance registry entry
- The instance's working directory
- Vagrant box references (though the boxes themselves are shared and stay)
- Cron-scheduled backup jobs
- Open ports forwarded by the hypervisor
Even worse, the naive teardown of one instance can disturb another — for example, removing a docker network that another instance happens to use, or deleting /etc/hosts entries with a regex too greedy.
The thesis of this part is: homelab destroy --instance <name> walks every resource the instance created, removes each one, verifies the removal, and refuses to touch anything that does not belong to the instance. The teardown is itself a typed pipeline, run in reverse order, with Result<T> at every step. Multi-instance isolation is preserved through teardown as well as setup.
The shape
[Injectable(ServiceLifetime.Singleton)]
public sealed class DestroyRequestHandler : IRequestHandler<DestroyRequest, Result<DestroyResponse>>
{
private readonly IInstanceRegistry _registry;
private readonly IVosOrchestrator _vos;
private readonly IContainerEngine _engine;
private readonly IDnsProvider _dns;
private readonly IBackupSchedulerControl _backupSched;
private readonly IFileSystem _fs;
private readonly IHomeLabEventBus _events;
public async Task<Result<DestroyResponse>> HandleAsync(DestroyRequest req, CancellationToken ct)
{
var scopeResult = await _registry.GetAsync(req.InstanceName, ct);
if (scopeResult.IsFailure)
return Result.Failure<DestroyResponse>($"instance not found: {req.InstanceName}");
var scope = scopeResult.Value;
if (!req.Force)
{
var confirmation = await PromptAsync($"This will destroy instance '{req.InstanceName}' and all its data. Continue? [y/N]");
if (!confirmation) return Result.Failure<DestroyResponse>("aborted by user");
}
await _events.PublishAsync(new InstanceDestroyStarted(req.InstanceName, _clock.UtcNow), ct);
// Run the teardown stages in reverse order of setup
var stages = new Func<Task<Result>>[]
{
() => StopBackupSchedules(scope, ct),
() => StopComposeStacks(scope, ct),
() => DestroyVms(scope, ct),
() => RemoveDnsEntries(scope, ct),
() => RemoveCertsFromOsStore(scope, ct),
() => RemoveDockerNetworks(scope, ct),
() => RemoveDockerVolumes(scope, ct),
() => RemoveWorkdir(scope, ct),
() => ReleaseInstanceRegistry(scope, ct),
};
var failures = new List<string>();
foreach (var stage in stages)
{
var result = await stage();
if (result.IsFailure)
{
failures.AddRange(result.Errors);
if (!req.ContinueOnError) break;
}
}
await _events.PublishAsync(failures.Count == 0
? new InstanceDestroyCompleted(req.InstanceName, _clock.UtcNow)
: new InstanceDestroyFailed(req.InstanceName, failures, _clock.UtcNow), ct);
return failures.Count == 0
? Result.Success(new DestroyResponse(req.InstanceName))
: Result.Failure<DestroyResponse>(string.Join("\n", failures));
}
// ... per-stage methods below
}[Injectable(ServiceLifetime.Singleton)]
public sealed class DestroyRequestHandler : IRequestHandler<DestroyRequest, Result<DestroyResponse>>
{
private readonly IInstanceRegistry _registry;
private readonly IVosOrchestrator _vos;
private readonly IContainerEngine _engine;
private readonly IDnsProvider _dns;
private readonly IBackupSchedulerControl _backupSched;
private readonly IFileSystem _fs;
private readonly IHomeLabEventBus _events;
public async Task<Result<DestroyResponse>> HandleAsync(DestroyRequest req, CancellationToken ct)
{
var scopeResult = await _registry.GetAsync(req.InstanceName, ct);
if (scopeResult.IsFailure)
return Result.Failure<DestroyResponse>($"instance not found: {req.InstanceName}");
var scope = scopeResult.Value;
if (!req.Force)
{
var confirmation = await PromptAsync($"This will destroy instance '{req.InstanceName}' and all its data. Continue? [y/N]");
if (!confirmation) return Result.Failure<DestroyResponse>("aborted by user");
}
await _events.PublishAsync(new InstanceDestroyStarted(req.InstanceName, _clock.UtcNow), ct);
// Run the teardown stages in reverse order of setup
var stages = new Func<Task<Result>>[]
{
() => StopBackupSchedules(scope, ct),
() => StopComposeStacks(scope, ct),
() => DestroyVms(scope, ct),
() => RemoveDnsEntries(scope, ct),
() => RemoveCertsFromOsStore(scope, ct),
() => RemoveDockerNetworks(scope, ct),
() => RemoveDockerVolumes(scope, ct),
() => RemoveWorkdir(scope, ct),
() => ReleaseInstanceRegistry(scope, ct),
};
var failures = new List<string>();
foreach (var stage in stages)
{
var result = await stage();
if (result.IsFailure)
{
failures.AddRange(result.Errors);
if (!req.ContinueOnError) break;
}
}
await _events.PublishAsync(failures.Count == 0
? new InstanceDestroyCompleted(req.InstanceName, _clock.UtcNow)
: new InstanceDestroyFailed(req.InstanceName, failures, _clock.UtcNow), ct);
return failures.Count == 0
? Result.Success(new DestroyResponse(req.InstanceName))
: Result.Failure<DestroyResponse>(string.Join("\n", failures));
}
// ... per-stage methods below
}Nine stages, run in reverse order of setup. Each one:
- Acts only on resources prefixed with the instance scope name. The DNS removal does not touch entries belonging to other instances. The docker volume cleanup uses a label filter (
homelab.instance=prod-multi) so it only finds the right volumes. - Returns
Result<T>. A failure does not necessarily abort the teardown —--continue-on-errorlets the user clean up everything that can be cleaned up, with a list of warnings at the end. - Publishes events. Every step is observable.
Stop backup schedules
The backup scheduler from Part 45 is told to drop any policy that targets this instance:
private async Task<Result> StopBackupSchedules(InstanceScope scope, CancellationToken ct)
{
return await _backupSched.RemoveInstanceAsync(scope.Name, ct);
}private async Task<Result> StopBackupSchedules(InstanceScope scope, CancellationToken ct)
{
return await _backupSched.RemoveInstanceAsync(scope.Name, ct);
}If the scheduler is running as a sidecar in the instance itself, this is automatic (the sidecar dies with the instance). For external schedulers, the call is explicit.
Stop compose stacks
private async Task<Result> StopComposeStacks(InstanceScope scope, CancellationToken ct)
{
var projectName = $"{scope.Name}-devlab";
var result = await _engine.StopComposeAsync(projectName, removeVolumes: false, ct);
// We don't remove volumes here — that's a separate stage with its own opt-out
return result;
}private async Task<Result> StopComposeStacks(InstanceScope scope, CancellationToken ct)
{
var projectName = $"{scope.Name}-devlab";
var result = await _engine.StopComposeAsync(projectName, removeVolumes: false, ct);
// We don't remove volumes here — that's a separate stage with its own opt-out
return result;
}docker compose down (or podman-compose down) for the instance's project.
Destroy VMs
private async Task<Result> DestroyVms(InstanceScope scope, CancellationToken ct)
{
var machines = await _vos.ListMachinesAsync(ct);
var instanceMachines = machines.Value.Where(m => m.Name.StartsWith(scope.Name + "-")).ToList();
foreach (var m in instanceMachines)
{
var result = await _vos.DestroyAsync(m.Name, force: true, ct);
if (result.IsFailure) return result.Map();
await _events.PublishAsync(new VosDestroyCompleted(m.Name, _clock.UtcNow), ct);
}
return Result.Success();
}private async Task<Result> DestroyVms(InstanceScope scope, CancellationToken ct)
{
var machines = await _vos.ListMachinesAsync(ct);
var instanceMachines = machines.Value.Where(m => m.Name.StartsWith(scope.Name + "-")).ToList();
foreach (var m in instanceMachines)
{
var result = await _vos.DestroyAsync(m.Name, force: true, ct);
if (result.IsFailure) return result.Map();
await _events.PublishAsync(new VosDestroyCompleted(m.Name, _clock.UtcNow), ct);
}
return Result.Success();
}Only machines whose name starts with the instance prefix get destroyed. A vos destroy on prod-multi-platform does not affect dev-single-main.
Remove DNS entries
private async Task<Result> RemoveDnsEntries(InstanceScope scope, CancellationToken ct)
{
var allEntries = await _dns.ListAsync(ct);
if (allEntries.IsFailure) return allEntries.Map();
var instanceEntries = allEntries.Value.Where(e => e.Hostname.EndsWith($".{scope.TldPrefix}.lab")).ToList();
foreach (var e in instanceEntries)
{
var result = await _dns.RemoveAsync(e.Hostname, ct);
if (result.IsFailure) return result;
}
return Result.Success();
}private async Task<Result> RemoveDnsEntries(InstanceScope scope, CancellationToken ct)
{
var allEntries = await _dns.ListAsync(ct);
if (allEntries.IsFailure) return allEntries.Map();
var instanceEntries = allEntries.Value.Where(e => e.Hostname.EndsWith($".{scope.TldPrefix}.lab")).ToList();
foreach (var e in instanceEntries)
{
var result = await _dns.RemoveAsync(e.Hostname, ct);
if (result.IsFailure) return result;
}
return Result.Success();
}The list filter is by the instance's TLD prefix, so gitlab.prod-multi.lab is removed but gitlab.dev-single.lab is not.
Remove certs from OS store
private async Task<Result> RemoveCertsFromOsStore(InstanceScope scope, CancellationToken ct)
{
if (OperatingSystem.IsMacOS())
return await _security.RunAsync(new[] { "delete-certificate", "-c", $"HomeLab CA - {scope.Name}", "/Library/Keychains/System.keychain" }, ct).Map();
if (OperatingSystem.IsLinux())
{
var caPath = $"/usr/local/share/ca-certificates/homelab-{scope.Name}.crt";
if (_fs.File.Exists(caPath)) _fs.File.Delete(caPath);
return await _shell.RunAsync("update-ca-certificates", "--fresh", ct).Map();
}
if (OperatingSystem.IsWindows())
return await _certutil.RunAsync(new[] { "-delstore", "ROOT", $"HomeLab CA - {scope.Name}" }, ct).Map();
return Result.Failure("unsupported OS");
}private async Task<Result> RemoveCertsFromOsStore(InstanceScope scope, CancellationToken ct)
{
if (OperatingSystem.IsMacOS())
return await _security.RunAsync(new[] { "delete-certificate", "-c", $"HomeLab CA - {scope.Name}", "/Library/Keychains/System.keychain" }, ct).Map();
if (OperatingSystem.IsLinux())
{
var caPath = $"/usr/local/share/ca-certificates/homelab-{scope.Name}.crt";
if (_fs.File.Exists(caPath)) _fs.File.Delete(caPath);
return await _shell.RunAsync("update-ca-certificates", "--fresh", ct).Map();
}
if (OperatingSystem.IsWindows())
return await _certutil.RunAsync(new[] { "-delstore", "ROOT", $"HomeLab CA - {scope.Name}" }, ct).Map();
return Result.Failure("unsupported OS");
}Only the instance's CA is removed from the OS trust store. Other instances' CAs are untouched.
Remove Docker networks and volumes
private async Task<Result> RemoveDockerNetworks(InstanceScope scope, CancellationToken ct)
{
// Use a label filter — only networks created by this instance
var labelFilter = $"homelab.instance={scope.Name}";
var listResult = await _docker.NetworkListAsync(filter: new[] { $"label={labelFilter}" }, ct);
if (listResult.IsFailure) return listResult.Map();
foreach (var net in listResult.Value.Networks)
{
var rmResult = await _docker.NetworkRemoveAsync(net.Id, ct);
if (rmResult.IsFailure) return rmResult.Map();
}
return Result.Success();
}private async Task<Result> RemoveDockerNetworks(InstanceScope scope, CancellationToken ct)
{
// Use a label filter — only networks created by this instance
var labelFilter = $"homelab.instance={scope.Name}";
var listResult = await _docker.NetworkListAsync(filter: new[] { $"label={labelFilter}" }, ct);
if (listResult.IsFailure) return listResult.Map();
foreach (var net in listResult.Value.Networks)
{
var rmResult = await _docker.NetworkRemoveAsync(net.Id, ct);
if (rmResult.IsFailure) return rmResult.Map();
}
return Result.Success();
}Every docker network HomeLab creates is labeled homelab.instance=<name>. The removal filter ensures only the instance's networks are touched.
Remove the working directory
private Task<Result> RemoveWorkdir(InstanceScope scope, CancellationToken ct)
{
var workdir = $"./{scope.Name}";
if (_fs.Directory.Exists(workdir))
{
_fs.Directory.Delete(workdir, recursive: true);
}
return Task.FromResult(Result.Success());
}private Task<Result> RemoveWorkdir(InstanceScope scope, CancellationToken ct)
{
var workdir = $"./{scope.Name}";
if (_fs.Directory.Exists(workdir))
{
_fs.Directory.Delete(workdir, recursive: true);
}
return Task.FromResult(Result.Success());
}The instance's ./prod-multi/ directory is removed. Other instances' directories are untouched.
Release the registry entry
private Task<Result> ReleaseInstanceRegistry(InstanceScope scope, CancellationToken ct)
=> _registry.ReleaseAsync(scope.Name, ct);private Task<Result> ReleaseInstanceRegistry(InstanceScope scope, CancellationToken ct)
=> _registry.ReleaseAsync(scope.Name, ct);The subnet returns to the pool. The next homelab init --name new-instance may be allocated this subnet again.
The dry-run mode
homelab destroy --instance prod-multi --dry-run prints what would happen without doing it:
$ homelab destroy --instance prod-multi --dry-run
Would destroy instance 'prod-multi':
Backups:
- Stop scheduler entry for 'prod-multi'
Compose stacks:
- docker compose down -p prod-multi-devlab
VMs:
- vagrant destroy --force prod-multi-gateway
- vagrant destroy --force prod-multi-platform
- vagrant destroy --force prod-multi-data
- vagrant destroy --force prod-multi-obs
DNS entries:
- gitlab.prod-multi.lab → 192.168.57.10
- registry.prod-multi.lab → 192.168.57.10
- baget.prod-multi.lab → 192.168.57.11
- data.prod-multi.lab → 192.168.57.12
- obs.prod-multi.lab → 192.168.57.13
Certificates:
- Remove 'HomeLab CA - prod-multi' from OS trust store
Docker networks:
- prod-multi-platform (id: a1b2c3...)
- prod-multi-data-net (id: d4e5f6...)
- prod-multi-obs-net (id: 789012...)
Working directory:
- ./prod-multi/
Instance registry:
- Release subnet 192.168.57
Use --force to skip the confirmation prompt.
Use --no-dry-run to actually do this.$ homelab destroy --instance prod-multi --dry-run
Would destroy instance 'prod-multi':
Backups:
- Stop scheduler entry for 'prod-multi'
Compose stacks:
- docker compose down -p prod-multi-devlab
VMs:
- vagrant destroy --force prod-multi-gateway
- vagrant destroy --force prod-multi-platform
- vagrant destroy --force prod-multi-data
- vagrant destroy --force prod-multi-obs
DNS entries:
- gitlab.prod-multi.lab → 192.168.57.10
- registry.prod-multi.lab → 192.168.57.10
- baget.prod-multi.lab → 192.168.57.11
- data.prod-multi.lab → 192.168.57.12
- obs.prod-multi.lab → 192.168.57.13
Certificates:
- Remove 'HomeLab CA - prod-multi' from OS trust store
Docker networks:
- prod-multi-platform (id: a1b2c3...)
- prod-multi-data-net (id: d4e5f6...)
- prod-multi-obs-net (id: 789012...)
Working directory:
- ./prod-multi/
Instance registry:
- Release subnet 192.168.57
Use --force to skip the confirmation prompt.
Use --no-dry-run to actually do this.The dry-run is the most-used mode for destructive operations. Always show the user what before doing it.
The test
[Fact]
public async Task destroy_only_removes_resources_belonging_to_the_instance()
{
using var fixture = await MultiInstanceFixture.NewAsync(instances: new[] { "a", "b" });
// Both instances are running; both have DNS, certs, networks, volumes
await fixture.UpAsync("a");
await fixture.UpAsync("b");
fixture.GetDockerNetworks().Should().Contain(n => n.Name == "a-platform");
fixture.GetDockerNetworks().Should().Contain(n => n.Name == "b-platform");
// Destroy only "a"
var result = await fixture.Cli("destroy", "--instance", "a", "--force");
result.ExitCode.Should().Be(0);
// "a" is gone, "b" is intact
fixture.GetDockerNetworks().Should().NotContain(n => n.Name == "a-platform");
fixture.GetDockerNetworks().Should().Contain(n => n.Name == "b-platform");
fixture.GetDnsEntries("b").Should().NotBeEmpty();
fixture.GetVms().Should().Contain(v => v.Name.StartsWith("b-"));
}
[Fact]
public async Task destroy_with_continue_on_error_reports_all_failures_at_end()
{
var registry = new ScriptedInstanceRegistry();
var dns = new ScriptedDnsProvider();
dns.OnRemove("missing.lab", returnFailure: true);
var handler = TestHandlers.Destroy(registry, dns);
var result = await handler.HandleAsync(
new DestroyRequest("test", Force: true, ContinueOnError: true),
default);
result.IsFailure.Should().BeTrue();
result.Errors.Should().Contain(e => e.Contains("missing.lab"));
}[Fact]
public async Task destroy_only_removes_resources_belonging_to_the_instance()
{
using var fixture = await MultiInstanceFixture.NewAsync(instances: new[] { "a", "b" });
// Both instances are running; both have DNS, certs, networks, volumes
await fixture.UpAsync("a");
await fixture.UpAsync("b");
fixture.GetDockerNetworks().Should().Contain(n => n.Name == "a-platform");
fixture.GetDockerNetworks().Should().Contain(n => n.Name == "b-platform");
// Destroy only "a"
var result = await fixture.Cli("destroy", "--instance", "a", "--force");
result.ExitCode.Should().Be(0);
// "a" is gone, "b" is intact
fixture.GetDockerNetworks().Should().NotContain(n => n.Name == "a-platform");
fixture.GetDockerNetworks().Should().Contain(n => n.Name == "b-platform");
fixture.GetDnsEntries("b").Should().NotBeEmpty();
fixture.GetVms().Should().Contain(v => v.Name.StartsWith("b-"));
}
[Fact]
public async Task destroy_with_continue_on_error_reports_all_failures_at_end()
{
var registry = new ScriptedInstanceRegistry();
var dns = new ScriptedDnsProvider();
dns.OnRemove("missing.lab", returnFailure: true);
var handler = TestHandlers.Destroy(registry, dns);
var result = await handler.HandleAsync(
new DestroyRequest("test", Force: true, ContinueOnError: true),
default);
result.IsFailure.Should().BeTrue();
result.Errors.Should().Contain(e => e.Contains("missing.lab"));
}What this gives you that bash doesn't
A bash teardown script is vagrant destroy --force followed by rm -rf followed by hopes. There is no per-resource cleanup. There is no instance scoping. There is no test that proves the teardown is clean.
A typed nine-stage teardown pipeline gives you, for the same surface area:
- One verb (
homelab destroy --instance <name>) for every layer - Reverse-order stages mirroring setup
- Per-instance scoping at every stage (label filters, name prefixes, hostname suffixes)
- Multi-instance safety — destroying one does not disturb others
- Dry-run mode that shows the plan before executing
--continue-on-errorfor partial cleanup- Tests for the multi-instance isolation
The bargain pays back the first time you destroy a pr-1234 ephemeral instance and watch prod-multi keep running, completely untouched, hosting the runner that just destroyed the ephemeral.
End of Act VIII
Day-2 operations: covered. The Podman variant: covered. Multiple labs side-by-side: covered. Clean teardown: covered. DevLab can now be created, used, upgraded, observed, backed up, restored, multi-instanced, and destroyed — all through typed verbs, all idempotent, all tested.
Act IX is the developer ergonomics: writing a HomeLab plugin from scratch, and extending Ops.Dsl from a plugin. Two parts. Both are about how to ship a NuGet that adds a new capability to HomeLab without forking it.