Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Part 45: The Backup / Restore Framework

"A backup that has not been restored is not a backup. It is a hopeful tarball."


Why

Every service in DevLab has data. GitLab has Postgres + Gitaly's repos + LFS storage. Postgres has the database. MinIO has six buckets of varying importance. Meilisearch has its index. Grafana has its dashboards. The configuration files in data/certs/ have the CA private key.

Losing any of these is bad. Losing GitLab's Postgres is catastrophic — it contains the source code metadata for every project the team is working on. Losing the CA private key means every cert in DevLab has to be regenerated and re-trusted.

The thesis of this part is: HomeLab ships an IBackupProvider plugin contract with three providers (restic for files/configs, pgbackrest for Postgres, mc mirror for MinIO), driven by Ops.DataGovernance [BackupPolicy] declarations attached to compose contributors. Backups are scheduled, encrypted, and stored in a backup MinIO bucket. The backups are tested by a periodic restore-into-throwaway-lab job that proves recovery works.


The shape

public interface IBackupProvider
{
    string Name { get; }
    Task<Result<BackupId>> RunAsync(BackupSpec spec, CancellationToken ct);
    Task<Result> RestoreAsync(BackupId id, RestoreSpec spec, CancellationToken ct);
    Task<Result<IReadOnlyList<BackupRecord>>> ListAsync(string targetService, CancellationToken ct);
    Task<Result> PruneAsync(string targetService, RetentionPolicy retention, CancellationToken ct);
}

public sealed record BackupSpec(string TargetService, string Source, string DestinationBucket);
public sealed record RestoreSpec(string DestinationPath, DateTimeOffset? Snapshot = null);
public sealed record BackupId(string Provider, string SnapshotId, DateTimeOffset CreatedAt, long SizeBytes);
public sealed record BackupRecord(BackupId Id, string TargetService, string Description);
public sealed record RetentionPolicy(int KeepLast, int KeepDaily, int KeepWeekly, int KeepMonthly);

Five methods, every one returning Result<T>. The provider is selected by the [BackupPolicy] attribute on the contributor.

Provider 1: ResticBackupProvider

[Injectable(ServiceLifetime.Singleton)]
public sealed class ResticBackupProvider : IBackupProvider
{
    public string Name => "restic";
    private readonly ResticClient _restic;
    private readonly ISecretStore _secrets;

    public async Task<Result<BackupId>> RunAsync(BackupSpec spec, CancellationToken ct)
    {
        var pwd = await _secrets.ReadAsync("RESTIC_PASSWORD", ct);
        if (pwd.IsFailure) return pwd.Map<BackupId>();

        var repo = $"s3:https://minio.frenchexdev.lab/{spec.DestinationBucket}/restic-{spec.TargetService}";

        var initResult = await _restic.InitAsync(repo, pwd.Value, ct);
        // init returns failure if already initialized; that's fine, we ignore it on subsequent runs

        var backupResult = await _restic.BackupAsync(
            repo: repo,
            password: pwd.Value,
            paths: new[] { spec.Source },
            tag: $"{spec.TargetService}-{DateTimeOffset.UtcNow:yyyyMMdd-HHmmss}",
            ct: ct);

        if (backupResult.IsFailure) return backupResult.Map<BackupId>();

        return Result.Success(new BackupId(
            Provider: "restic",
            SnapshotId: backupResult.Value.SnapshotId,
            CreatedAt: DateTimeOffset.UtcNow,
            SizeBytes: backupResult.Value.SizeBytes));
    }

    public async Task<Result> RestoreAsync(BackupId id, RestoreSpec spec, CancellationToken ct)
    {
        var pwd = await _secrets.ReadAsync("RESTIC_PASSWORD", ct);
        if (pwd.IsFailure) return pwd.Map();

        return await _restic.RestoreAsync(id.SnapshotId, pwd.Value, spec.DestinationPath, ct).Map();
    }

    // ...
}

restic is the right tool for backing up directory trees: deduplication, encryption, snapshots, S3-compatible backends. We use it for data/certs/ (the CA + wildcard cert), the GitLab Gitaly repos directory, the MinIO vagrant-boxes bucket (via the s3 backend), and any other file-based state.

Provider 2: PgBackRestBackupProvider

[Injectable(ServiceLifetime.Singleton)]
public sealed class PgBackRestBackupProvider : IBackupProvider
{
    public string Name => "pgbackrest";
    private readonly DockerClient _docker;
    private readonly ISecretStore _secrets;

    public async Task<Result<BackupId>> RunAsync(BackupSpec spec, CancellationToken ct)
    {
        // pgbackrest runs inside the postgres container, configured to push to S3
        var execResult = await _docker.ExecAsync(
            container: "postgres",
            command: new[] { "pgbackrest", "--stanza=devlab", "backup", "--type=full" },
            ct: ct);

        if (execResult.IsFailure) return execResult.Map<BackupId>();

        // Parse the snapshot ID from pgbackrest's output
        var snapshotId = ExtractSnapshotId(execResult.Value.StdOut);

        return Result.Success(new BackupId("pgbackrest", snapshotId, DateTimeOffset.UtcNow, /* size */ 0));
    }

    public async Task<Result> RestoreAsync(BackupId id, RestoreSpec spec, CancellationToken ct)
    {
        return await _docker.ExecAsync(
            container: "postgres",
            command: new[] { "pgbackrest", "--stanza=devlab", "restore", "--set=" + id.SnapshotId },
            ct: ct).Map();
    }

    // ...
}

pgbackrest is the dedicated Postgres backup tool. It does point-in-time recovery, parallel restore, encrypted S3 backends, and continuous WAL archiving. We delegate to it for Postgres specifically because hand-rolled pg_dump backups have too many edge cases (large databases, schema migrations mid-dump, etc.).

Provider 3: MinioMirrorBackupProvider

[Injectable(ServiceLifetime.Singleton)]
public sealed class MinioMirrorBackupProvider : IBackupProvider
{
    public string Name => "minio-mirror";
    private readonly DockerClient _docker;

    public async Task<Result<BackupId>> RunAsync(BackupSpec spec, CancellationToken ct)
    {
        // mc mirror copies one bucket to another, with --remove for in-sync mode
        var snapshotPrefix = $"snapshot-{DateTimeOffset.UtcNow:yyyyMMdd-HHmmss}";

        var execResult = await _docker.ExecAsync(
            container: "minio-init",   // re-use the init sidecar's mc client
            command: new[]
            {
                "mc", "mirror",
                $"local/{spec.Source}",
                $"local/{spec.DestinationBucket}/{snapshotPrefix}/"
            },
            ct: ct);

        if (execResult.IsFailure) return execResult.Map<BackupId>();
        return Result.Success(new BackupId("minio-mirror", snapshotPrefix, DateTimeOffset.UtcNow, /* size */ 0));
    }

    // ...
}

For MinIO buckets, we use mc mirror to copy them into a backups bucket with a timestamped prefix. This is appropriate for the artifact buckets (gitlab-artifacts, registry, nuget-packages) where we want point-in-time snapshots without re-encrypting an already-large dataset.


Declarations

Compose contributors declare their backup needs via [BackupPolicy]:

public sealed class GitLabComposeContributor : IComposeFileContributor
{
    public IEnumerable<OpsDataGovernanceDeclaration> DataGovernance => new[]
    {
        new OpsBackupPolicy(
            Target: "postgres",
            Provider: "pgbackrest",
            Cron: "0 2 * * *",                                        // every day at 02:00
            Retention: new RetentionPolicy(KeepLast: 3, KeepDaily: 7, KeepWeekly: 4, KeepMonthly: 6),
            Description: "Daily Postgres full backup"),

        new OpsBackupPolicy(
            Target: "gitlab-config",
            Provider: "restic",
            Source: "./gitlab/config",
            Cron: "0 3 * * *",
            Retention: new RetentionPolicy(KeepLast: 7, KeepDaily: 14, KeepWeekly: 4, KeepMonthly: 6),
            Description: "Daily restic backup of GitLab Omnibus configuration"),

        new OpsBackupPolicy(
            Target: "gitlab-lfs",
            Provider: "minio-mirror",
            Source: "gitlab-lfs",
            Cron: "0 4 * * 0",                                        // weekly
            Retention: new RetentionPolicy(KeepLast: 4, KeepDaily: 0, KeepWeekly: 4, KeepMonthly: 6),
            Description: "Weekly LFS bucket snapshot")
    };
}

Three policies for GitLab. Each one specifies its provider, schedule, retention. The framework walks every contributor and generates a cron entry per policy.


The cron orchestrator

homelab backup run --schedule is a long-running mode that drives the schedules:

[Injectable(ServiceLifetime.Singleton)]
public sealed class BackupScheduler : IBackupScheduler
{
    private readonly IEnumerable<IBackupProvider> _providers;
    private readonly IClock _clock;
    private readonly IHomeLabEventBus _events;
    private readonly IBackupPolicyRegistry _policies;

    public async Task RunAsync(CancellationToken ct)
    {
        while (!ct.IsCancellationRequested)
        {
            var due = _policies.GetDuePolicies(_clock.UtcNow);
            foreach (var policy in due)
            {
                var provider = _providers.SingleOrDefault(p => p.Name == policy.Provider);
                if (provider is null) continue;

                await _events.PublishAsync(new BackupStarted(policy.Target, policy.Provider, _clock.UtcNow), ct);
                var result = await provider.RunAsync(new BackupSpec(policy.Target, policy.Source, "backups"), ct);

                if (result.IsSuccess)
                    await _events.PublishAsync(new BackupCompleted(policy.Target, result.Value, _clock.UtcNow), ct);
                else
                    await _events.PublishAsync(new BackupFailed(policy.Target, result.Errors, _clock.UtcNow), ct);
            }

            await Task.Delay(TimeSpan.FromMinutes(1), ct);
        }
    }
}

The scheduler runs in a sidecar container, ticks every minute, evaluates the cron expressions, and triggers due backups. The retention policies are applied periodically by PruneAsync.

For users who do not want to run the scheduler in-band, the same homelab backup run --target gitlab-config command works on demand and can be triggered by an external cron daemon or a CI job.


The restore-test job

Dogfood loop #4: a scheduled job spins up a throwaway HomeLab instance, restores the latest backup into it, and asserts that it works:

public async Task<Result> RunRestoreTestAsync(CancellationToken ct)
{
    using var throwaway = await _testLab.NewInstanceAsync(name: $"restore-test-{_clock.UtcNow:yyyyMMddHHmm}");
    await throwaway.UpAllAsync(ct);

    var backupResult = await _provider.RestoreAsync(
        id: await _provider.GetLatestAsync("postgres", ct),
        spec: new RestoreSpec(throwaway.PostgresDataDir),
        ct);
    if (backupResult.IsFailure)
    {
        await _events.PublishAsync(new RestoreTestFailed(backupResult.Errors, _clock.UtcNow), ct);
        return backupResult;
    }

    var verifyResult = await throwaway.HttpGetAsync("https://gitlab/api/v4/version", ct);
    if (verifyResult.IsFailure || !verifyResult.Value.Contains("version"))
    {
        await _events.PublishAsync(new RestoreTestFailed(new[] { "GitLab unreachable after restore" }, _clock.UtcNow), ct);
        return Result.Failure("restore test failed");
    }

    await _events.PublishAsync(new RestoreTestPassed(_clock.UtcNow), ct);
    await throwaway.DestroyAsync(ct);
    return Result.Success();
}

The job runs nightly. If it fails, the alert wakes you up. If it succeeds, you have cryptographic certainty that your backups are recoverable. Both are valuable. The alert is the more valuable of the two — it is the difference between a backup framework you trust and a backup framework you have.


The test

[Fact]
public async Task restic_provider_invokes_restic_with_correct_args()
{
    var restic = new ScriptedResticClient();
    restic.OnBackup(exitCode: 0, snapshotId: "abc123", sizeBytes: 1024);
    var secrets = new InMemorySecretStore();
    await secrets.WriteAsync("RESTIC_PASSWORD", "secret", default);

    var provider = new ResticBackupProvider(restic, secrets);
    var result = await provider.RunAsync(
        new BackupSpec("gitlab-config", "./gitlab/config", "backups"),
        default);

    result.IsSuccess.Should().BeTrue();
    result.Value.SnapshotId.Should().Be("abc123");
    restic.Calls.Should().ContainSingle(c => c.Method == "Backup" && c.Tag.Contains("gitlab-config"));
}

[Fact]
[Trait("category", "e2e")]
[Trait("category", "slow")]
public async Task nightly_restore_test_succeeds_against_real_devlab()
{
    using var devlab = await TestLab.NewAsync(name: "restore-loop", topology: "single");
    await devlab.UpAllAsync();

    // Trigger a real backup
    await devlab.Cli("backup", "run", "--target", "postgres").AssertExitZero();

    // Run the restore test
    var restoreResult = await devlab.Cli("backup", "restore-test", "--target", "postgres");
    restoreResult.ExitCode.Should().Be(0);
    restoreResult.StdOut.Should().Contain("restore test passed");
}

What this gives you that bash doesn't

A bash script that "does backups" is pg_dump | gzip > /backups/pg-$(date +%F).sql.gz plus a cron entry plus a hand-rolled retention loop in find -mtime. There is no test that proves the dump is valid. There is no test that proves the dump can be restored.

A typed three-provider backup framework with restore-test automation gives you, for the same surface area:

  • One contract (IBackupProvider) for every backup target
  • Three providers (restic, pgbackrest, minio-mirror) selected per target via [BackupPolicy]
  • Cron-driven scheduling with retention policies
  • A restore-test job that proves backups are recoverable
  • Tests that exercise each provider against fakes
  • Plugin extensibility for additional providers (Borg, Duplicacy, cloud-native backup services)

The bargain pays back the first time the restore-test fails and you fix the problem before you actually need a restore.


⬇ Download