Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Part 40: kubeadm Upgrades — The Painful One

"kubeadm upgrade is one command per step. There are seventeen steps. We make every one of them a saga."


Why

Upgrading a kubeadm cluster is the highest-risk routine operation in production Kubernetes. The flow is well-documented but unforgiving: you must walk a specific sequence of kubeadm upgrade plan, kubeadm upgrade apply, kubectl drain, package upgrade, kubelet restart, kubectl uncordon, repeated for each control plane and then for each worker. A misstep at any point can leave the cluster in a half-upgraded state where some components are at v1.31 and others at v1.32, and the API server refuses operations.

The thesis: K8s.Dsl wraps the kubeadm upgrade flow in a KubeadmUpgradeSaga from the Saga toolbelt. Each step is a saga step with a compensation. The user runs homelab k8s upgrade --to v1.32.0 and the saga walks the path. On failure, the saga rolls back the most recent step and leaves the cluster in a known state.


The upgrade path resolver

[Injectable(ServiceLifetime.Singleton)]
public sealed class KubernetesUpgradePathResolver
{
    // Kubernetes supports skipping at most one minor version per upgrade.
    // 1.30 → 1.31 → 1.32 is allowed. 1.30 → 1.32 is not.
    public IReadOnlyList<string> ResolvePath(string from, string to)
    {
        var fromV = ParseSemver(from);
        var toV = ParseSemver(to);

        if (fromV.Minor == toV.Minor)
            return new[] { to };   // patch upgrade only

        var path = new List<string>();
        for (var minor = fromV.Minor + 1; minor <= toV.Minor; minor++)
        {
            // Last patch of the target minor (in practice, this is queried from a release feed)
            var version = LatestPatchOf(toV.Major, minor);
            path.Add(version);
        }

        return path;
    }
}

A user upgrading from v1.30.5 to v1.32.0 gets a two-step path: v1.31.4v1.32.0. The intermediate stop is mandatory; kubeadm refuses to skip minor versions.


The saga (single control plane)

[Saga]
public sealed class KubeadmUpgradeSaga
{
    [SagaStep(Order = 1, Compensation = nameof(NothingToCompensate))]
    public async Task<Result> BackupBeforeUpgrade(KubeadmUpgradeContext ctx, CancellationToken ct)
    {
        var result = await _velero.RunBackupAsync($"pre-upgrade-{ctx.TargetVersion}", new[] { "kube-system" }, ct);
        if (result.IsSuccess) ctx.PreUpgradeBackupId = result.Value;
        return result.Map();
    }

    [SagaStep(Order = 2, Compensation = nameof(RestoreFromBackup))]
    public async Task<Result> CheckUpgradePlan(KubeadmUpgradeContext ctx, CancellationToken ct)
    {
        var planResult = await _vagrant.SshCommandAsync(
            ctx.FirstControlPlaneNode,
            $"sudo kubeadm upgrade plan {ctx.TargetVersion}",
            ct);

        // The plan output tells us if the upgrade is even valid
        if (planResult.IsFailure) return planResult.Map();
        if (!planResult.Value.StdOut.Contains("[upgrade/versions] Latest version"))
            return Result.Failure("kubeadm upgrade plan returned unexpected output");
        return Result.Success();
    }

    [SagaStep(Order = 3, Compensation = nameof(RestoreFromBackup))]
    public async Task<Result> UpgradeFirstControlPlane(KubeadmUpgradeContext ctx, CancellationToken ct)
    {
        // 1. Apt-pin the new kubeadm version on the first cp node
        var pkgResult = await _vagrant.SshCommandAsync(
            ctx.FirstControlPlaneNode,
            $"sudo apk add --no-cache kubeadm={ctx.TargetVersion.TrimStart('v')}-r0",
            ct);
        if (pkgResult.IsFailure) return pkgResult.Map();

        // 2. Run kubeadm upgrade apply
        var applyResult = await _vagrant.SshCommandAsync(
            ctx.FirstControlPlaneNode,
            $"sudo kubeadm upgrade apply -y {ctx.TargetVersion}",
            ct);
        return applyResult.Map();
    }

    [SagaStep(Order = 4, Compensation = nameof(NothingToCompensate))]
    public async Task<Result> DrainControlPlaneNode(KubeadmUpgradeContext ctx, CancellationToken ct)
    {
        return await _kubectl.DrainAsync(
            ctx.FirstControlPlaneNode,
            ignoreDaemonsets: true,
            deleteEmptyDirData: true,
            force: true,
            ct).Map();
    }

    [SagaStep(Order = 5, Compensation = nameof(NothingToCompensate))]
    public async Task<Result> UpgradeKubeletAndKubectl(KubeadmUpgradeContext ctx, CancellationToken ct)
    {
        return await _vagrant.SshCommandAsync(
            ctx.FirstControlPlaneNode,
            $"sudo apk add --no-cache kubelet={ctx.TargetVersion.TrimStart('v')}-r0 kubectl={ctx.TargetVersion.TrimStart('v')}-r0 && " +
            $"sudo service kubelet restart",
            ct).Map();
    }

    [SagaStep(Order = 6, Compensation = nameof(NothingToCompensate))]
    public async Task<Result> UncordonControlPlaneNode(KubeadmUpgradeContext ctx, CancellationToken ct)
    {
        return await _kubectl.UncordonAsync(ctx.FirstControlPlaneNode, ct).Map();
    }

    // Steps 7..9 repeat for each additional control plane (in HA topology)
    // Steps 10..12 repeat for each worker

    public async Task<Result> RestoreFromBackup(KubeadmUpgradeContext ctx, CancellationToken ct)
    {
        if (ctx.PreUpgradeBackupId is null) return Result.Success();
        return await _velero.RestoreAsync(ctx.PreUpgradeBackupId, ct).Map();
    }

    public Task<Result> NothingToCompensate(KubeadmUpgradeContext ctx, CancellationToken ct)
        => Task.FromResult(Result.Success());
}

For an HA topology, the saga has additional steps for cp-2 and cp-3 (each one is a five-step sequence: drain, package upgrade, kubelet restart, uncordon, wait for ready). Then the same five-step sequence runs for each worker. A typical HA upgrade has ~30 saga steps total.

The compensation steps are deliberate: most steps have NothingToCompensate because reverting them mid-upgrade is more dangerous than leaving them as-is. The real recovery path on failure is "restore from the pre-upgrade Velero backup" (the compensation on step 2 and step 3). This is the entire reason backup-before-upgrade is mandatory.


The user experience

$ homelab k8s upgrade --to v1.32.0
Current version: v1.30.5
Target version:  v1.32.0
Upgrade path:
  v1.31.4 (intermediate stop)
  v1.32.0 (target)

This is a 2-step upgrade across 6 nodes. Estimated time: ~45 minutes.
A pre-upgrade backup will be taken to MinIO before each step.

Continue? [y/N] y

[1/2] Upgrading to v1.31.4
  ✓ pre-upgrade backup (acme-pre-upgrade-v1.31.4, 1.4 GB)
  ✓ kubeadm upgrade plan v1.31.4
  ✓ kubeadm upgrade apply v1.31.4 (cp-1) — 4m22s
  ✓ drain cp-1
  ✓ kubelet upgrade on cp-1
  ✓ uncordon cp-1
  ✓ drain cp-2
  ✓ kubelet upgrade on cp-2
  ✓ uncordon cp-2
  ✓ drain cp-3
  ✓ kubelet upgrade on cp-3
  ✓ uncordon cp-3
  ✓ drain w-1
  ✓ kubelet upgrade on w-1
  ✓ uncordon w-1
  ✓ drain w-2
  ✓ kubelet upgrade on w-2
  ✓ uncordon w-2
  ✓ drain w-3
  ✓ kubelet upgrade on w-3
  ✓ uncordon w-3
  ✓ all nodes at v1.31.4

[2/2] Upgrading to v1.32.0
  ... same flow ...

✓ cluster upgrade complete: all nodes at v1.32.0
  total elapsed: 41m18s
  pre-upgrade backups: acme-pre-upgrade-v1.31.4, acme-pre-upgrade-v1.32.0

The whole flow is one verb. The user types homelab k8s upgrade --to v1.32.0, walks away for 45 minutes, comes back to either a successfully upgraded cluster or a clean rollback to v1.30.5.


What happens on failure

If step 11 (uncordon cp-1) succeeds but step 13 (drain cp-2) fails because cp-2 is unresponsive, the saga:

  1. Stops at the failure
  2. Calls the compensation chain in reverse order from step 13 backward
  3. Most compensations are NothingToCompensate (the upgrade-in-place is hard to undo)
  4. The compensation at step 3 is RestoreFromBackup, which restores from pre-upgrade-v1.31.4
  5. The cluster ends up in its pre-upgrade state, all at v1.30.5

The user is alerted via the event bus. The error message identifies the failing step and the exit code from kubeadm. The user can inspect cp-2, fix the underlying problem (often a stuck pod that refused to drain), and re-run homelab k8s upgrade --to v1.32.0.


What this gives you that hand-rolled upgrades don't

Hand-rolled is a wiki page with 17 steps that the on-call engineer copies and pastes at 3 AM. The hand-rolled flow has no compensation. If step 11 fails, the engineer looks at the wiki for the rollback procedure, which usually says "restore from backup", which usually means the engineer scrambles to find a backup that may or may not exist.

The KubeadmUpgradeSaga gives you, for the same surface area:

  • One verb to upgrade
  • Mandatory backup before each step
  • A typed path resolver that respects required intermediate versions
  • Compensation that restores from backup on failure
  • Tests that exercise the saga against fake IVagrantBackend + fake IVeleroClient
  • Same flow in dev as in prod (because the saga IS the production runbook)

The bargain pays back the first time you upgrade your dev cluster, hit a snag, the saga rolls back, you fix the snag, and you run the same upgrade in production knowing exactly what to expect.


⬇ Download