Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Part 23: k8s-multi — One Control Plane and Three Workers

"The smallest topology that exercises real multi-node Kubernetes. Three workers because Longhorn wants three. One control plane because nothing is HA below this point."


Why

k8s-multi is the realistic topology. Four VMs, ~32 GB of RAM, exercises everything that k8s-single cannot:

  • Real Longhorn with three-replica volumes
  • Real Cilium with cross-node CNI traffic
  • Real anti-affinity so pods spread across workers
  • Real rolling deploys with at least two pods of every replica set
  • Real PodDisruptionBudgets that prevent draining a worker if it would violate availability
  • Real kubectl drain exercise when you want to test what happens

The thesis: k8s-multi is the default for HomeLab K8s, the topology where workloads actually exercise the same code paths as production. It costs 32 GB of RAM and takes about 8 minutes to bootstrap.


The configuration

# config-homelab.yaml
name: dev
topology: multi             # the HomeLab topology — four VMs in different roles
k8s:
  distribution: kubeadm     # we want real kubeadm for this topology
  topology: k8s-multi
  version: "v1.31.4"
  cni: cilium
  csi: longhorn
  ingress: nginx
vos:
  cpus: 12                  # total across all VMs (the resolver allocates)
  memory: 28672             # 28 GB of VM RAM (rest is host overhead)
  subnet: "192.168.62"
acme:
  name: dev
  tld: lab

The resolver from Part 08 splits this into:

VM Role vCPUs RAM IP
dev-cp-1 control-plane 2 4 GB 192.168.62.10
dev-w-1 worker 2 8 GB 192.168.62.21
dev-w-2 worker 2 8 GB 192.168.62.22
dev-w-3 worker 2 8 GB 192.168.62.23

Plus host OS overhead (~3 GB), total is ~32 GB. The control plane is small (4 GB) because it does not run workloads — the taint stays on. The workers are 8 GB each because they run everything else.


The boot sequence

$ homelab init --name dev
$ cd dev
$ vim config-homelab.yaml          # set topology: multi, k8s.topology: k8s-multi
$ homelab packer build             # build alpine-3.21-k8snode-kubeadm box
$ homelab box add --local
$ homelab vos init
$ homelab vos up                   # boot 4 VMs in parallel (~3 min)
$ homelab dns add api.dev.lab 192.168.62.21
$ homelab tls init && homelab tls install && homelab tls trust
$ homelab k8s create               # bootstrap the control plane (~3 min)
$ homelab k8s node add dev-w-1     # join worker 1
$ homelab k8s node add dev-w-2     # join worker 2
$ homelab k8s node add dev-w-3     # join worker 3
$ homelab k8s apply                # install Cilium, Longhorn, ingress, ...
$ homelab k8s use-context dev
$ kubectl get nodes
NAME       STATUS   ROLES           AGE     VERSION
dev-cp-1   Ready    control-plane   8m25s   v1.31.4
dev-w-1    Ready    <none>          5m12s   v1.31.4
dev-w-2    Ready    <none>          4m33s   v1.31.4
dev-w-3    Ready    <none>          3m51s   v1.31.4

The homelab k8s create verb actually does all of this in one command — it calls the topology resolver, brings up the VMs, runs the bootstrap saga for the control plane, runs the join saga for the workers in parallel, and then runs the apply step. The split above is to show the constituent parts.

End to end: about 18 minutes the first time, ~10 minutes thereafter.


What k8s-multi enables

The five things k8s-single cannot do, all of which k8s-multi does:

1. Longhorn with three replicas

Longhorn's default is three replicas. On k8s-single we drop to one (no other nodes to replicate to). On k8s-multi we get the real three. A PersistentVolumeClaim of size 10Gi creates a Longhorn volume with three replica chunks, one on each worker. If you vagrant halt dev-w-2, the volume keeps serving from dev-w-1 and dev-w-3 and Longhorn marks the third replica as degraded; when you bring dev-w-2 back, Longhorn syncs the missing data automatically.

2. Anti-affinity

A workload that requests podAntiAffinity with topologyKey: kubernetes.io/hostname and replicas: 3 lands one pod per worker. On k8s-single the affinity has no effect (everything is on one node). On k8s-multi you can verify the spread with kubectl get pods -o wide.

3. Real rolling deploys

A Deployment with strategy: RollingUpdate and replicas: 3 rolls one pod at a time. While the rolling update is in progress, two pods are still serving traffic on different workers. The Service round-robins between them. The user actually sees the rolling update in real time.

4. PodDisruptionBudgets

A PodDisruptionBudget with minAvailable: 2 for a deployment with replicas: 3 allows at most one pod to be down at a time. kubectl drain dev-w-2 succeeds (one pod can move). kubectl drain dev-w-2 --disable-eviction and then trying to drain dev-w-3 while dev-w-2 is still draining fails with "would violate disruption budget". This is the first time most developers see PDBs in action — and the lesson is permanent.

5. Realistic Cilium

Cilium running on three workers exercises BPF programs across nodes, the inter-node tunnel (VXLAN by default), and the kube-proxy replacement. On k8s-single all traffic stays on one node so the inter-node BPF code path never runs. On k8s-multi it does, and bugs in the Cilium configuration (e.g. wrong tunnel mode, MTU mismatch) surface immediately.


What 32 GB does NOT cover

The known limitations of k8s-multi:

  • No control plane HA. One control plane, one etcd. If dev-cp-1 goes down, the cluster loses its API server. Workloads keep running (the kubelets are autonomous) but you cannot deploy anything new.
  • No kubeadm upgrade of HA. The upgrade flow on a single control plane is straightforward; the HA flow (drain-upgrade-uncordon for each control plane) is not exercised.
  • No realistic etcd quorum scenarios. etcd is single-node, so quorum loss is not a thing.

For these, k8s-ha is the topology to use.


Why this is the default

The default config field for K8s.Dsl is topology: k8s-multi. The reasons:

  1. It is the realistic topology. Most production clusters have at least three workers; matching that in dev catches the most bugs.
  2. It fits in 32 GB. A 32 GB workstation can run it; a 64 GB workstation can run it alongside browser, IDE, and host services with comfortable headroom.
  3. It is the right starting point. Users who need less can drop to k8s-single. Users who need more can move to k8s-ha. Most users land at multi and stay there.

What this gives you that k8s-single doesn't

Five capabilities (Longhorn replication, anti-affinity, rolling deploys, PDBs, real Cilium) that k8s-single simply cannot exercise. The cost is double the RAM (32 GB instead of 16 GB) and double the boot time (10 minutes instead of 6 minutes). The benefit is every multi-node bug class becomes testable.

The bargain pays back the first time you write a workload that seems to work in k8s-single but breaks in k8s-multi because the anti-affinity rule was wrong, or the RWO PVC fails to attach during a rolling deploy.


⬇ Download