Part 23: k8s-multi — One Control Plane and Three Workers
"The smallest topology that exercises real multi-node Kubernetes. Three workers because Longhorn wants three. One control plane because nothing is HA below this point."
Why
k8s-multi is the realistic topology. Four VMs, ~32 GB of RAM, exercises everything that k8s-single cannot:
- Real Longhorn with three-replica volumes
- Real Cilium with cross-node CNI traffic
- Real anti-affinity so pods spread across workers
- Real rolling deploys with at least two pods of every replica set
- Real PodDisruptionBudgets that prevent draining a worker if it would violate availability
- Real
kubectl drainexercise when you want to test what happens
The thesis: k8s-multi is the default for HomeLab K8s, the topology where workloads actually exercise the same code paths as production. It costs 32 GB of RAM and takes about 8 minutes to bootstrap.
The configuration
# config-homelab.yaml
name: dev
topology: multi # the HomeLab topology — four VMs in different roles
k8s:
distribution: kubeadm # we want real kubeadm for this topology
topology: k8s-multi
version: "v1.31.4"
cni: cilium
csi: longhorn
ingress: nginx
vos:
cpus: 12 # total across all VMs (the resolver allocates)
memory: 28672 # 28 GB of VM RAM (rest is host overhead)
subnet: "192.168.62"
acme:
name: dev
tld: lab# config-homelab.yaml
name: dev
topology: multi # the HomeLab topology — four VMs in different roles
k8s:
distribution: kubeadm # we want real kubeadm for this topology
topology: k8s-multi
version: "v1.31.4"
cni: cilium
csi: longhorn
ingress: nginx
vos:
cpus: 12 # total across all VMs (the resolver allocates)
memory: 28672 # 28 GB of VM RAM (rest is host overhead)
subnet: "192.168.62"
acme:
name: dev
tld: labThe resolver from Part 08 splits this into:
| VM | Role | vCPUs | RAM | IP |
|---|---|---|---|---|
dev-cp-1 |
control-plane | 2 | 4 GB | 192.168.62.10 |
dev-w-1 |
worker | 2 | 8 GB | 192.168.62.21 |
dev-w-2 |
worker | 2 | 8 GB | 192.168.62.22 |
dev-w-3 |
worker | 2 | 8 GB | 192.168.62.23 |
Plus host OS overhead (~3 GB), total is ~32 GB. The control plane is small (4 GB) because it does not run workloads — the taint stays on. The workers are 8 GB each because they run everything else.
The boot sequence
$ homelab init --name dev
$ cd dev
$ vim config-homelab.yaml # set topology: multi, k8s.topology: k8s-multi
$ homelab packer build # build alpine-3.21-k8snode-kubeadm box
$ homelab box add --local
$ homelab vos init
$ homelab vos up # boot 4 VMs in parallel (~3 min)
$ homelab dns add api.dev.lab 192.168.62.21
$ homelab tls init && homelab tls install && homelab tls trust
$ homelab k8s create # bootstrap the control plane (~3 min)
$ homelab k8s node add dev-w-1 # join worker 1
$ homelab k8s node add dev-w-2 # join worker 2
$ homelab k8s node add dev-w-3 # join worker 3
$ homelab k8s apply # install Cilium, Longhorn, ingress, ...
$ homelab k8s use-context dev
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
dev-cp-1 Ready control-plane 8m25s v1.31.4
dev-w-1 Ready <none> 5m12s v1.31.4
dev-w-2 Ready <none> 4m33s v1.31.4
dev-w-3 Ready <none> 3m51s v1.31.4$ homelab init --name dev
$ cd dev
$ vim config-homelab.yaml # set topology: multi, k8s.topology: k8s-multi
$ homelab packer build # build alpine-3.21-k8snode-kubeadm box
$ homelab box add --local
$ homelab vos init
$ homelab vos up # boot 4 VMs in parallel (~3 min)
$ homelab dns add api.dev.lab 192.168.62.21
$ homelab tls init && homelab tls install && homelab tls trust
$ homelab k8s create # bootstrap the control plane (~3 min)
$ homelab k8s node add dev-w-1 # join worker 1
$ homelab k8s node add dev-w-2 # join worker 2
$ homelab k8s node add dev-w-3 # join worker 3
$ homelab k8s apply # install Cilium, Longhorn, ingress, ...
$ homelab k8s use-context dev
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
dev-cp-1 Ready control-plane 8m25s v1.31.4
dev-w-1 Ready <none> 5m12s v1.31.4
dev-w-2 Ready <none> 4m33s v1.31.4
dev-w-3 Ready <none> 3m51s v1.31.4The homelab k8s create verb actually does all of this in one command — it calls the topology resolver, brings up the VMs, runs the bootstrap saga for the control plane, runs the join saga for the workers in parallel, and then runs the apply step. The split above is to show the constituent parts.
End to end: about 18 minutes the first time, ~10 minutes thereafter.
What k8s-multi enables
The five things k8s-single cannot do, all of which k8s-multi does:
1. Longhorn with three replicas
Longhorn's default is three replicas. On k8s-single we drop to one (no other nodes to replicate to). On k8s-multi we get the real three. A PersistentVolumeClaim of size 10Gi creates a Longhorn volume with three replica chunks, one on each worker. If you vagrant halt dev-w-2, the volume keeps serving from dev-w-1 and dev-w-3 and Longhorn marks the third replica as degraded; when you bring dev-w-2 back, Longhorn syncs the missing data automatically.
2. Anti-affinity
A workload that requests podAntiAffinity with topologyKey: kubernetes.io/hostname and replicas: 3 lands one pod per worker. On k8s-single the affinity has no effect (everything is on one node). On k8s-multi you can verify the spread with kubectl get pods -o wide.
3. Real rolling deploys
A Deployment with strategy: RollingUpdate and replicas: 3 rolls one pod at a time. While the rolling update is in progress, two pods are still serving traffic on different workers. The Service round-robins between them. The user actually sees the rolling update in real time.
4. PodDisruptionBudgets
A PodDisruptionBudget with minAvailable: 2 for a deployment with replicas: 3 allows at most one pod to be down at a time. kubectl drain dev-w-2 succeeds (one pod can move). kubectl drain dev-w-2 --disable-eviction and then trying to drain dev-w-3 while dev-w-2 is still draining fails with "would violate disruption budget". This is the first time most developers see PDBs in action — and the lesson is permanent.
5. Realistic Cilium
Cilium running on three workers exercises BPF programs across nodes, the inter-node tunnel (VXLAN by default), and the kube-proxy replacement. On k8s-single all traffic stays on one node so the inter-node BPF code path never runs. On k8s-multi it does, and bugs in the Cilium configuration (e.g. wrong tunnel mode, MTU mismatch) surface immediately.
What 32 GB does NOT cover
The known limitations of k8s-multi:
- No control plane HA. One control plane, one etcd. If
dev-cp-1goes down, the cluster loses its API server. Workloads keep running (the kubelets are autonomous) but you cannot deploy anything new. - No
kubeadm upgradeof HA. The upgrade flow on a single control plane is straightforward; the HA flow (drain-upgrade-uncordon for each control plane) is not exercised. - No realistic etcd quorum scenarios. etcd is single-node, so quorum loss is not a thing.
For these, k8s-ha is the topology to use.
Why this is the default
The default config field for K8s.Dsl is topology: k8s-multi. The reasons:
- It is the realistic topology. Most production clusters have at least three workers; matching that in dev catches the most bugs.
- It fits in 32 GB. A 32 GB workstation can run it; a 64 GB workstation can run it alongside browser, IDE, and host services with comfortable headroom.
- It is the right starting point. Users who need less can drop to
k8s-single. Users who need more can move tok8s-ha. Most users land at multi and stay there.
What this gives you that k8s-single doesn't
Five capabilities (Longhorn replication, anti-affinity, rolling deploys, PDBs, real Cilium) that k8s-single simply cannot exercise. The cost is double the RAM (32 GB instead of 16 GB) and double the boot time (10 minutes instead of 6 minutes). The benefit is every multi-node bug class becomes testable.
The bargain pays back the first time you write a workload that seems to work in k8s-single but breaks in k8s-multi because the anti-affinity rule was wrong, or the RWO PVC fails to attach during a rolling deploy.