Part 01: The Problem — Why Every Dev Rebuilds the Same Homelab
"I just want a GitLab on my laptop." — every developer, the day before they spend three weekends on it.
The honest starting point
I want a homelab.
Not a Kubernetes cluster on a Raspberry Pi farm. Not a multi-region Terraform monstrosity. Not a NAS with ten Docker containers I forgot the password for. I just want a local GitLab that hosts my code, runs my CI, ships my NuGet packages, serves my docs, and survives a reboot. I want to type one command, walk away, and come back to a working environment. I want the same setup on my laptop, on my desktop, and on the colleague's machine I onboarded last week.
That is the promise of a homelab. And every developer who has ever pursued that promise has discovered the same thing: it does not exist as a product. It exists as a pile.
The pile has a recognisable shape. There is a Packer directory with two or three .pkr.hcl files (or, if you started in 2016, a few .json templates that the Packer binary still grudgingly accepts). There is a Vagrantfile next to a vagrant.yaml, glued together by an inline Ruby block that everyone is afraid to touch. There is a docker-compose.yaml (or three) with a comment at the top that says # v2.4 — DO NOT EDIT. There is a traefik.yml and a dynamic.yml and a directory full of .crt and .key files whose origin nobody remembers. There is a gitlab.rb from the Omnibus install that someone tweaked once, in 2022. And tying it all together, there are bash scripts. Not one. Several. With names like setup.sh, bootstrap.sh, provision-gitlab.sh, init-runners.sh, and the inevitable fix-everything.sh that nobody admits to writing.
This is not a strawman. Open the personal infrastructure repo of any senior backend developer who has ever wanted to host their own GitLab and you will find some version of this pile. I have written six of them over the last decade. Each rewrite started with "this time I'll do it properly" and ended with the same pile, slightly differently arranged.
The pile is the disease. This series is the cure.
The pile, anatomized
Let us be specific. Here is the file inventory of the median "I have a homelab" repository. If your repo has fewer files, you have not finished. If it has more, you are further down the same road.
my-homelab/
├── README.md # last updated 18 months ago
├── packer/
│ ├── alpine-3.18.json # legacy, kept "just in case"
│ ├── alpine-3.21.pkr.hcl
│ ├── http/
│ │ └── answers # autoinstall answer file
│ ├── scripts/
│ │ ├── install-docker.sh
│ │ ├── enable-docker-tcp.sh # opens 2375, no TLS, "TODO"
│ │ └── cleanup.sh
│ └── output-vagrant/
│ └── package.box # 1.4 GB, gitignored
├── vagrant/
│ ├── Vagrantfile # 240 lines of Ruby
│ ├── vagrant.yaml # data file the Vagrantfile reads
│ └── .vagrant/ # gitignored
├── compose/
│ ├── docker-compose.yaml # gitlab + runner + traefik + postgres
│ ├── docker-compose.override.yaml # local secrets, gitignored
│ ├── traefik/
│ │ ├── traefik.yml # static config
│ │ ├── dynamic.yml # dynamic config
│ │ └── certs/
│ │ ├── ca.crt
│ │ ├── ca.key
│ │ ├── wildcard.crt
│ │ └── wildcard.key
│ └── gitlab/
│ ├── gitlab.rb # tweaked in 2022, nobody remembers why
│ └── runner-config.toml.template
├── scripts/
│ ├── setup.sh # the one you run first
│ ├── bootstrap.sh # the one setup.sh calls
│ ├── provision-gitlab.sh
│ ├── init-runners.sh
│ ├── add-dns.sh # writes to /etc/hosts via sudo
│ ├── trust-ca.sh # macOS-specific, broken on Linux
│ ├── teardown.sh # works ~70% of the time
│ └── fix-everything.sh # the one nobody admits to running
├── docs/
│ └── HOW-TO.md # contradicts setup.sh in three places
└── .env.example # the only file that is honestmy-homelab/
├── README.md # last updated 18 months ago
├── packer/
│ ├── alpine-3.18.json # legacy, kept "just in case"
│ ├── alpine-3.21.pkr.hcl
│ ├── http/
│ │ └── answers # autoinstall answer file
│ ├── scripts/
│ │ ├── install-docker.sh
│ │ ├── enable-docker-tcp.sh # opens 2375, no TLS, "TODO"
│ │ └── cleanup.sh
│ └── output-vagrant/
│ └── package.box # 1.4 GB, gitignored
├── vagrant/
│ ├── Vagrantfile # 240 lines of Ruby
│ ├── vagrant.yaml # data file the Vagrantfile reads
│ └── .vagrant/ # gitignored
├── compose/
│ ├── docker-compose.yaml # gitlab + runner + traefik + postgres
│ ├── docker-compose.override.yaml # local secrets, gitignored
│ ├── traefik/
│ │ ├── traefik.yml # static config
│ │ ├── dynamic.yml # dynamic config
│ │ └── certs/
│ │ ├── ca.crt
│ │ ├── ca.key
│ │ ├── wildcard.crt
│ │ └── wildcard.key
│ └── gitlab/
│ ├── gitlab.rb # tweaked in 2022, nobody remembers why
│ └── runner-config.toml.template
├── scripts/
│ ├── setup.sh # the one you run first
│ ├── bootstrap.sh # the one setup.sh calls
│ ├── provision-gitlab.sh
│ ├── init-runners.sh
│ ├── add-dns.sh # writes to /etc/hosts via sudo
│ ├── trust-ca.sh # macOS-specific, broken on Linux
│ ├── teardown.sh # works ~70% of the time
│ └── fix-everything.sh # the one nobody admits to running
├── docs/
│ └── HOW-TO.md # contradicts setup.sh in three places
└── .env.example # the only file that is honestThat repo has, depending on how thorough the author was, between 1,500 and 4,000 lines of operationally significant text. Of those lines, fewer than 100 are typed. The rest is YAML, HCL, JSON, bash, Ruby, and free-form Markdown. There is no compiler that will tell you when one of those 4,000 lines drifts out of sync with another. There is no test that fails when the gitlab.rb references an external_url that the Traefik config no longer routes. There is no IDE that will autocomplete the volume name in the traefik service that the gitlab service expects to mount. There is no analyzer that will warn you when the cert SAN list in wildcard.crt no longer matches the hostnames the dynamic.yml is serving.
Every line is a string. Every string can drift. Every drift is a Saturday afternoon you do not get back.
A short tour of the drift
Let me walk you through the drift, in the order it actually bites.
Drift #1 — The packer image and the docker version
You build an Alpine image with Packer. The provisioning script installs Docker via apk add docker. You don't pin the version. Six months later you rebuild the image and get a new Docker. The docker-compose.yaml you wrote uses version: "3.8" and a depends_on syntax that the new Docker Compose v2 reads slightly differently. The gitlab service no longer waits for postgres to be healthy before it starts. The first start of GitLab now races the database. Half the time it works. The other half you get a 502 from the runner registration script. The fix is one line in the compose file. Finding it costs you half a day of staring at logs.
Nothing you wrote in Packer told you the Docker version. Nothing you wrote in Compose told you the Packer image version. Nothing connected the two.
Drift #2 — The Vagrant box name and the registry path
You build the box with homelab packer build. It gets dropped at packer/output-vagrant/package.box. You add it to Vagrant locally with vagrant box add my-homelab/alpine-3.21-dockerhost packer/output-vagrant/package.box. The Vagrantfile references it as config.vm.box = "my-homelab/alpine-3.21-dockerhost". Six weeks later you publish it to a self-hosted registry as frenchexdev/alpine-3.21-dockerhost-v1.0.2. You forget to update the Vagrantfile. The next vagrant up on a fresh machine pulls the old box from the public Vagrant Cloud — except there isn't one — and fails with a 404. You discover that the Vagrantfile has three different box references, two of which are stale.
Nothing you wrote in the Packer post-processor told the Vagrantfile what name to use. Nothing you wrote in the registry catalog told the Vagrantfile what version to expect.
Drift #3 — The compose volume name and the Traefik file provider
You add a new service to docker-compose.yaml. It needs to read certs. You mount a named volume called traefik-certs. The Traefik file provider in dynamic.yml references /etc/traefik/certs/wildcard.crt. Inside Traefik's container, the volume is mounted at /etc/traefik/certs. So the mount path matches and the cert is found. So far so good.
Now you rename the volume to tls-certs because you decided to use it for things other than Traefik. You update the traefik service. You forget the gitlab service that also mounted it (because GitLab needs the same wildcard cert for its built-in Nginx). The next compose up starts GitLab with no cert. GitLab serves HTTP on port 80, Traefik tries to forward HTTPS to port 443, and you get a redirect loop. The fix is one line. Finding it costs you 90 minutes of curl with -v.
Nothing in the Compose file checks that the mount paths in two different services are consistent. Nothing in Traefik checks that the file referenced in the dynamic config actually exists in the volume.
Drift #4 — The gitlab.rb external_url and the DNS entry
You set external_url 'https://gitlab.frenchexdev.lab' in gitlab.rb. You add 192.168.56.10 gitlab.frenchexdev.lab to your hosts file. You set up Traefik to route Host(\gitlab.frenchexdev.lab`). You generate a wildcard cert for *.frenchexdev.lab`. Everything works.
Six months later, you decide to switch from a single-VM topology to a multi-VM topology. The gateway VM gets 192.168.56.10. The platform VM (where GitLab now lives) gets 192.168.56.11. You update Traefik. You update the DNS entry to point at .10 (the gateway). You forget that the gitlab.rb external_url is the URL GitLab thinks it is at — and it is still correct. But you also forget that GitLab Runner registration uses the gitlab.rb clone_url, which you never set, so it falls back to a guess based on the hostname Docker assigned to the GitLab container. The runner registers, but every CI job tries to clone from http://abc123def456:80/... which does not resolve from inside the runner container. CI jobs fail with fatal: unable to access. You spend three hours convinced it is a network problem.
Nothing in the GitLab config knows about the DNS topology. Nothing in the DNS knows about the GitLab clone URL. Nothing in the runner knows what the GitLab thinks its URL is.
Drift #5 — The cert expiry and the calendar reminder
You generate a self-signed wildcard cert for *.frenchexdev.lab with a 365-day validity. You add a calendar reminder for "renew homelab cert". The calendar reminder fires 12 months later. You ignore it because the lab is working. The cert expires three days later. Every browser refuses to load gitlab.frenchexdev.lab. Every git push fails with a TLS error. Every CI job that pulls from baget.frenchexdev.lab fails. You spend the morning regenerating certs, distributing them, and restarting Traefik. You make a note to set the next cert to 10 years. You make the same note next year.
Nothing in the cert tells the lab when it expires. Nothing in the lab fails loudly enough, early enough, to warn you. Monitoring would catch it — if you had set up monitoring.
Drift #6 — The bash scripts and reality
The setup.sh script begins:
#!/usr/bin/env bash
set -euo pipefail
cd "$(dirname "$0")/.."
./scripts/bootstrap.sh
./scripts/provision-gitlab.sh
./scripts/init-runners.sh
./scripts/add-dns.sh gitlab.frenchexdev.lab 192.168.56.10
./scripts/trust-ca.sh
echo "Done. Open https://gitlab.frenchexdev.lab in your browser."#!/usr/bin/env bash
set -euo pipefail
cd "$(dirname "$0")/.."
./scripts/bootstrap.sh
./scripts/provision-gitlab.sh
./scripts/init-runners.sh
./scripts/add-dns.sh gitlab.frenchexdev.lab 192.168.56.10
./scripts/trust-ca.sh
echo "Done. Open https://gitlab.frenchexdev.lab in your browser."That script has not been run end-to-end on a fresh machine in 14 months. It still works on the author's laptop, because the author's laptop has all the side effects of every previous attempt baked in. On a fresh machine — say, a colleague's brand-new ThinkPad — bootstrap.sh fails on line 47 because it assumes mkcert is on the PATH, and the colleague has never installed mkcert. The colleague installs mkcert. The script then fails on line 102 because provision-gitlab.sh assumes a specific Postgres major version. The colleague upgrades. The script fails on line 156 because add-dns.sh uses sudo dscacheutil -flushcache, which is a macOS command, and the colleague is on Linux. The colleague rewrites the line. The script fails on line 203 because trust-ca.sh assumes the Linux user is in the sudo group, and on this Ubuntu install they are not. The colleague gives up and pings the author on Slack. The author says "oh yeah, I think I fixed that on my machine". They never push the fix.
Nothing in the bash script declares its preconditions. Nothing in the bash script knows what platform it is running on. Nothing in the bash script can be unit-tested without spinning up a VM. Nothing in the bash script returns a Result<T> that the caller can branch on. Bash returns 0 or non-zero, and if you forget set -e, even that lies.
Drift #7 — The CI runner that has no idea where it is
You finally get GitLab running. You register a runner. The runner uses the docker executor. The runner needs to talk back to GitLab. The runner is a container running on the same VM as GitLab. The runner config has url = "http://gitlab" because that worked when you tested it manually. Then you add a second runner on a different VM. Its config also has url = "http://gitlab". The second runner cannot resolve gitlab because it is in a different Docker network. You change it to http://gitlab.frenchexdev.lab. The second runner can now resolve it (via PiHole) but the first runner now cannot, because Traefik intercepts the request and the runner does not trust the self-signed cert.
You write a bash script that templates the runner config based on the runner's location. The bash script has a hard-coded list of locations. You add a third runner. You forget to update the list. The third runner starts with the wrong URL. CI jobs fail intermittently — only on the third runner. The intermittency is the worst part: it costs you a day to even notice the pattern.
Nothing in the runner config knows where the runner is. Nothing in GitLab knows that there are runners in different network zones. Nothing in the topology declaration explains that the routing depends on the source VM.
Counting the cost
Let us make this concrete. Take a developer who maintains a personal homelab — call her Alice. Alice is a competent senior backend engineer. She has been doing this for years.
Over an average year, here is what Alice spends on her homelab, conservatively:
| Activity | Hours per year | Notes |
|---|---|---|
| Initial setup of a new lab on a new machine | 20 | Once per year, on average; includes the inevitable "fix-everything" sessions |
| Renewing certs | 4 | Two cycles a year, two hours each, including the panic |
| Recovering from a Docker / Compose / Vagrant version bump | 8 | Two events a year, four hours each |
| Recovering from a GitLab Omnibus upgrade | 6 | One major upgrade a year, six hours of grief |
| Onboarding a colleague to the same lab setup | 8 | One colleague a year, eight hours of pair-debugging |
| Saturday-afternoon "why is this broken now" sessions | 16 | Four sessions a year, four hours each |
| Writing the bash script that "fixes everything for next time" | 6 | Three sessions a year, two hours each, never finished |
| Total | 68 | Almost two work-weeks |
Two work-weeks per year. Per developer. On a homelab. That nobody else uses. To do a thing — host a GitLab — that GitLab.com offers for free.
This is not a rational allocation of attention. It is a series of small individual mistakes, each of which felt necessary at the time, accumulating into a pile of work that has no compile-time check, no test, and no one to ask when it breaks. The pile is the disease. Two work-weeks a year is the symptom.
Why YAML didn't save us
The honest objection is: "YAML solved this. Look at Kubernetes. Look at Helm. Look at GitOps. Look at Docker Compose. We have declarative infrastructure. We don't need to write bash anymore."
YAML did not solve this. YAML moved the drift from one file format to another. Here is the proof.
YAML is untyped
When you write:
services:
gitlab:
image: gitlab/gitlab-ce:16.4.2-ce.0
volumes:
- traefik-certs:/etc/gitlab/ssl:ro
environment:
GITLAB_OMNIBUS_CONFIG: |
external_url 'https://gitlab.frenchexdev.lab'
nginx['ssl_certificate'] = '/etc/gitlab/ssl/wildcard.crt'
nginx['ssl_certificate_key'] = '/etc/gitlab/ssl/wildcard.key'services:
gitlab:
image: gitlab/gitlab-ce:16.4.2-ce.0
volumes:
- traefik-certs:/etc/gitlab/ssl:ro
environment:
GITLAB_OMNIBUS_CONFIG: |
external_url 'https://gitlab.frenchexdev.lab'
nginx['ssl_certificate'] = '/etc/gitlab/ssl/wildcard.crt'
nginx['ssl_certificate_key'] = '/etc/gitlab/ssl/wildcard.key'…you have written exactly four strings (gitlab/gitlab-ce:16.4.2-ce.0, traefik-certs:/etc/gitlab/ssl:ro, the embedded Ruby for Omnibus, and the volume name) that have no compile-time relationship to each other. The volume traefik-certs had better be defined in the top-level volumes: block. The mount path /etc/gitlab/ssl had better be one that GitLab actually uses. The Omnibus config block had better reference files that exist at the path you mounted. The image tag had better be one that exists on Docker Hub. None of these are checked. They are strings. The schema YAML provides — the docker-compose schema — checks that you used the right YAML keys, and that the values are roughly the right shape, but it cannot tell you that wildcard.crt exists in the traefik-certs volume. It cannot tell you that the version in the embedded Ruby matches the version in the image tag. It cannot tell you that the runner you registered four files away expects external_url to be a different value.
YAML is a transport format. It is good at being read by machines and bad at being typed by humans. Treating it as a programming language was a mistake. Treating it as a generated artifact, on the other hand, is the right answer — and it is the answer this series will defend, page after page.
Helm templates are bash with extra steps
Helm replaces the question "is my YAML correct" with the question "is my Go template correct, and does the YAML it produces correspond to a valid manifest". The first question is unanswerable until you run helm template. The second question is unanswerable until you run kubectl apply --dry-run=server. Neither is a compile. Neither is checked when you save the file. The feedback loop is seconds longer than YAML, and the mental model is twice as complex, because now you are reasoning in two languages at once.
Helm is great for one specific job — distributing a parameterised Kubernetes app to other teams — and disastrous for the job we actually have, which is "describe the operational state of a single environment with no external consumers". Using Helm for a homelab is using a courier service to hand a note to your roommate.
Kustomize is YAML inheritance, and YAML inheritance was never a good idea
Kustomize is the most honest of the three: it admits that YAML is the format and merely tries to give you a sane way to compose files. It is also the hardest to reason about, because the result of a kustomize build depends on a tree of overlays that nobody can mentally evaluate without running the tool. The same drift we have already enumerated — between two files, between two strings, between a name and a reference — exists in Kustomize, but now it is hidden across an inheritance graph. You have replaced "find the typo in the YAML" with "find the typo in the YAML and also figure out which overlay it came from".
Terraform is for nouns, not verbs
Terraform is a different category. It types the nouns of infrastructure: VMs, networks, disks, security groups, DNS records. It gives you a great answer to "is this VM the size I asked for". It gives you no answer at all to "does this VM run the Docker version my Compose file expects" or "does this DNS record point at the IP of the load balancer my GitLab thinks it's behind". Terraform's job is provisioning. Operational state — the verbs of running a system, not the nouns of provisioning it — is not in scope, and the Terraform community is the first to tell you so.
The Ops DSL Ecosystem series makes the long version of this argument; if you have not read it, the short version is: Terraform types the nouns, Ops.Dsl types the verbs, and a homelab needs both. This series is the one that ties them together with a CLI a developer can actually run on their laptop.
What we want instead
Let us state the requirement positively. We want a tool — one tool, one CLI binary — that:
- Reads a single configuration file (
config-homelab.yaml) that is schema-validated against types declared in C#, so VSCode autocompletes every key and warns on every typo. - Generates every artifact: the Packer HCL, the Vagrantfile, the
vagrant.yaml, thedocker-compose.yaml, the Traefik static and dynamic config, thegitlab.rb, the runnerconfig.toml, the certificate files, the DNS entries. Generated, not hand-written. If you find yourself editing one of those files by hand, the tool has failed. - Talks to every binary it needs —
packer,vagrant,docker,podman,git,mkcert— through typed wrappers generated from--helpoutput, so the CLI surface is checked at compile time and exit codes turn intoResult<T>. - Stands the lab up via a deterministic pipeline of stages — Validate, Resolve, Plan, Generate, Apply, Verify — each of which can be tested in isolation, replaced by a plugin, or replayed against a recorded fixture.
- Publishes events at every meaningful step (
PackerBuildStarted,VagrantBoxAdded,ComposeStackDeployed,TlsCaGenerated, etc.) so plugins, progress reporters, audit logs, and observability can subscribe without coupling to the pipeline. - Lets plugins add new machine kinds, new compose services, new TLS providers, new DNS providers, new container engines, new sub-DSLs — all via NuGet, all without forking the core.
- Supports multiple instances on the same machine, so you can run a
dev, aprod, and anha-stagelab in parallel without collision. - Eats its own dog food: the GitLab the tool stands up is the same GitLab that hosts the tool's source code, runs the tool's CI, and ships the tool's NuGet packages. The tool builds the lab that builds the tool.
That tool is HomeLab. It does not exist yet. This series is the design.
The shape of the answer (preview)
The rest of this series will fill in the details, but the architectural commitments are simple enough to state in one diagram:
Every box in that diagram is an interface. Every interface is registered in DI via [Injectable]. Every concrete is replaceable. Every plugin slots into the pipeline without recompiling the core. Every event flows through the bus.
The result is a tool that is not built for this homelab — it is built for any homelab that fits the contributor model. Mine. Yours. The colleague's. The CI's ephemeral one. The HA staging instance. All of them. Same code, different config.
Why now
You might reasonably ask: "if this is so obvious, why hasn't anybody done it?" The honest answer is: people have, badly. There are projects called Devbox, Coolify, Dokku, Portainer, k3d, Rancher Desktop, Lima, Multipass, Lazydocker, and a dozen others that solve some slice of this problem. Most of them are either (a) opinionated about a specific stack you don't want, (b) tightly coupled to a specific cloud or runtime, or (c) good at one of the seven things HomeLab needs to do and bad at the other six.
What is new in 2026 is that the C# / .NET ecosystem has finally accumulated enough source-generator infrastructure, enough typed-DSL primitives, and enough binary-wrapper patterns that the meta-orchestrator can be written in a single language, with a single composition root, and with a single test harness. The pieces are all there:
BinaryWrappergenerates typed CLIs fordocker,podman,packer,vagrant,git,mkcertBuildergenerates async, validated, cycle-safe config buildersInjectablegenerates DI composition roots and decorator chainsFrenchExDev.Net.Dslprovides the M3 meta-metamodel that Ops.Dsl is built onVosalready wraps Vagrant with a 28-command typed backend and a data-driven VagrantfilePacker.Bundlealready produces multi-file HCL2 via a contributor pipelineDockerCompose.Bundlealready produces typed Compose YAML via buildersTraefik.Bundlealready produces typed static + dynamic configGitLab.Ci.Yamlalready produces typed.gitlab-ci.ymlfrom the official schemaQualityGatealready gives the dev-loop a tight feedback barResult,Option,Clock,Guard,FiniteStateMachine,Saga,Reactive— all the cross-cutting libraries are written
The only thing missing is the glue. The thing that takes a config-homelab.yaml, runs it through the pipeline, fires the events, calls the plugins, generates every artifact, runs every binary, and verifies the result. That glue is HomeLab. The next 55 parts of this series build it.
What this part gives you that bash doesn't
Nothing yet. This part is the indictment, not the cure. The cure starts in Part 02, where we make the case that every action in HomeLab must be a CLI command — not because CLIs are fashionable, but because CLI-first equals testable-first, and the test harness for an operations tool is the most important piece of code in the project.
For now, the contribution of this part is the indictment. Six drifts. Two work-weeks a year. Seven file formats. One bash script that nobody admits to running. The pile is real. The drift is real. The cost is real. And the next 55 parts are the receipt that proves it can be different.