Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Part 46: Multi-Host Scheduling — N VMs Across M Machines

"One workstation eventually runs out of cores. Two workstations should not require two HomeLabs."


Why

DevLab's HA topology has ~10 VMs. A modest workstation (32 GB of RAM, 8 cores) can host them, but it is uncomfortable. Two workstations each with the same hardware can host them comfortably, if you can split the lab across them.

The naive approach is to run two HomeLabs, one on each workstation, and call them different lab names. That works for parallel labs (see Part 51) but not for one lab spread across two machines. For one lab, the VMs need to know about each other, the routing has to cross hosts, and the user wants to run one homelab vos up and have it provision VMs on both machines.

The thesis of this part is: HomeLab supports multi-host scheduling via an IRemoteHost plugin contract, an hosts.yaml registry, and SSH-based remote Vagrant invocation. The compose contributors do not change. The Vagrant data file gains a host field per VM. The pipeline routes each VM's lifecycle commands to the right host.


The shape

public interface IRemoteHost
{
    string Name { get; }
    string Address { get; }
    Task<Result> RunCommandAsync(string command, string[] args, CancellationToken ct);
    Task<Result> CopyFileAsync(string localPath, string remotePath, CancellationToken ct);
    Task<Result<string>> ExecVagrantAsync(string vagrantCommand, string machineName, CancellationToken ct);
}

[Injectable(ServiceLifetime.Singleton)]
public sealed class SshRemoteHost : IRemoteHost
{
    public string Name { get; }
    public string Address { get; }
    private readonly SshClient _ssh;
    private readonly ScpClient _scp;
    private readonly string _workdir;

    public async Task<Result> RunCommandAsync(string command, string[] args, CancellationToken ct)
    {
        var cmdLine = $"cd {_workdir} && {command} {string.Join(' ', args)}";
        var result = await _ssh.RunAsync(cmdLine, ct);
        return result.ExitCode == 0
            ? Result.Success()
            : Result.Failure($"remote {Name}: {result.Stderr}");
    }

    public async Task<Result> CopyFileAsync(string localPath, string remotePath, CancellationToken ct)
        => await _scp.CopyAsync(localPath, remotePath, ct);

    public async Task<Result<string>> ExecVagrantAsync(string vagrantCommand, string machineName, CancellationToken ct)
    {
        var args = string.IsNullOrEmpty(machineName)
            ? new[] { vagrantCommand }
            : new[] { vagrantCommand, machineName };
        return await _ssh.RunAsync($"cd {_workdir} && vagrant " + string.Join(' ', args), ct);
    }
}

The SshRemoteHost wraps an SSH connection. It can run arbitrary commands on the host, copy files (the Vagrantfile, the data YAML, the secrets, the compose files, the certs), and run Vagrant. It assumes the remote host has Vagrant installed and a working hypervisor.


The hosts.yaml registry

# hosts.yaml — declares the physical hosts available to this HomeLab instance
hosts:
  - name: laptop
    address: 127.0.0.1                   # the local machine, no SSH
    capacity:
      cpus: 8
      memory_mb: 32768
    placements:
      - gateway      # which VM roles can run here
      - data
      - obs

  - name: workstation
    address: workstation.local          # SSH target (resolved via mDNS or hosts file)
    ssh_user: ops
    ssh_key: ~/.ssh/id_ed25519
    workdir: /home/ops/devlab
    capacity:
      cpus: 16
      memory_mb: 65536
    placements:
      - platform                        # the heavy VM lives here
      - rails-1
      - rails-2
      - gitaly-1
      - gitaly-2
      - gitaly-3

The registry tells HomeLab which physical machines are available, what each can host, and how to reach them. The user maintains it once per HomeLab instance.


Placement: which VM goes where

The topology resolver (from Part 30) gains a placement step that consults the hosts registry:

[Injectable(ServiceLifetime.Singleton)]
public sealed class HostPlacementResolver : IHostPlacementResolver
{
    private readonly IHostsRegistry _hosts;

    public IReadOnlyDictionary<string, string> Resolve(IReadOnlyList<VosMachineConfig> machines)
    {
        var placements = new Dictionary<string, string>();
        var hostsByName = _hosts.GetAll().ToDictionary(h => h.Name);
        var hostUsage = hostsByName.ToDictionary(kv => kv.Key, _ => new HostUsage());

        // First-fit decreasing: sort machines by memory descending, place each on the host
        // that (a) accepts the role and (b) still has capacity
        var ordered = machines.OrderByDescending(m => m.Memory);

        foreach (var m in ordered)
        {
            var candidates = hostsByName.Values
                .Where(h => h.Placements.Contains(m.Role))
                .Where(h => hostUsage[h.Name].CanFit(m))
                .OrderBy(h => hostUsage[h.Name].PercentUsed())  // least used first
                .ToList();

            if (candidates.Count == 0)
                throw new InvalidOperationException($"no host can place machine {m.Name} (role {m.Role}, {m.Cpus} cpu, {m.Memory} MB)");

            var selected = candidates.First();
            placements[m.Name] = selected.Name;
            hostUsage[selected.Name].Add(m);
        }

        return placements;
    }
}

The algorithm is first-fit-decreasing: sort machines by RAM descending, place each on the least-used host that accepts the role and has capacity. It is not optimal but it is deterministic and easy to reason about.

The resolver runs in the Plan stage. The output is a Dictionary<string, string> (machine name → host name) that the Apply stage uses to route Vagrant commands.


The Apply stage with multi-host

public async Task<Result<HomeLabContext>> RunAsync(HomeLabContext ctx, CancellationToken ct)
{
    foreach (var machine in ctx.Plan!.Machines)
    {
        var hostName = ctx.Plan.Placements[machine.Name];
        var host = _hosts.Get(hostName);

        // 1. Sync the box file (if not already on the remote host)
        if (host.Name != "laptop")
            await SyncBoxIfNeededAsync(host, machine, ct);

        // 2. Sync the working directory (Vagrantfile, config-vos.yaml, secrets, certs, compose files)
        if (host.Name != "laptop")
            await SyncWorkdirAsync(host, machine, ctx.Request.OutputDir, ct);

        // 3. Run vagrant up on the remote host
        var upResult = await host.ExecVagrantAsync("up", machine.Name, ct);
        if (upResult.IsFailure) return upResult.Map<HomeLabContext>();

        await _events.PublishAsync(new VosUpCompleted(machine.Name, /* duration */ default, _clock.UtcNow), ct);
    }

    return Result.Success(ctx);
}

The same vagrant up runs on whichever host the placement resolver chose. The local machine is host.Name == "laptop" and skips the SSH/SCP step. Remote hosts get the working directory synced via SCP, then Vagrant runs over SSH.


Box sync

The .box file is large (1–2 GB). We do not want to SCP it on every vos up. The first run syncs it; subsequent runs check the remote box list and skip if the version is already present:

private async Task<Result> SyncBoxIfNeededAsync(IRemoteHost host, VosMachineConfig machine, CancellationToken ct)
{
    var listResult = await host.RunCommandAsync("vagrant", new[] { "box", "list" }, ct);
    if (listResult.IsFailure) return listResult;

    if (listResult.Value.Contains($"{machine.Box} ({machine.Provider}, {machine.BoxVersion})"))
        return Result.Success();   // already there, skip

    // Compute the local box file path
    var localBox = $"./packer/output-vagrant/{machine.Box.Replace('/', '-')}-{machine.Provider}.box";

    // Copy + add
    var remoteBox = $"{host.Workdir}/boxes/{Path.GetFileName(localBox)}";
    await host.CopyFileAsync(localBox, remoteBox, ct);
    return await host.RunCommandAsync("vagrant", new[] { "box", "add", "--name", machine.Box, "--box-version", machine.BoxVersion ?? "1.0.0", remoteBox }, ct).Map();
}

For users who run a Vagrant box registry inside DevLab (which they probably do, see Part 40), the remote host can be configured to fetch from the registry instead of receiving the box via SCP — eliminating the bandwidth cost entirely.


Cross-host networking

Multi-host topology has a wrinkle: the Vagrant private network is per-host. A VM on laptop cannot directly reach a VM on workstation via 192.168.56.x because each host has its own VirtualBox host-only adapter on a different bridge.

The fix is bridged networking for VMs that need cross-host reachability. The gateway VM on laptop and the platform VM on workstation get bridged network interfaces in addition to their private ones. Traefik routes traffic between them via the bridged IPs. PiHole on gateway returns those IPs for the relevant hostnames.

# Vos data file generated for multi-host
machines:
  - name: devlab-gateway
    box: frenchexdev/alpine-3.21-dockerhost
    networks:
      - type: private_network
        ip: 192.168.56.10               # local-only, for sibling VMs on the same host
      - type: public_network
        bridge: en0                     # cross-host, gets an IP from the LAN DHCP

The bridged interface is opt-in per machine. The placement resolver knows which machines need cross-host reachability (from the topology DAG: any machine whose Ops.Deployment dependency lives on a different host) and adds the bridge automatically.


The test

[Fact]
public void placement_resolver_packs_machines_into_smallest_set_of_hosts()
{
    var hosts = new[]
    {
        new HostConfig { Name = "a", Capacity = new() { Cpus = 8, MemoryMb = 16384 }, Placements = new[] { "gateway", "data" } },
        new HostConfig { Name = "b", Capacity = new() { Cpus = 16, MemoryMb = 65536 }, Placements = new[] { "platform" } }
    };
    var machines = new[]
    {
        new VosMachineConfig { Name = "devlab-gateway", Role = "gateway", Cpus = 2, Memory = 1024 },
        new VosMachineConfig { Name = "devlab-platform", Role = "platform", Cpus = 4, Memory = 8192 },
        new VosMachineConfig { Name = "devlab-data", Role = "data", Cpus = 2, Memory = 4096 }
    };

    var resolver = new HostPlacementResolver(new InMemoryHostsRegistry(hosts));
    var placements = resolver.Resolve(machines);

    placements["devlab-gateway"].Should().Be("a");
    placements["devlab-data"].Should().Be("a");
    placements["devlab-platform"].Should().Be("b");
}

[Fact]
public void placement_resolver_throws_when_no_host_accepts_role()
{
    var hosts = new[] { new HostConfig { Name = "a", Placements = new[] { "gateway" } } };
    var machines = new[] { new VosMachineConfig { Name = "x", Role = "platform", Cpus = 1, Memory = 256 } };
    var resolver = new HostPlacementResolver(new InMemoryHostsRegistry(hosts));

    Action act = () => resolver.Resolve(machines);
    act.Should().Throw<InvalidOperationException>().WithMessage("*no host can place*");
}

[Fact]
public async Task ssh_remote_host_runs_vagrant_via_ssh()
{
    var ssh = new ScriptedSshClient();
    ssh.OnRun("cd /home/ops/devlab && vagrant up devlab-platform", exitCode: 0, stdout: "Bringing machine 'devlab-platform' up");
    var host = new SshRemoteHost("workstation", "workstation.local", ssh, scp: null!, "/home/ops/devlab");

    var result = await host.ExecVagrantAsync("up", "devlab-platform", default);

    result.IsSuccess.Should().BeTrue();
    result.Value.Should().Contain("Bringing machine 'devlab-platform' up");
}

What this gives you that bash doesn't

A bash script that runs Vagrant on two machines is two bash scripts with ssh user@host vagrant up plus a hard-coded list of which machine goes where, plus a third script that copies the box files in advance, plus a fourth that updates the IP allowlists when the LAN DHCP changes.

A typed multi-host story with IRemoteHost, hosts.yaml, and a placement resolver gives you, for the same surface area:

  • One config field (hosts:) declaring available machines
  • A placement resolver that fits VMs onto hosts deterministically
  • An SSH-based IRemoteHost that handles the box sync, the workdir sync, and the Vagrant invocation
  • Bridged networking added automatically when cross-host reachability is needed
  • Tests for the placement algorithm and the SSH wrapper
  • The same homelab vos up for single-host and multi-host

The bargain pays back the first time you run a 12-VM HA DevLab across two laptops without anyone manually editing a Vagrantfile.


⬇ Download