Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Part 48: GPU Passthrough — The Subset That Actually Works

"GPU passthrough on a dev workstation is possible. It is also fragile. We support the subset that does not require rebuilding your kernel."


Why

Some workloads need a GPU: ML training, inference benchmarks, video transcoding, CUDA-accelerated simulations. A homelab user with an NVIDIA card on their workstation might reasonably want to expose it to a container running inside a VM. The naive approach is to install the NVIDIA driver inside the VM, mount the device, and hope. The reality is more complicated:

  • VirtualBox does not officially support GPU passthrough at all. Workarounds exist, but they involve PCI device pass-through that only works on Linux hosts with VFIO and IOMMU. Not portable.
  • Hyper-V supports GPU partitioning (Set-VMGpuPartitionAdapter) on Windows hosts only. Reasonably reliable.
  • Parallels on macOS supports limited GPU virtualisation but not raw passthrough.
  • libvirt/KVM supports VFIO passthrough on Linux. The most reliable, but requires Linux host.

The thesis of this part is: HomeLab supports GPU passthrough on the intersection of "your hypervisor supports it" and "your card is supported", with a clear failure mode when neither holds. The rest of the lab works without GPU. The user gets one config field; the contributor handles the platform-specific bits.


The shape

public sealed record GpuPassthroughSpec
{
    public bool Enabled { get; init; }
    public string? VendorId { get; init; }    // "10de" for NVIDIA
    public string? DeviceId { get; init; }    // PCI device ID
    public string Mode { get; init; } = "passthrough";   // "passthrough" | "partition"
    public string? PartitionCount { get; init; }          // for Hyper-V GPU-P
}

The config:

machines:
  - name: devlab-gpu-worker
    box: frenchexdev/alpine-3.21-gpuhost
    cpus: 8
    memory: 16384
    gpu:
      enabled: true
      vendor_id: 10de
      device_id: 2786       # RTX 4070
      mode: passthrough

The contributor that handles this lives in the GpuPassthroughContributor:

[Injectable(ServiceLifetime.Singleton)]
[Order(40)]   // after the host overlay
public sealed class GpuPassthroughContributor : IPackerBundleContributor, IMachineTypeContributor
{
    private readonly HomeLabConfig _config;
    private readonly IPlatformInfo _platform;

    public bool ShouldContribute() => _config.Machines.Any(m => m.Gpu?.Enabled == true);

    public void Contribute(PackerBundle bundle)
    {
        if (!ShouldContribute()) return;

        // 1. Add NVIDIA driver + nvidia-container-toolkit to the image
        bundle.Scripts.Add(new PackerScript("install-nvidia.sh", InstallNvidiaScript()));
        bundle.Provisioners.Add(new PackerProvisioner
        {
            Type = "shell",
            Properties = new()
            {
                ["scripts"] = new[] { "scripts/install-nvidia.sh" },
                ["execute_command"] = "{{ .Vars }} sh '{{ .Path }}'"
            }
        });
    }

    public void Contribute(VosMachine machine)
    {
        var gpu = machine.Config.Gpu;
        if (gpu is null || !gpu.Enabled) return;

        // 2. Add provider-specific Vagrant customisation
        switch (machine.Provider)
        {
            case "virtualbox":
                ConfigureVirtualBoxPassthrough(machine, gpu);
                break;
            case "hyperv":
                ConfigureHyperVPartition(machine, gpu);
                break;
            case "libvirt":
                ConfigureLibvirtPassthrough(machine, gpu);
                break;
            case "parallels":
                throw new NotSupportedException("Parallels does not support GPU passthrough");
            default:
                throw new InvalidOperationException($"GPU passthrough not implemented for {machine.Provider}");
        }
    }

    private void ConfigureVirtualBoxPassthrough(VosMachine machine, GpuPassthroughSpec gpu)
    {
        // VirtualBox PCI passthrough requires Linux host with IOMMU enabled
        if (!_platform.IsLinux)
            throw new NotSupportedException("VirtualBox GPU passthrough requires a Linux host");

        machine.VagrantCustomizations.Add(new VagrantCustomization(
            Provider: "virtualbox",
            Lines: new[]
            {
                $"v.customize ['modifyvm', :id, '--pciattach', '0a:00.0@01:00.0']",  // device:bus@bus:device
                $"v.customize ['modifyvm', :id, '--vrde', 'on']"
            }));
    }

    private void ConfigureHyperVPartition(VosMachine machine, GpuPassthroughSpec gpu)
    {
        if (!_platform.IsWindows)
            throw new NotSupportedException("Hyper-V GPU partitioning requires a Windows host");

        machine.VagrantCustomizations.Add(new VagrantCustomization(
            Provider: "hyperv",
            Lines: new[]
            {
                "h.gpu_partition_adapter = true",
                $"h.gpu_partition_count = {gpu.PartitionCount ?? "1"}"
            }));
    }

    private void ConfigureLibvirtPassthrough(VosMachine machine, GpuPassthroughSpec gpu)
    {
        machine.VagrantCustomizations.Add(new VagrantCustomization(
            Provider: "libvirt",
            Lines: new[]
            {
                $"l.pci :bus => '0x0a', :slot => '0x00', :function => '0x0'"
            }));
    }

    private string InstallNvidiaScript() => """
        #!/bin/sh
        set -eux

        # Alpine has nvidia-driver-560 in testing repo (as of mid-2025)
        echo 'https://dl-cdn.alpinelinux.org/alpine/edge/testing' >> /etc/apk/repositories
        apk update
        apk add --no-cache nvidia-driver nvidia-utils nvidia-container-toolkit

        # Configure docker to use the nvidia runtime
        cat > /etc/docker/daemon.json.gpu <<'EOF'
        {
          "runtimes": {
            "nvidia": {
              "path": "/usr/bin/nvidia-container-runtime",
              "runtimeArgs": []
            }
          },
          "default-runtime": "nvidia"
        }
        EOF

        # Merge with existing daemon.json
        jq -s '.[0] * .[1]' /etc/docker/daemon.json /etc/docker/daemon.json.gpu > /etc/docker/daemon.json.tmp
        mv /etc/docker/daemon.json.tmp /etc/docker/daemon.json
        rm /etc/docker/daemon.json.gpu

        service docker restart

        # Verify
        nvidia-smi || echo 'nvidia-smi not available — GPU not actually passed through?'
        """;
}

The contributor implements two role interfaces: it touches both the Packer bundle (to install the driver) and the machine config (to add the provider-specific Vagrant customisation).


Exposing the GPU to a compose service

A service that wants the GPU declares it in its compose contributor:

public void Contribute(ComposeFile compose)
{
    compose.Services["ml-worker"] = new ComposeService
    {
        Image = "nvidia/cuda:12.4.0-base-ubuntu22.04",
        Restart = "always",
        Deploy = new ComposeDeploy
        {
            Resources = new ComposeResources
            {
                Reservations = new ComposeReservations
                {
                    Devices = new[]
                    {
                        new ComposeDevice
                        {
                            Driver = "nvidia",
                            Count = 1,
                            Capabilities = new[] { "gpu" }
                        }
                    }
                }
            }
        }
    };
}

Docker Compose v2 reads deploy.resources.reservations.devices and asks the nvidia container runtime to expose the GPU. Inside the container, nvidia-smi works.


Failure modes

GPU passthrough has many failure modes. The contributor surfaces them as Result.Failure rather than crashing:

  • No NVIDIA card on the host: detected by checking for /dev/nvidia0 (Linux) or the GPU device list (Windows). Failure with "no NVIDIA GPU detected on host".
  • Host hypervisor does not support passthrough: detected by the platform / provider combination. Failure with "VirtualBox GPU passthrough requires Linux host".
  • IOMMU not enabled: detected by checking /sys/kernel/iommu_groups/. Failure with "IOMMU is not enabled in your BIOS".
  • Driver mismatch: the container's CUDA version must match the host's driver. The contributor warns if the major versions differ.

Each failure is a clear message at validation time, before anyone tries to boot the VM.


The test

[Fact]
public void contributor_skips_if_no_machine_has_gpu_enabled()
{
    var bundle = new PackerBundle();
    var c = new GpuPassthroughContributor(
        Options.Create(new HomeLabConfig { Machines = new() { new() { Gpu = new() { Enabled = false } } } }),
        new FakePlatformInfo(IsLinux: true));

    c.Contribute(bundle);

    bundle.Scripts.Should().NotContain(s => s.FileName.Contains("nvidia"));
}

[Fact]
public void contributor_throws_when_virtualbox_passthrough_requested_on_macos_host()
{
    var c = new GpuPassthroughContributor(
        Options.Create(new HomeLabConfig
        {
            Machines = new() { new() { Provider = "virtualbox", Gpu = new() { Enabled = true } } }
        }),
        new FakePlatformInfo(IsMacOs: true));

    var machine = new VosMachine { Provider = "virtualbox", Config = new() { Gpu = new() { Enabled = true } } };

    Action act = () => c.Contribute(machine);
    act.Should().Throw<NotSupportedException>().WithMessage("*requires a Linux host*");
}

[Fact]
public void hyperv_partition_emits_correct_vagrant_customization()
{
    var c = new GpuPassthroughContributor(
        Options.Create(StandardConfigWithGpu()),
        new FakePlatformInfo(IsWindows: true));

    var machine = new VosMachine { Provider = "hyperv", Config = new() { Gpu = new() { Enabled = true, PartitionCount = "2" } } };
    c.Contribute(machine);

    machine.VagrantCustomizations.Should().ContainSingle();
    machine.VagrantCustomizations[0].Lines.Should().Contain("h.gpu_partition_adapter = true");
    machine.VagrantCustomizations[0].Lines.Should().Contain("h.gpu_partition_count = 2");
}

What this gives you that bash doesn't

A bash script that "passes through a GPU" is a stack-overflow-cobbled snippet that works on the author's machine and breaks on every other configuration. There is no test. There is no platform check. There is no clear error.

A typed GpuPassthroughContributor gives you, for the same surface area:

  • Provider-aware behaviour (VirtualBox / Hyper-V / libvirt / Parallels)
  • Platform validation (Windows for Hyper-V, Linux for libvirt and VBox passthrough)
  • NVIDIA driver installation in the Packer image
  • nvidia-container-runtime registration in the Docker daemon
  • Compose service GPU reservation via the Docker Compose v2 deploy syntax
  • Clear failure messages for unsupported combinations

The bargain pays back the first time you run an ML training container inside a HomeLab VM and nvidia-smi works.


End of Act VII

We have now added every operational concern HomeLab promised: secrets, observability, backup with restore tests, multi-host scheduling, cost tracking, and GPU passthrough. With these in hand, DevLab is not just running — it is operable, with the depth a security-conscious SRE would actually accept.

Act VIII covers the day-2 operations and the variants: how to upgrade DevLab, how to run it on Podman, how to run multiple instances side-by-side, and how to tear it all down cleanly when you are done.


⬇ Download