Part 48: GPU Passthrough — The Subset That Actually Works

"GPU passthrough on a dev workstation is possible. It is also fragile. We support the subset that does not require rebuilding your kernel."

Why

Some workloads need a GPU: ML training, inference benchmarks, video transcoding, CUDA-accelerated simulations. A homelab user with an NVIDIA card on their workstation might reasonably want to expose it to a container running inside a VM. The naive approach is to install the NVIDIA driver inside the VM, mount the device, and hope. The reality is more complicated:

VirtualBox does not officially support GPU passthrough at all. Workarounds exist, but they involve PCI device pass-through that only works on Linux hosts with VFIO and IOMMU. Not portable.
Hyper-V supports GPU partitioning (Set-VMGpuPartitionAdapter) on Windows hosts only. Reasonably reliable.
Parallels on macOS supports limited GPU virtualisation but not raw passthrough.
libvirt/KVM supports VFIO passthrough on Linux. The most reliable, but requires Linux host.

The thesis of this part is: HomeLab supports GPU passthrough on the intersection of "your hypervisor supports it" and "your card is supported", with a clear failure mode when neither holds. The rest of the lab works without GPU. The user gets one config field; the contributor handles the platform-specific bits.

The shape

public sealed record GpuPassthroughSpec
{
    public bool Enabled { get; init; }
    public string? VendorId { get; init; }    // "10de" for NVIDIA
    public string? DeviceId { get; init; }    // PCI device ID
    public string Mode { get; init; } = "passthrough";   // "passthrough" | "partition"
    public string? PartitionCount { get; init; }          // for Hyper-V GPU-P
}

The config:

machines:
  - name: devlab-gpu-worker
    box: frenchexdev/alpine-3.21-gpuhost
    cpus: 8
    memory: 16384
    gpu:
      enabled: true
      vendor_id: 10de
      device_id: 2786       # RTX 4070
      mode: passthrough

The contributor that handles this lives in the GpuPassthroughContributor:

[Injectable(ServiceLifetime.Singleton)]
[Order(40)]   // after the host overlay
public sealed class GpuPassthroughContributor : IPackerBundleContributor, IMachineTypeContributor
{
    private readonly HomeLabConfig _config;
    private readonly IPlatformInfo _platform;

    public bool ShouldContribute() => _config.Machines.Any(m => m.Gpu?.Enabled == true);

    public void Contribute(PackerBundle bundle)
    {
        if (!ShouldContribute()) return;

        // 1. Add NVIDIA driver + nvidia-container-toolkit to the image
        bundle.Scripts.Add(new PackerScript("install-nvidia.sh", InstallNvidiaScript()));
        bundle.Provisioners.Add(new PackerProvisioner
        {
            Type = "shell",
            Properties = new()
            {
                ["scripts"] = new[] { "scripts/install-nvidia.sh" },
                ["execute_command"] = "{{ .Vars }} sh '{{ .Path }}'"
            }
        });
    }

    public void Contribute(VosMachine machine)
    {
        var gpu = machine.Config.Gpu;
        if (gpu is null || !gpu.Enabled) return;

        // 2. Add provider-specific Vagrant customisation
        switch (machine.Provider)
        {
            case "virtualbox":
                ConfigureVirtualBoxPassthrough(machine, gpu);
                break;
            case "hyperv":
                ConfigureHyperVPartition(machine, gpu);
                break;
            case "libvirt":
                ConfigureLibvirtPassthrough(machine, gpu);
                break;
            case "parallels":
                throw new NotSupportedException("Parallels does not support GPU passthrough");
            default:
                throw new InvalidOperationException($"GPU passthrough not implemented for {machine.Provider}");
        }
    }

    private void ConfigureVirtualBoxPassthrough(VosMachine machine, GpuPassthroughSpec gpu)
    {
        // VirtualBox PCI passthrough requires Linux host with IOMMU enabled
        if (!_platform.IsLinux)
            throw new NotSupportedException("VirtualBox GPU passthrough requires a Linux host");

        machine.VagrantCustomizations.Add(new VagrantCustomization(
            Provider: "virtualbox",
            Lines: new[]
            {
                $"v.customize ['modifyvm', :id, '--pciattach', '0a:00.0@01:00.0']",  // device:bus@bus:device
                $"v.customize ['modifyvm', :id, '--vrde', 'on']"
            }));
    }

    private void ConfigureHyperVPartition(VosMachine machine, GpuPassthroughSpec gpu)
    {
        if (!_platform.IsWindows)
            throw new NotSupportedException("Hyper-V GPU partitioning requires a Windows host");

        machine.VagrantCustomizations.Add(new VagrantCustomization(
            Provider: "hyperv",
            Lines: new[]
            {
                "h.gpu_partition_adapter = true",
                $"h.gpu_partition_count = {gpu.PartitionCount ?? "1"}"
            }));
    }

    private void ConfigureLibvirtPassthrough(VosMachine machine, GpuPassthroughSpec gpu)
    {
        machine.VagrantCustomizations.Add(new VagrantCustomization(
            Provider: "libvirt",
            Lines: new[]
            {
                $"l.pci :bus => '0x0a', :slot => '0x00', :function => '0x0'"
            }));
    }

    private string InstallNvidiaScript() => """
        #!/bin/sh
        set -eux

        # Alpine has nvidia-driver-560 in testing repo (as of mid-2025)
        echo 'https://dl-cdn.alpinelinux.org/alpine/edge/testing' >> /etc/apk/repositories
        apk update
        apk add --no-cache nvidia-driver nvidia-utils nvidia-container-toolkit

        # Configure docker to use the nvidia runtime
        cat > /etc/docker/daemon.json.gpu <<'EOF'
        {
          "runtimes": {
            "nvidia": {
              "path": "/usr/bin/nvidia-container-runtime",
              "runtimeArgs": []
            }
          },
          "default-runtime": "nvidia"
        }
        EOF

        # Merge with existing daemon.json
        jq -s '.[0] * .[1]' /etc/docker/daemon.json /etc/docker/daemon.json.gpu > /etc/docker/daemon.json.tmp
        mv /etc/docker/daemon.json.tmp /etc/docker/daemon.json
        rm /etc/docker/daemon.json.gpu

        service docker restart

        # Verify
        nvidia-smi || echo 'nvidia-smi not available — GPU not actually passed through?'
        """;
}

The contributor implements two role interfaces: it touches both the Packer bundle (to install the driver) and the machine config (to add the provider-specific Vagrant customisation).

Exposing the GPU to a compose service

A service that wants the GPU declares it in its compose contributor:

public void Contribute(ComposeFile compose)
{
    compose.Services["ml-worker"] = new ComposeService
    {
        Image = "nvidia/cuda:12.4.0-base-ubuntu22.04",
        Restart = "always",
        Deploy = new ComposeDeploy
        {
            Resources = new ComposeResources
            {
                Reservations = new ComposeReservations
                {
                    Devices = new[]
                    {
                        new ComposeDevice
                        {
                            Driver = "nvidia",
                            Count = 1,
                            Capabilities = new[] { "gpu" }
                        }
                    }
                }
            }
        }
    };
}

Docker Compose v2 reads deploy.resources.reservations.devices and asks the nvidia container runtime to expose the GPU. Inside the container, nvidia-smi works.

Failure modes

GPU passthrough has many failure modes. The contributor surfaces them as Result.Failure rather than crashing:

No NVIDIA card on the host: detected by checking for /dev/nvidia0 (Linux) or the GPU device list (Windows). Failure with "no NVIDIA GPU detected on host".
Host hypervisor does not support passthrough: detected by the platform / provider combination. Failure with "VirtualBox GPU passthrough requires Linux host".
IOMMU not enabled: detected by checking /sys/kernel/iommu_groups/. Failure with "IOMMU is not enabled in your BIOS".
Driver mismatch: the container's CUDA version must match the host's driver. The contributor warns if the major versions differ.

Each failure is a clear message at validation time, before anyone tries to boot the VM.

The test

[Fact]
public void contributor_skips_if_no_machine_has_gpu_enabled()
{
    var bundle = new PackerBundle();
    var c = new GpuPassthroughContributor(
        Options.Create(new HomeLabConfig { Machines = new() { new() { Gpu = new() { Enabled = false } } } }),
        new FakePlatformInfo(IsLinux: true));

    c.Contribute(bundle);

    bundle.Scripts.Should().NotContain(s => s.FileName.Contains("nvidia"));
}

[Fact]
public void contributor_throws_when_virtualbox_passthrough_requested_on_macos_host()
{
    var c = new GpuPassthroughContributor(
        Options.Create(new HomeLabConfig
        {
            Machines = new() { new() { Provider = "virtualbox", Gpu = new() { Enabled = true } } }
        }),
        new FakePlatformInfo(IsMacOs: true));

    var machine = new VosMachine { Provider = "virtualbox", Config = new() { Gpu = new() { Enabled = true } } };

    Action act = () => c.Contribute(machine);
    act.Should().Throw<NotSupportedException>().WithMessage("*requires a Linux host*");
}

[Fact]
public void hyperv_partition_emits_correct_vagrant_customization()
{
    var c = new GpuPassthroughContributor(
        Options.Create(StandardConfigWithGpu()),
        new FakePlatformInfo(IsWindows: true));

    var machine = new VosMachine { Provider = "hyperv", Config = new() { Gpu = new() { Enabled = true, PartitionCount = "2" } } };
    c.Contribute(machine);

    machine.VagrantCustomizations.Should().ContainSingle();
    machine.VagrantCustomizations[0].Lines.Should().Contain("h.gpu_partition_adapter = true");
    machine.VagrantCustomizations[0].Lines.Should().Contain("h.gpu_partition_count = 2");
}

What this gives you that bash doesn't

A bash script that "passes through a GPU" is a stack-overflow-cobbled snippet that works on the author's machine and breaks on every other configuration. There is no test. There is no platform check. There is no clear error.

A typed GpuPassthroughContributor gives you, for the same surface area:

Provider-aware behaviour (VirtualBox / Hyper-V / libvirt / Parallels)
Platform validation (Windows for Hyper-V, Linux for libvirt and VBox passthrough)
NVIDIA driver installation in the Packer image
nvidia-container-runtime registration in the Docker daemon
Compose service GPU reservation via the Docker Compose v2 deploy syntax
Clear failure messages for unsupported combinations

The bargain pays back the first time you run an ML training container inside a HomeLab VM and nvidia-smi works.

End of Act VII

We have now added every operational concern HomeLab promised: secrets, observability, backup with restore tests, multi-host scheduling, cost tracking, and GPU passthrough. With these in hand, DevLab is not just running — it is operable, with the depth a security-conscious SRE would actually accept.

Act VIII covers the day-2 operations and the variants: how to upgrade DevLab, how to run it on Podman, how to run multiple instances side-by-side, and how to tear it all down cleanly when you are done.

`[` or `Alt+S`	Focus sidebar navigation
`]` or `Alt+C`	Focus main content
`↑` `↓`	Navigate between sidebar items
`Enter`	Open page / toggle section
`Space`	Toggle section expand/collapse
`Escape`	Close overlay / sidebar

`Ctrl+K`	Open search
`?`	Show this help

`Ctrl+=` or `Ctrl+↑`	Increase font size
`Ctrl+−` or `Ctrl+↓`	Decrease font size
`f`	Open console font selector

`Ctrl+⇧+=` or `Ctrl+⇧+↑`	Browser zoom in
`Ctrl+⇧+−` or `Ctrl+⇧+↓`	Browser zoom out
`Ctrl+⇧+0`	Reset browser zoom

`Tab`	Focus a diagram or image
`Enter`	Open full size overlay
`+` `−`	Zoom in / out (in overlay)
`Escape`	Close overlay, return focus

Part 48: GPU Passthrough — The Subset That Actually Works📋

Why📋

The shape📋

Exposing the GPU to a compose service📋

Failure modes📋

The test📋

What this gives you that bash doesn't📋

End of Act VII📋

Cross-links📋