VFIO tips and tricks: July 2016

Hey folks, it feels like it's time to mention that assignment of Intel graphics devices (IGD) is currently available in qemu.git and will be part of the upcoming QEMU 2.7 release. There's already pretty thorough documentation of the modes available in the source tree, please give it a read. There are two modes described there, "legacy" and "Universal Passthrough" (UPT), each have their pros and cons. Which ones are available to you depends on your hardware. UPT mode is only available for Broadwell and newer processors while legacy mode is available all the way back through SandyBridge. If you have a processor older than SandyBridge, stop now, this is not going to work for you. If you don't know what any of these strange names mean, head to Wikipedia and Ark to figure it out.

The high level overview is that "legacy" mode is much like our GeForce support, the IGD is meant to be the primary and exclusive graphics in the VM. Additionally the IGD address in the VM must be at PCI 00:02.0, only Seabios is currently supported, only the 440FX chipset model is supported (no Q35), the IGD device must be the primary host graphics device, and the host needs to be running kernel v4.6 or newer. Clearly assigning the host primary graphics is a bit of an about-face for our GPU assignment strategy, but we depend on running the IGD video ROM, which depends on VGA and imposes most of the above requirements as well (oh add CONFIG_VFIO_PCI_VGA to the requirements list). I have yet to see an IGD ROM with UEFI support, which is why OVMF is not yet supported, but seems possible to support with a CSM and some additional code in OVMF.

Legacy mode should work with both Linux and Windows guests (and hopefully others if you're so inclined). The i915 driver does suffer from the typical video driver problem that sometimes the whole system explodes (not literally) when unbinding or re-binding the IGD to the driver. Personally I avoid this by blacklisting the i915 driver. Of course as some have found out trying to do this with discrete GPUs, there are plenty of other drivers ready to jump on the device to keep the console working. The primary ones I've seen are vesafb and efifb, which one is used on your system depends on your host firmware settings, legacy BIOS vs UEFI respectively. To disable these, simply add video=vesafb:off or video=efifb:off to the kernel command line (not sure which to use? try both, video=vesafb:off,efifb:off). The first thing you'll notice when you boot an IGD system with i915 blacklisted and the more basic framebuffer drivers disabled is that you don't get anything on the graphics head after grub. Plan for this. I use a serial console, but perhaps you're more comfortable running blind and willing to hope the system boots and you can ssh into it remotely.

If you've followed along with this procedure, you should be able to simply create a <hostdev> entry in your libvirt XML, which ought to look something like this:

</source>

</hostdev>

Again, assigning the IGD device (which is always 00:02.0) to address 00:02.0 in the VM is required. Delete the <video> and <graphics> sections and everything should just magically work. Caveat emptor, my newest CPU is Broadwell, I've been told this works with Skylake, but IGD is hardly standardized and each new implementation seems to tweak things just a bit.

Some of you are probably also curious why this doesn't work on Q35, which leads into the discussion of UPT mode; IGD clearly is not a discrete GPU, but "integrated" not only means that the GPU is embedded in the system, in this case it means that the GPU is kind of smeared across the system. This is why IGD assignment hasn't "just worked" and why you need a host kernel with support for exposing certain regions through vfio and a BIOS that's aware of IGD, and it needs to be at a specific address, etc, etc, etc. One of those requirements is that the video ROM actually also cares about a few properties of the device at PCI address 00:1f.0, the ISA/LPC bridge. Q35 includes its own bridge at that location and we cannot simply modify the IDs of that bridge for compatibility reasons. Therefore that bridge being an implicit part of Q35 means that IGD assignment doesn't work on Q35. This also means that PCI address 00:1f.0 is not available for use in a 440FX machine.

Ok, so UPT. Intel has known for a while that the sprawl of IGD has made it difficult to deal with for device assignment. To combat this, both software and hardware changes have been made that help to consolidate IGD to be more assignment-friendly. Great news, right? Well sort of. First off, in UPT mode the IGD is meant to be a secondary graphics device in the VM, there's no VGA mode support (oh, BTW, x-vga=on is automatically added by QEMU in legacy mode). In fact, um, there's no output support of any kind by default in UPT mode. How's this useful you ask, well between the emulated graphics and IGD you can setup mirroring so you actually have a remote-capable, hardware accelerated graphics VM. Plus, if you add the option x-igd-opregion=on to the vfio-pci device, you can get output to a physical display, but there again you're going to need the host running kernel v4.6 or newer and the upcoming QEMU 2.7 support, while no-output UPT has probably actually worked for quite a while. UPT mode has no requirements for the IGD PCI address, but note that most VM firmare, SeaBIOS or OVMF, will define the primary graphics as the one having the lowest PCI address. Usually not a problem, but some of you create some crazy configs. You'll also still need to do all the blacklisting and video disabling above, or just risk binding and unbinding i915 from the host, gambling each time whether it'll explode.

So UPT sounds great except why is this opregion thing optional? Well, it turns out that if you want to do that cool mirroring thing I mention above and a physical output is enabled with the opregion, you actually need to have a monitor attached to the device or else your apps don't get any hardware acceleration love. Whereas if IGD doesn't know about any outputs, it's happy to apply hardware acceleration regardless of what's physically connected. Sucks, but readers here should already know how to create wrapper scrips to add this extra option if they want it (similar to x-vga=on). I don't think Intel really wants to support this hacky hybrid mode either, thus the experimental x- option prefix tag.

Oh, one more gotcha for UPT mode, Intel seems to expect otherwise, but I've had zero success trying to run Linux guests with UPT. Just go ahead and assume this is for your Windows guests only at this point.

What else... laptop displays should work, I believe switching outputs even works, but working on laptops is rather inconvenient since you're unlikely to have a serial console available. Also note that while you can use input-linux to attach a laptop keyboard and mouse (not trackpad IME), I don't know how to make the hotkeys work, so that's a bummer. Some IGD devices will generate DMAR error spew on the host when assigned, particularly the first time per host boot. Don't be too alarmed by this, especially if it stops before the display is initialized. This seems to be caused by resetting the IGD in an IOMMU context where it can't access its page tables setup by the BIOS/host. Unless you have an ongoing spew of these, they can probably be ignored. If you have something older than SandyBridge that you wish you could use this with and continued reading even after told to stop, sorry, there was a hardware change at SandyBridge and I don't have anything older to test with and don't really want to support additional code for such outdated hardware. Besides, those are pretty old and you need an excuse for an upgrade anyway.

With this support I've switched my desktop system so that the host actually runs from a USB stick and the previous bare-metal Fedora install is virtualized with IGD, running alongside my existing GeForce VM. Give it a try and good luck.

VFIO tips and tricks

Friday, July 15, 2016

Intel Graphics assignment