Friday, July 15, 2016

Intel Graphics assignment

Hey folks, it feels like it's time to mention that assignment of Intel graphics devices (IGD) is currently available in qemu.git and will be part of the upcoming QEMU 2.7 release.  There's already pretty thorough documentation of the modes available in the source tree, please give it a read.  There are two modes described there, "legacy" and "Universal Passthrough" (UPT), each have their pros and cons.  Which ones are available to you depends on your hardware.  UPT mode is only available for Broadwell and newer processors while legacy mode is available all the way back through SandyBridge.  If you have a processor older than SandyBridge, stop now, this is not going to work for you.  If you don't know what any of these strange names mean, head to Wikipedia and Ark to figure it out.

The high level overview is that "legacy" mode is much like our GeForce support, the IGD is meant to be the primary and exclusive graphics in the VM.  Additionally the IGD address in the VM must be at PCI 00:02.0, only Seabios is currently supported, only the 440FX chipset model is supported (no Q35), the IGD device must be the primary host graphics device, and the host needs to be running kernel v4.6 or newer.  Clearly assigning the host primary graphics is a bit of an about-face for our GPU assignment strategy, but we depend on running the IGD video ROM, which depends on VGA and imposes most of the above requirements as well (oh add CONFIG_VFIO_PCI_VGA to the requirements list).  I have yet to see an IGD ROM with UEFI support, which is why OVMF is not yet supported, but seems possible to support with a CSM and some additional code in OVMF.

Legacy mode should work with both Linux and Windows guests (and hopefully others if you're so inclined).  The i915 driver does suffer from the typical video driver problem that sometimes the whole system explodes (not literally) when unbinding or re-binding the IGD to the driver.  Personally I avoid this by blacklisting the i915 driver.  Of course as some have found out trying to do this with discrete GPUs, there are plenty of other drivers ready to jump on the device to keep the console working.  The primary ones I've seen are vesafb and efifb, which one is used on your system depends on your host firmware settings, legacy BIOS vs UEFI respectively.  To disable these, simply add video=vesafb:off or video=efifb:off to the kernel command line (not sure which to use?  try both, video=vesafb:off,efifb:off).  The first thing you'll notice when you boot an IGD system with i915 blacklisted and the more basic framebuffer drivers disabled is that you don't get anything on the graphics head after grub.  Plan for this.  I use a serial console, but perhaps you're more comfortable running blind and willing to hope the system boots and you can ssh into it remotely.

If you've followed along with this procedure, you should be able to simply create a <hostdev> entry in your libvirt XML, which ought to look something like this:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </hostdev>

Again, assigning the IGD device (which is always 00:02.0) to address 00:02.0 in the VM is required.  Delete the <video> and <graphics> sections and everything should just magically work.  Caveat emptor, my newest CPU is Broadwell, I've been told this works with Skylake, but IGD is hardly standardized and each new implementation seems to tweak things just a bit.

Some of you are probably also curious why this doesn't work on Q35, which leads into the discussion of UPT  mode; IGD clearly is not a discrete GPU, but "integrated" not only means that the GPU is embedded in the system, in this case it means that the GPU is kind of smeared across the system.  This is why IGD assignment hasn't "just worked" and why you need a host kernel with support for exposing certain regions through vfio and a BIOS that's aware of IGD, and it needs to be at a specific address, etc, etc, etc.  One of those requirements is that the video ROM actually also cares about a few properties of the device at PCI address 00:1f.0, the ISA/LPC bridge.  Q35 includes its own bridge at that location and we cannot simply modify the IDs of that bridge for compatibility reasons.  Therefore that bridge being an implicit part of Q35 means that IGD assignment doesn't work on Q35.  This also means that PCI address 00:1f.0 is not available for use in a 440FX machine.

Ok, so UPT.  Intel has known for a while that the sprawl of IGD has made it difficult to deal with for device assignment.  To combat this, both software and hardware changes have been made that help to consolidate IGD to be more assignment-friendly.  Great news, right?  Well sort of.  First off, in UPT mode the IGD is meant to be a secondary graphics device in the VM, there's no VGA mode support (oh, BTW, x-vga=on is automatically added by QEMU in legacy mode).  In fact, um, there's no output support of any kind by default in UPT mode.  How's this useful you ask, well between the emulated graphics and IGD you can setup mirroring so you actually have a remote-capable, hardware accelerated graphics VM.  Plus, if you add the option x-igd-opregion=on to the vfio-pci device, you can get output to a physical display, but there again you're going to need the host running kernel v4.6 or newer and the upcoming QEMU 2.7 support, while no-output UPT has probably actually worked for quite a while.  UPT mode has no requirements for the IGD PCI address, but note that most VM firmare, SeaBIOS or OVMF, will define the primary graphics as the one having the lowest PCI address.  Usually not a problem, but some of you create some crazy configs.  You'll also still need to do all the blacklisting and video disabling above, or just risk binding and unbinding i915 from the host, gambling each time whether it'll explode.

So UPT sounds great except why is this opregion thing optional?  Well, it turns out that if you want to do that cool mirroring thing I mention above and a physical output is enabled with the opregion, you actually need to have a monitor attached to the device or else your apps don't get any hardware acceleration love.  Whereas if IGD doesn't know about any outputs, it's happy to apply hardware acceleration regardless of what's physically connected.  Sucks, but readers here should already know how to create wrapper scrips to add this extra option if they want it (similar to x-vga=on).  I don't think Intel really wants to support this hacky hybrid mode either, thus the experimental x- option prefix tag.

Oh, one more gotcha for UPT mode, Intel seems to expect otherwise, but I've had zero success trying to run Linux guests with UPT.  Just go ahead and assume this is for your Windows guests only at this point.

What else... laptop displays should work, I believe switching outputs even works, but working on laptops is rather inconvenient since you're unlikely to have a serial console available.  Also note that while you can use input-linux to attach a laptop keyboard and mouse (not trackpad IME), I don't know how to make the hotkeys work, so that's a bummer.  Some IGD devices will generate DMAR error spew on the host when assigned, particularly the first time per host boot.  Don't be too alarmed by this, especially if it stops before the display is initialized.  This seems to be caused by resetting the IGD in an IOMMU context where it can't access its page tables setup by the BIOS/host.  Unless you have an ongoing spew of these, they can probably be ignored.  If you have something older than SandyBridge that you wish you could use this with and continued reading even after told to stop, sorry, there was a hardware change at SandyBridge and I don't have anything older to test with and don't really want to support additional code for such outdated hardware.  Besides, those are pretty old and you need an excuse for an upgrade anyway.

With this support I've switched my desktop system so that the host actually runs from a USB stick and the previous bare-metal Fedora install is virtualized with IGD, running alongside my existing GeForce VM.  Give it a try and good luck.

8 comments:

  1. Nice update Alex. I have this a quick go but couldn't get anywhere with it.

    I'm running Linux 4.7.0-rc7 and the very latest qemu-git. I blacklisted i915, and some other modules after troubleshooting by inserting the following in to /etc/modprobe.d/blacklist.conf:

    blacklist snd_hda_intel
    blacklist i915
    blacklist drm
    blacklist drm_kms_helper
    blacklist i2c_algo_bit
    install snd_hda_intel /bin/false
    install i915 /bin/false

    I also added video=vesafb:off,efifb:off to my kernel boot parameters. I can see that 00:02.0 has no kernel module in use when the system boots however I still get a console output on the screen. When I start the guest the console disappears and I see the below in dmesg but the guest never starts and I get no output on the screen:

    [ 67.980294] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
    [ 68.025793] br0: port 3(tap11) entered blocking state
    [ 68.025796] br0: port 3(tap11) entered disabled state
    [ 68.025854] device tap11 entered promiscuous mode
    [ 68.052476] br0: port 3(tap11) entered blocking state
    [ 68.052479] br0: port 3(tap11) entered forwarding state
    [ 69.158471] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
    [ 69.158473] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
    [ 78.675389] br0: port 3(tap11) entered disabled state
    [ 78.676919] device tap11 left promiscuous mode
    [ 78.676922] br0: port 3(tap11) entered disabled state
    [ 78.912853] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem



    ReplyDelete
    Replies
    1. Can you pastebin /proc/iomem, /proc/ioport, and 'lspci -vvvs 00:02.0', you'll need sudo for all of these on recent kernels. dmesg may also be useful. Clearly some console driver is still attaching to IGD. You might also find that you have more success letting i915 grab IGD, unbinding and preventing future binding. Generally the problems I see with i915 are re-binding after use by vfio-pci. 'virsh nodedev-detach pci_0000_00_02_0' at boot should do this.

      Delete
    2. I can see a difference now after letting i915 rab the IGD and running 'virsh nodedev-detach pci_0000_00_02_0' as the console output to screen disappears. However the guest still fails to boot with no sign of any output on the screen:

      Here's the pastebin to the outputs after running 'virsh nodedev-detach pci_0000_00_02_0'

      http://pastebin.com/7WFFpG1T

      Delete
    3. Can we see your guest XML/command line? What's the processor version?

      Delete
    4. Here's the XML : http://pastebin.com/mLwrK4Bw

      The processor is an i7-4770s (Haswell)

      Delete
    5. You're using /home/user/vfio-bios/bios.bin for your BIOS and /usr/local/qemu-2.7.0-git-20072106/bin/qemu-system-x86_64 for the emulator, but qemu.git has the correct bios in it, so why aren't you using /usr/local/qemu-2.7.0-git-20072106/share/qemu/bios.bin for the BIOS? Where does vfio-bios come from? Are there any error messages in /var/log/libvirt/qemu/arch-gns3-2016-idg.log? I don't spot any other issues, but using the correct BIOS is critical and there was churn on that during development, so anything other than the final bits that went upstream won't work.

      Delete
    6. Bingo! Thank you very much sir! I've just changed to the new SeaBIOS and it all works now.

      I've also got it working by blacklisting only i915 in /etc/modprobe.d/blacklist.conf so I don't have to execute 'virsh nodedev-detach pci_0000_00_02_0' after boot.


      Delete
  2. If you experience problems with BSODs in Windows, try enabling the kvm module option ignore_msrs. The risk of this is that unknown/unsupported MSRs will return 0, which may not always be the correct return value and may lead to other issues. The Intel graphics drivers are known to need this for now though.

    ReplyDelete

Comments are not a support forum. For help with problems, please try the vfio-users mailing list (https://www.redhat.com/mailman/listinfo/vfio-users)