The first thing we need to do on the host is enable the IOMMU. To do this, verify that IOMMU support is enabled in the host BIOS. How to do this will be specific to your hardware/BIOS vendor. If you can't find an option, don't fret, it may be tied to processor virtualization support. If you're using an Intel processor, check http://ark.intel.com to verify that your processor supports VT-d before going any further.
Next we need to modify the kernel commandline to allow the kernel to enable IOMMU support. This will be similar between distributions, but not identical. On Fedora we need to edit /etc/sysconfig/grub. Find the GRUB_CMDLINE_LINUX line and within the quotes add either intel_iommu=on or amd_iommu=on, depending on whether your platform is Intel or AMD. You may also want to add the option iommu=pt, which sets the IOMMU into passthrough mode for host devices. This reduces the overhead of the IOMMU for host owned devices, but also removes any protection the IOMMU may have provided again errant DMA from devices. If you weren't using the IOMMU before, there's nothing lost. Regardless of passthrough mode, the IOMMU will provide the same degree of isolation for assigned devices.
Save the system grub configuration file and use your distribution provided update scrips to apply this configuration to the boot-time grub config file. On Fedora, the command is:
# grub2-mkconfig -o /etc/grub2.cfg
If your host system boots via UEFI, the correct target file is /etc/grub2-efi.cfg.
With these changes, reboot the system and verify that the IOMMU is enabled. To do this, first verify that the kernel booted with the desired updates to the commandline. We can check this using:
# cat /proc/cmdline
If the changes are not there, verify that you've booted the correct kernel or double check instructions specific to your distribution. If they are there, then we next need to check that the IOMMU is actually functional. The easiest way to do this is to check for IOMMU groups, which are setup by the IOMMU and will be used by VFIO for assignment. To do this, run the following:
# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:16.0
/sys/kernel/iommu_groups/4/devices/0000:00:1a.0
/sys/kernel/iommu_groups/5/devices/0000:00:1b.0
/sys/kernel/iommu_groups/6/devices/0000:00:1c.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.5
/sys/kernel/iommu_groups/8/devices/0000:00:1c.6
/sys/kernel/iommu_groups/9/devices/0000:00:1c.7
/sys/kernel/iommu_groups/9/devices/0000:05:00.0
/sys/kernel/iommu_groups/10/devices/0000:00:1d.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.2
/sys/kernel/iommu_groups/11/devices/0000:00:1f.3
/sys/kernel/iommu_groups/12/devices/0000:02:00.0
/sys/kernel/iommu_groups/12/devices/0000:02:00.1
/sys/kernel/iommu_groups/13/devices/0000:03:00.0
/sys/kernel/iommu_groups/14/devices/0000:04:00.0
If you get output like above, then the IOMMU is working. If you do not get a list of devices, then something is wrong with the IOMMU configuration on your system, either not properly enabled or not supported by the hardware and you'll need to figure out the problem before moving forward.
This is also a good time to verify that we have the desired isolation via the IOMMU groups. In the above example, there's a separate group per device except for the following groups: 1, 9, 11, and 12. Group 1 includes:
This is also a good time to verify that we have the desired isolation via the IOMMU groups. In the above example, there's a separate group per device except for the following groups: 1, 9, 11, and 12. Group 1 includes:
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750] (rev a2)
01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)
This includes the processor root port and my GeForce card. This is a case where the processor root port does not provide isolation and is therefore included in the IOMMU group. The host driver for the root port should remain in place, with only the two endpoint devices, the GPU itself and its companion audio function bound to vfio-pci.
Group 9 has a similar constraint, though in this case device 0000:00:1c.7 is not a root port, but a PCI bridge. Since this is conventional PCI, the bridge and all of the devices behind it are grouped together. Device 0000:05:00.0 is another bridge, so there's nothing assignable in the IOMMU group anyway.
Group 11 is composed of internal components, an ISA bridge, SATA controller, and SMBus device. These are grouped because there's not ACS between the devices and therefore no isolation. I don't plan to assign any of these devices anyway, so it's not an issue.
Group 12 includes only the functions of my second graphics card, so the grouping here is also reasonable and perfectly usable for our purposes.
If your grouping is not reasonable, or usable, you may be able to "fix" this by using the ACS override patch, but carefully consider the implications of doing this. There is a potential for putting your data at risk. Read my IOMMU groups article again to make sure you understand the issue.
Next we need to handle the problem that we only intend to use the discrete GPUs for guests, we do not want host drivers attaching to them. This avoids issues with the host driver unbinding and re-binding to the device. Generally this is only necessary for graphics cards, though I also throw in the companion audio function to keep the host desktop from getting confused which audio device to use. We have a couple options for doing this. The most common option is to use the pci-stub driver to claim these devices before native host drivers have the opportunity. Fedora builds the pci-stub driver statically into the kernel, giving it loading priority over any loadable modules, simplifying this even further. If your distro doesn't keep reading, we'll cover a similar scenario with vfio-pci.
The first step is to determine the PCI vendor and device IDs we need to bind to pci-stub. For this we use lspci:
If your grouping is not reasonable, or usable, you may be able to "fix" this by using the ACS override patch, but carefully consider the implications of doing this. There is a potential for putting your data at risk. Read my IOMMU groups article again to make sure you understand the issue.
Next we need to handle the problem that we only intend to use the discrete GPUs for guests, we do not want host drivers attaching to them. This avoids issues with the host driver unbinding and re-binding to the device. Generally this is only necessary for graphics cards, though I also throw in the companion audio function to keep the host desktop from getting confused which audio device to use. We have a couple options for doing this. The most common option is to use the pci-stub driver to claim these devices before native host drivers have the opportunity. Fedora builds the pci-stub driver statically into the kernel, giving it loading priority over any loadable modules, simplifying this even further. If your distro doesn't keep reading, we'll cover a similar scenario with vfio-pci.
The first step is to determine the PCI vendor and device IDs we need to bind to pci-stub. For this we use lspci:
$ lspci -n -s 1:
01:00.0 0300: 10de:1381 (rev a2)
01:00.1 0403: 10de:0fbc (rev a1)
$ lspci -n -s 2:
02:00.0 0300: 1002:6611
02:00.1 0403: 1002:aab0
The Vendor:Device IDs for my GPUs and audio functions are therefore 10de:1381, 10de:0fbc, 1002:6611, and 1002:aab0. From this, we can craft a new option to add to our kernel commandline using the same procedure as above for the IOMMU. In this case the commandline addition looks like this:
pci-stub.ids=10de:1381,10de:0fbc,1002:6611,1002:aab0
After adding this to our grub configuration, using grub2-mkconfig, and rebooting, lspci -nnk for these devices should list pci-stub for the kernel driver in use.
A further trick we can use is to craft an ids list using the advanced parsing of PCI vendor and class attributes to create an option list that will claim any Nvidia or AMD GPU or audio device:
pci-stub.ids=1002:ffffffff:ffffffff:ffffffff:00030000:ffff00ff,1002:ffffffff:ffffffff:ffffffff:00040300:ffffffff,10de:ffffffff:ffffffff:ffffffff:00030000:ffff00ff,10de:ffffffff:ffffffff:ffffffff:00040300:ffffffff
If you're using kernel v4.1 or newer, the vfio-pci driver supports the same ids option so you can directly attach devices to vfio-pci and skip pci-stub. vfio-pci is not generally built statically into the kernel, so we need to force it to be loaded early. To do this on Fedora we need to setup the module options we want to use with modprobe.d. I typically use a file named /etc/modprobe.d/local.conf for local, ie. system specific, configuration. In this case, that file would include:
options vfio-pci ids=1002:ffffffff:ffffffff:ffffffff:00030000:ffff00ff,1002:ffffffff:ffffffff:ffffffff:00040300:ffffffff,10de:ffffffff:ffffffff:ffffffff:00030000:ffff00ff,10de:ffffffff:ffffffff:ffffffff:00040300:ffffffff
Next we need to ensure that dracut includes the necessary modules to load vfio-pci. I therefore create /etc/dracut.conf.d/local.conf with the following:
add_drivers+="vfio vfio_iommu_type1 vfio_pci vfio_virqfd"
(Note, the vfio_virqfd module only exists in kernel v4.1+)
Finally, we need to tell dracut to load vfio-pci first. This is done by once again editing our grub config file and adding the option: rd.driver.pre=vfio-pci Note that in this case we no longer use a pci-stub.ids option from grub, since we're replacing it with vfio-pci. Regenerate the dracut initramfs with dracut -f --kver `uname -r` and reboot to see the effect (The --regenerate-all dracut option is also sometimes useful).
Another issue that users encounter when sequestering devices is what to do when there are multiple devices with the same vendor:device ID and some are intended to be used for the host. Some users have found the xen-pciback module to be a suitable stand-in for pci-stub with the additional feature that the "hide" option for this module takes device addresses rather than device IDs. I can't load this module on Fedora, so here's my solution that I like a bit better.
Create a small script, I've named mine /sbin/vfio-pci-override-vga.sh It contains:
#!/bin/sh
for i in $(find /sys/devices/pci* -name boot_vga); do
if [ $(cat $i) -eq 0 ]; then
GPU=$(dirname $i)
AUDIO=$(echo $GPU | sed -e "s/0$/1/")
echo "vfio-pci" > $GPU/driver_override
if [ -d $AUDIO ]; then
echo "vfio-pci" > $AUDIO/driver_override
fi
fi
done
modprobe -i vfio-pci
This script will find every non-boot VGA device in the system, use the driver_override feature introduced in kernel v3.16, and make vfio-pci the exclusive driver for that device. If there's a companion audio device at function 1, it also gets a driver override. We then modprobe the vfio-pci module, which will automatically bind to the devices we've specified. Don't forget to make the script executable with chmod 755. Now, in place of the options line in our modprobe.d file, we use the following:
install vfio-pci /sbin/vfio-pci-override-vga.sh
So we specify that to install the vfio-pci module, run the script we just wrote, which sets up our driver overrides and then loads the module, ignoring the install option (-i) to prevent a loop. Finally, we need to tell dracut to include this script in the initramfs, so in addition to the add_drivers+= that we added above, add the following to /etc/dracut.conf.d/local.conf:
install_items+="/sbin/vfio-pci-override-vga.sh /usr/bin/find /usr/bin/dirname"
Note that the additional utilities required were found using lsinitrd and iteratively added to make the script work. Regenerate the initramfs with dracut again and you should now have all the non-boot VGA devices and their companion audio functions bound to vfio-pci after reboot. The primary graphics should load with the native host driver normally. This method should work for any kernel version, and I think I'm going to switch my setup to use it since I wrote it up here.
Obviously a more simple script can be used to pick specific devices. Here's an example that achieves the same result on my system:
#!/bin/sh
DEVS="0000:01:00.0 0000:01:00.1 0000:02:00.0 0000:02:00.1"
for DEV in $DEVS; do
echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
done
modprobe -i vfio-pci
(In this case the find and dirname binaries don't need to be included in the intramfs)
Another couple other bonuses for v4.1 and newer kernels is that by binding devices statically to vfio-pci, they will be placed into a low power state when not in use. Before you get your hopes too high, this generally only saves a few watts and does not stop the fan. v4.1 users with exclusively OVMF guests can also add an "options vfio-pci disable_vga=1" line to their modprobe.d which will cause vfio-pci to opt-out devices from vga arbitration if possible. This prevents VGA arbitration from interfering with host devices, even in configurations like mine with multiple assigned GPUs.
If you're in the unfortunate situation of needing to use legacy VGA BIOS support for your assigned graphics cards and you have Intel host graphics using the i915 driver, this is also the point where you need to patch your host kernel for the i915 VGA arbitration fix. Don't forget that to enable this patch you also need to pass the enable_hd_vgaarb=1 option to the i915 driver. This is typically done via a modprobe.d options entry as discussed above.
At this point your system should be ready to use. The IOMMU is enabled, the IOMMU groups have been verified, the VGA and audio functions for assignment have been bound to either vfio-pci or pci-stub for later use by libvirt, and we've enabled proper VGA arbitration support in the i915 driver if needed. In the next part we'll actually install a VM, and maybe even attach a GPU to it. Stay tuned.
Very nice how-to's.
ReplyDeleteI've read them all so far and I'm now trying to actually make it happen :)
So far so good, but I'm a bit lost after the pci-stub part.
I added pci-stub.ids to the grub and the cards are nicely linked to the pci-stub driver.
After that you mention stuff about kernel 4.1, which I currently don't have. But I can't figer out from your story is the pci-stub part is enough for kernel <4.1 or that something still needs to be done in order to work in the next part of your how-to.
Yeah, I sort of meandered around there. The point of any of the early binding is to prevent native host drivers from claiming the devices, generally because graphics drivers don't have good support yet for dynamically unbinding when you want to startup the VM. pci-stub is a sufficient workaround for this. libvirt will unbind the device from pci-stub and bind it to vfio-pci when you startup the VM. If you don't have 4.1+, the difference between binding to pci-stub or vfio-pci is large academic. Once you do have 4.1+, vfio-pci will put unused devices into a low power state, which may save you a couple watts, but the functionality is largely the same.
DeleteJust installed Fedora 22, after failing in Linux Mint due to lack of up-to-date packages.
ReplyDelete"After adding this to our grub configuration, using mkgrub2config, and rebooting, lspci -nnk for these devices should list pci-stub for the kernel driver in use."
bash: mkgrub2config: command not found
I assume that, if it worked, it would be pretty self-explanatory?
grub2-mkconfig, as documented below the 4th paragraph. Fixed the erroneous reference to mkgrub2config.
DeleteOkay, I THINK I have this stage totally down. Thanks for the write-up.
ReplyDeleteI am somewhat familiar with using pci-stub to claim the second video card. In that case, I can confirm I've done it right by using dmesg | grep pci-stub to get a message like:
pci-stub 0000:04:00.0: claimed by stub
Using vfio-pci is new to me though. I do see the following:
vfio-pci 0000:04:00.0: enabling device (0100 -> 0103)
I assume that means I'm good? I have no video output on that monitor, in any event... :)
First of all thanks for taking the time to write this all up. Many of the similar articles I've seen are... incomplete at best.
ReplyDeleteHowever, I'm a little unclear which parts not to do if not using kernel 4.1+
I do not wish to bind more than the one specific card to vfio or stub, so I left the specific addresses for the card I wish to pass through on pci-stub.ids=. Since you said vfio-pci supports the ids option in 4.1+ I left it pci-stub.
Does one still need to rd.driver.pre=vfio-pci to grub and create one of the scripts in this? or should I be doing something else? As is I cannot launch the VM due to errors about the operation not being permitted, not being able to get the group, and device initialization failing.
I assume this is a result of not having given the device to vfio-pci instead of stub, but I'm not exactly sure how to.
If the devices you want to assign have unique IDs and your distro builds pci-stub directly into the kernel, then the pci-stub.ids= option on the kernel commandline is just fine. Permission denied errors can be a result of using <qemu:arg> options in your libvirt xml, lack of support for interrupt remapping, or platform breakage with reserved memory regions. We'd need to see the error and dmesg to know which it is.
DeleteHi, the messages displayed in virt-manager are as follows:
DeleteError starting domain: internal error: early end of file from monitor: possible problem:
2015-06-04T13:51:39.398074Z qemu-system-x86_64: -device vfio-pci,host=86:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio: failed to set iommu for container: Operation not permitted
2015-06-04T13:51:39.398119Z qemu-system-x86_64: -device vfio-pci,host=86:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio: failed to setup container for group 25
2015-06-04T13:51:39.398135Z qemu-system-x86_64: -device vfio-pci,host=86:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio: failed to get group 25
2015-06-04T13:51:39.398165Z qemu-system-x86_64: -device vfio-pci,host=86:00.0,id=hostdev0,bus=pci.0,addr=0x5: Device initialization failed
2015-06-04T13:51:39.398185Z qemu-system-x86_64: -device vfio-pci,host=86:00.0,id=hostdev0,bus=pci.0,addr=0x5: Device 'vfio-pci' could not be initialized
I have not manually changed anything in the xml file, I created it much as described in your part4, with the exception of having left it bios instead of uefi. If there are specific logs or files I should add information from please let me know, I've only recently started experimenting with virtualization and most of my linux experience is quite dated.
dmesg likely provides the solution, does it say something about using allow_unsafe_interrupts? If so, try "options vfio_iommu_type1 allow_unsafe_interrupts=1" in modprobe.d. You'll need to at least manually unload the module or reboot to have it loaded with the correct option. This would also mean that your hardware isn't protecting you from possible MSI attacks from the guest. If you trust the guest, not an issue.
DeleteThat was indeed exactly what was preventing it from starting. However when it runs I get no video output, so time for me to double check I've put everything correctly.
DeleteThanks for the help!
Before anything else, thank for taking the time to setup this guide.
ReplyDeleteHowever I must ask, is there a way to make pci-stub work for identical cards(same vendor and device ids)?
Really? See above, "Another issue that users encounter when sequestering devices is what to do when there are multiple devices with the same vendor:device ID..."
DeleteSo your saying the only solution is to upgrade my kernel and use vfio ?
DeleteNo, the only thing 4.1 brings is the "ids" option to vfio-pci, but just like pci-stub, that doesn't help when you have devices with the same IDs, split between host and guest. The driver_override support is in any reasonably new kernel (3.16+).
DeleteI kinda misunderstood that part and thought I needed 4.1 to use vfio, that's why I wanted to know if there was a way to make pci-stub work.
DeleteI have installed the ed2.git-aarch64 and edk2.git-ovm-x64 from the https://www.kraxel.org/repos/firmware.repo repository. I have also installed the Visualization Preview Repository.
ReplyDeleteThe virt-manager on my machine does not appear the same as the version in Alex's examples,
1. After pressing the "Finish Button" on the "Create Virtual Machine" Window I do not get a window showing the overview of the installation. Instead the overview Windows Begins the install process.
2. I am unable to change the firmware setting.
3. In the processor Configuration I do not have the option of host-passthrough
My system is running Fedora 22, virt-manager is 1.2.1, libvirt 1.2.17, and qemu is 2.4. My system is up to date with latest version in the Virtaulization Preview Repository. What version are you running?
Thank you
Aaron
As noted in part 4, you must select the customize before install box to get to the advanced configuration. There it should be possible to change the firmware. Also as noted in part 4, host-passthrough can be typed into the selection window, it is not a pre-defined selection.
DeleteI just want to report that I have successfully got near the same configuration on Debian GNU/Linux 8.0 system. I've used ASUS P8Z77-V Deluxe mainboard, Core i7 3770T, ASUS Radeon R5 230 (marked to support UEFI on official site), Zotac nVidia GTX 760 4Gb (is not marked to support UEFI, but it looks like it supports it).
ReplyDeleteAfter any guest starts Gnome Shell (which runs on IGD) loses all its effects and animations. I suppose that it is the issue with VGA arbiter.
I did not used any side repositories. The only thing I had to do is to plug testing and unstable repositories for few newer packages.
If anybody is interested here are software versions I've used:
1. Linux kernel 4.0.0 from testing repository
2. OVMF from unstable repository
3. libvirt 1.2.9 from stable repository
4. qemu-kvm 2.1 from stable repository
I had manually write OVMF arguments for qemu since virt-manager does not support it (did not check whether libvirt does).
I observe strance CPU consumption on VM with nVidia card. Guest's top shows ~10% load, but qemu process on host consumes ~20%. I cannot clearly understand why this happens.
Additionally I noticed that several games on Linux guest (Borderlands 2, Fahrenheit) have issues with sound. I tried to enable MSI for audio device, but it did not help. I am wondering why MSI for soundcard is not enabled by default, it looks like there're no issues with it.
DeleteThis comment has been removed by the author.
ReplyDeleteI used the script you've provided in your post (vfio-pci-override-vga.sh) it works but only if I'm using the open source radeon drivers. If I use the drivers from amds site then kvm says the resource is busy. I ran "lspci -vnn" it says the cards claimed by "vfio" but yet kvm says its busy. I also ran "dmesg" it says something along the lines of "vfio@4.00.0 vs fglrx@4.00.0". I dont want to use the opensource drivers because they dont fully support my cards also I cant run a 144hz monitor with them. So what can I do to resolve this?
ReplyDeleteI don't understand what you're doing, do you have two AMD cards, one of which you want to assign to the guest and the other to be used by the host? And fglrx is claiming some resources of the guest card even when claimed by vfio-pci, preventing use in KVM? That sounds like an fglrx problem, complain to AMD.
DeleteYup that's exactly what I want to do. Okay thanks I'll go grab my pitch fork and knife.
DeleteThanks for putting this out there. I'm trying to assign a DVB-T card to a VM, but am really struggling with it. I thought it would work when I read your post and added this to my grub command line:
ReplyDeleteiommu=pt intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1 pci-stub.ids=14f1:8802,14f1:8800.
I understand the problem is that I have multiple PCI devices in group 11:
# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:02.0
/sys/kernel/iommu_groups/2/devices/0000:00:16.0
/sys/kernel/iommu_groups/2/devices/0000:00:16.3
/sys/kernel/iommu_groups/3/devices/0000:00:19.0
/sys/kernel/iommu_groups/4/devices/0000:00:1a.0
/sys/kernel/iommu_groups/5/devices/0000:00:1b.0
/sys/kernel/iommu_groups/6/devices/0000:00:1c.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.4
/sys/kernel/iommu_groups/8/devices/0000:00:1c.6
/sys/kernel/iommu_groups/9/devices/0000:00:1c.7
/sys/kernel/iommu_groups/10/devices/0000:00:1d.0
/sys/kernel/iommu_groups/11/devices/0000:00:1e.0
/sys/kernel/iommu_groups/11/devices/0000:05:00.0
/sys/kernel/iommu_groups/11/devices/0000:05:00.1
/sys/kernel/iommu_groups/11/devices/0000:05:02.0
/sys/kernel/iommu_groups/11/devices/0000:05:02.2
/sys/kernel/iommu_groups/12/devices/0000:00:1f.0
/sys/kernel/iommu_groups/12/devices/0000:00:1f.2
/sys/kernel/iommu_groups/12/devices/0000:00:1f.3
My DVB-T card is 0000:05:02.0 and 0000:05:02.2.
Unfortunately I still can't get the VM to start and get this error when booting it so obviousuly what I added to the kernel; command line didn't work:
Error starting domain: internal error: process exited while connecting to monitor: 2015-09-10T01:15:53.633408Z qemu-kvm: -device vfio-pci,host=05:02.0,id=hostdev0,bus=pci.0,addr=0x9: vfio: error, group 11 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
2015-09-10T01:15:53.633447Z qemu-kvm: -device vfio-pci,host=05:02.0,id=hostdev0,bus=pci.0,addr=0x9: vfio: failed to get group 11
2015-09-10T01:15:53.633463Z qemu-kvm: -device vfio-pci,host=05:02.0,id=hostdev0,bus=pci.0,addr=0x9: Device initialization failed.
2015-09-10T01:15:53.633479Z qemu-kvm: -device vfio-pci,host=05:02.0,id=hostdev0,bus=pci.0,addr=0x9: Device 'vfio-pci' could not be initialized
I'm running a CentOS 7.1 hypervisor with kernel 3.10.0-229.el7.x86_64.
I'd really appreciate it if you could point out what I haven't done properly to isolate the PCi card from group 11.
You haven't given any lspci info, so all I can do is point at this:
Deletehttp://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html
Sorry about not providing any lspci info. It's as follows. I've only provided PCI info for the problematic group 11 due to the posting character limit.
Delete# lspci -nnk
...
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev a4)
...
05:00.0 Ethernet controller [0200]: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) [8086:1010] (rev 01)
Subsystem: Compaq Computer Corporation NC7170 Gigabit Server Adapter [0e11:00db]
Kernel driver in use: e1000
05:00.1 Ethernet controller [0200]: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) [8086:1010] (rev 01)
Subsystem: Compaq Computer Corporation NC7170 Gigabit Server Adapter [0e11:00db]
Kernel driver in use: e1000
05:02.0 Multimedia video controller [0400]: Conexant Systems, Inc. CX23880/1/2/3 PCI Video and Audio Decoder [14f1:8800] (rev 05)
Subsystem: KWorld Computer Co. Ltd. KWorld/VStream XPert DVB-T [17de:08a6]
05:02.2 Multimedia controller [0480]: Conexant Systems, Inc. CX23880/1/2/3 PCI Video and Audio Decoder [MPEG Port] [14f1:8802] (rev 05)
Subsystem: KWorld Computer Co. Ltd. KWorld/VStream XPert DVB-T [17de:08a6]
I've answered this on the vfio-users list: https://www.redhat.com/archives/vfio-users/2015-September/msg00179.html
DeleteI have set my system up as per the vfio-pci-override-vga.sh script method and it pretty much works flawlessly on a clearos (centos) 7 system. I compiled qemu, libvirt and virt-manager from source as the clearos packages are too old for the setup to work this way. The only issue i have is if I try to run a second gpu for another windows machine (two separate machines running with two separate cards). The machine with the card in the second pcie slot will boot with a white screen instead of the efi boot screen, function for a little bit (usually with white lines and artifacts) and then lock up the entire host machine).
ReplyDeleteThe cards and slots are fine as I can take either one out and run them from either slot and everything performs as expected. I have also installed windows on the host and was able to run both cards as a dual monitor setup.
lspci is as follows
00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller
00:16.0 Communication controller: Intel Corporation 9 Series Chipset Family ME Interface #1
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V
00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2
00:1b.0 Audio device: Intel Corporation 9 Series Chipset Family HD Audio Controller
00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 (rev d0)
00:1c.2 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d0)
00:1c.4 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 5 (rev d0)
00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1
00:1f.0 ISA bridge: Intel Corporation 9 Series Chipset Family H97 Controller
00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode]
00:1f.3 SMBus: Intel Corporation 9 Series Chipset Family SMBus Controller
01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750 Ti] (rev a2)
01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)
03:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 03)
05:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750] (rev a2)
05:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)
and
the iommu groups are
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:03.0
/sys/kernel/iommu_groups/4/devices/0000:00:14.0
/sys/kernel/iommu_groups/5/devices/0000:00:16.0
/sys/kernel/iommu_groups/6/devices/0000:00:19.0
/sys/kernel/iommu_groups/7/devices/0000:00:1a.0
/sys/kernel/iommu_groups/8/devices/0000:00:1b.0
/sys/kernel/iommu_groups/9/devices/0000:00:1c.0
/sys/kernel/iommu_groups/9/devices/0000:00:1c.2
/sys/kernel/iommu_groups/9/devices/0000:00:1c.4
/sys/kernel/iommu_groups/9/devices/0000:03:00.0
/sys/kernel/iommu_groups/9/devices/0000:05:00.0
/sys/kernel/iommu_groups/9/devices/0000:05:00.1
/sys/kernel/iommu_groups/10/devices/0000:00:1d.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.2
/sys/kernel/iommu_groups/11/devices/0000:00:1f.3
Just wondering if you might have any ideas on where to look. 05:00.0 is the isssue.
Hi,
ReplyDeleteI hope someone is still reading the comments^^
I had a setup like this running on Fedora21 but I switched to Debian 8 lately and nothing works.
From what I figured out, pci-stub is configured as a module and loaded way to late when I look at my bootlog. (after modules like e1000e and xhci_hcd already did their thing)
I blacklisted radeon so my GPU can be claimed by pci-stub, but pcie-usb controller, ethernet and gpu hdmi audio can't be claimed at this point.
I think I could blacklist e1000e, too but not xhci_hcd because there are multiple devices.
My mouse and keyboard are connected to a kvm switch which is connected to onboard usb and some pcie x1 card I assigned to my virtual machine, so I can easily switch mouse and keyboard between host and virtual system.
Any idea what I could do to solve this without changing my system?
I think recompiling the kernel would solve it, but I'd like to keep this system as simple as possible.
Having a similar issue with module load order/priority and devices being grabbed by the wrong module. Haven't figured it completely but added 'vfio-pci' to the top of the list in /etc/initramfs-tools/modules and it solve 99% of my problems. Think the issue is that the kernel parameter Alex notes (rd.driver.pre=vfio-pci) does work in Debian based distros...I use Ubuntu so guessing you've tried this also and failed?
Delete*doesn't work in Debian based distros
DeleteOh and remember the 'update-initramfs -u' if you try amending the file I mention.
Yeah I tried both, vfio and pci_stub in muliple places, like initramfs/modules, modprobe.d, kernel params etc.
DeleteI'm still not satisfied (because it opens a whole new can of worms) but I got it working like this:
root@dom0:/etc/apt# cat /etc/modprobe.d/pci-stub.conf
install radeon /sbin/modprobe pci-stub; /sbin/modprobe --ignore-install radeon
install e1000e /sbin/modprobe pci-stub; /sbin/modprobe --ignore-install e1000e
install xhci_hcd /sbin/modprobe pci-stub; /sbin/modprobe --ignore-install xhci_hcd
# Dependency:
install ptp /sbin/modprobe pci-stub; /sbin/modprobe --ignore-install ptp
install pps_core /sbin/modprobe pci-stub; /sbin/modprobe --ignore-install pps_core
install usbcore /sbin/modprobe pci-stub; /sbin/modprobe --ignore-install usbcore
install usb_common /sbin/modprobe pci-stub; /sbin/modprobe --ignore-install usb_common
options pci-stub ids=1002:6819,1002:aab0,1b73:1100,8086:153b
Is there a way to accomplish this on a debian based OS? I have two identical nvidia cards and would like to pass one to a vm.
ReplyDeletedude, you literally skipped over it
DeleteHow do you select just one card when pci-stub ids are the same for identical cards
DeleteDid you even read this article? "Another issue that users encounter when sequestering devices is what to do when there are multiple devices with the same vendor:device ID..."
DeleteHow do you select just one card when pci-stub ids are the same for identical cards
Deleteif i understood correct the vfio option has risks if under vga pci id there is hardware like hdmi audio from hdmi and loaded driver different than vga driver .
ReplyDeletecorrect or understood wrong ?
I keep receiving this error whenever i try to boot two guests at once
ReplyDeleteif A is started first, then it pops up an already claimed error for gpu ! which is for guest domain A when I attempt to start B
Likewise in reverse for A on guest domain B's gpu
two seperate discreet GPUs
rror starting domain: Requested operation is not valid: PCI device 0000:01:00.0 is in use by driver QEMU, domain A
Traceback (most recent call last):
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 90, in cb_wrapper
callback(asyncjob, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 126, in tmpcb
callback(*args, **kwargs)
File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 83, in newfn
ret = fn(self, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/domain.py", line 1402, in startup
self._backend.create()
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1035, in create
if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: Requested operation is not valid: PCI device 0000:01:00.0 is in use by driver QEMU, domain A
hardware setup is as follows
Deleteintel 3770 (has vt-d)
16 GB 1600mhz ram
EVGA nvidia gtx770 (vfio driver)
Radeon HD 5450 (vfio driver)
motherboard: gigabyte GA-z77mx-d3h-th
fedora 23
is this something solved by the ACS override patch?
Thanks for the tutorial, I am trying to follow multiple sources to set up a GPU passthrough on Fedora 23 with Kernel 4.3.5, where the host use Intel IGP (Z170+i6-6600), the guest Windows 10 uses AMD 7950.
ReplyDeleteHowever, after everything is set (IOMMU OK, vfio-bind is done as checked using lspci -nnk), and installing UEFI for QEMU,
whenever I attached the two PCI devices, (graphics + audio), the KVM simply froze, and the CPU usage was constant and the screen was blank (no video output). I have spent a few days on it but really can't figure it out, much appreciated if you wouldn't mind pointing me to some possible fixes.
One point I think maybe the problem is the IOMMU group, I saw it to be group 1 which included 3 devices, graphics, audio and the PCI bridge.
Is there anything I should do for the bridge? Or I should actually change the card from slot 1 to slot 2?
I just built a nice X99 system for this very purpose, but I made the mistake of getting two identical GPUs. You said, "This script will find every non-boot VGA device in the system, use the driver_override feature introduced in kernel v3.16, and make vfio-pci the exclusive driver for that device."
ReplyDeleteThe problem is that when I check:
find /sys/devices/pci* -name boot_vga
it indicates that both GPUs are marked as boot_vga. I've scoured the internet and can't find a way to set the 2nd gpu to NOT be boot_vga. Do I set the contents of /sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/boot_vga to 1 (it currently has the single character of zero in it)?
Wait - the primary GPU has a 1 in the boot_vga file, so I think that the 2nd GPU is actually NOT set to boot_vga. I'm going to push forward with your instruction and just see if there actually isn't a problem.
DeleteI was able to get up to the pci-stub piece. I have verified that my HDMI audio and video are the only items in an IOMMU group. When I attempt to move them from pci-stub to vfio, the HDMI audio will bind, but the HDMI video will not.
ReplyDelete07:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii PRO [Radeon R9 290] [1002:67b1]
Subsystem: XFX Pine Group Inc. Device [1682:9295]
Kernel modules: radeon, fglrx
07:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:aac8]
Subsystem: XFX Pine Group Inc. Device [1682:aac8]
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
dmesg shows this:
vfio-pci: probe of 0000:07:00.0 failed with error -16
Any suggestions?
Me too !
Delete[ 4.530998] snd_hda_intel 0000:02:00.1: enabling device (0000 -> 0002)
[ 4.531010] snd_hda_intel 0000:02:00.1: Handle vga_switcheroo audio client
[ 4.531012] snd_hda_intel 0000:02:00.1: Force to non-snoop mode
I want to add that to get the fix for multiple devices with same IDs working on Ubuntu, do everything as described except for the part where you modify the dracut config to copy the /sbin/vfio-pci-override-vga.sh script. Ubuntu uses initramfs-tools. Specifically, add a file to /etc/initramfs-tools/hooks/vfio-pci-override-vga with the following contents:
ReplyDelete#!/bin/sh -e
PREREQS=""
case $1 in
prereqs) echo "${PREREQS}"; exit 0;;
esac
. /usr/share/initramfs-tools/hook-functions
copy_exec /sbin/vfio-pci-override-vga.sh /sbin
Rebuild initramfs: sudo update-initramfs -u
Then, check that the file is there: lsinitramfs -l /boot/initrd.img-3.8.0-4-generic | grep vga
>Another couple other bonuses for v4.1 and newer kernels is that by binding devices statically to vfio-pci, they will be placed into a low power state when not in use. Before you get your hopes too high, this generally only saves a few watts and does not stop the fan.
ReplyDeleteMaybe this is common knowledge by now, but using libvirt 2.0.0 on Linux 4.6 with a VFIO configured graphics card the fan may turn off completely if the graphics card has a "Zero RPM" mode. In my case I'm using EVGA GTX 970 SSC, and the RPM control seems to work as if native on Windows.