tag:blogger.com,1999:blog-86943037814531332232024-03-18T21:27:05.382-06:00VFIO tips and tricksAlex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.comBlogger30125tag:blogger.com,1999:blog-8694303781453133223.post-18630295443589980882016-10-13T16:08:00.000-06:002016-10-13T16:09:52.925-06:00How to improve performance in Windows 7A contribution from <a href="https://www.redhat.com/archives/vfio-users/2016-October/msg00065.html">Thomas Lindroth</a> on the <a href="https://www.redhat.com/mailman/listinfo/vfio-users">vfio-users</a> mailing list:<br />
<blockquote cite="https://www.redhat.com/archives/vfio-users/2016-October/msg00065.html">I thought I'd share a trick for improving the performance on win7 guests. The<br />
tl;dr version is add <feature policy='disable' name='hypervisor'/> to the<br />
<cpu> section of your libvirt xml like so:<br />
<br />
<cpu mode='host-passthrough'><br />
<topology sockets='1' cores='3' threads='1'/><br />
<feature policy='disable' name='hypervisor'/><br />
</cpu><br />
<br />
The long story is that according to Microsoft's documentation "On systems<br />
where the TSC is not suitable for timekeeping, Windows automatically selects<br />
a platform counter (either the HPET timer or the ACPI PM timer) as the basis<br />
for QPC." QPC = QueryPerformanceCounter() which is a windows api for getting<br />
timing info. Some redhat documentation say: "Windows 7 do not use the TSC as<br />
a time source if the hypervisor-present bit is set". Instead if falls back on<br />
acpi_pm or hpet if hpet is enabled in the xml.<br />
<br />
The hypervisor present bit is a fake cpuid flag qemu and other hypervisors<br />
injects to show the guest it's running under a hypervisor. This is different<br />
from the KVM signature that can be hidden with <kvm><hidden state='on'>.<br />
With the hypervisor flag disabled in libvirt xml windows 7 started using TSC<br />
as timing source for me. <br />
<br />
Nvidia has a "Timer Function Performance" benchmark on their web page to<br />
measure overhead from timers. With acpi_pm the timer query took 3,605ns on<br />
average and with TSC 12.52ns. Passmark's CPU floating point performance<br />
benchmark, which query timers 265,000 times/sec, went from 3952 points with<br />
acpi_pm to 5594 points with TSC. The reason TSC is so much faster is because<br />
both acpi_pm and hpet are emulated by qemu in userspace and TSC is handled by<br />
KVM in kernel space.<br />
<br />
All games I've tested use the timer at least 25,000 times/sec. I'm guessing<br />
it's the graphics drivers doing that. Some games like Outlast query the timer<br />
~275,000 times/sec. The performance for those games are basically limited by<br />
how fast the host can do context switches. I expect the performance<br />
improvement with TSC is great in those games. Unfortunately 3dmark's fire<br />
strike benchmark still do 25,000 queries/sec to the acpi_pm even with the<br />
hypervisor flag hidden. There must be some other windows api for using the<br />
"platform counter" as Microsoft calls it but most games don't use it.<br />
<br />
Unless you are using windows 7 you'll probably not benefit from this. Windows<br />
10 is probably using the hypervclock instead. That redhat documentation<br />
talking about the hypervisor bit was actually a guide for how to turn off TSC<br />
to "resolve guest timing issues". I don't experience any problems myself but<br />
if you got one of those "clocksource tsc unstable” systems this might not<br />
work so well.<br />
</blockquote><br />
Google says <a href="http://www.nvidia.com/object/timer_function_performance.html">this</a> is the NVIDIA timer benchmark.<br />
<br />
It would be interesting to see how the hyper-v extensions compare and whether tests like Fire Strike actually makes use of them. Thanks Thomas!<br />
<br />
(copied with permission)Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com2tag:blogger.com,1999:blog-8694303781453133223.post-71403298607825519122016-09-26T17:00:00.001-06:002016-09-26T17:00:21.525-06:00Passing QEMU command line options through libvirt<span style="font-family: Arial, Helvetica, sans-serif;">This one comes from <a href="http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html">Stefan's blog</a> with some duct tape and bailing wire courtesy of Laine. I've talked previously about using wrapper scripts to launch QEMU, which typically use <i>sed</i> to insert options that libvirt doesn't know about. This is by far better than defining full vfio-pci devices using <qemu:arg> options, which many guides suggest, but it hides the devices from libvirt and causes all sorts of problems with device permissions and locked memory, etc. But, there's a nice compromise as Stefan shows in his last example at the link above. Say we only want to add x-vga=on to one of our hostdev entries in the libvirt domain XML. We can do something like this:</span><div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><qemu:commandline></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"> <qemu:arg value='-set'/></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"> <qemu:arg value='device.hostdev0.x-vga=on'/></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"></qemu:commandline></span></div>
<div style="font-family: Arial, Helvetica, sans-serif;">
<br /></div>
<div style="font-family: Arial, Helvetica, sans-serif;">
The effect is that we add the option x-vga=on to the hostdev0 device, which is defined via a normal <hostdev> section in the XML and gets all the device permission and locked memory love from libvirt. So which device is hostdev0? Well, things get a little mushy there. libvirt invents the names based on the order of the hostdev entries in the XML, so you can simply count (starting from zero) to pick the entry for the additional option. It's a bit cleaner overall than needing to manage a wrapper script separately from the VM. Also, don't forget that to use <qemu:commandline> you need to first enable the QEMU namespace in the XML by updating the first line in the domain XML to:</div>
</div>
<div style="font-family: Arial, Helvetica, sans-serif;">
<br /></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'></span></div>
<div>
<br /></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Otherwise libvirt will promptly discard the extra options when you save the domain.</span></div>
Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com1tag:blogger.com,1999:blog-8694303781453133223.post-77537275914723688122016-09-26T16:15:00.000-06:002016-09-26T16:15:24.328-06:00"Intel-IOMMU: enabled": It doesn't mean what you think it means<span style="font-family: Arial, Helvetica, sans-serif;">A quick post just because I keep seeing this in practically every how-to guide I come across. The instructions grep dmesg for "IOMMU" and come up with either "</span><span style="color: #111111; font-size: inherit; white-space: pre-wrap;"><span style="font-family: Arial, Helvetica, sans-serif;">Intel-IOMMU: enabled"</span><span style="font-family: Source Code Pro, monospace, sans-serif;"> </span></span><span style="color: #111111; font-size: inherit; white-space: pre-wrap;"><span style="font-family: Arial, Helvetica, sans-serif;">or "</span></span><span style="color: #111111; font-family: Arial, Helvetica, sans-serif;"><span style="white-space: pre-wrap;">DMAR: IOMMU enabled". Clearly that means it's enabled, right? Wrong. That line comes from a __setup() function that parses the options for "intel_iommu=". Nothing has been done at that point, not even a check to see if VT-d hardware is present. Pass intel_iommu=on as a boot option to an AMD system and you'll see this line. Yes, this is clearly not a very intuitive message. So for the record, the mouthful that you should be looking for is this line:</span></span><br />
<span style="color: #111111; font-family: Arial, Helvetica, sans-serif;"><span style="white-space: pre-wrap;"><br /></span></span>
<span style="color: #111111; font-family: Arial, Helvetica, sans-serif;"><span style="white-space: pre-wrap;">DMAR: Intel(R) Virtualization Technology for Directed I/O</span></span><br />
<span style="color: #111111; font-family: Arial, Helvetica, sans-serif;"><span style="white-space: pre-wrap;"><br /></span></span>
<span style="color: #111111; font-family: Arial, Helvetica, sans-serif;"><span style="white-space: pre-wrap;">or on older kernels the prefix is different:</span></span><br />
<span style="color: #111111; font-family: Arial, Helvetica, sans-serif;"><span style="white-space: pre-wrap;"><br /></span></span>
<span style="color: #111111; font-family: Arial, Helvetica, sans-serif;"><span style="white-space: pre-wrap;">PCI-DMA: Intel(R) Virtualization Technology for Directed I/O</span></span><br />
<span style="color: #111111; font-family: Arial, Helvetica, sans-serif;"><span style="white-space: pre-wrap;"><br /></span></span>
<span style="color: #111111; font-family: Arial, Helvetica, sans-serif;"><span style="white-space: pre-wrap;">When you see this, you're pretty much past all the failure points of initializing VT-d. FWIW, the "DMAR" flavors of the above appeared in v4.2, so on a more recent kernel, that's your better option.</span></span>Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com1tag:blogger.com,1999:blog-8694303781453133223.post-64737912094059153272016-09-01T13:07:00.000-06:002016-09-01T13:07:25.050-06:00And now you're an expert<span style="font-family: Arial, Helvetica, sans-serif;">Video from my KVM Forum 2016 talk:</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/WFkdTFTOTpA/0.jpg" src="https://www.youtube.com/embed/WFkdTFTOTpA?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com0tag:blogger.com,1999:blog-8694303781453133223.post-14665580218394308462016-08-24T14:46:00.002-06:002016-08-24T14:46:45.487-06:00KVM Forum 2016 - An Introduction to PCI Device Assignment with VFIO<span style="font-family: Arial, Helvetica, sans-serif;">Slides available here:</span><div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><a href="http://awilliam.github.io/presentations/KVM-Forum-2016">http://awilliam.github.io/presentations/KVM-Forum-2016</a></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Video to come</span></div>
Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com0tag:blogger.com,1999:blog-8694303781453133223.post-58376178494919393382016-07-15T16:34:00.001-06:002016-07-15T16:34:46.249-06:00Intel Graphics assignment<span style="font-family: Arial, Helvetica, sans-serif;">Hey folks, it feels like it's time to mention that assignment of Intel graphics devices (IGD) is currently available in qemu.git and will be part of the upcoming QEMU 2.7 release. There's already pretty thorough documentation of the modes available in the <a href="http://git.qemu.org/?p=qemu.git;a=blob;f=docs/igd-assign.txt">source tree</a>, please give it a read. There are two modes described there, "legacy" and "Universal Passthrough" (UPT), each have their pros and cons. Which ones are available to you depends on your hardware. UPT mode is only available for Broadwell and newer processors while legacy mode is available all the way back through SandyBridge. If you have a processor older than SandyBridge, stop now, this is not going to work for you. If you don't know what any of these strange names mean, head to <a href="https://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures">Wikipedia</a> and <a href="http://ark.intel.com/">Ark</a> to figure it out.</span><div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">The high level overview is that "legacy" mode is much like our GeForce support, the IGD is meant to be the primary and exclusive graphics in the VM. Additionally the IGD address in the VM must be at PCI 00:02.0, only Seabios is currently supported, only the 440FX chipset model is supported (no Q35), the IGD device must be the primary host graphics device, and the host needs to be running kernel v4.6 or newer. Clearly assigning the host primary graphics is a bit of an about-face for our GPU assignment strategy, but we depend on running the IGD video ROM, which depends on VGA and imposes most of the above requirements as well (oh add CONFIG_VFIO_PCI_VGA to the requirements list). I have yet to see an IGD ROM with UEFI support, which is why OVMF is not yet supported, but seems possible to support with a CSM and some additional code in OVMF.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Legacy mode should work with both Linux and Windows guests (and hopefully others if you're so inclined). The i915 driver does suffer from the typical video driver problem that sometimes the whole system explodes (not literally) when unbinding or re-binding the IGD to the driver. Personally I avoid this by blacklisting the i915 driver. Of course as some have found out trying to do this with discrete GPUs, there are plenty of other drivers ready to jump on the device to keep the console working. The primary ones I've seen are </span><span style="font-family: Courier New, Courier, monospace;">vesafb</span><span style="font-family: Arial, Helvetica, sans-serif;"> and </span><span style="font-family: Courier New, Courier, monospace;">efifb</span><span style="font-family: Arial, Helvetica, sans-serif;">, which one is used on your system depends on your host firmware settings, legacy BIOS vs UEFI respectively. To disable these, simply add </span><span style="font-family: Courier New, Courier, monospace;">video=vesafb:off</span><span style="font-family: Arial, Helvetica, sans-serif;"> or </span><span style="font-family: Courier New, Courier, monospace;">video=efifb:off</span><span style="font-family: Arial, Helvetica, sans-serif;"> to the kernel command line (not sure which to use? try both, </span><span style="font-family: Courier New, Courier, monospace;">video=vesafb:off,efifb:off</span><span style="font-family: Arial, Helvetica, sans-serif;">). The first thing you'll notice when you boot an IGD system with i915 blacklisted and the more basic framebuffer drivers disabled is that you don't get anything on the graphics head after grub. Plan for this. I use a serial console, but perhaps you're more comfortable running blind and willing to hope the system boots and you can ssh into it remotely.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">If you've followed along with this procedure, you should be able to simply create a </span><span style="font-family: Courier New, Courier, monospace;"><hostdev> </span><span style="font-family: Arial, Helvetica, sans-serif;">entry in your libvirt XML, which ought to look something like this:</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> <hostdev mode='subsystem' type='pci' managed='yes'></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> <driver name='vfio'/></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> <source></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> <address domain='0x0000' bus='0x00' slot='0x02' function='0x0'/></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> </source></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> <alias name='hostdev0'/></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> </hostdev></span></div>
<div style="font-family: Arial, Helvetica, sans-serif;">
<br /></div>
</div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Again, assigning the IGD device (which is always 00:02.0) to address 00:02.0 in the VM is required. Delete the </span><span style="font-family: Courier New, Courier, monospace;"><video></span><span style="font-family: Arial, Helvetica, sans-serif;"> and </span><span style="font-family: Courier New, Courier, monospace;"><graphics></span><span style="font-family: Arial, Helvetica, sans-serif;"> sections and everything </span><i style="font-family: Arial, Helvetica, sans-serif;">should</i><span style="font-family: Arial, Helvetica, sans-serif;"> just magically work. Caveat emptor, my newest CPU is Broadwell, I've been told this works with Skylake, but IGD is hardly standardized and each new implementation seems to tweak things just a bit.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Some of you are probably also curious why this doesn't work on Q35, which leads into the discussion of UPT mode; IGD clearly is not a discrete GPU, but "integrated" not only means that the GPU is embedded in the system, in this case it means that the GPU is kind of smeared across the system. This is why IGD assignment hasn't "just worked" and why you need a host kernel with support for exposing certain regions through vfio and a BIOS that's aware of IGD, and it needs to be at a specific address, etc, etc, etc. One of those requirements is that the video ROM actually also cares about a few properties of the device at PCI address 00:1f.0, the ISA/LPC bridge. Q35 includes its own bridge at that location and we cannot simply modify the IDs of that bridge for compatibility reasons. Therefore that bridge being an implicit part of Q35 means that IGD assignment doesn't work on Q35. This also means that PCI address 00:1f.0 is not available for use in a 440FX machine.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Ok, so UPT. Intel has known for a while that the sprawl of IGD has made it difficult to deal with for device assignment. To combat this, both software and hardware changes have been made that help to consolidate IGD to be more assignment-friendly. Great news, right? Well sort of. First off, in UPT mode the IGD is meant to be a <i>secondary</i> graphics device in the VM, there's no VGA mode support (oh, BTW, x-vga=on is automatically added by QEMU in legacy mode). In fact, um, there's no output support of any kind by default in UPT mode. How's this useful you ask, well between the emulated graphics and IGD you can setup mirroring so you actually have a remote-capable, hardware accelerated graphics VM. Plus, if you add the option </span><span style="font-family: Courier New, Courier, monospace;">x-igd-opregion=on</span><span style="font-family: Arial, Helvetica, sans-serif;"> to the vfio-pci device, you can get output to a physical display, but there again you're going to need the host running kernel v4.6 or newer and the upcoming QEMU 2.7 support, while no-output UPT has probably actually worked for quite a while. UPT mode has no requirements for the IGD PCI address, but note that most VM firmare, SeaBIOS or OVMF, will define the primary graphics as the one having the lowest PCI address. Usually not a problem, but some of you create some crazy configs. You'll also still need to do all the blacklisting and video disabling above, or just risk binding and unbinding i915 from the host, gambling each time whether it'll explode.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">So UPT sounds great except why is this opregion thing optional? Well, it turns out that if you want to do that cool mirroring thing I mention above and a physical output is enabled with the opregion, you actually need to have a monitor attached to the device or else your apps don't get any hardware acceleration love. Whereas if IGD doesn't know about any outputs, it's happy to apply hardware acceleration regardless of what's physically connected. Sucks, but readers here should already know how to create wrapper scrips to add this extra option if they want it (similar to <a href="http://vfio.blogspot.com/2015/05/vfio-gpu-how-to-series-part-5-vga-mode.html">x-vga=on</a>). I don't think Intel really wants to support this hacky hybrid mode either, thus the experimental x- option prefix tag.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Oh, one more gotcha for UPT mode, Intel seems to expect otherwise, but I've had zero success trying to run Linux guests with UPT. Just go ahead and assume this is for your Windows guests only at this point.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">What else... laptop displays should work, I believe switching outputs even works, but working on laptops is rather inconvenient since you're unlikely to have a serial console available. Also note that while you can use input-linux to attach a laptop keyboard and mouse (not trackpad IME), I don't know how to make the hotkeys work, so that's a bummer. Some IGD devices will generate DMAR error spew on the host when assigned, particularly the first time per host boot. Don't be too alarmed by this, especially if it stops before the display is initialized. This seems to be caused by resetting the IGD in an IOMMU context where it can't access its page tables setup by the BIOS/host. Unless you have an ongoing spew of these, they can probably be ignored. If you have something older than SandyBridge that you wish you could use this with and continued reading even after told to stop, sorry, there was a hardware change at SandyBridge and I don't have anything older to test with and don't really want to support additional code for such outdated hardware. Besides, those are pretty old and you need an excuse for an upgrade anyway.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">With this support I've switched my desktop system so that the host actually runs from a USB stick and the previous bare-metal Fedora install is virtualized with IGD, running alongside my existing GeForce VM. Give it a try and good luck.</span></div>
Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com8tag:blogger.com,1999:blog-8694303781453133223.post-17813281769246663242016-01-03T15:11:00.000-07:002016-01-03T15:11:06.569-07:00Comments on the 7 Gamers, 1 CPU video<span style="font-family: Arial, Helvetica, sans-serif;">In case you've seen this video:</span><div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/LXOaCkbt4lI/0.jpg" src="https://www.youtube.com/embed/LXOaCkbt4lI?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">And you're thinking to yourself that the R9 Nano they used looks like a great choice for your own GPU assignment build, think again. This GPU is known to have reset issues, so while it's impressive to see a build with this degree of consolidation, you should be suspicious about the problems Linus alludes to and the limited functionality of the overall system shown in the video. For instance, does rebooting a VM require a host reboot, or perhaps a manual soft eject of the GPU from the VM? We see this a lot with newer AMD cards, well beyond the partial workarounds we have for Bonaire and Hawaii based GPUs.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Personally I would have preferred to see an NVIDIA based solution, but due to the scale of this build, and unique slot and power restrictions, the compatibility with GPU assignment was mostly an afterthought. NVIDIA is not without issues, but for the time being we understand those issues and have workarounds for them, and even a path for supported configurations with Quadro cards.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">On the plus side, yes, this is using KVM and VFIO and it's an impressive example of what this technology can do. However, when you're spec'ing your own build, do your own research and don't rely on videos like this to choose your components. My 2 cents...</span></div>
Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com26tag:blogger.com,1999:blog-8694303781453133223.post-18663067501683481512015-10-23T09:20:00.001-06:002016-10-15T12:21:07.464-06:00Intel processors with ACS support<span style="font-family: "arial" , "helvetica" , sans-serif;">If you've been keeping up with this blog then you understand a bit about IOMMU groups and device isolation. In my howto series I describe the limitations of the Xeon E3 processor that I use in my example system and recommend Xeon E5 or higher processors to provide the best case device isolation for those looking to build a system. Well, thanks to the vfio-users mailing list, it has come to my attention that there are in fact Core i7 processors with PCIe Access Control Services (ACS) support on the processor root ports.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br />
</span> <span style="font-family: "arial" , "helvetica" , sans-serif;">Intel lists these processors as <a href="http://ark.intel.com/products/family/79318/Intel-High-End-Desktop-Processors">High End Desktop Processors</a>, they include <a href="http://www.intel.com/content/www/us/en/processors/core/core-i7-lga2011-3-datasheet-vol-2.html">Socket 2011-v3 </a><a href="http://ark.intel.com/products/codename/79427/Haswell-E">Haswell E</a> processors, <a href="http://www.intel.com/content/www/us/en/processors/core/4th-gen-core-i7-lga2011-datasheet-vol-2.html">Socket 2011</a> <a href="http://ark.intel.com/products/codename/67456/Ivy-Bridge-E">Ivy Bridge E</a> processors, and <a href="http://www.intel.com/content/www/us/en/processors/core/core-i7-lga-2011-datasheet-vol-2.html">Socket 2011</a> <a href="http://ark.intel.com/products/codename/63378/Sandy-Bridge-E">Sandy Bridge E</a> processors. The linked datasheets for each family clearly lists ACS register capabilities. Current listings for these processors include:</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br />
</span> <span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Haswell-E (LGA2011-v3)</b></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">i7-5960X (8-core, 3/3.5GHz)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">i7-5930K (6-core, 3.2/3.8GHz)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">i7-5820K (6-core, 3.3/3.6GHz)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br />
</span> <span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Ivy Bridge-E (LGA2011)</b></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">i7-4960X (6-core, 3.6/4GHz)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">i7-4930K (6-core, 3.4/3.6GHz)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">i7-4820K (4-core, 3.7/3.9GHz)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br />
</span> <span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Sandy Bridge-E (LGA2011)</b></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">i7-3960X (6-core, 3.3/3.9GHz)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">i7-3970X (6-core, 3.5/4GHz)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">i7-3930K (6-core, 3.2/3.8GHz)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">i7-3820 (4-core, 3.6/3.8GHz)</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">These also appear to be the only Intel Core processors compatible with Socket 2011 and 2011-v3 found on X79 and X99 motherboards, so basing your platform around these chipsets will hopefully lead to success. My recommendation is based only on published specs, not first hand experience though, so your mileage may vary.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br />
</span> <span style="font-family: "arial" , "helvetica" , sans-serif;">Unfortunately there are not yet any Skylake based "High End Desktop Processors" and from what we've seen on the mailing list, Skylake does not implement ACS on processor root ports, nor do we have quirks to enable isolation on the Z170 PCH root ports (which include a read-only ACS capability, effectively confirming lack of isolation), and integrated I/O devices on the motherboard exhibit poor grouping with system management components (aside from the onboard I219 LOM, which we do have quirked). This makes the currently available Skylake platforms a really bad choice for doing device assignment.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br />
</span> <span style="font-family: "arial" , "helvetica" , sans-serif;">Based on this new data, I'll revise my recommendation for Intel platforms to include Xeon E5 and higher processors or Core i7 High End Desktop Processors (as listed by Intel). Of course there are combinations where regular Core i5, i7 and Xeon E3 processors will work well, we simply need to be aware of their limitations and factor that into our system design.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br />
</span> <span style="font-family: "arial" , "helvetica" , sans-serif;">EDIT (Oct 30 04:18 UTC 2015)</span><span style="font-family: "arial" , "helvetica" , sans-serif;">: A more subtle feature also found in these E series processors is support for IOMMU super pages. The datasheets for the E5 Xeons and these High End Desktop Processors indicate support for 2MB and 1GB IOMMU pages while the standard Core i5 and i7 only support 4KB pages. This means less space wasted for the IOMMU page tables, more efficient table walks by the hardware, and less thrashing of the I/O TLB under I/O load resulting in I/O stalls. Will you notice it? Maybe. VFIO will take advantage of IOMMU super pages any time we find a sufficiently sized range of contiguous pages. To help insure this happens, make use of hugepages in the VM.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">EDIT (Oct 15 18:20 UTC 2016): Intel Broadwell-E processors have been out for some time and as we'd expect, the datasheets do indicate that ACS is supported. So add to the list above:</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Broadwell-E (LGA2011-v3)</b></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">i7-6950X (10-core, 3.0/3.5GHz)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">i7-6900K (8-core, 3.2/3.7GHz)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">i7-6850K (6-core, 3.6/3.8GHz)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">i7-6800K (6-core, 3.4/3.6GHz)</span>Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com19tag:blogger.com,1999:blog-8694303781453133223.post-8684431194902100032015-09-02T09:50:00.000-06:002015-09-02T09:50:08.680-06:00libvirt 1.2.19 - session mode device assignment<span style="font-family: Arial, Helvetica, sans-serif;">Just a quick note to point out a feature, actually bug fix, in the new libvirt 1.2.19 <a href="https://libvirt.org/news.html" target="_blank">release</a>:</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Courier New, Courier, monospace;">hostdev: skip ACS check when using VFIO for device assignment (Laine Stump)</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">Why is this noteworthy? Well, that ACS checking required access to PCI config space beyond the standard header, which is privileged. That means that session (ie. user) mode libvirt couldn't do it and failed trying to support </span><span style="font-family: Courier New, Courier, monospace;"><hostdev></span><span style="font-family: Arial, Helvetica, sans-serif;"> entries. Now that libvirt recognizes that vfio enforces device isolation and a userspace ACS test is unnecessary, session mode libvirt can support device assignment! Thanks Laine!</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">Note that a user still can't just pluck a device from the host and start using it, that's still privileged. There's also the problem that a VM making use of device assignment needs to lock all of the VM memory into RAM, which is typically quite a lot more than the standard user locked memory limit of 64kB. But these can be resolved by enabling the (trusted) user to lock memory sufficient for their VM and preparing the device for the user. The keys to doing this are:</span><br />
<ol>
<li><span style="font-family: Arial, Helvetica, sans-serif;">Use </span><span style="font-family: Courier New, Courier, monospace;">/etc/security/limits.conf</span><span style="font-family: Arial, Helvetica, sans-serif;"> to increase </span><span style="font-family: Courier New, Courier, monospace;">memlock</span><span style="font-family: Arial, Helvetica, sans-serif;"> for the desired user</span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;">Pre-bind the desired device to vfio-pci, either by the various mechanisms provided in other posts or simply using </span><span style="font-family: Courier New, Courier, monospace;">virsh nodedev-detach</span><span style="font-family: Arial, Helvetica, sans-serif;">.</span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;">Change the ownership of the vfio group to that of the (trusted) user. To determine the group, follow the links in sysfs or use </span><span style="font-family: Courier New, Courier, monospace;">virsh nodedev-dumpxml</span><span style="font-family: Arial, Helvetica, sans-serif;">, for example:</span></li>
</ol>
<pre><code> $ virsh nodedev-dumpxml pci_0000_00_19_0
<device>
<name>pci_0000_00_19_0</name>
<path>/sys/devices/pci0000:00/0000:00:19.0</path>
<parent>computer</parent>
<driver>
<name>e1000e</name>
</driver>
<capability type='pci'>
<domain>0</domain>
<bus>0</bus>
<slot>25</slot>
<function>0</function>
<product id='0x1502'>82579LM Gigabit Network Connection</product>
<vendor id='0x8086'>Intel Corporation</vendor>
<iommuGroup number='4'>
<address domain='0x0000' bus='0x00' slot='0x19' function='0x0'/>
</iommuGroup>
</capability>
</device>
</code></pre>
<blockquote class="tr_bq">
<code><span style="font-family: Arial, Helvetica, sans-serif;">The </span><span style="font-family: Courier New, Courier, monospace;">iommuGroup</span><span style="font-family: Arial, Helvetica, sans-serif;"> sections tells us that this is group number 4, so permissions need to be set on </span><span style="font-family: Courier New, Courier, monospace;">/dev/vfio/4</span><span style="font-family: Arial, Helvetica, sans-serif;">. As always, also note the set of devices within this group and ensure that all endpoints listed are either bound to </span><span style="font-family: Courier New, Courier, monospace;">vfio-pci</span><span style="font-family: Arial, Helvetica, sans-serif;"> or </span><span style="font-family: Courier New, Courier, monospace;">pci-stub</span><span style="font-family: Arial, Helvetica, sans-serif;">, the former will allow the user access to the device, the latter will allow the group to be usable without explicitly allowing the user access.</span></code></blockquote>
<span style="font-family: Arial, Helvetica, sans-serif;">Enjoy! </span>Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com9tag:blogger.com,1999:blog-8694303781453133223.post-58870277897241529432015-08-11T10:39:00.003-06:002015-08-11T10:39:50.710-06:00vfio-users mailing list<span style="font-family: Arial, Helvetica, sans-serif;">The Arch Linux VGA thread was recently closed leaving a number of users looking for a place to continue the conversation. To facilitate that, I've created a vfio-users mailing list for discussion of all topics related to vfio. That includes QEMU device assignment use cases, such as VGA/GPU, as well as userspace drivers. Though hosted by Red Hat, this is a distribution independent forum intended to further vfio and its use cases as a technology. Please be respectful of each other and keep topics relevant to the forum. Sign up <a href="https://www.redhat.com/mailman/listinfo/vfio-users" target="_blank">here</a></span>Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com12tag:blogger.com,1999:blog-8694303781453133223.post-7898668417785912742015-05-05T19:02:00.001-06:002015-05-05T19:02:14.733-06:00VFIO GPU How To series, part 5 - A VGA-mode, SeaBIOS VM<span style="font-family: Arial, Helvetica, sans-serif;">For this example I'm show how to setup a VGA-mode VM using a Windows 7 guest and SeaBIOS. In this case we are not dependent on the guest or the graphics card supporting UEFI, but with Intel host graphics, we do need to work around the Linux i915 driver's broken participation in VGA arbitration. If you intend to use IGD for the host, the typical solution for this is to apply the i915 patch to the host kernel and use the </span><span style="font-family: Courier New, Courier, monospace;">enable_hd_vgaarb=1</span><span style="font-family: Arial, Helvetica, sans-serif;"> option for the i915 driver to make it correctly participate in VGA arbitration. The side-effect of doing this is the loss of DRI support in the host Xorg server.</span><br />
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">However, since this is just a demo, I'm going to be lazy and modify my script from part 4 of this series to comment out the test for boot_vga being non-zero such that vfio-pci will be set as the driver override for all VGA devices in the system. This limits my system to text-mode on the primary head for this example. For a long term solution, I would either be looking for a UEFI/OVMF setup as outlined in part 4, or disabling IGD and using discrete graphics for the host. Continuing to patch kernels for i915 VGA arbitration support is of course an option too, if you enjoy that sort of thing. If you're not using IGD on the host, or like my example, avoiding the i915 driver, you should be ok.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Windows 7 VM installation is largely the same as Windows 8 installation in part 4. The difference is that we're not going to modify the VM firmware to use UEFI, we'll leave that set to BIOS. The other difference is that BIOS won't drop to a shell and allow us to manually boot from the install CD after we've broken the boot order by adding the virtio drivers. My brute force solution to this is to attempt to boot the VM, it will error and say no OS found, force the VM off, then modify the boot options to push the Windows CDROM above the virtio disk and start the VM again. I expect that you could also enable the boot menu on the first pass and try to catch the F12 prompt to select the correct boot device.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">After installation, update, and installing TightVNC, as in part 4, shut down the VM, prune and tune the VM using </span><span style="font-family: Courier New, Courier, monospace;">virsh edit</span><span style="font-family: Arial, Helvetica, sans-serif;">, just as we did in the previous setup. If you're using Nvidia also add the KVM hidden section to features, removing the Hyper-V section and disable the Hyper-V clock source. Also return to virt-manager and add the GPU and audio function to the VM. All of this is documented in part 4.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Before you start the VM with the GPU, we first need to make a wrapper script around our qemu-kvm binary to insert the x-vga=on option. To do this, create a file named </span><span style="font-family: Courier New, Courier, monospace;">/usr/libexec/qemu-kvm.vga</span><span style="font-family: Arial, Helvetica, sans-serif;"> with the following:</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace;">#!/bin/sh</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">exec /usr/libexec/qemu-kvm \</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>`echo "\$@" | \</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>sed 's|01:00.0|01:00.0,x-vga=on|g' | \</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>sed 's|02:00.0|02:00.0,x-vga=on|g'`</span></div>
<div style="font-family: Arial, Helvetica, sans-serif;">
<br /></div>
</div>
<div style="font-family: Arial, Helvetica, sans-serif;">
We'll use this as the executable libvirt uses in place of qemu-kvm directly. Any time the qemu-kvm command line contains 01:00.0 or 02:00.0, which are the PCI addresses of the graphics cards in my system, we'll add the option ",x-vga=on". This makes it transparent to libvirt that this is happening. Be sure to chmod 755 the file before proceeding.</div>
<div style="font-family: Arial, Helvetica, sans-serif;">
<br /></div>
<div style="font-family: Arial, Helvetica, sans-serif;">
If your system is using selinux, libvirt will get an audit error trying to execute this script, so we'll need to add a new selinux module to allow for this. Red Hat documentation <a href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Technical_Notes/virt.html" target="_blank">here</a> provides instructions for doing this. In summary:</div>
<div style="font-family: Arial, Helvetica, sans-serif;">
<br /></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Set the selinux permissions for the file:</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"># restorecon /usr/libexec/qemu-kvm.vga</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Create an selinux module</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"># cat > qemukvmvga.te << EOF</span></div>
<div>
<div style="font-family: 'Courier New', Courier, monospace;">
policy_module(qemukvmvga, 1.0)</div>
<div style="font-family: 'Courier New', Courier, monospace;">
<br /></div>
<div style="font-family: 'Courier New', Courier, monospace;">
gen_require(\`</div>
<div style="font-family: 'Courier New', Courier, monospace;">
attribute virt_domain;</div>
<div style="font-family: 'Courier New', Courier, monospace;">
type qemu_exec_t;</div>
<div style="font-family: 'Courier New', Courier, monospace;">
')</div>
<div style="font-family: 'Courier New', Courier, monospace;">
<br /></div>
<div style="font-family: 'Courier New', Courier, monospace;">
can_exec(virt_domain, qemu_exec_t)</div>
<div style="font-family: 'Courier New', Courier, monospace;">
EOF</div>
<div style="font-family: 'Courier New', Courier, monospace;">
<br /></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Build the selinux module</span></div>
<div style="font-family: 'Courier New', Courier, monospace;">
# make -f /usr/share/selinux/devel/Makefile</div>
<div style="font-family: 'Courier New', Courier, monospace;">
<br /></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Install selinux module</span></div>
<div style="font-family: 'Courier New', Courier, monospace;">
# semodule -i qemukvmvga.pp</div>
</div>
<div style="font-family: 'Courier New', Courier, monospace;">
<br /></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">With selinux happy, we next need to run </span><span style="font-family: Courier New, Courier, monospace;">virsh edit</span><span style="font-family: Arial, Helvetica, sans-serif;"> on the domain again. Find the </span><span style="font-family: Courier New, Courier, monospace;"><emulator></span><span style="font-family: Arial, Helvetica, sans-serif;"> tag and update the executable to point to our new script. Save and exit the configuration and you should now be able to start the VM from virt-manager with VGA-mode enabled. The same driver installation guidelines from part 4 apply for the GeForce/Catalyst drivers. Enjoy.</span></div>
Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com12tag:blogger.com,1999:blog-8694303781453133223.post-80319101939956651352015-05-05T17:16:00.000-06:002015-05-05T17:16:09.668-06:00VFIO GPU How To series, part 4 - Our first VM<span style="font-family: Arial, Helvetica, sans-serif;">At this point in the series you should have a system capable of device assignment and properly configured to sequester at least the GPU from the host for assignment to a guest. You should already have your distribution packages installed for QEMU/KVM and libvirt, including virt-manager. In this article we'll cover installation and configuration of the guest for UEFI capable graphics card and UEFI capable guest. This is the configuration I'd recommend for most users as it's the most directly supported. Unless you absolutely cannot upgrade your graphics card or guest OS, this is the configuration most users should aim for.</span><br />
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">I'll be using the hardware configuration discussed in part 1 of this series along with Windows 8.1 (64 bit) as my guest operating system. Both my EVGA GTX 750 and Radeon HD8570 OEM support UEFI as determined <a href="http://vfio.blogspot.com/2014/08/does-my-graphics-card-rom-support-efi.html" target="_blank">here</a>, and I'll cover the details unique to each. Each GPU is connected via HDMI to a small LED TV, which will also be my primary target for audio output. Perhaps we'll discuss in future articles the modifications necessary for host-based audio.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">The first step is to start virt-manager, which you should be able to do from your host desktop. If your host doesn't have a desktop, virt-manager can connect remotely over ssh. We'll mostly be using virt-manager for installation and setup, and maybe future maintenance. After installation you can set the VM to auto start or start and stop it with the command line virsh tools. If you've never run virt-manager before, you probably want to start with configuring your storage and networking Either by Edit->Connection Details or right click on the connection and selecting Details in the drop-down. I have a mounted filesystem setup to store all of my VM images and NFS mounts for ISO storage. You can also configure full physical devices, volume groups, iSCSI, etc. Volume groups are another favorite of mine to be able to have full flexibility and performance. libvirt's default will be to create VM images in /var/lib/libvirt/images, so it's also fine if you simply want to mount space there.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Next you'll want to click over to the Network Interfaces tab. If you want the host and guest to be able to communicate over the network, which is necessary if you want to use tools like Synergy to share a mouse and keyboard, then you'll want to create a bridge device and add your primary host network to the bridge. If host-guest communication is not important for you, you can simply use a macvtap connection when configuring the VM networking. This may be perfectly acceptable if you're creating a multi-seat system and providing mouse and keyboard via some other means (USB passthrough for host controller assignment).</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Now we can create our new VM:</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3FWroMoWsb7rUg4UOqw4HBGJlIrztrxVvWZRhbeK6cpXxWGKk96DZ9aMiog7WBI8seajHRS94gnHp8CahyWjQnOpEd-1wy5TxYcAuJEewtUmC0zn2CY_a7sPDbt4ajfdt-K51GXiEZqwV/s1600/new-vm1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3FWroMoWsb7rUg4UOqw4HBGJlIrztrxVvWZRhbeK6cpXxWGKk96DZ9aMiog7WBI8seajHRS94gnHp8CahyWjQnOpEd-1wy5TxYcAuJEewtUmC0zn2CY_a7sPDbt4ajfdt-K51GXiEZqwV/s320/new-vm1.jpg" height="320" width="309" /></a></div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">My ISOs are stored on an NFS mount that I already configured, so I use the Local install media option. Clicking Forward brings us to the next dialog where I select my ISO image location and select the OS type and version:</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrDnfiQG9gMPYRAjc78oG9i227aE7T65mkh2EL5OktMrNd6bUeeg5MWyUtq3qiim8kGfoKZdmr4uog-lMVkL97q4ycGRt0_Mg3mtEQ6X0Bx-mpHw5diY1GuIVsIBcTqUUpBmEmNC3zmoqY/s1600/new-vm2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrDnfiQG9gMPYRAjc78oG9i227aE7T65mkh2EL5OktMrNd6bUeeg5MWyUtq3qiim8kGfoKZdmr4uog-lMVkL97q4ycGRt0_Mg3mtEQ6X0Bx-mpHw5diY1GuIVsIBcTqUUpBmEmNC3zmoqY/s320/new-vm2.jpg" height="320" width="308" /></a></div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">This allows libvirt to pre-configure some defaults for the VM, some of which we'll change as we move along. Stepping forward again, we can configure the VM RAM size and number of vCPUS:</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5t26Yaz36lPizMZOi9A7Kfm9JUtgiR3q7Gdc4K_ejMAlWK52xpOuAFs2ZW9h9sXr9p9ryIE-60yGZ4ZiWXbbNpb-0wdG8mdUUq1hNGm4p-MuM1SjpyNsmuEaCmDAK2kL1eaQUsRZAxvMS/s1600/new-vm3.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5t26Yaz36lPizMZOi9A7Kfm9JUtgiR3q7Gdc4K_ejMAlWK52xpOuAFs2ZW9h9sXr9p9ryIE-60yGZ4ZiWXbbNpb-0wdG8mdUUq1hNGm4p-MuM1SjpyNsmuEaCmDAK2kL1eaQUsRZAxvMS/s320/new-vm3.jpg" height="320" width="302" /></a></div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">For my example VM I'll use the defaults. These can be changed later if desired. Next we need to create a disk for the new VM:</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Arial, Helvetica, sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgROHw6guLy7_hv5S2UJ39euj5rbrDTM-8t06BQKSS8uqr0DAktdR3L2wwT5MS332nbf7rgW-jpBfMuw48pKE39gNJ8GEITUDgxailb-l7sg1Jzhi6kr-BOLkgqEkIkoppkFTg4lqboYyBH/s1600/new-vm5.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgROHw6guLy7_hv5S2UJ39euj5rbrDTM-8t06BQKSS8uqr0DAktdR3L2wwT5MS332nbf7rgW-jpBfMuw48pKE39gNJ8GEITUDgxailb-l7sg1Jzhi6kr-BOLkgqEkIkoppkFTg4lqboYyBH/s320/new-vm5.jpg" height="320" width="302" /></a></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">The first radio button will create the disk with an automatic name in the default storage location, the second radio button allows you to name the image and specify the type. Generally I therefore always select the second option. For my VM, I've created a disk with these parameters:</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Arial, Helvetica, sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiw7p6fQQKWju_JmkLnEtLoj5493hwsNl8_5pTvDCnDr3p0gnSsYiLJy4iyfpkkH6eR0fK_4jZj3Gw6g9NcDYfoNh_zkPVZI6Limo61wlX7LvQgdGM5cT0UQaZPj6CihurQuPCc1COqDUR/s1600/new-vm4.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiw7p6fQQKWju_JmkLnEtLoj5493hwsNl8_5pTvDCnDr3p0gnSsYiLJy4iyfpkkH6eR0fK_4jZj3Gw6g9NcDYfoNh_zkPVZI6Limo61wlX7LvQgdGM5cT0UQaZPj6CihurQuPCc1COqDUR/s320/new-vm4.jpg" height="320" width="312" /></a></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">In this case I've created a 50GB, sparse, raw image file. Obviously you'll need to size the disk image based on your needs. 50GB certainly doesn't leave much room for games. You can also choose whether to allocate the entire image now or let it fault in on demand. There's a little extra overhead in using a sparse image, so if space saving isn't a concern, allocate the entire disk. I would also generally only recommend a qcow format if you're looking for the space saving or snap-shotting features provided by qcow. Otherwise raw provide better performance for a disk image based VM</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">On the final step of the setup, we get to name our VM and take an option I generally use (and we must use for OVMF), and select to customize the VM before install:</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Arial, Helvetica, sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFzAt7mSBE59VlVcwkZ2hdxtdE0VpIq0ozgn5y18nOFLQGWQl09lOFBuj9KYvFRrefmU1oZyIkd4royosI2gUbng2JsJ4hykNIaG7Jh4S8AHXQzYUGlrTH-koMrDbQebdDFoe9zpW-L1hyphenhyphen/s1600/new-vm6.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFzAt7mSBE59VlVcwkZ2hdxtdE0VpIq0ozgn5y18nOFLQGWQl09lOFBuj9KYvFRrefmU1oZyIkd4royosI2gUbng2JsJ4hykNIaG7Jh4S8AHXQzYUGlrTH-koMrDbQebdDFoe9zpW-L1hyphenhyphen/s320/new-vm6.jpg" height="320" width="290" /></a></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">Clicking Finish here brings up a new dialog, where in the overview we need to change our Firmware option from BIOS to UEFI:</span></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Arial, Helvetica, sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh13alUVWqZ1PgT2rIVw8p2I3OYdasDi1EXwhFTHE5-d7Ib-sGu201AUwsB2JTB0e33MgwHHTiBnBbYzlhCsvRSCpMakUBO7Q-ZIK7nFubYk9Ij9hDvk_gJ18Z25djcN9-f0LFsyqu65uEN/s1600/new-vm7.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh13alUVWqZ1PgT2rIVw8p2I3OYdasDi1EXwhFTHE5-d7Ib-sGu201AUwsB2JTB0e33MgwHHTiBnBbYzlhCsvRSCpMakUBO7Q-ZIK7nFubYk9Ij9hDvk_gJ18Z25djcN9-f0LFsyqu65uEN/s640/new-vm7.jpg" height="490" width="640" /></a></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">If UEFI is not available, your libvirt and virt manager tools may be too old. Note that I'm using the default i440FX machine type, which I recommend for all Windows guests. If you absolutely must use Q35, select it here and complete the VM installation, but you'll later need to edit the XML and use a wrapper script around qemu-kvm to get a proper configuration until libvirt support for Q35 improves. Select Apply and move down to the Processor selection:</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Arial, Helvetica, sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgS5UT_Xv9o4LVI5mh36do8OwIxkb7w7v9pHiNhVYy1GXXUa2rBe_JSpobpWpUHRCIqVNd4rFVNJN1Y_IAarsADh8_Eu_lvAmNJ7uUXPygM-TPa350vlsS85R6ATFwMdGee5JFWjf60Tjqb/s1600/new-vm8.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgS5UT_Xv9o4LVI5mh36do8OwIxkb7w7v9pHiNhVYy1GXXUa2rBe_JSpobpWpUHRCIqVNd4rFVNJN1Y_IAarsADh8_Eu_lvAmNJ7uUXPygM-TPa350vlsS85R6ATFwMdGee5JFWjf60Tjqb/s640/new-vm8.jpg" height="512" width="640" /></a></span></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">Here we can change the number of vCPUs for the guest if we've had second thoughts since our previous selection. We can also change the CPU type exposed to the guest. Often for PCI device assignment and optimal performance we'll want to use host-passthrough here, which is not available in the drop down and needs to by typed manually. This is also our opportunity to change the socket/core configuration for the VM. I'll change my configuration here to expose 4 vCPUs as a single socket, dual-core with threads, so that I can later show vCPU pinning with a thread example. There is pinning configuration available here, but I tend to configure this by editing the XML directly, which I'll show later. We can finish the install before we worry about that.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">The next important option for me is the Disk configuration:</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Arial, Helvetica, sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQmDKkw4qcKKLrrK3E0J2eu59VxdSIRc9JhSPiocx96n0QzNYOPraBiIv4bUgE3gi3VVjvwwQwrg0l6sYrZljGxSH9oPPpcNeonHHHdhYMEtW1Wq0riyQr6RYVlOlQgHCcHBAMQp3aw6pI/s1600/new-vm9.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQmDKkw4qcKKLrrK3E0J2eu59VxdSIRc9JhSPiocx96n0QzNYOPraBiIv4bUgE3gi3VVjvwwQwrg0l6sYrZljGxSH9oPPpcNeonHHHdhYMEtW1Wq0riyQr6RYVlOlQgHCcHBAMQp3aw6pI/s640/new-vm9.jpg" height="512" width="640" /></a></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">As shown here, I've changed my Disk bus from the default to VirtIO. VirtIO is a paravirtualized disk interface, which means that it's designed to be high performance for virtual machines. (EDIT: for further optimization using virtio-scsi rather than virtio-blk, see the comments below) Unfortunately Windows guests do not support VirtIO without additional drivers, so we'll need to configure the VM to provide those drivers during installation. For that, click the Add Hardware button on the bottom left of the screen. Select the Storage and just as we did with the installation media, locate the ISO image for the virtio drivers on your system. The latest virtio drivers can be found <a href="https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/latest-virtio/" target="_blank">here</a>. The dialog should look something like this:</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgF9-1Z-_cTRUUPTdAHytGQcUwqjqRlbAuFyjEJxgRxYTh8CXkojVDrdN-BjvOfF_fvH10MSxn0eyJ3kBtxhKwiT9Xc-NFcCAJS-YBpRMJkR8fYzNP82aGMA3tdfdY8DhyphenhypheneVJ7fPYsbBjfK/s1600/new-vm10.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgF9-1Z-_cTRUUPTdAHytGQcUwqjqRlbAuFyjEJxgRxYTh8CXkojVDrdN-BjvOfF_fvH10MSxn0eyJ3kBtxhKwiT9Xc-NFcCAJS-YBpRMJkR8fYzNP82aGMA3tdfdY8DhyphenhypheneVJ7fPYsbBjfK/s640/new-vm10.jpg" height="640" width="552" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">Click Finish to add the CDROM. Since we're adding virtio drivers anyway, let's also optimize our VM NIC by changing it to virtio as well:</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkQ3letK96tPTqTRar54iSCpDEZRk72oKMQ9JEYzGHSFh65ShT4RDDCRkhfgun6JltrhVUFjvWpv5-NhYakeMHjRdXM9ZMlLoUP96uELNAfx6EBidiQVLPMpMpoAB-LeoNRR_B2i8tsxIz/s1600/new-vm11.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkQ3letK96tPTqTRar54iSCpDEZRk72oKMQ9JEYzGHSFh65ShT4RDDCRkhfgun6JltrhVUFjvWpv5-NhYakeMHjRdXM9ZMlLoUP96uELNAfx6EBidiQVLPMpMpoAB-LeoNRR_B2i8tsxIz/s640/new-vm11.jpg" height="512" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">The remaining configuration is for your personal preference. If you're using a remote system and do not have ssh authorized keys configured, I'd advise changing the Display to use VNC rather than Spice, otherwise you'll need to enter your password half a dozen times. Select Begin Installation to... begin installation.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">In this configuration OVMF will drop to an EFI Shell from which you can navigate to the boot executable:</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinNa0N7966fA_6o4aXqPMJETttN3qhDVxIiwaYy8bWY0K-uejNVTpDgKNmIQtmCnjRI95dfIDCoLWngmwvVcFGffLXayA4u-ZUYq-tggoPtpbkSsYRbj15IHTr-_Wko_SEO0ArG6hqha1T/s1600/Screenshot_win8.1-demo_2015-05-04_18:49:37.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinNa0N7966fA_6o4aXqPMJETttN3qhDVxIiwaYy8bWY0K-uejNVTpDgKNmIQtmCnjRI95dfIDCoLWngmwvVcFGffLXayA4u-ZUYq-tggoPtpbkSsYRbj15IHTr-_Wko_SEO0ArG6hqha1T/s640/Screenshot_win8.1-demo_2015-05-04_18:49:37.png" height="480" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">Type bootx64 and press enter to continue installation and press a key when prompted to boot from the CDROM. I expect this would normally happen automatically if we hadn't added a second CDROM for virtio drivers.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">At this point the guest OS installation should proceed normally. When prompted for where to install Windows, select Load Driver, navigate to your CDROM, select Win8, AMD64 (for 64 bit drivers). You'll be given a list of several drivers to choose from. Select all of the drivers using shift-click and click Next. You should now see your disk and can proceed with installation. Generally after installation, the first thing I do is apply at least the recommended updates and reboot the VM.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">Before we shutdown to reconfigure, I like to install <a href="http://www.tightvnc.com/" target="_blank">TightVNC</a> server so I can have remote access even if something goes wrong with the display. I generally do a custom install of TightVNC, disabling client support, and setting security appropriate to the environment. Make note of the IP address of the guest and verify the VNC connection works. Shutdown the VM and we'll trim it down, tune it further, and add a GPU.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">Start with the machine details view in virt-manager and remove everything we no longer need. That includes the CDROM devices, the tablet, the display, sound, serial, spice channel, video, virtio serial, and USB redirectors. We can also remove the IDE controller altogether now. That leaves us with a minimal config that looks something like this:</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9xljsDT8UbctskwW3b5l8QHEr97gFRU3Q19k82WgUloyiuAW03H6d__1Itvfc1ZHm_ehPC0hlNpWQQYxHMMpvM-qMWRERZU7uRxrXfhqhhvMzPBCPUkJDkeX9qbe9im6SVBcI3X1hfT_B/s1600/new-vm12.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9xljsDT8UbctskwW3b5l8QHEr97gFRU3Q19k82WgUloyiuAW03H6d__1Itvfc1ZHm_ehPC0hlNpWQQYxHMMpvM-qMWRERZU7uRxrXfhqhhvMzPBCPUkJDkeX9qbe9im6SVBcI3X1hfT_B/s640/new-vm12.jpg" height="524" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">We can also take this opportunity to do a little further tuning by directly editing the XML. On the host, run </span><span style="font-family: Courier New, Courier, monospace;">virsh edit <domain></span><span style="font-family: Arial, Helvetica, sans-serif;"> as root or via sudo. Before the <os> tag, you can optionally add something like the following:</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace;"> <memoryBacking></span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace;"> <hugepages/></span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace;"> </memoryBacking></span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace;"> <cputune></span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace;"> <vcpupin vcpu='0' cpuset='2'/></span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace;"> <vcpupin vcpu='1' cpuset='3'/></span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace;"> <vcpupin vcpu='2' cpuset='6'/></span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace;"> <vcpupin vcpu='3' cpuset='7'/></span></div>
<div class="separator" style="clear: both;">
</div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace;"> </cputune></span></div>
</div>
<div>
<br /></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">The <memoryBacking> tag allows us to specify huge pages for the guest. This helps improve the VM efficiency by skipping page table levels when doing address translations. In order for this to work, you must create sufficient huge pages on the host system. Personally I like to do this via kernel commandline by adding something like </span><span style="font-family: Courier New, Courier, monospace;">hugepages=2048</span><span style="font-family: Arial, Helvetica, sans-serif;"> in </span><span style="font-family: Courier New, Courier, monospace;">/etc/sysconfig/grub</span><span style="font-family: Arial, Helvetica, sans-serif;"> and regenerating the initramfs as we did in the previous installment of this series. Most processors will only support 2MB hugepages, so by reserving 2048, we're reserving 4096MB worth of memory, which is enough for the 4GB guest I've configured here. Note that transparent huge pages are not effective for VMs making use of device assignment because all of the VM memory needs to be allocated and pinned for the IOMMU before the VM runs. This means the consolidation passes used by transparent huge pages will not be able to combine pages later.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">As advertised, we're also configuring CPU pinning. Since I've advertised a single socket, dual-core, threaded processor to my guest, I pin to the same configuration on the host. Processors 2 & 3 on the host are the 2nd and 3rd cores on my quad-core processor and processors 6 & 7 are the threads corresponding to those cores. We can determine this by looking at the "core id" line in </span><span style="font-family: Courier New, Courier, monospace;">/proc/cpuinfo</span><span style="font-family: Arial, Helvetica, sans-serif;"> on the host. It generally does not make sense to expose threads to the guest unless it matches the physical host configuration, as we've configured here. Save the XML and if you've chosen to use hugepages remember to reboot the host to make the new commandline take effect or else libvirt will error starting the VM.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">I'll start out assigning my Radeon HD8570 to the guest, so we don't yet need to hide any hypervisor features to make things work. Return to virt-manager and select Add Hardware. Select PCI Host Device and find your graphics card GPU function for assignment in the list. Repeat this process for the graphics card audio function. My VM details now look like this:</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipfwSdq3ttqRGeGgJwd3aj9hn9A-EewfbpU9Kwn6YcwhfEfN0t50yhcAkSm0Wfu3DU7xJXMOEty511mPq2ihQDfsqGw6civdh6_GT-LrwnJSy-YR2msEiELQ2rk8qJr-pzUtrHnaEvLFAE/s1600/new-vm13.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipfwSdq3ttqRGeGgJwd3aj9hn9A-EewfbpU9Kwn6YcwhfEfN0t50yhcAkSm0Wfu3DU7xJXMOEty511mPq2ihQDfsqGw6civdh6_GT-LrwnJSy-YR2msEiELQ2rk8qJr-pzUtrHnaEvLFAE/s640/new-vm13.jpg" height="484" width="640" /></a></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Start the VM. This time there will not be any console available via virt-manager, the display should initialize and you should see the TianoCore boot splash on the physical monitor connected to the graphics card as well as the Windows startup. Once the guest is booted, you can now reconnect to it using the guest-based VNC server.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">At this point we can use the browser to go to amd.com and download driver software for our device. For AMD software it's recommended to specify the driver for your device using the drop down menus on the website rather than allowing the tools to select a driver for you. This holds true for updates via the runtime Catalyst interface later as well. Allowing driver detect often results in blue screens.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">At some point during the download, Windows will probably figure out that its hardware changed and switch to it's builtin drivers for the device and the screen resolution will increase. This is a good sign that things are working. Run the Catalyst installation program. I generally use an Express Installation, which hopefully implies that Custom Installations will also work. After installation completes, reboot the VM and you should now have a fully functional, fully graphics accelerated VM.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">The GeForce card is nearly as easy, but we first need to work around some of the roadblocks Nvidia has put in place to prevent you from using the hardware you've purchased in the way that you desire (and by my reading conforms to the EULA for their software, but IANAL). For this step we again need to run </span><span style="font-family: Courier New, Courier, monospace;">virsh edit</span><span style="font-family: Arial, Helvetica, sans-serif;"> on the VM. Within the </span><span style="font-family: Courier New, Courier, monospace;"><features></span><span style="font-family: Arial, Helvetica, sans-serif;"> section, remove everything between the </span><span style="font-family: Courier New, Courier, monospace;"><hyperv></span><span style="font-family: Arial, Helvetica, sans-serif;"> tags, including the tags themselves. In their place add the following tags:</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> <kvm></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> <hidden state='on'/></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> </kvm></span></div>
<div style="font-family: Arial, Helvetica, sans-serif;">
<br /></div>
</div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Additionally, within the </span><span style="font-family: Courier New, Courier, monospace;"><clock></span><span style="font-family: Arial, Helvetica, sans-serif;"> tag, find the timer named </span><span style="font-family: Courier New, Courier, monospace;">hypervclock</span><span style="font-family: Arial, Helvetica, sans-serif;">, remove the line containing this tag completely. Save and exit the edit session.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">We can now follow the same procedure used in the above Radeon example, add the GPU and audio function to the VM, boot the VM and download drivers from nvidia.com. As with AMD, I typically use the express installation. Restart the VM and you should now have a fully accelerated Nvidia VM.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">For either GPU type I highly suggest continuing with following the instructions in my <a href="http://vfio.blogspot.com/2014/09/vfio-interrupts-and-how-to-coax-windows.html" target="_blank">article</a> on configuring the audio device to use Message Signaled Interrupts (MSI) to improve the efficiency and avoid glitchy audio. MSI for the GPU function is typically enabled by default for AMD and not necessarily a performance gain on Nvidia due to an extra trap that Nvidia takes through QEMU to re-enable the MSI via a PCI config write.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">Hopefully you now have a working VM with a GPU assigned. If you don't, please comment on what variants of the above setup you'd like to see and I'll work on future articles. I'll re-iterate that the above is my preferred and recommended setup, but VGA-mode assignment with SeaBIOS can also be quite viable provided you're not using Intel IGD graphics for the host (or you're willing to suffer through patching your host kernel for the foreseeable future). Currently on my ToDo list for this series is possibly a UEFI install of Windows 7 (if that's possible), a VGA-mode example by disabling IGD on my host, using host GTX750 and assigning HD8570. That will require a simple qemu-kvm wrapper script to insert the x-vga=on option for vfio-pci. After that I'll likely do a Q35 example with a more complicated wrapper, unless libvirt beats me to adding better native support for Q35. I will not be doing SLI/Crossfire examples as I don't have the hardware for it, there's too much proprietary black magic in SLI, and I really don't see the point of it given the performance of single card solutions today. Stay tuned for future articles and please suggest or up-vote what you'd like to see next.</span></div>
Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com78tag:blogger.com,1999:blog-8694303781453133223.post-30235191575624643202015-05-05T17:00:00.000-06:002015-11-01T07:31:46.494-07:00VFIO GPU How To series, part 3 - Host configuration<span style="font-family: Arial, Helvetica, sans-serif;">For my setup I'm using a Fedora 21 system with the <a href="https://fedoraproject.org/wiki/Virtualization_Preview_Repository" target="_blank">virt-preview yum repos</a> to get the latest QEMU and libvirt support along with Gerd Hoffmann's <a href="https://www.kraxel.org/repos/" target="_blank">firmware repo</a> for the latest EDK2 OVMF builds. I hope though that the majority of the setup throughout this howto series is mostly distribution agnostic, just make sure you're running a newer distribution with current kernels and tools. Feel free to add comments for other distributions if something is markedly different. </span><br />
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">The first thing we need to do on the host is enable the IOMMU. To do this, verify that IOMMU support is enabled in the host BIOS. How to do this will be specific to your hardware/BIOS vendor. If you can't find an option, don't fret, it may be tied to processor virtualization support. If you're using an Intel processor, check <a href="http://ark.intel.com/" target="_blank">http://ark.intel.com</a> to verify that your processor supports VT-d before going any further.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Next we need to modify the kernel commandline to allow the kernel to enable IOMMU support. This will be similar between distributions, but not identical. On Fedora we need to edit </span><span style="font-family: Courier New, Courier, monospace;">/etc/sysconfig/grub</span><span style="font-family: Arial, Helvetica, sans-serif;">. Find the </span><span style="font-family: Courier New, Courier, monospace;">GRUB_CMDLINE_LINUX</span><span style="font-family: Arial, Helvetica, sans-serif;"> line and within the quotes add either </span><span style="font-family: Courier New, Courier, monospace;">intel_iommu=on</span><span style="font-family: Arial, Helvetica, sans-serif;"> or </span><span style="font-family: Courier New, Courier, monospace;">amd_iommu=on</span><span style="font-family: Arial, Helvetica, sans-serif;">, depending on whether your platform is Intel or AMD. You may also want to add the option </span><span style="font-family: Courier New, Courier, monospace;">iommu=pt</span><span style="font-family: Arial, Helvetica, sans-serif;">, which sets the IOMMU into passthrough mode for host devices. This reduces the overhead of the IOMMU for host owned devices, but also removes any protection the IOMMU may have provided again errant DMA from devices. If you weren't using the IOMMU before, there's nothing lost. Regardless of passthrough mode, the IOMMU will provide the same degree of isolation for assigned devices.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Save the system grub configuration file and use your distribution provided update scrips to apply this configuration to the boot-time grub config file. On Fedora, the command is:</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"># grub2-mkconfig -o /etc/grub2.cfg</span></div>
<div>
<br /></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">If your host system boots via UEFI, the correct target file is </span><span style="font-family: Courier New, Courier, monospace;">/etc/grub2-efi.cfg</span><span style="font-family: Arial, Helvetica, sans-serif;">.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">With these changes, reboot the system and verify that the IOMMU is enabled. To do this, first verify that the kernel booted with the desired updates to the commandline. We can check this using:</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"># cat /proc/cmdline</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br />
</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">If the changes are not there, verify that you've booted the correct kernel or double check instructions specific to your distribution. If they are there, then we next need to check that the IOMMU is actually functional. The easiest way to do this is to check for IOMMU groups, which are setup by the IOMMU and will be used by VFIO for assignment. To do this, run the following:</span></div>
<div>
<br /></div>
<pre><code># find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:16.0
/sys/kernel/iommu_groups/4/devices/0000:00:1a.0
/sys/kernel/iommu_groups/5/devices/0000:00:1b.0
/sys/kernel/iommu_groups/6/devices/0000:00:1c.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.5
/sys/kernel/iommu_groups/8/devices/0000:00:1c.6
/sys/kernel/iommu_groups/9/devices/0000:00:1c.7
/sys/kernel/iommu_groups/9/devices/0000:05:00.0
/sys/kernel/iommu_groups/10/devices/0000:00:1d.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.2
/sys/kernel/iommu_groups/11/devices/0000:00:1f.3
/sys/kernel/iommu_groups/12/devices/0000:02:00.0
/sys/kernel/iommu_groups/12/devices/0000:02:00.1
/sys/kernel/iommu_groups/13/devices/0000:03:00.0
/sys/kernel/iommu_groups/14/devices/0000:04:00.0
</code></pre>
<br />
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">If you get output like above, then the IOMMU is working. If you do not get a list of devices, then something is wrong with the IOMMU configuration on your system, either not properly enabled or not supported by the hardware and you'll need to figure out the problem before moving forward.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">This is also a good time to verify that we have the desired isolation via the IOMMU groups. In the above example, there's a separate group per device except for the following groups: 1, 9, 11, and 12. Group 1 includes:</span></div>
<div>
<pre><span style="word-wrap: normal;"><code>
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750] (rev a2)
01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)</code></span></pre>
</div>
<div>
<br /></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">This includes the processor root port and my GeForce card. This is a case where the processor root port does not provide isolation and is therefore included in the IOMMU group. The host driver for the root port should remain in place, with only the two endpoint devices, the GPU itself and its companion audio function bound to vfio-pci.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Group 9 has a similar constraint, though in this case device </span><span style="font-family: Courier New, Courier, monospace;">0000:00:1c.7</span><span style="font-family: Arial, Helvetica, sans-serif;"> is not a root port, but a PCI bridge. Since this is conventional PCI, the bridge and all of the devices behind it are grouped together. Device </span><span style="font-family: Courier New, Courier, monospace;">0000:05:00.0</span><span style="font-family: Arial, Helvetica, sans-serif;"> is another bridge, so there's nothing assignable in the IOMMU group anyway.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Group 11 is composed of internal components, an ISA bridge, SATA controller, and SMBus device. These are grouped because there's not ACS between the devices and therefore no isolation. I don't plan to assign any of these devices anyway, so it's not an issue.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Group 12 includes only the functions of my second graphics card, so the grouping here is also reasonable and perfectly usable for our purposes.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">If your grouping is not reasonable, or usable, you may be able to "fix" this by using the ACS override patch, but carefully consider the implications of doing this. There is a potential for putting your data at risk. Read my <a href="http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html" target="_blank">IOMMU groups article</a> again to make sure you understand the issue.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">Next we need to handle the problem that we only intend to use the discrete GPUs for guests, we do not want host drivers attaching to them. This avoids issues with the host driver unbinding and re-binding to the device. Generally this is only necessary for graphics cards, though I also throw in the companion audio function to keep the host desktop from getting confused which audio device to use. We have a couple options for doing this. The most common option is to use the pci-stub driver to claim these devices before native host drivers have the opportunity. Fedora builds the pci-stub driver statically into the kernel, giving it loading priority over any loadable modules, simplifying this even further. If your distro doesn't keep reading, we'll cover a similar scenario with vfio-pci.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">The first step is to determine the PCI vendor and device IDs we need to bind to pci-stub. For this we use lspci:</span><br />
<br />
<div>
<pre><span style="word-wrap: normal;"><code>$ lspci -n -s 1:
01:00.0 0300: 10de:1381 (rev a2)
01:00.1 0403: 10de:0fbc (rev a1)
$ lspci -n -s 2:
02:00.0 0300: 1002:6611
02:00.1 0403: 1002:aab0
</code></span></pre>
</div>
</div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">The Vendor:Device IDs for my GPUs and audio functions are therefore 10de:1381, 10de:0fbc, 1002:6611, and 1002:aab0. From this, we can craft a new option to add to our kernel commandline using the same procedure as above for the IOMMU. In this case the commandline addition looks like this:</span><br />
<div>
<pre><span style="word-wrap: normal;"><code>
pci-stub.ids=10de:1381,10de:0fbc,1002:6611,1002:aab0
</code></span></pre>
</div>
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">After adding this to our grub configuration, using grub2-mkconfig, and rebooting, </span><span style="font-family: Courier New, Courier, monospace;">lspci -nnk</span><span style="font-family: Arial, Helvetica, sans-serif;"> for these devices should list pci-stub for the kernel driver in use.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">A further trick we can use is to craft an ids list using the advanced parsing of PCI vendor and class attributes to create an option list that will claim any Nvidia or AMD GPU or audio device:</span><br />
<br />
<div>
<pre><span style="word-wrap: normal;"><code>pci-stub.ids=1002:ffffffff:ffffffff:ffffffff:00030000:ffff00ff,1002:ffffffff:ffffffff:ffffffff:00040300:ffffffff,10de:ffffffff:ffffffff:ffffffff:00030000:ffff00ff,10de:ffffffff:ffffffff:ffffffff:00040300:ffffffff
</code></span></pre>
</div>
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">If you're using kernel v4.1 or newer, the vfio-pci driver supports the same ids option so you can directly attach devices to vfio-pci and skip pci-stub. vfio-pci is not generally built statically into the kernel, so we need to force it to be loaded early. To do this on Fedora we need to setup the module options we want to use with modprobe.d. I typically use a file named </span><span style="font-family: Courier New, Courier, monospace;">/etc/modprobe.d/local.conf </span><span style="font-family: Arial, Helvetica, sans-serif;">for local, ie. system specific, configuration. In this case, that file would include:</span><br />
<div>
<pre><span style="word-wrap: normal;"><code>
options vfio-pci ids=1002:ffffffff:ffffffff:ffffffff:00030000:ffff00ff,1002:ffffffff:ffffffff:ffffffff:00040300:ffffffff,10de:ffffffff:ffffffff:ffffffff:00030000:ffff00ff,10de:ffffffff:ffffffff:ffffffff:00040300:ffffffff
</code></span></pre>
</div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <br />
<span style="font-family: Arial, Helvetica, sans-serif;">Next we need to ensure that dracut includes the necessary modules to load vfio-pci. I therefore create </span><span style="font-family: Courier New, Courier, monospace;">/etc/dracut.conf.d/local.conf</span><span style="font-family: Arial, Helvetica, sans-serif;"> with the following:</span><br />
<div>
<pre><span style="word-wrap: normal;"><code>
add_drivers+="vfio vfio_iommu_type1 vfio_pci vfio_virqfd"
</code></span></pre>
</div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">(Note, the vfio_virqfd module only exists in kernel v4.1+)</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span><span style="font-family: Arial, Helvetica, sans-serif;">Finally, we need to tell dracut to load vfio-pci first. This is done by once again editing our grub config file and adding the option: </span><span style="font-family: Courier New, Courier, monospace;">rd.driver.pre=vfio-pci</span><span style="font-family: Arial, Helvetica, sans-serif;"> Note that in this case we no longer use a pci-stub.ids option from grub, since we're replacing it with vfio-pci. Regenerate the dracut initramfs with </span><span style="font-family: Courier New, Courier, monospace;">dracut -f --kver `uname -r`</span><span style="font-family: Arial, Helvetica, sans-serif;"> and reboot to see the effect (The </span><span style="font-family: Courier New, Courier, monospace;">--regenerate-all</span><span style="font-family: Arial, Helvetica, sans-serif;"> dracut option is also sometimes useful).</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">Another issue that users encounter when sequestering devices is what to do when there are multiple devices with the same vendor:device ID and some are intended to be used for the host. Some users have found the xen-pciback module to be a suitable stand-in for pci-stub with the additional feature that the "hide" option for this module takes device addresses rather than device IDs. I can't load this module on Fedora, so here's my solution that I like a bit better.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">Create a small script, I've named mine </span><span style="font-family: Courier New, Courier, monospace;">/sbin/vfio-pci-override-vga.sh </span><span style="font-family: Arial, Helvetica, sans-serif;">It contains:</span><br />
<div>
<pre><span style="word-wrap: normal;"><code>
#!/bin/sh
for i in $(find /sys/devices/pci* -name boot_vga); do
if [ $(cat $i) -eq 0 ]; then
GPU=$(dirname $i)
AUDIO=$(echo $GPU | sed -e "s/0$/1/")
echo "vfio-pci" > $GPU/driver_override
if [ -d $AUDIO ]; then
echo "vfio-pci" > $AUDIO/driver_override
fi
fi
done
modprobe -i vfio-pci
</code></span></pre>
</div>
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">This script will find every non-boot VGA device in the system, use the driver_override feature introduced in kernel v3.16, and make vfio-pci the exclusive driver for that device. If there's a companion audio device at function 1, it also gets a driver override. We then modprobe the vfio-pci module, which will automatically bind to the devices we've specified. Don't forget to make the script executable with chmod 755. Now, in place of the </span><span style="font-family: Courier New, Courier, monospace;">options</span><span style="font-family: Arial, Helvetica, sans-serif;"> line in our modprobe.d file, we use the following:</span><br />
<br />
<div>
<pre><span style="word-wrap: normal;"><code>install vfio-pci /sbin/vfio-pci-override-vga.sh
</code></span></pre>
</div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span><span style="font-family: Arial, Helvetica, sans-serif;">So we specify that to install the vfio-pci module, run the script we just wrote, which sets up our driver overrides and then loads the module, ignoring the install option (-i) to prevent a loop. Finally, we need to tell dracut to include this script in the initramfs, so in addition to the </span><span style="font-family: Courier New, Courier, monospace;">add_drivers+=</span><span style="font-family: Arial, Helvetica, sans-serif;"> that we added above, add the following to </span><span style="font-family: Courier New, Courier, monospace;">/etc/dracut.conf.d/local.conf</span><span style="font-family: Arial, Helvetica, sans-serif;">: </span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <br />
<div>
<pre><span style="word-wrap: normal;"><code>install_items+="/sbin/vfio-pci-override-vga.sh /usr/bin/find /usr/bin/dirname"
</code></span></pre>
</div>
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">Note that the additional utilities required were found using </span><span style="font-family: Courier New, Courier, monospace;">lsinitrd</span><span style="font-family: Arial, Helvetica, sans-serif;"> and iteratively added to make the script work. Regenerate the initramfs with dracut again and you should now have all the non-boot VGA devices and their companion audio functions bound to vfio-pci after reboot. The primary graphics should load with the native host driver normally. This method should work for any kernel version, and I think I'm going to switch my setup to use it since I wrote it up here.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">Obviously a more simple script can be used to pick specific devices. Here's an example that achieves the same result on my system:</span><br />
<div>
<pre><span style="word-wrap: normal;"><code>
#!/bin/sh
DEVS="0000:01:00.0 0000:01:00.1 0000:02:00.0 0000:02:00.1"
for DEV in $DEVS; do
echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
done
modprobe -i vfio-pci
</code></span></pre>
</div>
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">(In this case the find and dirname binaries don't need to be included in the intramfs)</span><br />
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">Another couple other bonuses for v4.1 and newer kernels is that by binding devices statically to vfio-pci, they will be placed into a low power state when not in use. Before you get your hopes too high, this generally only saves a few watts and does not stop the fan. v4.1 users with exclusively OVMF guests can also add an "options vfio-pci d</span><span style="font-family: Courier New, Courier, monospace;">isable_vga=1</span><span style="font-family: Arial, Helvetica, sans-serif;">" line to their modprobe.d which will cause vfio-pci to opt-out devices from vga arbitration if possible. This prevents VGA arbitration from interfering with host devices, even in configurations like mine with multiple assigned GPUs.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">If you're in the unfortunate situation of needing to use legacy VGA BIOS support for your assigned graphics cards and you have Intel host graphics using the i915 driver, this is also the point where you need to patch your host kernel for the <a href="https://gist.github.com/cspicer/b5e3b3d21b635ee4105a#file-i915_317-patch" target="_blank">i915 VGA arbitration fix</a>. Don't forget that to enable this patch you also need to pass the </span><span style="font-family: Courier New, Courier, monospace;">enable_hd_vgaarb=1</span><span style="font-family: Arial, Helvetica, sans-serif;"> option to the i915 driver. This is typically done via a modprobe.d </span><span style="font-family: Courier New, Courier, monospace;">options</span><span style="font-family: Arial, Helvetica, sans-serif;"> entry as discussed above.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">At this point your system should be ready to use. The IOMMU is enabled, the IOMMU groups have been verified, the VGA and audio functions for assignment have been bound to either vfio-pci or pci-stub for later use by libvirt, and we've enabled proper VGA arbitration support in the i915 driver if needed. In the next part we'll actually install a VM, and maybe even attach a GPU to it. Stay tuned.</span>Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com47tag:blogger.com,1999:blog-8694303781453133223.post-50990994189372083652015-05-05T14:37:00.000-06:002015-05-05T15:23:28.684-06:00VFIO GPU How To series, part 2 - Expectations<span style="font-family: Arial, Helvetica, sans-serif;">From part 1 we learned some basic guidelines for the hardware necessary to support GPU assignment, but let's take a moment to talk about what we're actually trying to accomplish. There's no point in proceeding further if the solution doesn't meet your goals and expectations.</span><br />
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">First things first, PCI device assignment is a means to exclusively assign a device to a VM. Devices cannot be shared among multiple guests and cannot be shared between host and guest. The solution we're discussing here is not vGPU, VGX, or any other means of multiplexing a single GPU among multiple users. Furthermore, VFIO works on an isolation unit known as an IOMMU Group. Endpoints within an IOMMU group follow the same rule; they're either owned by a single guest or the host, not both. As referenced in part 1, a previous <a href="http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html" target="_blank">article</a> on IOMMU groups attempts to explain this relationship.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Next, be aware that Linux graphics drivers, both open source and proprietary are rather poor at dynamically binding and unbinding devices. This means that hot-unplugging a graphics adapter from the host configuration, assigning it to a guest for some task, and then re-plugging it back to the host desktop is not really achievable just yet. This is something that could happen, but graphics drivers need to get to the same level of robustness around binding and unbinding devices as NIC drivers before this is really practical.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Probably the primary incorrect expectations that users have around GPU assignment is the idea that the out-of-band VM graphics display, via Spice or VNC, will still be available, it will somehow just be accelerated with an assigned GPU. The misconception is reinforced by <a href="https://www.youtube.com/watch?v=VGdqRuHBY64" target="_blank">youtube</a> <a href="https://www.youtube.com/watch?v=Qi1LdFkRzIs" target="_blank">videos</a> that show accelerated graphics running within a window on the host system. In both of these examples, the remote display is actually accomplished using TightVNC running within the guest. Don't get me wrong, TightVNC is a great solution for some use cases, and local software bridges for virtio networking provide extreme amounts of bandwidth to make this a bit more practical than going across a physical wire, but it's not a full replacement for a console screen within virt-manager or other VM management tools. TightVNC is a server running within the guest, it's only available once the guest is booted, it's rather CPU intensive in the guest, and it's only moderately ok for the remote display of 3D graphics. When using GPU assignment, the only fully accelerated guest output is through the monitor connector to the physical graphics card itself. We currently have no ability to scrape the framebuffer from the physical device, driven by proprietary drivers, and feed those images into the QEMU remote graphics protocols. There are commercial solutions to this problem, <a href="https://www.nice-software.com/products/dcv" target="_blank">NICE DCV</a> and <a href="http://www8.hp.com/us/en/campaigns/workstations/remote-graphics-software.html" target="_blank">HP RGS</a> are both software solutions to provide better remote 3D capabilities. It's even possible to co-assign <a href="http://www.teradici.com/pcoip-technology" target="_blank">PCoIP</a> cards to the VM to achieve high performance remote 3D. In my use case, I enable TightVNC for 2D interaction with the VM and use the local monitor or software stacks like <a href="http://store.steampowered.com/streaming" target="_blank">Steam in home streaming</a> for remote use of the VM. Tools like <a href="http://synergy-project.org/" target="_blank">Synergy</a> are useful for local monitors to seamlessly combine mouse and keyboard for multiple desktops.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">Another frequent misconception is that integrated graphics devices, like Intel IGD graphics, are just another graphics device and should work with GPU assignment. Unfortunately IGD is not only non-discrete from the aspect of being integrated into the processor, but it's also non-discrete in that the drivers for IGD depend on various registers and operations regions and firmware tables spread across the chipset. Thus, while we can attach the IGD GPU to the guest, it doesn't work due to driver requirements. This issue is being worked and will hopefully have a solution in the near future. As I understand it, AMD APUs are much more similar to their discrete counterparts, but the device topology still makes them difficult to work with. We rely fairly heavily on PCI bus resets in order to put GPUs into a clean state for the guest and between guest reboots, but this is not possible on IGD or APUs because they reside on the host PCIe root complex. Resetting the host bus is not only non-standard, but it would reset every PCI device in the system. We really need Function Level Reset (FLR) support in the device to make this work. </span><span style="font-family: Arial, Helvetica, sans-serif;">For now, IGD assignment doesn't work, and I have no experience with APU assignment, but expect it to work poorly due to lack of reset.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">Ok, I think that tackles the top misconceptions regarding GPU assignment. Hopefully there are still plenty of interesting use cases for your application. In the next part we'll start to configure our host system for device assignment. Stay tuned.</span></div>
Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com0tag:blogger.com,1999:blog-8694303781453133223.post-59705586156546149532015-05-05T14:18:00.002-06:002015-05-05T15:23:21.117-06:00 VFIO GPU How To series, part 1 - The hardware<span style="font-family: Arial, Helvetica, sans-serif;">This is an attempt to make a definitive howto guide for GPU assignment with QEMU/KVM and VFIO. It should also be relevant for general PCI device assignment with VFIO. For part 1 I'll simply cover the hardware that I use, it's features and drawbacks for this application and what I might do differently in designing a system specifically for GPU assignment. In later parts we'll get in to installing VMs and configuring GPU assignment.</span><br />
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">The system I'm using is nothing particularly new or exciting, it's simply a desktop based on an <a href="http://asus%20p8h67-m%20pro/CSM" target="_blank">Asus P8H67-M PRO/CSM</a>:</span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuGkS3mtqvWCqi0qRv3F0X86eEfG-reSv2aL7xTGiZiKVk0spFZkNb8igTkCz9B-3Jd2uDAJHUXF0C8FzmcHiAE4fsIjMOl-NjrqxbidBZGqlDFt8TuT69YIYqCU4HnqnXwwQLG-GF8Keo/s1600/20111213_152026.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuGkS3mtqvWCqi0qRv3F0X86eEfG-reSv2aL7xTGiZiKVk0spFZkNb8igTkCz9B-3Jd2uDAJHUXF0C8FzmcHiAE4fsIjMOl-NjrqxbidBZGqlDFt8TuT69YIYqCU4HnqnXwwQLG-GF8Keo/s1600/20111213_152026.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">I'm using a <a href="http://ark.intel.com/products/65729/Intel-Xeon-Processor-E3-1245-v2-8M-Cache-3_40-GHz" target="_blank">Xeon E3-1245 v2</a>, Ivy Bridge processor. I wouldn't necessarily recommend this particular setup (it's probably only available on ebay anymore anyway), but I'll point out a few interesting points about it to help you pick your own system. First, the motherboard uses an H67 chipset, which is covered by the <a href="http://lxr.free-electrons.com/ident?i=pci_quirk_intel_pch_acs_ids" target="_blank">Intel PCH ACS quirk</a>. This means that devices connected via the PCH root ports will be isolated from each other. That includes anything plugged into the black PCIe 2.0 x16 (electrically x4) slot shown on the top of the picture above, as well as builtin devices hanging off the PCH root ports internally. The blue x16 slot is the higher performance PCIe 3.0 (3.0 for my processor) slot driven from the processor root ports. The motherboard manual will sometimes have block diagrams indicating which ports are derived from which component, this particular board doesn't.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">A convenient "feature" of this board is that there's only a single processor-based root port slot. That's not exactly a feature, but processors for this socket (1155) that also support Intel VT-d include Core i5, i7, and Xeon E3-1200 series. None of these processors support PCIe ACS between the root ports (see <a href="http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html" target="_blank">here</a> for why that's important), this means multiple root ports would not be isolated from each other. If I had more than one processor-based root ports and made use of them, I might need the ACS override patch to fake isolation that may or may not exist.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">There are also a couple conventional PCI slots on this board that are largely useless. Not only because conventional PCI is not a good choice for device assignment, and not only because they're blocked by graphics card heatsinks, but because they're driven by an ASMedia ASM1083 (rev 1) PCIe-to-PCI bridge, which has all sorts of interrupt issues, even on bare metal. This spawns my personal dislike and distrust for anything made by ASMedia.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">The onboard NIC is a Realtek RTL8111, which is not particularly interesting either. Realtek NICs are also a poor choice for doing direct device assignment to a guest; they do <a href="http://git.qemu.org/?p=qemu.git;a=commitdiff;h=4cb47d281a995cb49e4652cb26bafb3ab2d9bd28" target="_blank">strange and non-standard things</a>. On my system, I use it with a software bridge and virtio network devices for the VMs. This provides plenty of performance for my use cases (including Steam in-home streaming) as well as local connectivity for Synergy.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Final note on the base platform, I'm using the processor integrated Intel HD Graphics P4600 for the host graphics. This particular motherboard only allows BIOS selection between IGD, PCIe, and PCI for the primary graphics devices. There is no way to specify a particular PCIe slot for primary graphics as other vendors, like Gigabyte, tend to provide. This motherboard is therefore not a good choice for discrete host graphics since we only have one fixed configuration for selecting the primary graphics device between plugin cards.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">For reference, lspci on this system:</span></div>
<div>
<br />
<pre><span style="font-size: x-small; word-wrap: normal;"><code>-[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v2/Ivy Bridge DRAM Controller
+-01.0-[01]--+-00.0 NVIDIA Corporation GM107 [GeForce GTX 750]
| \-00.1 NVIDIA Corporation Device 0fbc
+-02.0 Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller
+-16.0 Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1
+-1a.0 Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2
+-1b.0 Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller
+-1c.0-[02]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240 OEM]
| \-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
+-1c.5-[03]----00.0 ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
+-1c.6-[04]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
+-1c.7-[05-06]----00.0-[06]--
+-1d.0 Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1
+-1f.0 Intel Corporation H67 Express Chipset Family LPC Controller
+-1f.2 Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller
\-1f.3 Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller
</code></span></pre>
</div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
My IOMMU groups look like this:</span></div>
<div>
<pre><span style="font-size: x-small; word-wrap: normal;"><code>
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:16.0
/sys/kernel/iommu_groups/4/devices/0000:00:1a.0
/sys/kernel/iommu_groups/5/devices/0000:00:1b.0
/sys/kernel/iommu_groups/6/devices/0000:00:1c.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.5
/sys/kernel/iommu_groups/8/devices/0000:00:1c.6
/sys/kernel/iommu_groups/9/devices/0000:00:1c.7
/sys/kernel/iommu_groups/9/devices/0000:05:00.0
/sys/kernel/iommu_groups/10/devices/0000:00:1d.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.2
/sys/kernel/iommu_groups/11/devices/0000:00:1f.3
/sys/kernel/iommu_groups/12/devices/0000:02:00.0
/sys/kernel/iommu_groups/12/devices/0000:02:00.1
/sys/kernel/iommu_groups/13/devices/0000:03:00.0
/sys/kernel/iommu_groups/14/devices/0000:04:00.0
</code></span></pre>
</div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span><span style="font-family: Arial, Helvetica, sans-serif;">Note that even though I only have a single PCH root port slot, there are multiple internal root ports connecting the built-in I/O, including the USB 3.0 controller and Ethernet. Without native ACS support or the ACS quirk for these PCH root ports, all of the 1c.* root ports and devices behind them would be grouped together.</span><br />
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">Let's move on to the graphics cards. This is an always-on desktop system, so noise and power (and having representative devices) are more important to me than ultimate performance. The cards I'm using are therefore an <a href="https://www.techpowerup.com/gpudb/b2780/evga-gtx-750-superclocked.html" target="_blank">EVGA GTX 750 Superclocked</a>:</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://www.techpowerup.com/gpudb/images/b2780.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="204" src="https://www.techpowerup.com/gpudb/images/b2780.jpg" width="320" /></a></div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">And an <a href="https://www.techpowerup.com/gpudb/1974/radeon-hd-8570-oem.html" target="_blank">AMD HD 8570 OEM</a>:</span></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://www.techpowerup.com/gpudb/images/1974.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="204" src="https://www.techpowerup.com/gpudb/images/1974.jpg" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<span style="font-family: Arial, Helvetica, sans-serif;">The GTX750 is based on Maxwell, giving it an excellent power-performance ratio and the 8570 is based on Oland, making it one of the newer generation of GCN chips from AMD. The 8570 is by no means a performance card, but I don't have room for a double-wide graphics card in my PCH root port slot and it's only running x4 electrically anyway. OEM cards seem to be a good way to find cheap cards on ebay, but their cooling solutions leave something to be desired. I actually replace the heatsink fan on the 8570 with a Cooler Master CoolViva Z1. I'll also mention that before upgrading to the GTX750 I successfully ran a <a href="https://www.techpowerup.com/gpudb/2041/geforce-gt-635-oem.html" target="_blank">GT 635 OEM</a>, which is fairly comparable in specs and price to the 8570.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">This system was not designed or purchased with this particular use case in mind. In fact, it only gained VT-d capabilities after upgrading from the Core i3 processor that I originally had installed. </span><span style="font-family: Arial, Helvetica, sans-serif;">So what would an ideal system be for this purpose? First, IOMMU support via Intel VT-d or AMD-Vi is required. This is not negotiable. If we stay with Intel Core i5/i7 (no VT-d support in i3) or Xeon E3-12xx series processors then we need to be well aware of the lack of ACS support on processor root ports. In an application like above, I'm more limited by physical slots so this is not a problem. If I wanted more processor-based root port slots, the trade-off in using these processors is the lack of isolation. The ACS override patch will not go upstream and is not recommended for downstreams or users due to the potential risk in assuming isolation where none may exist. The alternative is to set our sights on Xeon E5 or higher processors. This is potentially a higher price point, but I see plenty of users spending many hundreds of dollars on high-end graphics cards, yet skimping on the processor and complaining about needing to patch their kernel. Personally I'd rather put more towards the platform to avoid that hassle.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">There are also users that prefer AMD platforms. Personally I don't find them that compelling. The latest non-APU chipset is several years old and the processors are too power hungry for my taste. A-series processors aren't much better and their chipsets are largely unknown with respect to both isolation and IOMMU support. Besides the processor and chipset technologies, Intel has a huge advantage with <a href="http://ark.intel.com/" target="_blank">http://ark.intel.com/</a> in being able to quickly and easily research the capabilities of a given product.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">But what about the graphics cards? If you're looking for a solution supported by the graphics card vendor, you're limited to Nvidia Quadro K-series, model 2000 or better (or GRID or Tesla, but those are not terribly relevant in this context). Nvidia supports running these cards in QEMU/KVM virtual environments using VFIO in a secondary display configuration. In other words, pre-boot (BIOS), OS boot, and initial installation and maintenance is done using an emulated graphics device and the GPU is only activated once the proprietary graphics drivers are enabled in the guest. This mode of operation does not depend on the graphics ROM (legacy vs UEFI) and works with current Windows and Linux guests.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">When choosing between GeForce and Radeon, there's no clear advantage of one versus the other that's necessarily sufficient to trump personal preference. AMD cards are known to experience occasional blue screens for Windows guests and a couple of the more recent GPUs have <a href="https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg03128.html" target="_blank">known reset issues</a>. On the other hand, evidence suggests that Nvidia may be actively trying to subvert VM usage of GeForce graphics cards. As noted in the <a href="http://vfio.blogspot.com/2014/08/vfiovga-faq.html" target="_blank">FAQ</a> we need to both hide KVM as the hypervisor as well as disable KVM's support for Microsoft Hyper-V extensions in order for Nvidia's driver to work in the VM. The statement from Nvidia has always been that they are not intentionally trying to block this use, but that it's not supported and won't be fixed. Personally, that's a hard explanation to swallow.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">My observation is that AMD appears more interested in supporting VM use cases, but they're not doing anything in particular to enable it or make it work better. Nvidia generally works better, but each new driver upgrade is an opportunity for Nvidia to introduce new barriers or shut down this usage model entirely. If Nvidia were to make a gesture of support by fixing the current "bugs" in hypervisor detection, the choice would be clear IMHO.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">Users also often like to assign additional hardware to VMs to make them feel more like separate systems. In my use case, I can run multiple guests simultaneously and have a monitor for each graphics cards, but they are only used by a single user, me. I'm therefore perfectly happy using <a href="http://synergy-project.org/" target="_blank">Synergy</a> to share my mouse and keyboard and virtio disk and network for the VMs works well. For an actual multi-seat use case, being able to connect an individual mouse and keyboard per seat is obviously useful. USB "passthrough", which is different from "assignment", works on individual USB endpoints and is one solution to this problem, but generally doesn't work well with hotplug (in my experience). Using multiple USB host controllers, with a host controller assigned per VM is another option for providing a more native solution. This however means that we start increasing our slot requirements and therefore our concerns about isolation between those slots.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">Assigning physical NICs to a system is also an option, though for a desktop setup it's generally unnecessary. In the 1Gbps realm, virtio can easily handle the bandwidth, so the advantage of assignment is lower latency and more aggregate throughput among VMs and host. If 10Gbps is a concern, assignment becomes more practical. If you do decide to assign NICs, I find Intel NICs to be a good choice for assignment.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">I generally discourage users from assigning disk controllers directly to a VM. Often the builtin controllers store their boot ROM in the system firmware, making it difficult to actually boot the guest from an assigned HBA. Beyond that, the additional latency imposed by a paravirtual disk controller is typically substantially less than the latency of the disk itself, so the return on investment and additional complication of an assigned storage controller is generally not worthwhile for average configurations.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">Audio is also sometimes an issue for users. In my configuration I use HDMI audio from the graphics card for both VMs. This works well, so long as we <a href="http://vfio.blogspot.com/2014/09/vfio-interrupts-and-how-to-coax-windows.html" target="_blank">make sure the guest is using MSI interrupts</a> for the audio devices. Other users prefer USB audio devices, often in a passthrough configuration. Connecting separate VMs to the host user's pulse-audio session is generally difficult, but not impossible. A contributing problem to this is that assigned devices, such as the graphics card, need to be used in libvirt's "system" mode, while connecting to the user's pulseaudio daemon would probably be easier in a libvirt user session.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">Finally, number of processor cores and total memory size plays an important factor for the guests you'll eventually be running. When using PCI device assignment, VM memory cannot be over-committed. The assigned device is capable of DMA through the IOMMU, which means that all of guest memory needs to not only be allocated in advanced, but pinned into memory to make the guest physical to host physical mappings static for the VM. Total memory therefore needs to accommodate all assigned device VMs that might be run simultaneously, plus memory for whatever other applications the host is running. vCPUs can be over-committed, regardless of an assigned device, but doing so breaks down some of the performance isolation we gain by device assignment. If a "native" feel is desired, we'll want enough cores that we don't need to share between VMs and a few left over for applications and overhead on the host.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span> <span style="font-family: Arial, Helvetica, sans-serif;">Hopefully that helps give you an idea of where to start with choosing hardware. In the next segment we'll cover what to expect in a GPU assignment VM as users often have misconceptions around where the display is output and how to interact with the VM.</span>Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com5tag:blogger.com,1999:blog-8694303781453133223.post-77397093424572766252015-04-24T15:30:00.002-06:002015-04-24T15:30:58.235-06:00Progress on the AMD frontWith the help of Alex Deucher, I've been able to make progress on the long standing reset issue we've had with some Radeon GPUs. This issue typically manifests itself as a BSOD on VM restart. Some users attempt to do things like hot-unplug the GPU from the VM before reboot/shutdown or even suspend/resume the host in an attempt to work around this. Rebooting the host is also an option, but even more undesirable.<br />
<br />
The problem is believed to be limited to Bonaire and Hawaii GPUs. It's a hardware bug and should be fixed on more recent ASICs. In my experience, it appears that the GPU is sufficiently disconnected from the PCI bus that the PCI bus reset we rely on for most graphics cards has no effect on these particular GPUs. This leaves the internal SMC engine running microcode loaded from the guest driver, interfering with the driver re-load on the next boot.<br />
<br />
What we've found is that there are some ASIC specific reset mechanism we can use on these cards that get us to a sufficiently fresh state for the card to be used repeatedly, most of the time. I add that qualifier because I do still see occasional failure, but they manifest as if the card never wakes up rather than getting a BSOD. The solution for this is still a host reboot, but it should be a relatively rare occurrence.<br />
<br />
I had originally hoped this could be implemented as a device specific reset in the kernel, allowing a kernel update to transparently enable this for users, but given the nature of the workaround, I now feel more comfortable implementing it in the QEMU vfio driver. This will go in after the QEMU v2.3 release.<br />
<br />
If you're affected by this problem, I'd encourage you to give <a href="http://www.spinics.net/lists/kvm/msg116277.html" target="_blank">this patch</a> a try.Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com11tag:blogger.com,1999:blog-8694303781453133223.post-75730982113242380242014-10-14T08:34:00.000-06:002014-10-29T11:22:43.409-06:00KVM Forum 2014 - VFIO, OVMF, GPU, and You<span style="font-family: Arial, Helvetica, sans-serif;">Slides from my talk:</span><br />
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><a href="http://awilliam.github.io/presentations/KVM-Forum-2014/#/1">http://awilliam.github.io/presentations/KVM-Forum-2014/#/</a></span><br />
<br />
EDIT: Slides now include link to youtube video of talk</div>
Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com3tag:blogger.com,1999:blog-8694303781453133223.post-37383379923933578352014-09-22T13:25:00.000-06:002014-09-22T13:25:09.597-06:00VFIO interrupts and how to coax Windows guests to use MSI<span style="font-family: Arial, Helvetica, sans-serif;">Interrupts are used by devices for signaling attention. In the case of a NIC, there might be an interrupt indicating a packet received or that a transmit queue is empty. As with everything else, how we signal interrupts has evolved over time.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">In the PCI space, we started with just four physical interrupt lines, INT{A,B,C,D}. These are known as INTx, where x may be any one of the four lines. The configuration space for each PCI function indicates which interrupt line is used by that function. A common configuration is that function 0 may use INTA, while function 1 uses INTB, which helps to distribute devices evenly among the interrupt lines. PCI bridges also incorporate a standard swizzle to remap interrupt lines between primary and secondary interfaces, so that we don't over-use some of the interrupt lines. Each slot may also have different mappings, so INTA on one slot doesn't actually pull the same line as INTA on another slot.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">These interrupt lines are wired to be active-low, meaning that when an interrupt is not being signaled, the physical wire floats to a high value (ex. 5 volts), with a low current. If a device wants to signal an interrupt, it pulls the line to ground. For electrical reasons, this makes it possible for multiple devices to share the same interrupt line. Any one of the devices may pull the interrupt line low, then it's the task of the operating system to poll each of the devices using that particular line to determine which require service.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">One of the issues for device assignment may quickly become apparent here. When the OS polls each device to determine which device requires service, it typically does so via device specific drivers. Drivers like vfio-pci don't know how to determine which device pulled the interrupt line without some extra information. That extra information is provided by a feature introduced in the 2.3 version of the PCI spec which provides two important bits in PCI configuration space. The first is the Interrupt Status bit of the Device Status register, which tells us when the device is signaling an interrupt. This gives us a standard way to determine information that was previously device specific. The second bit is the Interrupt Disable bit in the Device Command register. This allows us to mask the interrupt at the device such that it ceases to pull the interrupt line low.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">These two important features allow us to assign a device making use of INTx signalling to a guest, because we can now identify when it is our assigned device signaling the interrupt and also prevent the device from continuing to signal in the host while it is being serviced by the guest. This latter feature means that the guest cannot saturate the host with interrupts by failing to service the interrupt.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">VFIO does also have support for non-PCI-2.3 compliant devices, but it requires a much more restricted configuration. We still need to identify when the assigned device is signaling an interrupt, which we can only do in a non-device specific way by requiring only a single device per interrupt line. Also when this is the case, we can mask the interrupt at the system APIC rather than at the device itself. Therefore we can achieve the same results, but we require an exclusive interrupt line for the device, which can often be an insurmountable configuration restriction.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">An astute reader may note that in either case, we forward the interrupt signal to the guest with either the device or the APIC configured to mask further interrupts. This implies that we need some sort of acknowledgement from the guest in order to unmask the device and allow subsequent interrupts. I won't go into the details of KVM IRQFD resamplers, but suffice to say we need a return signal from the hypervisor for this unmask. All of this masking and unmasked adds to the interrupt latency and makes INTx less desirable from a throughput and overhead perspective when assigning a device. Note that I say less desirable rather than undesirable, because in many cases the interrupt rate and latency requirements for the device are more than satisfied using this mechanism. However, if the device supports a more efficient mechanism, why not use it.</span><br />
<br />
Message Signaled Interrupts (MSI) provide that more efficient mechanism. An MSI is simply a DMA write by the device of a specific message data at a specific address. This improves two things from a virtualization perspective, first we have a much larger address space of interrupts, which generally means that each interrupts source will have an exclusive interrupt, removing the problem of determining the source when the interrupt is shared. Also, message signaled interrupts are interpreted as edge-triggered, eliminating the need for masking and thus the need for an unmask path. Therefore, to handle a device making use of MSI, we have no hardware interaction with the device upon receiving the interrupt (no device or APIC masking) and no return path from the hypervisor to re-enable subsequent interrupts.<br />
<br />
In most cases devices that can use MSI interrupts will be automatically configured to do so. You can verify this by looking in /proc/interrupts on a Linux guest or looking at the device resources in Device Manager for a Windows guest. In the Linux case the device interrupts will be listed as MSI rather than APIC, in the Windows case a negative number for the interrupt indicates MSI while a positive number indicates standard INTx. Another way to tell is by looking at /proc/interrupts on the host. Both the interrupt type and VFIO name of the interrupt will indicate the signaling method being used.<br />
<br />
To determine whether a device supports MSI, lspci on the host can be used to look at the capabilities of the device. For example:<br />
<br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$ sudo lspci -v -s 1:00.0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750] (rev a2) (prog-if 00 [VGA controller])</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Subsystem: eVga.com. Corp. Device 2753</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Flags: bus master, fast devsel, latency 0, IRQ 53</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Memory at f6000000 (32-bit, non-prefetchable) [size=16M]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Memory at c0000000 (64-bit, prefetchable) [size=256M]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Memory at d0000000 (64-bit, prefetchable) [size=32M]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>I/O ports at e000 [size=128]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Expansion ROM at f7000000 [disabled] [size=512K]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Capabilities: [60] Power Management version 3</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Capabilities: [78] Express Legacy Endpoint, MSI 00</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Capabilities: [100] Virtual Channel</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Capabilities: [250] Latency Tolerance Reporting</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Capabilities: [258] L1 PM Substates</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Capabilities: [128] Power Budgeting <?></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Capabilities: [900] #19</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Kernel driver in use: vfio-pci</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Kernel modules: nouveau</span><br />
<div>
<br /></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Here we can see the capability at [68] is an MSI capability, which is currently enabled. The device may also reports MSI-X, which is a further extension of MSI, providing some additional flexibility beyond the scope of our discussion here. Reporting either MSI or MSI-X indicates support for Message Signaled Interrupts.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">If you find that your device supports MSI but it's not being enabled, and your guest is Windows, you can follow the steps found <a href="http://forums.guru3d.com/showthread.php?t=378044" target="_blank">here</a> to attempt to enable it. Please note the part about making backups, not guaranteed to work, etc. If it doesn't work, you may find your VM unbootable and need to restore from backup. Being an assigned device, there's also a good chance you can remove the device, undo the settings, and re-add the device.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">The summary of the procedure is to identify the Device Instance Path from the Details tab of the device in Device Manager. Run regedit and find the same path under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\ After following down the tree using the Device Instance Path information, continue to follow down through "Device Parameters" and "Interrupt Management". Here you will find that you either have or need to create a new key named "MessageSignaledInterruptProperties". Within that key, find or create a DWORD value named "MSISupported". The value should be 1 enable MSI or 0 to disable.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">In my case, the Windows 8.1 VM seems to work well with MSI added and enabled on the GPU and enabled on the audio function. I can't however say that I see a noticeable performance difference although given what we know from above about the overhead of various paths, we can suspect that the hypervisor load is reduced in MSI mode.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">If you're using a Linux guest and find devices that aren't using MSI, use modinfo on the kernel module to see if it has an option to turn it on.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">A couple points beyond the scope of this post, but I'll mention for completeness, INTx on PCI Express is no longer based on a physical wire. It's actually more like MSI, using a transaction based mechanism. However, for compatibility the semantics of the interrupts are the same as if it was a physical wire. Second, while INTA-INTD are the PCI standard, chipsets can actually route more interrupt lines to help spread out the interrupt load. The Q35 QEMU model for instance has PIRQ lines A-H, and these are interleaved among devices in chipset specific ways.</span></div>
<div>
<br /></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Another important note is that hardware gets interrupts wrong. A lot. If your device doesn't work with MSI enabled, it may be because the hardware is broken and the vendor never intended MSI to be enabled. In the case of Linux, MSI is specifically disabled for some vendor's products based on past transgressions and may or may not work on your hardware. Good luck and please comment on successes or failures, particularly successes that result in a measurable performance improvement. Thanks.</span></div>
Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com12tag:blogger.com,1999:blog-8694303781453133223.post-44994497028284324932014-09-15T12:23:00.000-06:002014-09-15T12:23:38.852-06:00OVMF split image support<a href="https://www.kraxel.org/repos/firmware.repo" target="_blank">Gerd Hoffmann's Fedora OVMF builds</a> have been updated to support installing the split CODE/VARS binaries. Wherever you get your OVMF binaries, the advantage of this is that the EFI variables, ex. bootloader information, is stored separately from the executable code of the firmware allowing it to be updated without blasting the variable store. The libvirt update mentioned the other day already supports this quite nicely. Rather than having a loader entry with a single read-write image, we switch that to read-only entry and add nvram storage. The XML looks like this:<br />
<br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><domain type='kvm'></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ...</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> <os></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> <loader readonly='yes' type='pflash'>/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd</loader></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> <nvram template='/usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd'></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ...</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> </os></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"></domain></span><br />
<br />
Once the guest is started, a copy of the NVRAM templace is made an placed under /var/lib/libvirt/qemu/nvram/$DOMAIN_VARS.fd. This then becomes part of the state of the VM.<br />
<div>
<br /></div>
<div>
On the QEMU commandline, you'll need to manually create a copy of the VARS file for each VM and specify the CODE and VARS as:</div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">/usr/libexec/qemu-kvm ... \</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> -drive if=pflash,format=raw,readonly,file=/path/to/OVMF_CODE.fd \</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> -drive if=pflash,format=raw,file=/copy/of/OVMF_VARS.fd</span></div>
</div>
<div>
<br /></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">I'm also told that virt-install and virt-manager support for OVMF are coming real soon and the interface will be similar to the XML, allowing selection of both a CODE and template VARS files. The libvirt config file, /etc/libvirt/qemu.conf, also allows a default VARS template image to be specified per code image, so that the <nvram> entry gets filled in automatically based on the file used for the <loader> entry.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Finally, how do you tell whether you have a split or unified image for OVMF? Lacking some sort of parser, apparently the best way to tell is by file size. A unified image will be exactly 2MB while the split CODE image will be 2MB-128KB and the VARS image will be 128KB. Unsurprisingly then, you can also create a split image with dd, taking the first 128K as VARS and the rest as CODE.</span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;">Good luck.</span></div>
Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com1tag:blogger.com,1999:blog-8694303781453133223.post-78788291858471216742014-09-11T14:02:00.000-06:002014-09-15T09:27:36.601-06:00libvirt now supports OVMF<span style="font-family: Arial, Helvetica, sans-serif;">Thanks to the work of Michal Privoznik and support of Laszlo Ersek and others, libvirt can now manage VMs using OVMF natively. If you're on Fedora and using Gerd's OVMF RPMs, you simply need to create a copy of /usr/share/edk2.git/ovmf-x64/OVMF-pure-efi.fd for each VM (put it somewhere like /var/lib/libvirt/images/), and make it writable (support is still new and it doesn't seem to change file permissions for the VM yet). Then, edit the domain XML to include this:</span><br />
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><domain type='kvm'></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ...</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> <os></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ...</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> <b><loader type='pflash'>/var/lib/libvirt/images/VM1-OVMF.fd</loader></b></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> </os></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"></domain></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">Since the OVMF image we're using is a "unified" image, it contains both the UEFI code itself as well as variable storage space, so the above adds it as writable by the VM. There are also ways to have a split image so you can maintain the UEFI code separate from the variables, but I'll wait for builds from Gerd that support that before I attempt to document it.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">With support for both the kvm=off cpu option and OVMF in libvirt, we're now able to run completely native libvirt VMs with GeForce and Radeon GPU assignment. Support is already underway for virt-manager and virt-install of OVMF.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">Also, a VM CPU selection tip, since we don't care about migration with an assigned GPU, there are few reasons left not to want to use the </span><span style="font-family: Courier New, Courier, monospace;">-cpu host</span><span style="font-family: Arial, Helvetica, sans-serif;"> option for QEMU. To enable that through libvirt, change the CPU definition in the XML to this:</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><domain type='kvm'></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ...</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> <b><cpu mode='host-passthrough'/></b></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ...</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"></domain></span><br />
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">Automatic vCPU pinning is also available:</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><domain type='kvm'></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ...</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><b> <cputune></b></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><b> <vcpupin vcpu='0' cpuset='0'/></b></span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"><b> <vcpupin vcpu='1' cpuset='1'/></b></span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"><b> </cputune></b></span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"> ...</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"></domain></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">And yes, hugepage support is also available, see <a href="http://libvirt.org/formatdomain.html" target="_blank">libvirt documentation</a> for details. Enjoy.</span></div>
Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com7tag:blogger.com,1999:blog-8694303781453133223.post-8556530829458909652014-09-01T14:53:00.000-06:002014-09-02T20:02:43.241-06:00KVM Forum 2014<span style="font-family: Arial, Helvetica, sans-serif;"><a href="http://events.linuxfoundation.org/events/kvm-forum/program/schedule" target="_blank">The schedule is up for KVM Forum 2014</a>. It looks like I'll be talking about <a href="http://kvmforum2014.sched.org/event/bb17d420b4b850bafad76defc9c52b89" target="_blank">VFIO GPU assignment with OVFM</a> on the afternoon of Tuesday, October 14th. If you're in or around Düsseldorf, register for the conference and come see. There's also a talk just before mine on <a href="http://kvmforum2014.sched.org/event/536009faa311b825f1571e7982f577d8" target="_blank">KvmGT</a> that should be interesting.</span>Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com0tag:blogger.com,1999:blog-8694303781453133223.post-28862138866996674702014-08-31T10:59:00.000-06:002014-09-02T20:03:52.916-06:00Does my graphics card ROM support EFI?<span style="font-family: Arial, Helvetica, sans-serif;">If you're wanting to try legacy-free OVMF-based GPU assignment, it might be a good idea to start by testing whether your graphics card has EFI support in the PCI option ROM. I've written a small program to parse the ROM and report some basic info about the contents. Here's example output:</span><br />
<pre><code>
$ ./rom-parser GT635.rom
Valid ROM signature found @0h, PCIR offset 190h
PCIR: type 0, vendor: 10de, device: 1280, class: 030000
PCIR: revision 0, vendor revision: 1
Valid ROM signature found @f400h, PCIR offset 1ch
PCIR: type 3, vendor: 10de, device: 1280, class: 030000
PCIR: revision 3, vendor revision: 0
EFI: Signature Valid
Last image
</code></pre>
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">This is what we typically expect to see, there are two headers, the first is type 0, which is a standard PC BIOS ROM, the second is type 3, an EFI ROM. If you don't have EFI support in the ROM, the OVMF solution will not work for you. Newer graphics card will hopefully all have EFI support.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">To get this program:</span><br />
<pre><code>
$ git clone https://github.com/awilliam/rom-parser
$ cd rom-parser
$ make
</code></pre>
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">You'll need to copy the ROM to a file first, the program does not have support for enabling the ROM through pci-sysfs. To do this from the host:</span><br />
<pre><code>
# cd /sys/bus/pci/devices/0000:01:00.0/
# echo 1 > rom
# cat rom > /tmp/image.rom
# echo 0 > rom
</code></pre>
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">If you get a zero-sized file, look for an error in dmesg. The ROM may only be readable initially after boot, before any drivers have bound to it. Use the pci-stub.ids= boot option to attempt to keep the device in a pristine, unused state.</span>Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com14tag:blogger.com,1999:blog-8694303781453133223.post-78027518464369666712014-08-28T13:04:00.000-06:002014-09-02T20:08:04.896-06:00Upstream updates for August 28th 2014<span style="font-family: Arial, Helvetica, sans-serif;">qemu.git now includes the MTRR fixes that eliminate the long delay in guest reboot when using OVMF with an assigned device on Intel hardware that does not support IOMMU snoop control.</span>Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com0tag:blogger.com,1999:blog-8694303781453133223.post-64833395157592559702014-08-27T14:39:00.000-06:002014-09-02T20:07:47.636-06:00Fixes for Linux Radeon with 440FX guests<span style="font-family: Arial, Helvetica, sans-serif;">The DRM and Radeon drivers in Linux assume that there's always a parent device to the GPU. We can break this assumption easily with either the 440FX or Q35 QEMU machine modules by attaching the GPU to the root bus. This has been one of the problems drawing users to more complicated Q35 models which more accurately reflect the host hardware. We can also fix the driver to avoid such assumptions:</span><br />
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><a href="https://lkml.org/lkml/2014/8/27/553" target="_blank">[PATCH] drm: Test for PCI root bus to avoid NULL pointer dereference</a></span></div>
<div>
<span style="font-family: Arial, Helvetica, sans-serif;"><a href="https://lkml.org/lkml/2014/8/27/557" target="_blank">[PATCH] radeon: Test for PCI root bus before assuming bus->self</a></span></div>
Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com1tag:blogger.com,1999:blog-8694303781453133223.post-78179268234578533012014-08-26T10:30:00.001-06:002014-09-02T20:07:20.624-06:00Upstream updates for August 26th 2014<span style="font-family: Arial, Helvetica, sans-serif;">A couple updates relevant to Nvidia GeForce assignment:</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">QEMU</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;"><a href="http://git.qemu.org/?p=qemu.git;a=commit;h=fe08275db9b88ecf3a30c7540b894c25aec150c2" target="_blank">fe08275d</a> is now in qemu.git, decoupling the primary Nvidia GPU device quirk from the </span><span style="font-family: Courier New, Courier, monospace;">x-vga=on</span><span style="font-family: Arial, Helvetica, sans-serif;"> option. This means that an Nvidia GPU assigned to a legacy-free OVMF VM will now enable this quirk automatically.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">libvirt</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;"><a href="http://libvirt.org/git/?p=libvirt.git;a=commit;h=d071164272c5750a952f179d32d285e333ee267a" target="_blank">d0711642</a> is now in libvirt.git enabling libvirt support for the </span><span style="font-family: Courier New, Courier, monospace;">kvm=off</span><span style="font-family: Arial, Helvetica, sans-serif;"> QEMU cpu option. To enable this in your XML, add this to your VM definition:</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"><domain type='kvm'...></span><br />
<span style="font-family: Courier New, Courier, monospace;"> <features></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b> <kvm></b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b> <hidden state='on'/></b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b> </kvm></b></span><br />
<span style="font-family: Courier New, Courier, monospace;"> </features></span><br />
<span style="font-family: Courier New, Courier, monospace;"> ...</span><br />
<span style="font-family: Courier New, Courier, monospace;"></domain></span>Alex Williamsonhttp://www.blogger.com/profile/02071923591707250496noreply@blogger.com0