Sunday, January 3, 2016

Comments on the 7 Gamers, 1 CPU video

In case you've seen this video:


And you're thinking to yourself that the R9 Nano they used looks like a great choice for your own GPU assignment build, think again.  This GPU is known to have reset issues, so while it's impressive to see a build with this degree of consolidation, you should be suspicious about the problems Linus alludes to and the limited functionality of the overall system shown in the video.  For instance, does rebooting a VM require a host reboot, or perhaps a manual soft eject of the GPU from the VM?  We see this a lot with newer AMD cards, well beyond the partial workarounds we have for Bonaire and Hawaii based GPUs.

Personally I would have preferred to see an NVIDIA based solution, but due to the scale of this build, and unique slot and power restrictions, the compatibility with GPU assignment was mostly an afterthought.  NVIDIA is not without issues, but for the time being we understand those issues and have workarounds for them, and even a path for supported configurations with Quadro cards.

On the plus side, yes, this is using KVM and VFIO and it's an impressive example of what this technology can do.  However, when you're spec'ing your own build, do your own research and don't rely on videos like this to choose your components.  My 2 cents...

26 comments:

  1. So true.. I saw this video on a feed, and I am getting ready to do some vfio work out here in SF for some special projects. I had hoped they would get into the software a bit more.. the video was thin but at least an impressive show of the stuff we all can work with these days.

    ReplyDelete
    Replies
    1. Linus has another video where he goes into a bit more detail: https://www.youtube.com/watch?v=LuJYMCbIbPk

      However, he uses an Unraid configuration interface rather than doing it the way Alex has shown. Not that it's a wrong way to go about it, but I personally like knowing what I'm doing behind the scenes and being able to thoroughly tweak it (I'm not sure what functionality Unraid gives)

      Delete
    2. Note that in that video they used GeForce cards and AFAIK, it "just worked". As for unRAID, yes, for GPU assignment they've mostly just included the non-upstream patches (that you hopefully don't need if you've chosen your hardware correctly) and have a nice web GUI that automatically applies some of the various quirks, like the options to avoid Code 43 problems with NVIDIA. They've done a few things with their XML that I disagree with, but hopefully those will be resolved in their next release.

      Delete
    3. i think the guys in unraid doing good work by simplify the whole process .

      they are making money by selling linux as a service that is a red hat business model too :)

      i hope the next LOL arenas ( http://arcadesushi.com/league-of-legends-world-championships-45k-seats-seouled-out/ ) use computers like this in the video .

      Delete
  2. I have done this for my home project with 5 AMD r7 250 cards running air cooled and allot of hdmi over utp extenders running allover in the house. Use it for kodi in the living room and bedroop and some gaming in other rooms. Using customised proxmox software. Haven't experienced any glitches with pcie bus reset issues..vms reboot fine without issues on host or other vms.

    ReplyDelete
    Replies
    1. True, there are some Sea Island GPUs that work well, Oland is one of them. Your R7 250 is and Oland XT, I've got a HD8570 which is just a plain Oland and it works wonderfully. Bonaire and Hawaii cards have a partial workaround that seems to satisfy most users, but will fail occasionally if pushed. We have a number of users complaining about Tonga based cards, and the R9 Nano is based on Fiji, so personally I would avoid those for now, unless of course you want to help develop some code to make reset work for them.

      Delete
    2. I am running unRAID and do not have onboard video.

      Nvidia GTX 460 in the PCIe x16 slot.
      nvidia GT 210 in the PCIe x16 slot(x4 mode).

      How do I get unRAID to grab the slower card for itself? Right now unRAID is grabbing the better PCIe slot.

      Delete
    3. I am running unRAID and do not have onboard video.

      Nvidia GTX 460 in the PCIe x16 slot.
      nvidia GT 210 in the PCIe x16 slot(x4 mode).

      How do I get unRAID to grab the slower card for itself? Right now unRAID is grabbing the better PCIe slot.

      Delete
    4. This comment has been removed by the author.

      Delete
    5. put the 210 in the top slot. Running at slower PCI-e speeds is a minimal performace hit anyway. In Linus' 2 player 1 PC video he uses an old PCI-express video card, to use a pair of GPU's (980Ti and Titan) together and not sacrifice much performance (that third x16 slot is likely running at x4 or x8 tops). He never really explains it, he just goes through the setup process and what's in the PC as well as performance testing, but I imagine if you use a PCI video card you can boot the PC using PCI as the "init boot" or "init first" option, which might bind that to UnRAID, leaving the two GPUs to play with. with nVidia, you *need* another video card apparently, I did read a post of someone claiming a Quadro card can be passed easily (and thus flashing your primary card into a Quadro would solve your problems), but literally only the one post, and no evidence behind it.

      I want to do this, but I need to find a case with 8 PCI slot holes, as my third PCI-express x16 slot sits at the bottom of the case (or I could just notch the x1 slot above the first x16 slot and slap any old GPU in it, or even just use an x1 GPU...)

      Delete
  3. I've been using an R9 390x as my passthrough GPU for a few months now and haven't had any issues yet, wondering if you knew if this card also presented the reset issue the R9 Nano has?

    ReplyDelete
    Replies
    1. I have no specific knowledge of the R9 390X. In fact, the PCI-IDs database that I usually use to lookup the model name doesn't even know about it. Techpower says this is a Grenada XT based card. Wikipedia has Hawaii XT in parenthesis next to it and calls it a GCN 1.1 chip, like Bonaire and Hawaii. Fiji is GCN 1.2. So if I had to guess, I think maybe you're already using the existing reset quirk. You can check by looking to see if the PCI device ID for your card is in this list: http://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci-quirks.c;h=30c68a1e2b63ea0e8a442d460c660619e99bfba5;hb=HEAD#l1185

      Delete
    2. That paste sucked, let's try pasting it directly:

      /* Hawaii */
      case 0x67A0: /* Hawaii XT GL [FirePro W9100] */
      case 0x67A1: /* Hawaii PRO GL [FirePro W8100] */
      case 0x67A2:
      case 0x67A8:
      case 0x67A9:
      case 0x67AA:
      case 0x67B0: /* Hawaii XT [Radeon R9 290X] */
      case 0x67B1: /* Hawaii PRO [Radeon R9 290] */
      case 0x67B8:
      case 0x67B9:
      case 0x67BA:
      case 0x67BE:

      Delete
    3. Yeah, the device reads as a 290x in lspci and has the codes for Hawaii XT. Probably has to do with it being a rebranded with more RAM 290x.

      Delete
  4. It's always nice to see vfio-pci getting some "advertising" of some sort.
    Finally, it shows the most awesome purpose of vfio-pci.
    I'm curious if they'll ever do a CrossFireX build. Because that's where they'll have a lot of "fun" with unRaid.

    With PCIe lane switching(PLX manufactures some good switches with ACS), 7 is not a limit.

    ReplyDelete
    Replies
    1. Yes, I have an RDK from PLX, and a 4 port GPU oriented switch as well. I can pass all of these to individual VM's

      Delete
  5. Alex, is there a list of graphic cards that "it just works" somewhere? I would love to pick one that fits all my requirements.

    ReplyDelete
  6. As for the issues with AMD and PCI reset... I have a hint:
    Whenever my Windows10 is in a bad modd, it tries to boot up but it crashes for any unknown reason 1, 2, 3, 4 times in a row. When it happens, every time I start the VM again, the passthrough works fine.
    Then I try Safe mode, and it reboots without the reset issue.

    I've just tried to boot up straight into safe mode and shutdown/boot up the VM twice and it worked ( and it crashed after i tried reboot insted of shutdown/power on ), the I got it running, tried to force off and reboot and the PCI reset issue showed up...

    I hope this leads to a hint to fix the issue!?

    I have a Crossfire bridgeless ( 2x R9-380 ) setup working fine :)

    As for nVidia, I've always used nVidia cards in my computers, but they're trying to shut us down on VGA passthrough, and eventually they'll find a way, so, I'll stick with AMD until nVidia change it's policy about it...

    - Mauricio

    PS: I don't want to sing up for an OpenID account :p

    ReplyDelete
  7. Hi Alex, shame I forgot your advice to be careful about newer AMD cards - just bought 2 Nanos a week ago for "two gamers workstation" and they indeed are bit temperamental.

    ReplyDelete
  8. friendly reminder. there is a VFIO mailing list since the epic arch thread was closed last year.
    http://www.redhat.com/mailman/listinfo/vfio-users

    Alex maybe hint to that mailinglist on the righthandside of the blog?

    For reference the old arch thread https://bbs.archlinux.org/viewtopic.php?id=162768

    ReplyDelete
  9. I am trying to passthrough R9 Nano to a VM, and facing some problems. After Windows 7 has been installed, the latest AMD Crimson edition driver is installed. At first time, the driver works properly. There appears to have the problem when I power off the vm and restart it. The driver was not working until I reboot the host.

    Here is my system

    Manufacturer: Supermicro
    Product Name: X10DAi
    CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
    memory: 8GB
    linux kernel: 3.19.0-15

    I tried to use qemu version 2.3.1, 2.4.1, and 2.5.1.
    I also added the device id number into the function vfio_setup_resetfn(), but the problem remains.

    I start my virtual machine with the following code

    qemu-system-x86_64 -enable-kvm \
    -M pc -m 4096 -cpu host \
    -smp 4,sockets=1,cores=4,threads=1 \
    -rtc base=localtime \
    -vnc :0 \
    -vga none \
    -device lsi,id=scsi0,bus=pci.0 \
    -device vfio-pci,host=03:00.0,x-vga=on \
    -device vfio-pci,host=03:00.1 \
    -hda $1 \
    -monitor stdio

    Could this problem be a reset issue? And, do you have any advisement for us?

    ReplyDelete
    Replies
    1. Sounds exactly like the AMD reset problem, you've been duped into basing your build on a card that doesn't work. Only advice is to file a bug at bugs.freedesktop.org, product: DRI, component: DRM/Amdgpu. Alternative, use a card that's known to work.

      Delete
    2. 26.07.2016 is there any progress?

      Delete
    3. If anybody interested I've found some workaround. It is necessary to suspend-to-ram windows guest further you may kill qemu process. After that VM restart was successful.

      Delete
  10. I am trying to passthrough an NVIDIA Card which I bought in addition to my existing one into an qemu guest.

    But I think my problem is that the nvidia kernel module is grabbing both graphics card (I need one for the host), so the driver unbind does not work. after echoing the pci id into driver/unbind I still see kernel module used: nvidia for the gpu. for the audio device the gpu ships I see none.

    Is there a way around this issue?
    Thanks

    ReplyDelete
  11. Hi Alex, is it a known problem that we can't blacklist the "first" video card? Now I have two video cards which are installed at 1st slot and 3rd slot. I need to pass-through the 1st one to Windows/OSX, and use the 3rd one in Linux, but I find out I can only do it in the other way... otherwise both video cards has no output at all. I'm using Ubuntu 14.04 and using pci_stub list to do the blacklist. I think the kernel is still trying to initialize the 1st card but later it found the card is in black list and then it messed up. Could you give some help? Thanks!

    ReplyDelete

Comments are not a support forum. For help with problems, please try the vfio-users mailing list (https://www.redhat.com/mailman/listinfo/vfio-users)