Tuesday, August 26, 2014

VFIO+VGA FAQ

Question 1:

I get the following error when attempting to start the guest:
vfio: error, group $GROUP is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
Answer:

There are more devices in the IOMMU group than you're assigning, they all need to be bound to the vfio bus driver (vfio-pci) or pci-stub for the group to be viable.  See my previous post about IOMMU groups for more information.  To reduce the size of the IOMMU group, install the device into a different slot, try a platform that has better isolation support, or (at your own risk) bypass ACS using the ACS override patch.

Question 2: 

I've applied the ACS override patch, but it doesn't work.  The IOMMU group is the same regardless of the patch.

Answer:

The ACS override patch needs to be enabled with kernel command line options.  The patch file adds the following documentation:


pcie_acs_override =
        [PCIE] Override missing PCIe ACS support for:
    downstream
        All downstream ports - full ACS capabilties
    multifunction
        All multifunction devices - multifunction ACS subset
    id:nnnn:nnnn
        Specfic device - full ACS capabilities
        Specified as vid:did (vendor/device ID) in hex
The option pcie_acs_override=downstream is usually sufficient to split IOMMU grouping caused by lack of ACS at a PCIe root port.  Also see my post discussing IOMMU groups, ACS, and why use of this patch is potentially dangerous.

Question 3:

I have Intel host graphics, when I start the VM I don't get any output on the assigned VGA monitor and my host graphics are corrupted.  I also see errors in dmesg indicating unexpected drm interrupts.

Answer:

You're doing VGA assignment with IGD and have failed to apply or enable the i915 VGA arbiter patch.  The patch needs to be enabled with i915.enable_hd_vgaarb=1 on the kernel commandline.  See also my previous post about VGA arbitration and my previous post about using OVMF as an alternative to VGA assignment.

Question 4:

I have non-Intel host graphics and have a problem similar to Question 3.

Answer:

You need the other VGA arbiter patch.  This one is simply a bug in the VGA arbiter logic.  There are no kernel command line options to enable it.

Question 5:

I have Intel host graphics, I applied and enabled the i915 patch and now I don't have DRI support on the host.  How can I fix this?

Answer:

See my previous post about VGA arbitration to understand why this happens.  This is a know side-effect of enabling VGA arbitration on the i915 driver.  The only solution is to use a host graphics device that can properly opt-out of VGA arbitration or avoid VGA altogether by using a legacy-free guest.

Question 6:

How can I prevent host drivers from attaching to my assigned devices?

Answer:

The easiest option is to use the pci-stub.ids= option on the kernel commandline.  This parameter takes a comma separated list of PCI vendor:device IDs (found via lspci -n) for devices to be claimed by pci-stub during boot.  Note that if vfio-pci is built statically into the kernel, vfio-pci.ids= can be used instead.  There is currently no way to select only a single device if there are multiple matches for the vendor:device ID.

Question 7:

Do I need the NoSnoop patch?

Answer:

No, it was deprecated long ago.

Question 8:

Do I need vfio_iommu_type1.allow_unsafe_interrupts=1?

Answer:

Probably not.  Try vfio-based device assignment without it, if it fails look in dmesg for this:
No interrupt remapping support.  Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
If, and only if, you see that error message do you need the module options.  Also note that this means you opt-in to running vfio device assignment on a platform that does not protect against MSI-based interrupt injection attacks by guests.  Only trusted guests should be run in this configuration.  (Actually I just wish this was a frequently asked question, common practice seems to be to blindly use the option without question)

Question 9:

I use the nvidia driver in the host.  When I start my VM nothing happens.  What's wrong?

Answer:

The nvidia driver locks the VGA arbiter and does not release it causing the VM to stop on its first access to VGA resources.  If this is not yet fixed in the nvidia driver release, user contributed patches can be found to avoid this problem.

Question 10:

I'm assigning an Nvidia card to a Windows guest and get a Code 43 error in device manager.

Answer:

The Nvidia driver, starting with 337.88 identifies the hypervisor and disables the driver when KVM is found.  Nvidia claims this is an unintentional bug, but has no plans to fix it.  To work around the problem, we can hide the hypervisor by adding kvm=off to the list of cpu options provided (QEMU 2.1+ required).  libvirt support for this option is currently upstream.

Note that -cpu kvm=off is not a valid incantation of the cpu parameter, a CPU model such as host, or SandyBridge must also be provided, ex: -cpu host,kvm=off.

Update: The above workaround is sufficient for drivers 337.88 and 340.52.  With 344.11 and presumably later, the Hyper-V CPUID extensions supported by KVM also trigger the Code 43 error.  Disabling these extensions appears to be sufficient to allow the 344.11 driver to work.  This includes all of the hv_* options to -cpu.  In libvirt, this includes:

    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>

    </hyperv>

and

  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
  </clock>

Unfortunately removing these options will impose a performance penalty as these paravirtual interfaces are designed to improve the efficiency of virtual machines.

37 comments:

  1. (One of?) the [user contributed patches](https://lists.gnu.org/archive/html/qemu-devel/2013-05/txtmFIq9_I13p.txt) mentioned in the answer to Question 9.

    You can also solve this problem by booting in UEFI mode, e.g. via gumiboot/efistub. It's not supported either, but it's been working for me for over a year I think.

    ReplyDelete
  2. Under Windows (10) Technical Preview, I'm able to use the latest (v347.09) Nvidia drivers with the hv_* and kvm CPU features enabled. They only need to be disabled during the driver installation. Thanks Nvidia!

    ReplyDelete
    Replies
    1. Be sure to try rebooting the guest a few times, even with a Win8.1 guest I can re-enable hyperv features and unhide kvm and get it to boot with the Nvidia drivers, but it doesn't last.

      Delete
    2. This comment has been removed by the author.

      Delete
    3. Ok, I have now had a few "Code 43" errors. It seems (after limited testing) under WTP the check that triggers it is only performed when the driver is being enabled or when performing a system restart but *not* with a regular shutdown. So after configuring Windows not to automatically update and restart, apart from when I restart manually/on-purpose, I think I'm "Code 43" free.

      Delete
  3. Latest (347.xx) Nvidia drivers seem to have some very aggressive virtualization checks. I keep getting "Code 43" errors no matter what vfio pass-through platform I build - tried it on Windows 7 and 8.1 with and without Q35. My latest attempt was to install Windows 8.1 with OVFM boot loader:
    qemu-system-x86_64 -nographic -vga none \
    -enable-kvm -rtc base=localtime -m 4096 -smp sockets=1,cores=2 \
    -cpu host,kvm=off,hv-time=off,hv-relaxed=off,hv-vapic=off \
    -monitor stdio -serial none -parallel none -nodefconfig \
    -drive file=OVMF-pure-efi.fd,if=pflash,format=raw,readonly \
    -drive file=win81.qcow2,if=ide,media=disk,format=qcow2 \
    -device vfio-pci,host=01:00.0,multifunction=on,x-vga=on,romfile=Gigabyte.GTX970.rom \
    -device vfio-pci,host=01:00.1

    and I am still getting "Code 43" errors from the driver. Any suggestions?
    Besides the last desperate attempt to disassemble Nvidia driver and NOP the virtualization check...

    ReplyDelete
    Replies
    1. I'm running the latest 347.88 just fine. I expect your problem is:

      hv-time=off,hv-relaxed=off,hv-vapic=off

      Typically the way to disable these options is simply not to specify them. Try using "-cpu host,kvm=off"

      Delete
    2. Yeah, I tried this simpler option "host,kvm=off" too - without any luck :(
      Would there be any other reason for the driver to throw this error 43? If there were any resource or IRQ conflicts, then the error would have been different (10 or 12 I think).
      By the way, the display port output on the pass-through card remains blank the whole time - should I get any output on it while the OVMF or Windows boots?
      So I am running the VM headless, logging in through remote desktop to see the error in Device Manager.

      Delete
    3. What kernel & QEMU versions on the host?

      Delete
    4. Latest 2.3.0-rc2 (of yesterday) Qemu, running on ~1½ month old (latest at that time) Gentoo kernel with all the patches & kernel cmdline options from your archlinux thread.

      Delete
  4. I use the latest Proxmox 4.2.6 kernel, I applied the ACS override patch successfully, but my GPU and SAS controller are still in the same IOMMU group, I used the pcie_acs_override=downstream parameter and I also tried it with multifunction as value, but both devices are in the same group..

    I have a Supermicro X10SL7-F and a Xeon E3-1230v3, someone in the Arch forum had the exact same problem with this board but after he applied the acs override patch, the issue was gone. Many people on the Proxmox forum also complain about iommu grouping issues, could it be an issue with the 4.2 kernel? Since everyone says it works fine with 3.16.

    ReplyDelete
    Replies
    1. Also doesn't work with kernel 3.16 and 3.10 with ACS override patch applied.. I'm starting to think it's a problem with the motherboard? But then I can't believe the guy in the arch forums got it working properly with his 3.15 kernel... Can somebody help me please?

      Delete
    2. HELL YEAH the newest Proxmox kernel fixed the issue and my LSI controller finally gets separated from the GPU! Oh my god I'm so happy right now!

      Delete
  5. Question 10:

    I'm assigning an Nvidia card to a Windows guest and get a Code 43 error in device manager.

    Answer:

    Starting with emu 2.5.0 there's a new feature that enables you to spoof the hv_vendor_id. https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg00662.html

    Using this method I could re-enable all hv_ feature flags and passthrough a GTX980 in Windows 10 with the latest hot fix nvidia driver 361.60.

    If you are using libvirt, you need to add the xml namespace at the top and add the following lines to the end of your XML:

    [.. YOUR custom XML ... ]





    ReplyDelete
    Replies
    1. -cpu host,kvm=off,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vendor_id=Nvidia43FIX

      Delete
    2. Just when I was about to give up, you post this.
      Looks like there might still be hope.

      Delete
    3. Is there anyway to get passed a Code 28? The error is as follows:
      "The drivers for this device are not installed. (Code 28)
      There are no compatible drivers for this device."

      Installing nvidia drivers also throws this error:
      "This graphics driver could not find compatible graphics hardware."

      Device Manager does recognize the correct vendor:device ID as it matches what my host reports. Card is 860M, not trying to use an external monitor(laptop), just trying to get this working via VNC, gpu appears to be passed through and Windows is installed with no other errors, so my hardware meets the requirements. Just cannot get passed Code 28 and no one seems to cover it or talk about it.

      Do you think this technique could help with my issue? I have full details of my setup posted here: https://www.reddit.com/r/VFIO/comments/41ohss/laptop_with_nvidia_gpu_passed_through_but_cannot/

      Delete
    4. This is a typical result for laptop GPUs, there is far more integration and BIOS support required for laptop GPUs than for discrete plugin desktop GPUs.

      Delete
  6. Where do I put this -cpu host,kvm=off option? I could not find the CPU options?

    ReplyDelete
    Replies
    1. I apologize if this is noob question, I am trying to learn. But I cannot seem to figure out where cpu options are.

      Delete
    2. -cpu is a QEMU command line option. I'd very much recommend following the how-to series on this blog, which documents exactly how to configure this.

      Delete
    3. So I read the 5 part VFIO how to, but I do not remember seeing this, I did see the part where you use the hidden state. and this worked for a windows 10 OVMF setup, but I am needing to use Windows 7 and legacy. And I ready about how this option kvm=off does not get automatically applied if I am using legacy.





      So I guess your telling me QEMU needs to be started manually or I need to start the VM Guest via command like with this KVM option off? I have been doing VM management for the most part through the GUI so I have not had to start a VM with command line options before. Ill try and figure it out.

      Delete
    4. Ok so I just checked my output in ps aux with the VM running and its already running with -cpu SandyBridge,kvm=off I suppose the other thing I am doing wrong is using vfio-pci with a legacy VM boot, when I read the section about applying the arbitration patch I did not understand how to apply it when I clicked the link for it. Ill keep trying to read and understand your blog.

      Delete
  7. I'm also getting the Code 43 error for an nVidia card in the device manager.

    I've tried lots of combinations of -cpu=* settings, but none of them did help. The parameters suggested by Paul Idstein didn't fix it as well.

    I wonder if someone could give me ideas on how to troubleshoot/investigate it further, since I think I've tried all the combinations I was able to find.

    Here's my configuration:
    * Intel 6700k, nVidia 660 GTX.
    * Host is running Debian Jessie with some packages from Sid, particularly:
    ** Custom-built 4.4 kernel with the VGA arbitration patch.
    ** QEMU 2.5 which contains the fix mentioned by Paul. (Verified by setting vendor ID to more than 12 characters and seeing a message)
    * Guest video driver version 361.75, which was downgraded during experiments to 341.44.

    VM configuration:
    qemu-system-x86_64 \
    -enable-kvm \
    -cpu host,kvm=off,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vendor_id=Nvidia43FIX \
    -smp 8 \
    -m 1536 \
    -hda win8.qcow2 \
    -device vfio-pci,host=01:00.0 \

    Notable -cpu flag values that didn't fix the issue:
    -cpu host,kvm=off
    -cpu host,kvm=off,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time
    -cpu host,hv_time,hv_vendor_id=KeenlyKVM

    Any help would be very appreciated.

    ReplyDelete
    Replies
    1. Specifying -bios /usr/share/ovmf/OVMF.fd doesn't help either.

      Delete
    2. You need to passthrough a complete iommu group. If your GTX 660 has also an audio "device" this is often a multifunction pci device on 01:00.1 in your case. This must also be attached to the very same vm.

      Here's my qemu runtime (I'm using ZFS ZVOL as storage => don't do that if you don't know what you are doing ;) ) and virsh is generating this command for me so some statements appear doubled.

      /usr/bin/qemu-system-x86_64
      -name windows
      -S
      -machine pc-q35-2.5,accel=kvm,usb=off // q35 as machine
      -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on //uefi boot
      -drive file=/var/lib/libvirt/qemu/nvram/windows_VARS.fd,if=pflash,format=raw,unit=1 // uefi settings
      -m 16192 // memory
      -realtime mlock=off
      -smp 8,sockets=1,cores=4,threads=2
      -uuid 0fb2be0b-9a3c-42d6-8768-e86c5b710c71
      -nographic
      -no-user-config
      -nodefaults
      -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-windows/monitor.sock,server,nowait
      -mon chardev=charmonitor,id=monitor,mode=control
      -rtc base=localtime
      -no-shutdown
      -global ICH9-LPC.disable_s3=0
      -global ICH9-LPC.disable_s4=1
      -boot menu=off,strict=on
      -device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e
      -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1
      -device ioh3420,port=0xe0,chassis=3,id=pci.3,bus=pcie.0,multifunction=on,addr=0x1c
      -device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3
      -device ioh3420,port=0x20,chassis=5,id=pci.5,bus=pcie.0,addr=0x4
      -device ich9-usb-ehci1,id=usb,bus=pcie.0,multifunction=on,addr=0x1d.0x7
      -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x1d
      -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,multifunction=on,addr=0x1d.0x1
      -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,multifunction=on,addr=0x1d.0x2
      -drive file=/dev/harbour/windows,format=raw,if=none,id=drive-virtio-disk0,cache=directsync,aio=native
      -device virtio-blk-pci,scsi=off,bus=pcie.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
      -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x1b -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0
      -device vfio-pci,host=01:00.0,id=hostdev0,bus=pci.3,multifunction=on,addr=0x0 // gpu passthroguh
      -device vfio-pci,host=01:00.1,id=hostdev1,bus=pci.3,addr=0x0.0x1 // gpu-audio hdmi passthrough
      -device vfio-pci,host=03:00.0,id=hostdev2,bus=pci.4,addr=0x0 // ethernet adapter
      -device vfio-pci,host=05:00.0,id=hostdev3,bus=pci.5,addr=0x0 // wifi adapter
      -device usb-host,hostbus=1,hostaddr=5,id=hostdev4 // host usb mouse
      -device usb-host,hostbus=1,hostaddr=4,id=hostdev5 // host usb keyboard
      -device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x3
      -vga none // disable any integrated emulated 1st graphics device
      -cpu host,kvm=off,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vendor_id=Nvidia43Fix // disable KVM flag and enable hv windows flags with custom hv vendor
      -msg timestamp=on

      Delete
  8. I am having two issues, appreciated if you don't mind providing some ideas.

    1. If I check the ROM_BAR checkbox in the PCI devices (graphics and audio) of my AMD 7950, the virtual machine will be stuck right at start, no visual, and CPU usage stuck at 25% of a quad core config.

    2. If I uncheck the box, I can boot, and I can see the AMD graphics IF I am lucky. I made it work once, for some reason I deleted the domain and used a new one. Then on booting, it kept showing BSoD with SYSTEM_EXCEPTION_NOT HANDLED (atixxxx.xxx)

    Would you have any ideas?

    ReplyDelete
    Replies
    1. Try changing you cpu configuration, I had an AMD proc, setup with cpu=host shows this BSOD, cpu=phenom solves it...
      For intel proc, try using different intel arch such as core2duo.

      Maybe it's too late, but might help someone...

      Delete
  9. I'm also getting Error 43 with my GTX 650 on Windows 10 w/ qemu 2.5.0. Is there anything immediately wrong with my config?

    qemu-system-x86_64
    -enable-kvm
    -cpu host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,kvm=off,hv_vendor_id=Nvidia43FIX
    -m 4G
    -smp 8
    -boot order=c
    -drive file=image_file,format=raw
    -device=vfio-pci,host=01:00.0,multifunction=on
    -device=vfio-pci,host=01:00.1

    ReplyDelete
    Replies
    1. I have a similar setup to you (same qemu arguments, also Windows 10 but with a GTX 670) and I am also getting Error 43.

      Delete
    2. Hm that's very similar indeed... Has this worked for you previously? This is my first attempt at a FVIO setup. Might ping the mailing list otherwise.

      Delete
    3. It's also my first attempt. I might try an older version of Windows with older nvidia drivers (ie. ones that don't check for KVM) just to make sure that it's not something else causing the error.

      Delete
    4. I was considering doing that as well; might be a few days since I am doing this on a spare drive on my dev machine and have to actually get some work done at some point here.

      Please update if you get anywhere though (I'll do the same). I'm very interested to see if this is workable. I also considered using virt-manager, but didn't get around to it yet.

      Delete
  10. This comment has been removed by the author.

    ReplyDelete
  11. Same problem: nvidia error 43 and bsod

    My configuration:
    * Intel i7-6700k, nVidia GTX 950
    * debian sid
    * win10

    Qemu command:
    qemu-system-x86_64 \
    -enable-kvm -M q35,accel=kvm -m 4096 -cpu host,kvm=off,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vendor_id=Nvidia43FIX -smp 4,sockets=1,cores=2,threads=2 \
    -bios /usr/share/seabios/bios.bin -vga std \
    -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \
    -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on \
    -device vfio-pci,host=01:00.1,bus=root.1,addr=00.1 \
    -usb -usbdevice host:046d:c52b \
    -drive file=/mnt/virt/snapshot.qcow2,id=disk,format=qcow2,if=virtio \
    -net nic,macaddr=00:00:00:00:00:46,model=virtio -net bridge \
    -rtc base=localtime \
    -boot menu=on

    I can install latest nvidia driver, but:
    - without kvm=off i got error 43
    - with kvm=off i got bsod

    any idea?

    thanks in advance

    ReplyDelete
  12. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. This comment has been removed by a blog administrator.

      Delete

Comments are not a support forum. For help with problems, please try the vfio-users mailing list (https://www.redhat.com/mailman/listinfo/vfio-users)