Over the course of the last month, I’ve been engaged with a client around a Horizon View Plan & Design. One of the main business drivers and Use Cases, is full scale workstation replacement for CAD users (approx 8).
Therefore, I’ve spent quite some time researching the technology and also hands on with the graphics acceleration offering from Horizon View 5.3, such as vSGA and vDGA. You’ll find plenty of resources already out there, going over the differences, advantages and disadvantages of each. An excellent write up can be found here
My primary resource I used for this deployment is the VMware whitepaper Graphics Acceleration in VMware Horizon View Virtual Desktops
Let’s step through a couple of deployment stages:-
Use Case and Requirements
- Provide full scale workstation replacement for CAD users (approx 8)
- Performance doesn’t necessarily have to match or exceed that of a physical workstation.
- Performance should be suitable for CAD users compared to existing physical workstations. The ability to rotate, zoom and interact with models with no excessive lag or jitter would be considered a success.
The above may be slightly vague, however the pilot or POC is there to validate if the solution can replace existing physical desktops going forward. The customer is aware performance may not match physical, as those dedicated workstations have resources up to 24GB and x2 CPU (6 cores each). Of course, vSphere 5.5 can handle virtual machines of that size, if required.
Environment and Hardware
The hardware for the project had already been procured before the engagement started. These are the resources that were available (design constraints).
- x2 Dell PowerEdge R720 – 128GB, x2 CPU (12 cores each), core speed 2.3GHZ and local SSD storage
- ESXi 5.5 and vCenter 5.5 (vCSA)
- x2 Nvidia K1 GRID cards per host, offering 8 GPUs per host (4 per card)
- Shared storage – NetApp array
- Dell Wyse P25 Zero Client, with latest v4.2 firmware applied.
There are a number of decisions or choices, some of these are typical for vSphere and Horizon View design, however they require thought, along with the impact on other parts of the design.
- Virtual Machine resources (RAM\CPU) – Small, medium or large VMs? If considering large VMs for CAD users with maybe 4 vCPU and 8GB RAM, how does this affect CPU co-scheduling, HA. host density or other virtual machines etc
- Storage – Will fast shared storage be used? Does this provide the required I\O, latency and throughput? Can local SSD storage be used? Consider the impact of local storage, in terms of availability, manageability and maintenance operations
- Pool settings (video RAM) – 64MB to 512MB? If choosing 512MB, consider the additional overhead and available resources of the cluster.
- vSphere features – Impact from using Direct I\O passthrough device to virtual machines – No HA, DRS and vMotion.
- View Pool type – Automated or Manual? Due to passthrough device, vDGA can only use a Manual pool, not Automated or Linked Clones (only software rendering and vSGA).
- Balance CAD workloads (if heavy and demanding) across K1 hardware and ESXi hosts
- GPU – How do you carve up your GPU selection across your available slots?
- Network – LAN v WAN? Expected latency of connection as higher latency will impact the CAD session.
- As a result of networking, how do you tune PCoIP accordingly? Max\Min Image quality, Image Caching, FPS & Max Session bandwidth?
- Endpoints – Are proposed client devices powerful enough to handle the workload of 3D? Tera1 devices can support up to 30fps, Tera2 devices up to 60fps.
Some of the above can be clarified further by a desktop assessment of those existing physical workstations used by CAD users.
Otherwise, use your pilot or POC to validate the above, and adjust accordingly. Test, test and test!
Prepare ESXi hosts
Two display adapters – If the high-end NVIDIA card is set as the primary adapter, Xorg will not be able to use the GPU for rendering.
If you have two GPUs installed, the server BIOS may give you the option to select which GPU should be the Primary and which should be the Secondary. If this option is available, make sure the standard GPU is set as Primary and the high-end GPU is set as Secondary.
After the Nvidia hardware has been installed, you’ll need to install the drivers into ESXi 5.5 using the following:-
- # vim-cmd hostsvc/maintenance_mode_enter
- # localcli software vib install –no-sig-check -v /<path-to-vib>/NVIDIA-VMware-319.65-1OEM.5126.96.36.1991820.x86_64.vib
- # vim-cmd hostsvc/maintenance_mode_exit
Following the driver install, configure the GPUs for passthrough using ESXi>Advanced Settings>DirectPath I\O Configuration
Note: These drivers are only required in vSGA mode. With vDGA the GPU is passed directly to the virtual machine, and the Nvidia drivers installed in the guest operating system are used. Typically, most deployments utilise both vSGA and vDGA, so I would recommend the drivers being installed into ESXi at the beginning, to ensure all the relevant pieces are in place for different scenario’s.
Full details can be found here Graphics Acceleration in VMware Horizon View Virtual Desktops
- Ensure Intel VT-d in BIOS is enabled or check using esxcfg-module –l | grep vtddmar
- Check PCI Passthrough (green flag) of K1 GRID devices via ESXi>Advanced Settings
Prepare Parent Image
After installation of relevant applications such as Solidworks, it was time to fine tune the image. Below is a checklist of things to cover off:-
- VMware HW version 9 or 10 (only 128MB Video RAM available with v8).
- VMware HW – Video card (Auto detect settings)
- Minimum 4GB RAM and 2 vCPU
- Configure PCI passthrough (Nvidia K1 GRID GPU) device.
- VMXNET3 adapter
- Install latest Nvidia drivers – 332.76 into Windows (reboot)
- Check Windows Device Manager for Nvidia device
- Install Horizon View Agent 5.3 (reboot)
- Customise Windows – Enable Windows Aero, Themes service, Let Windows choose and Enable Transparency
- Run the VMware OS optimise tool Be careful not to disable settings required for 3D experience, see above.
- Enable the proprietary NVIDIA capture APIs by running C:\Program Files\Common Files\VMware\Teradici PCoIP Server\ MontereyEnable.exe” –enable
- Reboot virtual machine
- Registry changes, if required (see Performance Tips below)
- Activate Nvidia display adapter. Connect to VM for first time via PCoIP in full screen (use Manual Pool) from endpoint at native resolution, or VM will use the Soft 3D display adapter. vDGA does not work through the vSphere console sessions.
- After connecting via PCoIP, run dxdiag.exe and check Display tab for Nvidia GPU and driver
Note: After initial testing had been performed, remove the PCI device from the above Parent Image, and then cloned off another x of virtual machines. If you forget to remove the PCI device, you won’t be able to clone the virtual machine. After the virtual machines had been cloned and joined to the domain, a unique PCI device was assigned to each VM.
|VM||ESXi Host||PCI Device||User||Usage|
|CAD-PC-01||ESXi 1||06:00:0 K1 GRID||1||High|
|CAD-PC-02||ESXi 1||07:00:0 K1 GRID||2||Low|
|CAD-PC-03||ESXi 1||08:00:0 K1 GRID||3||Low|
|CAD-PC-04||ESXi 1||09:00:0 K1 GRID||4||High|
|CAD-PC-05||ESXi 2||06:00:0 K1 GRID||5||High|
|CAD-PC-06||ESXi 2||07:00:0 K1 GRID||6||High|
|CAD-PC-07||ESXi 2||08:00:0 K1 GRID||7||Low|
|CAD-PC-08||ESXi 2||09:00:0 K1 GRID||8||Low|
Horizon View Pool Configuration
- Manual Pool & Dedicated assignment
- PCoIP and 2 monitors (max allowed)
- Users cannot choose protocol
- 3D rendering – Hardware
- Video RAM – 512MB
After VMs have been configured or re-configured in vCenter, you must power off, and on, existing virtual machines for the 3D Renderer setting to take effect. Restarting or rebooting a virtual machine does not cause the setting to take effect.
A couple of cool benchmark tools you can use are as follows:-
Due to restrictions with software downloads, I was only able to run the performance benchmark tool provided by Solidworks.
- Virtual Machine – Minimum 4GB and 2 vCPU, plus VMXNET3
- PCoIP FPS (30), if application requires more increase to 60-120fps
- Tera2 Zero client device only support up to 60fps
- Enable PCoIP Image Caching, as Tera2 Zero client devices running firmware v4.1 can take advantage (View 5.2 onwards)
Note: The above setting is more geared towards bandwidth savings, which on a local LAN may be required. However, the whitepaper VMware View 3D Graphics Performance Study including Solidworks provides the following point of interest:-
In initial performance testing, it was quickly discovered that the sophisticated image caching techniques in View 5.2 ensured that any repetitive interaction with the CAD applications was rapidly cached such that, in some cases for the remainder of the test, View was able to source up to 90% of the total remoted pixels from the image cache. Accordingly, simple model rotations or model animations are not suitable operations for examining the real-world performance of the system.
Real world usage and interaction of CAD may result in the Image Cache being less effective, however I personally see no harm in enabling the feature. Every little bit of help goes a long way!
- Enable (Disable Build to lossless), reducing amount of PCoIP traffic and load on VM and endpoint device.
- Enable relative mouse (if app cursor control is uncontrollable) – Only supported through software client
- Solidworks – Tools>Options>Performance Toggle between hardware\software rendering (the default is hardware).
1. CAD and CATIA related. Occasionally, when working with CAD models (when turning and spinning), you may find that objects move irregularly and with a delay. However, the objects themselves are displayed clearly, without blurring
- HKLM\SOFTWARE\VMware, Inc.\VMware SVGA DevTap\
Value Name: MaxAppFrameRate=dword:00000000
If this registry key does not exist, it defaults to 30. Possibly set to match default PCoIP frame rate (i.e 60-120)
This change can negatively affect other applications. Use with caution and only if you are experiencing the symptoms mentioned above.
Note: I did not apply the above setting. As advised in the VMware whitepaper, I was only prepared to make this change, if I noticed the above behaviour. For my particular deployment, the CAD application performance was more than acceptable for the end user.
2. VMs using VMXNET3 (improve video playback performance, if required)
Value Name: FastSendDatagramThreshold
Data Type: REG_DWORD
Note: Both registry changes require a reboot of Windows.
To monitor performance within Windows:-
- Use the Nvidia Control Panel provided as part of the install of Nvidia software.
- Use the command nvidia-smi to monitor the usage of your GPU. This can be found in C:\Program Files\Nvidia Corporation\NVSMI
The best nvidia-smi metric located at the right of the middle section, showing % of each GPUs cores at point in time.
The nvidia-smi command should be run within Windows and not the ESXi host, as the hardware device (GPU) is being passed directly to the operating system with vDGA.
- PCoIP Log Viewer tool to monitor session bandwidth, latency, image encoding (FPS).
- Excellent blog about trouble-shooting PCoIP
- FRAPS Benchmark FPS
Solidworks provides a performance benchmark tool which was run twice. The results were compared to a previous benchmark run against a physical system.
- VM – x2 vCPU, 4GB, dedicated Nvidia K1 GRID GPU and NetApp storage
- Physical – x2 CPU (6 cores each), 12GB RAM, Quadro 4000 PCIe graphics and local SATA storage
|Attribute||VDI – Test 1||VDI – Test 2||Physical|
|Real View Performance||10||111.3||165.2|
Note: The differences between VDI Test 1 and Test 2 in terms of configuration were minimal. The second test was performed with the virtual machine domain joined and able to access CAD files\databases based on the network.
Unfortunately, I’ve been unable to validate the above results further and try to identify the differences. You would have thought with the spec of the virtual compared to the physical workstation, results may have varied. My access at the customer site is extremely limited, with no access to the physical workstation for further inspection or analysis.
The main point here, is the VDI solution more than matches up to the physical workstation. Based off these results and the positive feedback from the customer, the 3D solution gets the thumbs up!
A couple of further tweaks can be implemented to drive performance higher, and reduce the above benchmark ratings, if required.
- Existing configuration is limited at 30fps (Windows limit), increase if the application requires a higher frame rate.
- Registry – MaxAppFrameRate (see above) or Graphics Acceleration in VMware Horizon View Virtual Desktops
- Consider increasing the VM vCPU count from 2 to 4, if additional rendering performance is required. From my research via the Solidworks forums, the biggest hitter when it comes to improving rendering is CPU.
- Consider running the VMware OS Optimisation tool if further savings are required.
- Consider placing the CAD virtual machines on fast storage or local SSD, to improve I\O performance.
The above configuration resulted in excellent initial performance. The end user reported the performance and capabilities of the CAD application was comparable to the physical PC. I was concerned the differences in the hardware may have disappointed the end user, however the Nvidia and Horizon View technology stood up well and exceeded expectation. It would have been nice to have further tested and tweaked the configuration using other benchmark tools, and further investigated the differences between the physical and virtual platforms, however with limited access onsite, this was not possible.
For further reading, check out my 3D Resources page