Enabling Scalable User Experiences with Modern Workloads on Windows Virtual Desktop

If you’re supporting the recent influx in remote work, you’ve probably noticed that business applications are more graphics-heavy than ever before. Applications such as Microsoft Office, Google Chrome, and PDF readers now offer graphics-rich features that require more power. In addition, 4K and multiple high-resolution monitors, as well as multimedia streaming, are becoming the new normal in the digital workplace. Whether users are at the office or at home, you need to deliver these critical applications without compromising on performance and user experience.

NVIDIA Virtual GPU (vGPU) technology delivers responsive remote desktops by virtualizing the physical GPU to offload graphics workloads by moving them to virtual GPUs. With the graphics workloads offloaded to GPU, the CPU is freed up for non-graphics tasks, so that more users can be supported with a better user experience.

Microsoft Azure provides N-series instances that are enabled with GPUs to support the needs of compute-intensive, graphics-intensive, and visualization workloads. The NV-series instances are optimized and designed for knowledge workers, as well as creative and technical professionals.

The ways to access NVIDIA GPU-accelerated virtual desktops and workstations from the Azure cloud are as follows:

N-Series Virtual Machines for Windows Virtual Desktop
NVIDIA Quadro Virtual Workstation in the Microsoft Azure Marketplace
Bring Your Own NVIDIA vGPU License (BYOL)

Azure offers the VM sizes powered by NVIDIA in the following table.

NVIDIA-powered Azure VM size	Use case	GPU
NC_v3 Series (NC6s_v2, NC12s_v2, NC24s_v2, NC24rs_v2)	Accelerating machine training and high performance computing (HPC)	NVIDIA V100
ND_v2 Series (ND40s_v3)	Accelerating machine training and HPC	NVIDIA V100
ND Series (ND6s, ND12s, ND24s, ND24rs)	Deep learning and inferencing, and remote visualization workloads	NVIDIA P40
NC_v2 Series (NC6s_v2, NC12s_v2, NC24s_v2, NC24rs_v2)	Accelerating machine training and HPC	NVIDIA P100
NV and NV_v3 Series (NV6, NV12, NV24, NV12s_v3, NV24s_v3, NV48s_v3)	Remote visualization workloads and other graphics-intensive applications	NVIDIA M60

Table 1. VM sizes powered by NVIDIA that are offered by Microsoft Azure.

Windows Virtual Desktop is a comprehensive desktop and app virtualization service running on Azure. Windows Virtual Desktop delivers simplified management, Windows 10 multi-session, optimizations for Office 365 ProPlus, and support for Remote Desktop Services (RDS) environments. Windows Virtual Desktop use cases can be categorized as the following:

Light: Ideal for lightweight use cases such as data entry and call center apps.
Medium: Ideal for basic Microsoft Office apps such as Word and Excel, as well as database apps.
Heavy: Ideal for more intensive workloads such as development or engineering.
Heavy Graphics: Ideal for graphics-intensive apps such as 3D CAD/CAE, and photo/video editing tools.

From the above use case categories, it’s common knowledge that GPUs are a must-have for Heavy Graphics workloads. Can GPU-enabled VMs benefit customers running Medium or Heavy workloads too? To find out, NVIDIA and Microsoft ran performance tests on the vGPU and CPU-only Windows Virtual Desktop instances for the Medium and Heavy user profiles, using the NVIDIA nVector benchmarking tool.

NVIDIA developed the nVector benchmarking tool to simulate end-user workflow of different user profiles measuring the quality of the user experience across the following specific metrics:

End-User Latency: Measures the end-user responsiveness or how interactive the session is (amount of lag) at the endpoint.
Frame Rate: Measures the fluidity of the session or the number of frames that are sent to the end user.
Image Quality: Measures how much the image is impacted by the remote protocol.

The NVIDIA nVector benchmarking tool was used to compare the vGPU (NV12s_v3) instance with the CPU-only (D8s_v3) instance. The target workflow for comparison included a knowledge-worker workload that simulated Office 365, web browsing (Chrome), video playback, and PDF viewing. It also ran tests on collaborative tools such as Microsoft Teams, Cisco Webex, and WebGL-based SaaS apps on a single HD/4K display.

Test results

The following are some of the observations derived from the test results showing great quality of user experience while supporting a higher number of sessions on a VM.

Knowledge worker workload

Scale: 8/16 VMs | Full HD | Number of Displays : 1 | Protocol : RDP | Workload Duration: 40 min

Based on the nVector benchmark, we show that twice the number of users can be supported when using the NV12s_v3/GPU instance (16 sessions) compared to the D8s_v3/CPU instance (8 sessions) while maintaining the same user experience. With 16 sessions running on a vGPU NV12s_v3 instance, there is an overall 18% savings in CPU utilization compared to eight sessions running on a CPU-only D8s_v3 instance. Figure 1 shows the difference in CPU utilization.

The user-experience (UX) metrics also led us to conclude that while still maintaining low end-user latency, we can obtain 2x the number of frames in a vGPU instance compared to a CPU-only instance. Figure 2 shows the frame rate and end-user latency difference between a GPU-accelerated instance with twice as many sessions as a CPU-only instance.

WebGL

Webpage: https://hiq.fi/en/ | Full HD | Number of Displays : 1 | Workload Duration: 5 min

In a vGPU NV12s_v3 instance, the single-session WebGL test results are indicative of a better user experience. Average CPU utilization in a D8s_v3/CPU instance was 51% compared to 7% in a vGPU NV12s_v3 instance. Figure 3 demonstrates the benefits in CPU utilization, apart from the fact that a vGPU instance also gives 19 more frames per second with the same end-user latency compared to the CPU-only D8s_v3 instance.

When the test was run for multiple sessions, we observed that with only two sessions, the average CPU utilization in a D8s_v3 instance goes up to 93%. With a vGPU NV12s_v3 instance, we can comfortably scale up to nine sessions with the same average frame rate and better end-user latency.

Scaling beyond two sessions becomes impossible in a CPU-only instance. However, in the case of a vGPU instance, we were able to scale all the way up to 16 sessions. Although at this point the CPU utilization begins to inch closer to 92%, a more comfortable recommended scale would be somewhere between nine and 16 sessions. Figure 4 shows the average frame rate, end-user latency, and CPU utilization for multi-session test runs.

YouTube 4K Video

Workload: YouTube 4K video | UHD | Number of Displays : 1 | Number of Sessions: 4

Running a YouTube 4K video on four sessions in parallel on a CPU-only D8s_v3 instance compared to a vGPU NV12s_v3 instance clearly demonstrates the importance of vGPU in the data center space.

Where the CPU utilization peaks to 100% in a CPU-only instance, CPU utilization in a vGPU instance stays comfortably under 50% with a better frame rate and tolerable end-user latency. End-user latency values (447 ms) in a CPU-only instance point to the amount of lag that is clearly noticeable and leads to a poor user experience.

Collaboration tools: Microsoft Teams and Cisco Webex

Workloads: Microsoft Teams and Cisco Webex | Number of Displays : 1 | Number of Sessions: 2

Running collaboration tools such as Microsoft Teams and Cisco Webex in two sessions where one session is sharing the screen and running a YouTube video shows the high CPU utilization in a D8s_v3 instance that contributes to unsatisfactory end-user experience.

Performance evaluation summary

Based on the NVIDIA nVector benchmarking tool, the user-experience metrics in this post demonstrate how a GPU-enabled Windows Virtual Desktop instance running Windows 10 multi-session provides a better user experience than CPU-only.

The key drivers are collaboration tools, including Microsoft Teams and Cisco Webex, along with WebGL workloads. The tests also showed a great user experience level with GPU-enabled VM while supporting more sessions compared to CPU-only enabled VM.