Simulation / Modeling / Design

New NVIDIA CUDA-Q Features Boost Quantum Application Performance

May 12, 2024

By Mark Wolf, Efrat Shabtai and Ben Howe

Discuss (0)

AI-Generated Summary

Dislike

CUDA-Q, NVIDIA's open-source programming model, has seen significant performance enhancements in its recent releases, particularly in quantum simulation, with at least a 1.7x speedup from v0.6 to v0.7.1 for simulators without gate fusion.
The introduction of gate fusion in the nvidia-mgpu simulator, along with tunable gate fusion levels in v0.7.1, has further improved performance, achieving up to a 10x improvement in certain cases, and will be the new default simulator starting in v0.8.
CUDA-Q v0.7 and v0.7.1 include various optimizations such as improved just-in-time compilation, reduced hashing time for JIT change-detection checks, and automatic Hamiltonian batching, which collectively accelerate the execution of observe calls and overall application performance.

AI-generated content may summarize information incompletely. Verify important information. Learn more

NVIDIA CUDA-Q (formerly NVIDIA CUDA Quantum) is an open-source programming model for building quantum accelerated supercomputing applications that take full advantage of CPU, GPU, and QPU compute abilities. Developing these applications today is challenging and requires an easy-to-use coding environment coupled with powerful quantum simulation capabilities to efficiently evaluate and improve the performance of new algorithms.

CUDA-Q includes many new features that significantly improve performance, enabling users to push the limits of what can be simulated on classical supercomputers. This post demonstrates the performance enhancement of CUDA-Q for quantum simulation and provides a brief explanation of the improvements.

Improving performance

Computing expectation values is the primary quantum task in a Variational Quantum Eigensolver (VQE) application. You can easily compute these values in CUDA-Q using the observe function. The performance of the three most recent CUDA-Q releases was tested using 24 and 28 qubit VQE problems aimed at determining the ground state energy of two small molecules (C₂H₂ and C₂H₄). The experiments used the standard UCCSD ansatz and were written in Python.

For each version (v0.6, v0.7, v0.7.1), three state vector simulator backends were tested: nvidia (single precision), nvidia-fp64 (double precision), and nvidia-mgpu (nvidia-fp64 with gate fusion). The number following nvidia-mgpu designates the gate fusion level, previously hard coded as 6, but now a tunable parameter in v0.7.1.

Gate fusion is an optimization technique where consecutive quantum gates are combined or merged into a single gate to reduce the overall computational cost and improve circuit efficiency. The number of gates combined (gate fusion level) can significantly affect simulation performance and needs to be optimized for every application. You can now adjust the CUDAQ_MGPU_FUSE parameter and specify custom gate fusion levels different from the v0.7.1 default of 4.

Figure 1 presents the runtime for each simulator and CUDA-Q version using NVIDIA H100 GPUs. The two simulators without gate fusion experienced at least a 1.7x speedup from v0.6 to v0.7.1.

The nvidia-mgpu-6 v0.7.1 simulator results were 2.4x and 2.9x faster than the v0.6 results for the 24 and 28 qubit experiments, respectively. Tuning the gate fusion level improved the performance by an additional 10x and 1.3x, respectively, indicating how important and system-dependent this parameter can be.

The nvidia-mgpu simulator will be the new default starting in v0.8 (yet to be released), offering the best overall performance and enabling immediate utilization of multiple GPUs for many-qubit simulations.

Note that the original 0.7.1 timing results were updated on July 1, 2024. An LLVM issue initially produced incorrect UCCSD results. The revised timings were collected with a bug fix to ensure correct UCCSD results.

Accelerating the code

CUDA-Q v0.7 includes a number of enhancements that improve compilation and accelerate the time required to make successive observe calls (Figure 2).

First, the just-in-time (JIT) compilation path was improved to more efficiently compile the kernel. Previously, this procedure scaled quadratically with the number of gates in the circuit, but was reduced to linear scaling.

Second, improvements to the hashing for JIT change-detection checks reduce the time required to check if any code needs to be recompiled due to environment changes. This virtually eliminates the time required for these checks for each observe call.

Finally, v0.6 would perform all log processing for every call, regardless of the specified log level. This was changed in v0.7 to only perform the necessary processing for the specified log level.

In addition to gate fusion, 0.7.1 introduced automatic Hamiltonian batching (Figure 3) which further reduces the runtime for observe calls, by enabling batched Hamiltonian evaluations on a single GPU.

To further improve performance, future releases will include more enhancements to state preparation, handling of Pauli operators, and unitary synthesis.

Get started with CUDA-Q

The current and anticipated CUDA-Q improvements provide developers with a more performant platform to build quantum accelerated supercomputing applications. Not only is development today accelerated, but applications constructed on CUDA-Q are positioned to deploy in hybrid CPU, GPU, and QPU environments necessary for practical quantum computing.

The CUDA-Q Quick Start guide will help you to quickly set up your environment, while the Basics section will guide you through writing your first CUDA-Q application. Explore the code examples and applications to get inspiration for your own quantum application development. To provide feedback and suggestions, visit the NVIDIA/cuda-quantum GitHub repo.

Discuss (0)

About the Authors

About Mark Wolf
Mark Wolf is a technical marketing engineer for the NVIDIA HPC and Quantum Computing team. He focuses on inspiring developers by sharing examples of how NVIDIA quantum technologies can accelerate research. Prior to NVIDIA, Mark was a technical solutions specialist at Quantinuum. Mark holds a PhD in Quantum Chemistry from the University of Georgia.

View all posts by Mark Wolf

About Efrat Shabtai
Efrat Shabtai is a senior product manager for CUDA-Q and NVIDIA Quantum Cloud at NVIDIA. Prior to NVIDIA, Efrat worked as a principal engineering manager at Microsoft, building innovative products in Azure Quantum, Microsoft Research, and Azure Machine Learning, to name a few. Efrat holds a BSc in computer science from the Technion - Israel Institute of Technology.

View all posts by Efrat Shabtai

About Ben Howe
Ben Howe is a senior CUDA-Q software engineer at NVIDIA where he develops the CUDA-Q software framework for hybrid classical-quantum computing systems. Before NVIDIA, Ben was an Engineering Fellow at RTX where he developed real-time signal processing algorithms and HPC applications for a variety of sensor systems. He received bachelor degrees in Electrical Engineering and Computer Science, and a master’s degree in Electrical Engineering from Texas Tech University.

View all posts by Ben Howe

New NVIDIA CUDA-Q Features Boost Quantum Application Performance

Improving performance

Accelerating the code

Get started with CUDA-Q

Tags

About the Authors

Comments