# cuQuantum

## Accelerate Quantum Information Science

Quantum computing has the potential to offer giant leaps in computational capabilities—and the ability of scientists, developers, and researchers to simulate quantum circuits on classical computers is vital to get us there. **NVIDIA cuQuantum** is an SDK of optimized libraries and tools for accelerating quantum computing workflows. Developers can use cuQuantum to speed up quantum circuit simulations based on state vector and tensor network methods by orders of magnitude.

## GPU-Accelerated Quantum Circuit Simulations

The research community across academia, laboratories, and industry are using simulators to help design and verify algorithms to run on quantum computers. These simulators capture the properties of superposition and entanglement and are built on quantum circuit simulation frameworks, including Cirq, Qiskit, and others.

Using NVIDIA GPU Tensor Cores, cuQuantum is the foundation

for accelerating any quantum circuit simulation framework.

## Features and Benefits

### Flexible

Choose the best approach for your work from algorithm-agnostic, accelerated quantum circuit simulation methods.

The **state vector method** features include optimized memory management and math kernels, efficiency index bit swaps, gate application kennels, and probability array calculations for qubit sets.

The **tensor network method** features include accelerated tensor and tensor network contraction, order optimization, approximate contractions, and multi-GPU contractions.

### Scalable

Leverage the power of multi-node, multi-gpu clusters using the latest-generation GPUs, either on premises or in the cloud.

**Low-level C++ APIs** provide increased control and flexibility for a single GPU and single-node multi-GPU. **High-level Python API** supports drop-in multi-node execution.

### Fast

Simulate bigger problems faster and get more work done sooner.

Using an NVIDIA A100 Tensor Core GPU over CPU implementations delivers orders-of-magnitude speedups on key quantum problems, including **quantum circuits, Shor’s algorithm,** and the **variational quantum eigensolver**.

Leveraging the NVIDIA Selene supercomputer, cuQuantum generated a sample from a **full-circuit simulation** of the **Google Sycamore processor** in less than 10 minutes.

## Performance

### State Vector Method

### Quantum Fourier Transform

Scales to Tens of Qubits

State vector: quantum Fourier transform p to 32 qubits, complex 64 | CPU: qsim on dual AMD EPYC 7742 | GPU: qsim+cuStateVec on DGX A100

State vector simulation tracks the entire state of the system in time through each gate operation.

The memory requirement for state vector simulations grows exponentially with the number of qubits (2n). The memory requirement creates a practical limit of roughly 50 qubits on today’s largest classical supercomputers.

An NVIDIA DGX™ A100 with eight NVIDIA A100 80GB Tensor Core GPUs can simulate up to 36 qubits, delivering an orders-of-magnitude speedup over a dual-socket CPU server on leading state vector simulations.

cuStateVec is already being adopted by the leading publicly available simulators. cuStateVec has been integrated into Google’s qsim simulator, part of the Cirq framework, as well as IBM’s Aer simulator within the Qiskit framework. Users can download cuQuantum and start using it today wherever they use Cirq or Qiskit.

### Tensor Network Method

Tensor network methods are rapidly gaining popularity as a way to simulate hundreds or thousands of qubits for near-term quantum algorithms. Tensor networks scale with the number of quantum gates rather than the number of qubits. This makes it possible to simulate very large qubit counts with smaller gate counts on large supercomputers.

Tensor contractions dramatically reduce the memory requirement for running a circuit on a tensor network simulator. The research community is investing heavily in improving methods for quickly finding near optimal tensor contractions before running the simulation.

Using cuQuantum, NVIDIA researchers were able to simulate a variational quantum algorithm for solving the MaxCut optimization problem using 1,688 qubits to encode 3,375 vertices on an NVIDIA DGX SuperPOD™, a 16X improvement over the previous largest simulation and multiple orders of magnitude larger than the largest problem run on quantum hardware.

### MaxCut Optimization Problem—Vertex Count

Scales to Thousands of Qubits

Gray: Argonne TN Simulator on Theta supercomputer. Light green: cuQuantum on a single GPU. Dark green: cuQuantum on a DGX SuperPOD

### NVIDIA cuQuantum DGX Appliance

To help developers get started, the simulation software will be available in a container optimized to run on the latest NVIDIA GPUs in NVIDIA DGX systems, a cuQuantum DGX appliance.

It will include Google’s Cirq framework and qsim simulator, along with cuQuantum and the NVIDIA HPC SDK.

The appliance software achieved best-in-class performance on key problems in quantum computing such as Shor’s algorithm, random quantum circuits, and the variational quantum eigensolver.

The software will be available early next year in the NVIDIA NGC™ catalog.

### Sycamore Supremacy Circuit

The quantum appliance speeds up simulation of quantum supremacy circuits by 70X over a CPU implementation. Gray: dual AMD EPYC 7742 CPU. Green: NVIDIA cuQuantum DGX appliance with DGX A100 640GB. 36 qubits, depth m=14.

## RESOURCES

- Documentation
- Watch the GTC21Fall Keynote
- GTC Session: Introducing cuQuantum: Accelerating State Vector and Tensor Network-Based Quantum Circuit Simulation [A31093]
- Read the NVIDIA Blog: NVIDIA Teams With Google Quantum AI, IBM and Other Leaders to Speed Research in Quantum Computing
- Read the NVIDIA Blog: NVIDIA Sets World Record for Quantum Computing Simulation With cuQuantum Running on DGX SuperPOD
- Read the NVIDIA Blog: What Is Quantum Computing?
- Watch GTC Session: A Deep Dive on the Latest HPC Software
- Watch the GTC Session: Benchmarking GPU Clusters with Universal Quantum Computing Simulations
- Explore the HPC SDK
- Read the paper: Density Matrix Quantum Circuit Simulation
- Watch the SC20 Talk: Density Matrix Quantum Circuit Simulation via the BSP Machine on Modern GPU Clusters