Accelerate Quantum Information Science

Quantum computing has the potential to offer giant leaps in computational capabilities—and the ability of scientists, developers, and researchers to simulate quantum circuits on classical computers is vital to get us there.

NVIDIA cuQuantum is an SDK of optimized libraries and tools for accelerating quantum computing workflows. Developers can use cuQuantum to speed up quantum circuit simulations based on state vector and tensor network methods by orders of magnitude.

Q Quantum svg-icon

GPU-Accelerated Quantum Circuit Simulations

The research community across academia, laboratories, and industry are using simulators to help design and verify algorithms to run on quantum computers. These simulators capture the properties of superposition and entanglement and are built on quantum circuit simulation frameworks, including Cirq, Qiskit, and others.

Quantum Circuit Simulation Frameworks

Using NVIDIA GPU Tensor Cores, cuQuantum is the foundation
for accelerating any quantum circuit simulation framework.

Features and Benefits


Choose the best approach for your work from algorithm-agnostic, accelerated quantum circuit simulation methods.

The state vector method features include optimized memory management and math kernels, efficiency index bit swaps, gate application kennels, and probability array calculations for qubit sets.

The tensor network method features include accelerated tensor and tensor network contraction, order optimization, approximate contractions, and multi-GPU contractions.


Leverage the power of multi-node, multi-gpu clusters using the latest-generation GPUs, either on premises or in the cloud.

Low-level C++ APIs provide increased control and flexibility for a single GPU and single-node multi-GPU.

High-level Python API supports drop-in multi-node execution.


Simulate bigger problems faster and get more work done sooner.

Using an NVIDIA A100 Tensor Core GPU over CPU implementations delivers orders-of-magnitude speedups on key quantum problems, including quantum circuits, Shor’s algorithm, and the variational quantum eigensolver.

Leveraging the NVIDIA Selene supercomputer, cuQuantum generated a sample from a full-circuit simulation of the Google Sycamore processor in less than 10 minutes.


State Vector Method

Quantum Fourier Transform

Scales to Tens of Qubits

quantum Fourier transform p to 32 qubits, complex 64

State vector: quantum Fourier transform p to 32 qubits, complex 64 | CPU: qsim on dual AMD EPYC 7742 | GPU: qsim+cuStateVec on DGX A100

State vector simulation tracks the entire state of the system in time through each gate operation.

The memory requirement for state vector simulations grows exponentially with the number of qubits (2n). The memory requirement creates a practical limit of roughly 50 qubits on today’s largest classical supercomputers.

An NVIDIA DGX™ A100 with eight NVIDIA A100 80GB Tensor Core GPUs can simulate up to 36 qubits, delivering an orders-of-magnitude speedup over a dual-socket CPU server on leading state vector simulations.

cuStateVec is already being adopted by the leading publicly available simulators. cuStateVec has been integrated into Google’s qsim simulator, part of the Cirq framework, as well as IBM’s Aer simulator within the Qiskit framework. Users can download cuQuantum and start using it today wherever they use Cirq or Qiskit.

Tensor Network Method

Tensor network methods are rapidly gaining popularity as a way to simulate hundreds or thousands of qubits for near-term quantum algorithms. Tensor networks scale with the number of quantum gates rather than the number of qubits. This makes it possible to simulate very large qubit counts with smaller gate counts on large supercomputers.

Tensor contractions dramatically reduce the memory requirement for running a circuit on a tensor network simulator. The research community is investing heavily in improving methods for quickly finding near optimal tensor contractions before running the simulation.

Using cuQuantum, NVIDIA researchers were able to simulate a variational quantum algorithm for solving the MaxCut optimization problem using 1,688 qubits to encode 3,375 vertices on an NVIDIA DGX SuperPOD™, a 16X improvement over the previous largest simulation and multiple orders of magnitude larger than the largest problem run on quantum hardware.

MaxCut Optimization Problem—Vertex Count

Scales to Thousands of Qubits

Gray: Argonne TN Simulator on Theta supercomputer. Dark green: cuQuantum on a single GPU. Light green: cuQuantum on a DGX SuperPOD

Gray: Argonne TN Simulator on Theta supercomputer. Light green: cuQuantum on a single GPU. Dark green: cuQuantum on a DGX SuperPOD

NVIDIA cuQuantum DGX Appliance

To help developers get started, the simulation software will be available in a container optimized to run on the latest NVIDIA GPUs in NVIDIA DGX systems, a cuQuantum DGX appliance.

It will include Google’s Cirq framework and qsim simulator, along with cuQuantum and the NVIDIA HPC SDK.

The appliance software achieved best-in-class performance on key problems in quantum computing such as Shor’s algorithm, random quantum circuits, and the variational quantum eigensolver.

The software will be available early next year in the NVIDIA NGC™ catalog.

Sycamore Supremacy Circuit

DGX cuQuantum Appliance
State Vector Simulator on Dual AMD CPU

The quantum appliance speeds up simulation of quantum supremacy circuits by 70X over a CPU implementation. Gray: dual AMD EPYC 7742 CPU. Green: NVIDIA cuQuantum DGX appliance with DGX A100 640GB. 36 qubits, depth m=14.

Download cuQuantum public beta now.