Accelerate Quantum Computing Research

Quantum computing has the potential to offer giant leaps in computational capabilities—and the ability of scientists, developers, and researchers to simulate quantum circuits on classical computers is vital to get us there.

The research community across academia, laboratories, and industry are using simulators to help design and verify algorithms to run on quantum computers. These simulators capture the properties of superposition and entanglement and are built on quantum circuit simulation frameworks.

NVIDIA cuQuantum is an SDK of optimized libraries and tools for accelerating quantum computing workflows. Using NVIDIA GPU Tensor Core GPUs, developers can use cuQuantum to speed up quantum circuit simulations based on state vector and tensor network methods by orders of magnitude.

cuQuantum icon

Quick Links

cuQuantum DGX Appliance

A full simulation stack based on cuQuantum in a ready-to-deploy container.


Documentation for cuQuantum, as well as the DGX cuQuantum Appliance.


The cuQuantum public repository, including the cuQuantum Python bindings and examples.

NVIDIA cuQuantum DGX Appliance

To help developers get started, the simulation software is available in a container optimized to run on the latest NVIDIA GPUs in NVIDIA DGX systems, a cuQuantum DGX appliance.

It includes Google’s Cirq framework and qsim simulator, along with NVIDIA cuQuantum.

The appliance software achieved best-in-class performance on key problems in quantum computing such as Shor’s algorithm, random quantum circuits, and the variational quantum eigensolver.

The software is available now in the NVIDIA® NGC™ catalog.

Sycamore Supremacy Circuit

DGX cuQuantum Appliance
State Vector Simulator on Dual AMD CPU

The quantum appliance speeds up simulation of quantum supremacy circuits by 70X over a CPU implementation. Gray: dual AMD EPYC 7742 CPU. Green: NVIDIA cuQuantum DGX appliance with DGX A100 640GB. 36 qubits, depth m=14.

Features and Benefits


Choose the best approach for your work from algorithm-agnostic, accelerated quantum circuit simulation methods.

The state vector method features include optimized memory management and math kernels, efficiency index bit swaps, gate application kennels, and probability array calculations for qubit sets.

The tensor network method features include accelerated tensor and tensor network contraction, order optimization, approximate contractions, and multi-GPU contractions.


Leverage the power of multi-node, multi-gpu clusters using the latest-generation GPUs, either on premises or in the cloud.

Low-level C++ APIs provide increased control and flexibility for a single GPU and single-node multi-GPU.

High-level Python API supports drop-in multi-node execution.


Simulate bigger problems faster and get more work done sooner.

Using an NVIDIA A100 Tensor Core GPU over CPU implementations delivers orders-of-magnitude speedups on key quantum problems, including quantum circuits, Shor’s algorithm, and the variational quantum eigensolver.

Leveraging the NVIDIA Selene supercomputer, cuQuantum generated a sample from a full-circuit simulation of the Google Sycamore processor in less than 10 minutes.

Framework Integrations

cuQuantum is integrated with leading quantum circuit simulation frameworks. Download cuQuantum and get dramatically accelerated performance from your framework of choice, with zero code changes.


State Vector Method

Quantum Fourier Transform

CPU vs Single GPU

quantum Fourier transform p to 32 qubits, complex 64

State vector: quantum Fourier transform p to 32 qubits, complex 64 | CPU: qsim on dual AMD EPYC 7742 | GPU: qsim+cuStateVec on DGX A100

State vector simulation tracks the entire state of the system in time through each gate operation. It’s an excellent tool for simulating deep and/or highly entangled quantum circuits and for simulating noisy qubits.

An NVIDIA DGX™ A100 with eight NVIDIA A100 80GB Tensor Core GPUs can simulate up to 36 qubits, delivering an orders-of-magnitude speedup over a dual-socket CPU server on leading state vector simulations.

cuStateVec is already being adopted by the leading publicly available simulators. cuStateVec has been integrated into Google’s qsim simulator, part of the Cirq framework, IBM’s Aer simulator within the Qiskit framework, and Xanadu’s PennyLane simulator. Users can download cuQuantum and start using it today wherever they use Cirq, Qiskit, or PennyLane. For more details, check out our cuStateVec blog.

Tensor Network Method

Tensor network methods are rapidly gaining popularity as a way to simulate hundreds or thousands of qubits for near-term quantum algorithms. Tensor networks scale with the number of quantum gates rather than the number of qubits. This makes it possible to simulate very large qubit counts with smaller gate counts on large supercomputers.

Tensor contractions dramatically reduce the memory requirement for running a circuit on a tensor network simulator. The research community is investing heavily in improving pathfinding, methods for quickly finding near optimal tensor contractions before running the simulation.

cuTensorNet provides state-of-the-art performance for both the pathfinding and contraction stages of tensor network simulation. Check out our cuTensorNet blog for more details.

Using cuQuantum, NVIDIA researchers were able to simulate a variational quantum algorithm for solving the MaxCut optimization problem using 1,688 qubits to encode 3,375 vertices on an NVIDIA DGX SuperPOD™, a 16X improvement over the previous largest simulation and multiple orders of magnitude larger than the largest problem run on quantum hardware.

Pathfinding Performance—cuTensorNet

State-of-the-Art Performance for Path Quality and Time to Solution

M10, 12, 14, 20 = Random quantum circuits of depth 10, 12, 14, 20, from Arute et. al. Quantum Supremacy using a Programmable Superconducting Processor. Black and gray = opt-einsum: Yellow = Cotengra: Gray & Kourtis, Hyper-optimized Tensor Network Contraction, 2021.

Download cuQuantum now.