cuQuantum SDK

Accelerate Quantum Information Science

Quantum computing has the potential to offer giant leaps in computational capabilities. Until it becomes a reality, scientists, developers, and researchers are simulating quantum circuits on classical computers. NVIDIA cuQuantum is an SDK of optimized libraries and tools for accelerating quantum computing workflows. Developers can use cuQuantum to speed up quantum circuit simulations based on state vector, density matrix, and tensor network methods by orders of magnitude.

Join our early interest list to stay informed about the latest updates.


cuQuantum icon

GPU-Accelerated Quantum Circuit Simulations

The research community across academia, laboratories, and industry are using simulators to help design and verify algorithms to run on quantum computers. These simulators capture the properties of superposition and entanglement and are built on quantum circuit simulation frameworks, including Qiskit, Cirq, ProjectQ, Q#, and others.

Quantum Circuit Simulation Frameworks

Using NVIDIA GPU Tensor Cores, cuQuantum is the foundation
for accelerating any quantum circuit simulation framework.

Features and Benefits


Choose the best approach for your work from algorithm-agnostic, accelerated quantum circuit simulation methods.

The state vector method features include optimized memory management and math kernels, efficiency index bit swaps, gate application kennels, and probability array calculations for qubit sets.

The tensor network method features include accelerated tensor and tensor network contraction, order optimization, approximate contractions, and multi-GPU contractions.


Leverage the power of multi-node, multi-gpu clusters using the latest-generation GPUs, either on premises or in the cloud.

Low-level C++ APIs provide increased control and flexibility for a single GPU and single-node multi-GPU.

High-level Python API supports drop-in multi-node execution.


Simulate bigger problems faster and get more work done sooner.

Leveraging the NVIDIA Selene supercomputer, NVIDIA has used cuQuantum and A100 Tensor Core GPUs to generate a sample from a full-circuit simulation of the Google Sycamore processor in 10 minutes. In late 2019, this task was thought to take days on millions of CPU cores.


State Vector Method

State Vector Simulation

Scales to 10s of Qubits

State vector: 1,000 circuits, 36 qubits, depth m=10, complex 64 | CPU: Qiskit on dual AMD EPYC 7742 | GPU: Qgate on DGX A100

State vector methods were the first quantum circuit simulators to reach widespread use. They’re known for high-fidelity results.

The memory requirement for state vector simulations grows exponentially with the number of qubits (2^n). The memory requirement creates a practical limit of roughly 50 qubits on today’s largest classical supercomputers.

An NVIDIA DGX™ A100 with eight NVIDIA A100 80GB Tensor Core GPUs can simulate up to 36 qubits, delivering orders of magnitude speedup over a dual-socket CPU server on leading state vector simulations. State vector simulators and driving organizations:

  • JUQCS-G (Julich)
  • Qgate (NVAITC)
  • Qiskit-AER (IBM)
  • QuEST (Oxford)
  • SV1 (Amazon Web Services)
  • Vulcan (QCWare)

Tensor Network Method

The tensor network method trades memory footprint for runtime, allowing much larger quantum circuits to be simulated, albeit at slightly reduced fidelity compared to the state vector approach.

Tensor networks scale with the number of quantum gates rather than the number of qubits. This makes very large qubit counts with smaller gate counts feasible on large supercomputers.

Tensor contractions dramatically reduce the memory requirement for running a circuit on a tensor network simulator. The research community is investing heavily in improving methods for quickly finding near optimal tensor contractions before running the simulation.

Tensor network simulators and driving organizations:

  • AC-QDP (Alibaba)
  • QTensor (Argonne National Lab)
  • QuiMB (Caltech)
  • TN QVM (Oak Ridge National Lab)
  • TN1 (Amazon Web Services)

Using cuQuantum to run a tensor network method on an NVIDIA A100 for the Sycamore circuit simulation delivers orders of magnitude speedups over a dual-socket CPU server.

Tensor Network Simulation

Scales to 1000s of Qubits

Tensor network: 53 qubits, depth m=20 | CPU: Quimb with Caltech Research on dual AMD EPYC 7742 - estimated | GPU: Quimb on DGX A100

Join our early interest list to stay informed about the latest updates.