As quantum processor unit (QPU) builders and algorithm developers work to create large-scale, commercially viable quantum supercomputers, they are increasingly concentrating on quantum error correction (QEC). It represents the greatest opportunity and the biggest challenge in current quantum computing research.
CUDA-Q QEC aims to speed up researchers’ QEC experiments through the rapid creation of fully accelerated, end-to-end workflows—from defining and simulating novel codes with circuit-level noise models, to configuring realistic decoders and deploying them alongside physical QPUs. CUDA-Q QEC aims to provide each component in this workflow as user-definable through a comprehensive API. We built out key parts of this workflow in the CUDA-QX 0.4 release.
We walk you through the biggest new features in this blog. And see the complete release notes on GitHub, where you can also keep track of ongoing development, provide feedback, and contribute.
Generating a detector error model (DEM) from a memory circuit
The first step in a QEC workflow is defining a QEC code with an associated noise model.
QEC codes are ultimately implemented through stabilizer measurements, which are themselves noisy quantum circuits. The effective decoding of many stabilizer rounds requires knowledge of these circuits, the mapping of each measurement to a stabilizer (detector), and a prior estimate of the probability of every physical error that can occur in each circuit. The detector error model (DEM), originally developed as part of Stim (Quantum, 2021) and described in the paper Designing fault-tolerant circuits using detector error models (Arxiv, 2024), provides a useful way to describe this setup.

As of the CUDA-QX 0.4 release, you can automatically generate the DEM from a specified QEC circuit and noise model. The DEM can then be used for both circuit sampling in simulation and decoding the resulting syndromes using the standard CUDA-Q QEC decoder interface. For memory circuits, all necessary logic is already provided behind the CUDA-Q QEC API.
For more information on DEMs in CUDA-Q QEC, see the C++ API and Python API documentation and examples.
Tensor networks to enable exact maximum likelihood decoding
The use of tensor networks for QEC decoding offers several advantages in research. Relative to other algorithmic and AI decoders, tensor-network decoders are easy to understand. The tensor network for a code is based on its Tanner graph and can be contracted to compute the probability that a logical observable has flipped, given a syndrome. They are guaranteed to be accurate or even exact, and don’t require training (though they can benefit from it). And while they are often used as benchmarks in research, there is currently no open-access, go-to implementation in Python that researchers can use as a standard for tensor network decoding.
CUDA-QX 0.4 introduces a tensor network decoder with support for Python 3.11 onward. The decoder provides:
- Flexibility: The only input required is a parity check matrix, a logical observable, and a noise model. This allows the users to decode different codes with circuit level noise.
- Accuracy: The tensor networks are contracted exactly. Therefore the decoder achieves the theoretical optimum decoding accuracy (see Figure 2 below).
- Performance: By exploiting the GPU-accelerated cuQuantum libraries, users can push the performance of contractions and path optimizations beyond what was previously possible.
In Figure 2 below, we plot the logical error rate (LER) of the CUDA-Q QEC tensor network decoder using exact contraction on the open source dataset from the paper Suppressing quantum errors by scaling a surface code logical qubit (Nature, 2023). All reference lines (Ref in the figure below) quote data from the paper Learning high-accuracy error decoding for quantum processors (Nature, 2024). We show LER parity with Google’s tensor network decoder with an open-source, GPU-accelerated implementation.
For more information on tensor network decoding in CUDA-Q QEC see the Python API documentation and examples.

Improvements to the BP+OSD decoder
CUDA-QX 0.4 introduces several improvements to its GPU-accelerated Belief Propagation + Ordered Statistics Decoding (BP+OSD) implementation, which provide enhanced flexibility and monitoring capabilities:
Adaptive convergence monitoring
Iter_per_check
introduces configurable BP convergence checking intervals. Set to one iteration by default, this parameter can be increased to the maximum iteration limit set by the user to reduce overhead in scenarios where frequent convergence checks aren’t necessary.
Message clipping for numerical stability
clip_value
addresses potential numerical instabilities in BP by implementing message clipping. This feature allows users to set a non-negative threshold value to prevent message values from growing excessively large, which can lead to overflow or precision issues. When set to 0.0 (default), clipping is disabled, maintaining backward compatibility. Note that clipping aggressively could potentially impact BP’s performance.
BP algorithm selection
bp_method
provides users with a choice between two BP algorithms. sum-product
provides a traditional approach, offering robust performance for most scenarios. And min-sum
is a computationally efficient alternative that can provide faster convergence in certain cases.
Dynamic scaling for min-sum optimization
scale_factor
enhances the min-sum
algorithm with adaptive scaling capabilities. Users can specify a fixed scale factor (defaults to 1.0) or enable dynamic computation by setting it to 0.0, where the factor is automatically determined based on iteration count.
Result monitoring
opt_results
with bp_llr_history
introduces logging capabilities that allow researchers and developers to track the evolution of log-likelihood ratios (LLR) throughout BP’s decoding process. Users can configure the history depth from 0 to the maximum iteration count.
For complete information on CUDA-Q QEC’s BP+OSD decoder see the latest Python API or C++ API documentation and a full example.
A Generative Quantum Eigensolver (GQE) for AI-driven quantum circuit design
CUDA-QX 0.4 adds an out-of-the-box implementation of the Generative Quantum Eigensolver (GQE) to the Solvers library. This algorithm is the subject of ongoing research, especially with regard to the loss function. The current example provides a cost function suitable to small-scale simulation.
GQE is a novel hybrid algorithm for finding eigenstates (especially ground states) of quantum Hamiltonians using generative AI models. In contrast to the Variational Quantum Eigensolver (VQE), where the quantum program has a fixed parameterization, GQE shifts all program design into a classical AI model. This has the potential to alleviate convergence issues with traditional VQE approaches, such as barren plateaus.
Our implementation uses a transformer model following The generative quantum eigensolver (GQE) and its application for ground state search (Arxiv, 2024) and has been described in further detail in a previous NVIDIA technical blog, Advancing Quantum Algorithm Design with GPT.
The GQE algorithm performs the following steps:
- Initialize or load a pre-trained generative model.
- Generate candidate quantum circuits.
- Evaluate circuit performance on target Hamiltonian.
- Update the generative model based on results.
- Repeat generation and optimization until convergence.
For complete details on the Solvers implementation of GQE, see the Python API documentation and examples.
Conclusion
The CUDA-QX 0.4 release includes a variety of new features in both the Solvers and QEC libraries, including a new Generative Quantum Eigensolver (GQE) implementation, a new tensor network decoder, and a new API for auto-generating detector error models from noisy CUDA-Q memory circuits.
See the Github repository and the documentation for all the details.