Streamlining Quantum Error Correction and Application Development with CUDA-QX 0.4

As quantum processor unit (QPU) builders and algorithm developers work to create large-scale, commercially viable quantum supercomputers, they are increasingly concentrating on quantum error correction (QEC). It represents the greatest opportunity and the biggest challenge in current quantum computing research.

CUDA-Q QEC aims to speed up researchers’ QEC experiments through the rapid creation of fully accelerated, end-to-end workflows—from defining and simulating novel codes with circuit-level noise models, to configuring realistic decoders and deploying them alongside physical QPUs. CUDA-Q QEC aims to provide each component in this workflow as user-definable through a comprehensive API. We built out key parts of this workflow in the CUDA-QX 0.4 release.

We walk you through the biggest new features in this blog. And see the complete release notes on GitHub, where you can also keep track of ongoing development, provide feedback, and contribute.

Generating a detector error model (DEM) from a memory circuit

The first step in a QEC workflow is defining a QEC code with an associated noise model.

QEC codes are ultimately implemented through stabilizer measurements, which are themselves noisy quantum circuits. The effective decoding of many stabilizer rounds requires knowledge of these circuits, the mapping of each measurement to a stabilizer (detector), and a prior estimate of the probability of every physical error that can occur in each circuit. The detector error model (DEM), originally developed as part of Stim (Quantum, 2021) and described in the paper Designing fault-tolerant circuits using detector error models (Arxiv, 2024), provides a useful way to describe this setup.

Diagram of the end-to-end CUDA-Q QEC workflow for QEC. The DEM is a single object used in both the simulation of circuit shots and the configuration of the decoder, avoiding duplication in these steps. — Figure 1. Diagram of the end-to-end CUDA-Q QEC workflow. The DEM is a single object used in both the simulation of circuit shots and the configuration of the decoder, avoiding duplication in these steps.

As of the CUDA-QX 0.4 release, you can automatically generate the DEM from a specified QEC circuit and noise model. The DEM can then be used for both circuit sampling in simulation and decoding the resulting syndromes using the standard CUDA-Q QEC decoder interface. For memory circuits, all necessary logic is already provided behind the CUDA-Q QEC API.

For more information on DEMs in CUDA-Q QEC, see the C++ API and Python API documentation and examples.

Tensor networks to enable exact maximum likelihood decoding

The use of tensor networks for QEC decoding offers several advantages in research. Relative to other algorithmic and AI decoders, tensor-network decoders are easy to understand. The tensor network for a code is based on its Tanner graph and can be contracted to compute the probability that a logical observable has flipped, given a syndrome. They are guaranteed to be accurate or even exact, and don’t require training (though they can benefit from it). And while they are often used as benchmarks in research, there is currently no open-access, go-to implementation in Python that researchers can use as a standard for tensor network decoding.

CUDA-QX 0.4 introduces a tensor network decoder with support for Python 3.11 onward. The decoder provides:

Flexibility: The only input required is a parity check matrix, a logical observable, and a noise model. This allows the users to decode different codes with circuit level noise.
Accuracy: The tensor networks are contracted exactly. Therefore the decoder achieves the theoretical optimum decoding accuracy (see Figure 2 below).
Performance: By exploiting the GPU-accelerated cuQuantum libraries, users can push the performance of contractions and path optimizations beyond what was previously possible.

In Figure 2 below, we plot the logical error rate (LER) of the CUDA-Q QEC tensor network decoder using exact contraction on the open source dataset from the paper Suppressing quantum errors by scaling a surface code logical qubit (Nature, 2023). All reference lines (Ref in the figure below) quote data from the paper Learning high-accuracy error decoding for quantum processors (Nature, 2024). We show LER parity with Google’s tensor network decoder with an open-source, GPU-accelerated implementation.

For more information on tensor network decoding in CUDA-Q QEC see the Python API documentation and examples.

A chart showing the logical error rate of the CUDA-Q QEC tensor-network decoder, using open source data. The NVIDIA tensor network decoder achieves LER parity with Google’s decoder but is open source. — *Figure 2. Logical error rate of the CUDA-Q QEC tensor-network decoder, using open source data. The NVIDIA tensor network decoder achieves LER parity with Google’s decoder but is open source.*

Improvements to the BP+OSD decoder

CUDA-QX 0.4 introduces several improvements to its GPU-accelerated Belief Propagation + Ordered Statistics Decoding (BP+OSD) implementation, which provide enhanced flexibility and monitoring capabilities:

Adaptive convergence monitoring

Iter_per_check introduces configurable BP convergence checking intervals. Set to one iteration by default, this parameter can be increased to the maximum iteration limit set by the user to reduce overhead in scenarios where frequent convergence checks aren’t necessary.

Message clipping for numerical stability

clip_value addresses potential numerical instabilities in BP by implementing message clipping. This feature allows users to set a non-negative threshold value to prevent message values from growing excessively large, which can lead to overflow or precision issues. When set to 0.0 (default), clipping is disabled, maintaining backward compatibility. Note that clipping aggressively could potentially impact BP’s performance.

BP algorithm selection

bp_method provides users with a choice between two BP algorithms. sum-product provides a traditional approach, offering robust performance for most scenarios. And min-sum is a computationally efficient alternative that can provide faster convergence in certain cases.

Dynamic scaling for min-sum optimization

scale_factor enhances the min-sum algorithm with adaptive scaling capabilities. Users can specify a fixed scale factor (defaults to 1.0) or enable dynamic computation by setting it to 0.0, where the factor is automatically determined based on iteration count.

Result monitoring

opt_results with bp_llr_history introduces logging capabilities that allow researchers and developers to track the evolution of log-likelihood ratios (LLR) throughout BP’s decoding process. Users can configure the history depth from 0 to the maximum iteration count.

For complete information on CUDA-Q QEC’s BP+OSD decoder see the latest Python API or C++ API documentation and a full example.

A Generative Quantum Eigensolver (GQE) for AI-driven quantum circuit design

CUDA-QX 0.4 adds an out-of-the-box implementation of the Generative Quantum Eigensolver (GQE) to the Solvers library. This algorithm is the subject of ongoing research, especially with regard to the loss function. The current example provides a cost function suitable to small-scale simulation.

GQE is a novel hybrid algorithm for finding eigenstates (especially ground states) of quantum Hamiltonians using generative AI models. In contrast to the Variational Quantum Eigensolver (VQE), where the quantum program has a fixed parameterization, GQE shifts all program design into a classical AI model. This has the potential to alleviate convergence issues with traditional VQE approaches, such as barren plateaus.

Our implementation uses a transformer model following The generative quantum eigensolver (GQE) and its application for ground state search (Arxiv, 2024) and has been described in further detail in a previous NVIDIA technical blog, Advancing Quantum Algorithm Design with GPT.

The GQE algorithm performs the following steps:

Initialize or load a pre-trained generative model.
Generate candidate quantum circuits.
Evaluate circuit performance on target Hamiltonian.
Update the generative model based on results.
Repeat generation and optimization until convergence.

For complete details on the Solvers implementation of GQE, see the Python API documentation and examples.

Conclusion

The CUDA-QX 0.4 release includes a variety of new features in both the Solvers and QEC libraries, including a new Generative Quantum Eigensolver (GQE) implementation, a new tensor network decoder, and a new API for auto-generating detector error models from noisy CUDA-Q memory circuits.

See the Github repository and the documentation for all the details.