Simulation / Modeling / Design

Real-Time Neural Receivers Drive AI-RAN Innovation

Sep 03, 2024

By Sebastian Cammerer, Jakob Hoydis, Fayçal Aït Aoudia and Alexander Keller

Discuss (1)

AI-Generated Summary

Dislike

The deployment of neural network-based receiver components in the physical layer of future AI-RAN systems is being explored, with NVIDIA developing a research prototype of a neural network-based wireless receiver that replaces parts of the physical layer signal processing with learned components.
The neural receiver (NRX) architecture is designed to perform real-time inference, achieving an inference latency of less than 1 ms on an NVIDIA A100 GPU using the NVIDIA TensorRT inference library, and is adaptive to different modulation and coding schemes (MCS) without requiring re-training.
NVIDIA has released the research code for the NRX, providing a toolchain for designing, training, and evaluating NN-based receivers, and enabling site-specific fine-tuning and end-to-end learning for potential 6G features like pilotless communications.

AI-generated content may summarize information incompletely. Verify important information. Learn more

Today’s 5G New Radio (5G NR) wireless communication systems rely on highly optimized signal processing algorithms to reconstruct transmitted messages from noisy channel observations in mere microseconds. This remarkable achievement is the result of decades of relentless effort by telecommunications engineers and researchers, who have continuously improved signal processing algorithms to meet the demanding real-time constraints of wireless communications.

Initially, some algorithms were largely forgotten due to their prohibitive complexity at the time of discovery. The low-density parity-check (LDPC) codes discovered by Gallager in the 1960s are a notable example. Once rediscovered by David MacKay in the 1990s, they have now become the backbone of 5G NR. This case illustrates that even the best algorithms are impracticable unless they meet the stringent computational and latency requirements of telecommunications.

AI for wireless communications has received a lot of attention from researchers in academia and industry, as discussed in An Introduction to Deep Learning for the Physical Layer and An Overview of the 3GPP Study on Artificial Intelligence for 5G New Radio. It is increasingly acknowledged that it has the potential to offer superior reliability and accuracy when compared to many of the traditional physical layer algorithms. This inspires the concept of an AI radio access network (AI-RAN). So far, most studies are simulation-based and only little is known about the implications of real-time inference latency to the proposed solutions.

The latency and throughput requirements of wireless communication systems impose strict constraints on the neural network (NN) design, effectively limiting their size and depth. It is thus an open and interesting challenge to deploy and validate AI components in the physical layer of an actual cellular system under realistic latency restrictions.

This post discusses the opportunities and challenges associated with deploying NN-based receiver components in the physical layer of the future AI-RAN. We present an optimized neural network architecture and the necessary toolchain to enable real-time inference. Additionally, we discuss the potential for site-specific training and the concept of pilotless communications through end-to-end learning, offering insights into possible research directions for 6G.

NVIDIA opens its research lab

NVIDIA has developed a research prototype of a neural network-based wireless receiver that replaces parts of the physical layer signal processing by learned components. Special emphasis has been placed on the ability of the neural network architecture to perform real-time inference. For details, see A Neural Receiver for 5G NR Multi-user MIMO.

To empower AI-RAN researchers and engineers, NVIDIA has released the research code, which provides the entire toolchain required to design, train, and evaluate NN-based receivers. Real-time inference is enabled through NVIDIA TensorRT on GPU-accelerated hardware platforms. As such, NVIDIA offers the unique software and hardware stack for a seamless transition from conceptual prototyping in NVIDIA Sionna through early field evaluations using TensorRT up to commercial-grade deployment in NVIDIA Aerial.

Parts of the project have already been showcased, including hardware-in-the-loop verification of neural receivers, site-specific training, and end-to-end learning.

From handcrafted signal processing blocks to neural receivers

Neural receivers (NRX) are based on the idea of training a single NN to jointly perform channel estimation, equalization, and demapping (Figure 1). The NN is trained to estimate the transmitted bits from the channel observations and can be used as a drop-in replacement for existing signal processing algorithms. For more details and performance evaluations of the NRX concept, see Towards Environment-Specific Base Stations: AI/ML-driven Neural 5G NR Multi-user MIMO Receiver.

From an algorithmic point of view, the NRX is primarily defined by tensor operations, including matrix multiplications and convolutions. As with many AI applications, these operations can be significantly accelerated using NVIDIA hardware. Further, the extensive NVIDIA ecosystem of profiling and optimization tools enables refining the NRX architecture, effectively eliminating performance bottlenecks. The resulting NRX architecture achieves an inference latency of less than 1 ms on an NVIDIA A100 GPU using the NVIDIA TensorRT inference library.

5G NR standard compliance and reconfiguration

Although the NRX concept is rather simple, its integration in the 5G NR standard comes with several engineering challenges that need to be addressed (Figure 2). As the network configuration in a practical setup may change dynamically within milliseconds, the proposed NRX architecture is adaptive and capable of supporting different modulation and coding schemes (MCS) without the need for any re-training and without introducing any additional inference complexity.

Furthermore, arbitrary numbers of sub-carriers are supported and multi-user MIMO with a varying number of active users is possible. Another important aspect for practical deployment is the capability to deal with 5G NR compliant reference signals.

To maintain the resilience of the NRX under unseen channel conditions, the training is conducted in the urban microcell (UMi) scenario from 3GPP 38.901 using randomized macro-parameters such as the signal-to-noise ratio (SNR), Doppler spreads, and the number of active users. This allows for pre-training a robust and universal NRX that generalizes to a huge variety of radio environments.

As the NRX is software-defined, site-specific fine-tuning unlocks continuous improvements of the receiver even after deployment. A subsequent section of this post provides a detailed fine-tuning example using simulation outcomes based on ray tracing the radio environment, known as a digital twin. For more technical details, see the jumpstart tutorial and the neural receiver architecture overview notebook.

Performance evaluation under real-time constraints

As discussed previously, deploying AI algorithms comes with strict real-time constraints, and even robust NRX architectures may become impractical unless they operate within the required latency. In other words, the optimal network for deployment is not necessarily the one with the best error-rate performance, but rather the one that delivers the best accuracy within a defined computing latency budget.

Estimating the inference latency of a given neural network architecture is a complex task, as the results depend heavily on the targeted hardware platform, the specific software stack, and the extent of code optimization. Therefore, metrics like the number of floating-point operations (FLOPs), weights, or layers are often used as proxies for a model’s computational complexity. However, these metrics may be misleading due to the high degree of parallelism and potential memory bottlenecks during inference. Hence, we deploy the NRX using the TensorRT inference library on the targeted NVIDIA A100 GPU. This ensures realistic latency measurements, and the profiler helps eliminate bottlenecks on the critical path.

After training in TensorFlow, we exported the trained model as an ONNX file and built a TensorRT inference engine. TensorRT automatically optimizes the inference of the neural network for the target platform. If required, detailed profiling outputs are provided. An example is provided in the real-time tutorial notebook.

As expected, the computational complexity is heavily influenced by the 5G system configuration, including parameters like the number of allocated subcarriers and active users. The NRX architecture is designed and trained with a configurable network depth, enabling control of the computational latency after training. With this flexibility, the NRX can be easily reconfigured once the targeted hardware platform or system parameters change.

Figure 3 shows the performance evaluation of the NRX executed on an NVIDIA A100 GPU using TensorRT. The performance under real-time constraints differs from the computationally unrestricted version of the network. However, we’d like to emphasize that even under real-time constraints, the performance of the NRX is competitive or even outperforms many classical receiver algorithms.

Beyond classical algorithms: site-specific fine-tuning

An intriguing feature of AI-RAN components is their ability to undergo site-specific fine-tuning, which enables the refinement of neural network weights even after deployment. This fine-tuning relies on two key enablers:

AI-based algorithms such as the NRX
Software-defined RANs that facilitate the extraction of training data while the system is actively in use

Once the data is collected, the training can be conducted either locally or offline in the cloud.

To demonstrate site-specific finetuning of the neural receiver, we sampled a training dataset of 1,000 random user positions and velocities across the entire scene using the Sionna ray tracer. Figure 4 shows user positions for performance evaluation of the fine-tuned receiver. The red dot indicates the position of the base station, the gray line represents the user trajectories used for evaluation. New scenes can be directly loaded from OpenStreetMap.

As the fine-tuning starts with the pre-trained receiver network weights, it only takes a small number of training steps and moderate computing resources. Note that the NRX architecture itself remains unchanged. Figure 5 shows that already a minute of fine-tuning on a single GPU substantially improves the error-rate performance in the specific radio environment. Site-specific training enables adapting a smaller NRX to a specific radio environment, enabling it to perform at the level of a 4x larger, universally pretrained NRX. This saves a significant amount of compute during inference while maintaining superior error-rate performance.

It is a unique capability of the AI-enabled RAN to continuously adapt to the actual RF environment. As such, we envision fully software-defined and AI-driven next generation base stations that improve even after deployment.

Moving from 5G compliance to 6G research

Finally, we’d like to emphasize that neural receivers are not only a powerful replacement for existing receiver algorithms. They are a key enabler for a host of novel features such as pilotless communications using end-to-end learning and site-specific retraining after deployment.

Figure 6 illustrates the end-to-end learning approach where the NRX is extended by a trainable custom constellation that can be used instead of the traditional quadrature amplitude modulation (QAM).

The combination of a trainable custom constellation with a pilot-free slot structure forces the NRX to learn the signal reconstruction without relying on any reference signals. Intuitively, the NRX learns new constellations that implicitly include some type of superimposed piloting scheme which can be exploited for joint channel estimation and equalization. After training, the resulting scheme shows a similar error-rate performance when compared to the classical 5G system, but benefits from a higher data rate as the pilot overhead is completely removed. Further details can be found in the end-to-end learning notebook.

Although the resulting constellations are not compliant with the 5G NR standard, they are indicators of how AI may enable novel 6G features for higher reliability and increased throughput. To learn more, visit NVlabs/neural_rx on GitHub.

Acknowledgments

This work has received ﬁnancial support from the European Union under Grant Agreement 101096379 (CENTRIC). Views and opinions expressed are however those of the author(s) only and do not necessarily reﬂect those of the European Union or the European Commission (granting authority). Neither the European Union nor the granting authority can be held responsible for them.

Discuss (1)

About the Authors

About Sebastian Cammerer
Sebastian Cammerer is a research scientist at NVIDIA working on the intersection of wireless communications and machine learning. He is one of the maintainers and core developers of the Sionna open-source link-level simulator. His main research topics are machine learning for wireless communications and channel coding. Before joining NVIDIA, he received his PhD in electrical engineering and information technology from the University of Stuttgart, Germany, in 2021.

View all posts by Sebastian Cammerer

About Jakob Hoydis
Jakob Hoydis is a principal research scientist at NVIDIA working on the intersection of machine learning and wireless communications. He is an IEEE Fellow and 2023-24 Distinguished Industry Speaker of the IEEE Signal Processing Society. He is one of the maintainers and core developers of Sionna, a GPU-accelerated, open-source, link-level simulator for next-generation communication systems.

View all posts by Jakob Hoydis

About Fayçal Aït Aoudia
Fayçal Aït Aoudia is a senior research scientist at NVIDIA working on the convergence of wireless communications and machine learning. Before joining NVIDIA, he was a research scientist at Nokia Bell Labs, France. He is one of the maintainers and core developers of the Sionna open-source link-level simulator.

View all posts by Fayçal Aït Aoudia

About Alexander Keller
Alexander Keller is a senior director of research at NVIDIA working on the foundations of graphics, communications, and machine learning. Before his current role, he was the chief scientist of mental images. Prior to moving to industry, he worked as a full professor of computer graphics and scientific computing at Ulm University. Taking advantage of the unique synergy of machine learning and ray tracing, his research group released the first differentiable link-level simulator for 6G research.

View all posts by Alexander Keller