Simulation / Modeling / Design

Real-Time Neural Receivers Drive AI-RAN Innovation

Today’s 5G New Radio (5G NR) wireless communication systems rely on highly optimized signal processing algorithms to reconstruct transmitted messages from noisy channel observations in mere microseconds. This remarkable achievement is the result of decades of relentless effort by telecommunications engineers and researchers, who have continuously improved signal processing algorithms to meet the demanding real-time constraints of wireless communications.

Initially, some algorithms were largely forgotten due to their prohibitive complexity at the time of discovery. The low-density parity-check (LDPC) codes discovered by Gallager in the 1960s are a notable example. Once rediscovered by David MacKay in the 1990s, they have now become the backbone of 5G NR. This case illustrates that even the best algorithms are impracticable unless they meet the stringent computational and latency requirements of telecommunications.

AI for wireless communications has received a lot of attention from researchers in academia and industry, as discussed in An Introduction to Deep Learning for the Physical Layer and An Overview of the 3GPP Study on Artificial Intelligence for 5G New Radio. It is increasingly acknowledged that it has the potential to offer superior reliability and accuracy when compared to many of the traditional physical layer algorithms. This inspires the concept of an AI radio access network (AI-RAN). So far, most studies are simulation-based and only little is known about the implications of real-time inference latency to the proposed solutions. 

The latency and throughput requirements of wireless communication systems impose strict constraints on the neural network (NN) design, effectively limiting their size and depth. It is thus an open and interesting challenge to deploy and validate AI components in the physical layer of an actual cellular system under realistic latency restrictions.

This post discusses the opportunities and challenges associated with deploying NN-based receiver components in the physical layer of the future AI-RAN. We present an optimized neural network architecture and the necessary toolchain to enable real-time inference. Additionally, we discuss the potential for site-specific training and the concept of pilotless communications through end-to-end learning, offering insights into possible research directions for 6G.

NVIDIA opens its research lab 

NVIDIA has developed a research prototype of a neural network-based wireless receiver that replaces parts of the physical layer signal processing by learned components. Special emphasis has been placed on the ability of the neural network architecture to perform real-time inference. For details, see A Neural Receiver for 5G NR Multi-user MIMO.

To empower AI-RAN researchers and engineers, NVIDIA has released the research code, which provides the entire toolchain required to design, train, and evaluate NN-based receivers. Real-time inference is enabled through NVIDIA TensorRT on GPU-accelerated hardware platforms. As such, NVIDIA offers the unique software and hardware stack for a seamless transition from conceptual prototyping in NVIDIA Sionna through early field evaluations using TensorRT up to commercial-grade deployment in NVIDIA Aerial.

Parts of the project have already been showcased, including hardware-in-the-loop verification of neural receivers, site-specific training, and end-to-end learning.

From handcrafted signal processing blocks to neural receivers

Neural receivers (NRX) are based on the idea of training a single NN to jointly perform channel estimation, equalization, and demapping (Figure 1). The NN is trained to estimate the transmitted bits from the channel observations and can be used as a drop-in replacement for existing signal processing algorithms. For more details and performance evaluations of the NRX concept, see Towards Environment-Specific Base Stations: AI/ML-driven Neural 5G NR Multi-user MIMO Receiver.

A neural receiver replaces channel estimation, equalization and demapping by a single neural network.
Figure 1. Sending and receiving bits of information: classical receiver and neural receiver

From an algorithmic point of view, the NRX is primarily defined by tensor operations, including matrix multiplications and convolutions. As with many AI applications, these operations can be significantly accelerated using NVIDIA hardware. Further, the extensive NVIDIA ecosystem of profiling and optimization tools enables refining the NRX architecture, effectively eliminating performance bottlenecks. The resulting NRX architecture achieves an inference latency of less than 1 ms on an NVIDIA A100 GPU using the NVIDIA TensorRT inference library.

5G NR standard compliance and reconfiguration

Although the NRX concept is rather simple, its integration in the 5G NR standard comes with several engineering challenges that need to be addressed (Figure 2). As the network configuration in a practical setup may change dynamically within milliseconds, the proposed NRX architecture is adaptive and capable of supporting different modulation and coding schemes (MCS) without the need for any re-training and without introducing any additional inference complexity. 

Furthermore, arbitrary numbers of sub-carriers are supported and multi-user MIMO with a varying number of active users is possible. Another important aspect for practical deployment is the capability to deal with 5G NR compliant reference signals.

The neural receiver architecture is 5G NR compliant and supports dynamic MCS re-configuration, adaptive multi-user MIMO and flexible PRB allocations without any retraining.  Site-specific finetuning allows one to further optimize the performance after deployment.
Figure 2. Key capabilities of the neural receiver architecture

To maintain the resilience of the NRX under unseen channel conditions, the training is conducted in the urban microcell (UMi) scenario from 3GPP 38.901 using randomized macro-parameters such as the signal-to-noise ratio (SNR), Doppler spreads, and the number of active users. This allows for pre-training a robust and universal NRX that generalizes to a huge variety of radio environments.

As the NRX is software-defined, site-specific fine-tuning unlocks continuous improvements of the receiver even after deployment. A subsequent section of this post provides a detailed fine-tuning example using simulation outcomes based on ray tracing the radio environment, known as a digital twin. For more technical details, see the jumpstart tutorial and the neural receiver architecture overview notebook.

Performance evaluation under real-time constraints

As discussed previously, deploying AI algorithms comes with strict real-time constraints, and even robust NRX architectures may become impractical unless they operate within the required latency. In other words, the optimal network for deployment is not necessarily the one with the best error-rate performance, but rather the one that delivers the best accuracy within a defined computing latency budget.

Estimating the inference latency of a given neural network architecture is a complex task, as the results depend heavily on the targeted hardware platform, the specific software stack, and the extent of code optimization. Therefore, metrics like the number of floating-point operations (FLOPs), weights, or layers are often used as proxies for a model’s computational complexity. However, these metrics may be misleading due to the high degree of parallelism and potential memory bottlenecks during inference. Hence, we deploy the NRX using the TensorRT inference library on the targeted NVIDIA A100 GPU. This ensures realistic latency measurements, and the profiler helps eliminate bottlenecks on the critical path. 

After training in TensorFlow, we exported the trained model as an ONNX file and built a TensorRT inference engine. TensorRT automatically optimizes the inference of the neural network for the target platform. If required, detailed profiling outputs are provided. An example is provided in the real-time tutorial notebook

As expected, the computational complexity is heavily influenced by the 5G system configuration, including parameters like the number of allocated subcarriers and active users. The NRX architecture is designed and trained with a configurable network depth, enabling control of the computational latency after training. With this flexibility, the NRX can be easily reconfigured once the targeted hardware platform or system parameters change. 

Figure 3 shows the performance evaluation of the NRX executed on an NVIDIA A100 GPU using TensorRT. The performance under real-time constraints differs from the computationally unrestricted version of the network. However, we’d like to emphasize that even under real-time constraints, the performance of the NRX is competitive or even outperforms many classical receiver algorithms.

The inference latency of the neural receiver can be controlled by the depth of the neural network. To achieve the real-time requirement of less than 1ms latency, a performance degradation of approximately 0.7dB is observed as compared to the receiver that has no such constraints.
Figure 3. Performance evaluation of the NRX, varying its network depth and consequently the inference latency of the neural network

Beyond classical algorithms: site-specific fine-tuning

An intriguing feature of AI-RAN components is their ability to undergo site-specific fine-tuning, which enables the refinement of neural network weights even after deployment. This fine-tuning relies on two key enablers: 

  • AI-based algorithms such as the NRX
  • Software-defined RANs that facilitate the extraction of training data while the system is actively in use 

Once the data is collected, the training can be conducted either locally or offline in the cloud. 

To demonstrate site-specific finetuning of the neural receiver, we sampled a training dataset of 1,000 random user positions and velocities across the entire scene using the Sionna ray tracer. Figure 4 shows user positions for performance evaluation of the fine-tuned receiver. The red dot indicates the position of the base station, the gray line represents the user trajectories used for evaluation. New scenes can be directly loaded from OpenStreetMap

Munich scene used to generate site-specific channel simulations. An overlay of a coverage map shows the received signal strength for each user position.
Figure 4. Environment used for site-specific fine-tuning and evaluation of the NRX

As the fine-tuning starts with the pre-trained receiver network weights, it only takes a small number of training steps and moderate computing resources. Note that the NRX architecture itself remains unchanged. Figure 5 shows that already a minute of fine-tuning on a single GPU substantially improves the error-rate performance in the specific radio environment. Site-specific training enables adapting a smaller NRX to a specific radio environment, enabling it to perform at the level of a 4x larger, universally pretrained NRX. This saves a significant amount of compute during inference while maintaining superior error-rate performance.

When fine-tuned on the specific environment, the NRX performance improves by up to 2.2 dB as compared to the pre-trained receiver.
Figure 5. SNR performance improvement through site-specific receiver fine-tuning using a fixed number of only 1,000 data samples 

It is a unique capability of the AI-enabled RAN to continuously adapt to the actual RF environment. As such, we envision fully software-defined and AI-driven next generation base stations that improve even after deployment. 

Moving from 5G compliance to 6G research

Finally, we’d like to emphasize that neural receivers are not only a powerful replacement for existing receiver algorithms. They are a key enabler for a host of novel features such as pilotless communications using end-to-end learning and site-specific retraining after deployment. 

Figure 6 illustrates the end-to-end learning approach where the NRX is extended by a trainable custom constellation that can be used instead of the traditional quadrature amplitude modulation (QAM). 

The NRX concept can be extended by a trainable custom constellation. If no classical reference signal is used for piloting, the NRX will learn to communicate without the need for any classical piloting scheme.
Figure 6. End-to-end learning of a pilotless communication scheme by extending the NRX with a trainable custom constellation 

The combination of a trainable custom constellation with a pilot-free slot structure forces the NRX to learn the signal reconstruction without relying on any reference signals. Intuitively, the NRX learns new constellations that implicitly include some type of superimposed piloting scheme which can be exploited for joint channel estimation and equalization. After training, the resulting scheme shows a similar error-rate performance when compared to the classical 5G system, but benefits from a higher data rate as the pilot overhead is completely removed. Further details can be found in the end-to-end learning notebook.

Although the resulting constellations are not compliant with the 5G NR standard, they are indicators of how AI may enable novel 6G features for higher reliability and increased throughput. To learn more, visit NVlabs/neural_rx on GitHub.

Acknowledgments

This work has received financial support from the European Union under Grant Agreement 101096379 (CENTRIC). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission (granting authority). Neither the European Union nor the granting authority can be held responsible for them.

EU logo
Discuss (0)

Tags