The telecommunications industry is innovating rapidly toward 6G for both AI-native Radio Access Networks (AI-RAN) and AI-Core. The distributed User Plane Function (dUPF) brings compute closer to the network edge through decentralized packet processing and routing, enabling ultra-low latency, high throughput, and the seamless integration of distributed AI workloads. dUPF is becoming a crucial component in the evolution of mobile networks to be part of the foundational AI infrastructure.
This post explores the architectural advantages of dUPF at the telecom edge to enable agentic AI applications. It features a reference implementation of a dUPF user plane application built with NVIDIA DOCA Flow to leverage hardware-accelerated packet steering and processing. The demonstration highlights how the NVIDIA accelerated compute platform enables energy-efficient, low-latency user plane operations, reinforcing the essential role of dUPF in the 6G AI-Native Wireless Networks Initiative ( AI-WIN) full-stack architecture.
What is dUPF?
dUPF is a 3GPP 5G core network function, which handles user plane packet processing at distributed locations as defined in section 6.2.5 of 3GPP 5G core architecture and in section 4.2 of 3GPP 5G Mobile Edge Computing (MEC) architecture. dUPF moves user data processing closer to users and radio nodes. Unlike traditional UPFs that cause latency due to long backhaul routes, the dUPF handles traffic at the network edge, enabling real-time applications and local breakout for AI traffic through AI-specific local data networks (AI-DN), as shown in Figure 1.

How does dUPF work in the 6G AI-centric network?
6G aims to transform telecom operators into critical AI infrastructure, hosting AI factories and distributing AI inference as an AI grid. dUPF is a crucial aspect of this, enabling 6G distributed edge agentic AI and local breakout (LBO).
Next-generation applications like video search and summarization (VSS), XR, gaming, and industrial automation demand real-time, autonomous intelligence at the network edge, which traditional centralized wireless core architectures cannot provide.
This proximity offers several benefits:
- Ultra-low latency: Enables immediate responsiveness for mission-critical 6G use cases.
- Efficient data handling: Processes local data at the source, reducing latency and optimizing network resources.
- Enhanced data privacy and security: Localized processing minimizes sensitive data exposure, fostering trust.
- Decentralized compute for resilient AI: Distributes AI workloads, creating a robust, resilient infrastructure and eliminating single points of failure.
What are the benefits of dUPF on NVIDIA accelerated edge infrastructure?
NVIDIA AI Aerial platform is a suite of accelerated computing platforms, software, and services for designing, simulating, and operating wireless networks. The benefits of dUPF on AI Aerial edge infrastructure include:
- Ultra-low latency: Latency is as low as 25 microseconds with zero packet loss, improving user experience for edge AI inferencing.
- Cost reduction: Lower backhaul and OPEX through distributed processing and optimized resource utilization, reducing transport costs.
- Energy efficiency: NVIDIA DOCA Flow-enabled HW acceleration reduces CPU usage, freeing cores for AI applications on shared hardware, lowering power consumption.
- New revenue models: Enables AI-native services and applications requiring real-time edge data processing.
- Enhanced network performance: Improved scalability, jitter minimization, and deterministic behavior for AI and RAN traffic.

The key value propositions of dUPF are fully aligned with the 6G AI-WIN initiative, making dUPF an integral part of the AI-WIN full stack. This initiative brings together T-Mobile, MITRE, Cisco, ODC, and Booz Allen Hamilton to develop an AI-native network stack for 6G, built on NVIDIA AI Aerial.
dUPF use cases
Key use cases for dUPF include:
Ultra-low-latency applications: By hosting dUPF functions at the edge, data can be processed and routed locally, eliminating backhaul delays. This is critical for:
- AR/VR and real-time conversations with an AI agent
- VSS
- Autonomous vehicle and robot communications (V2X)
- Remote surgery and real-time industrial automation
AI and data-intensive workloads at the edge: Integration of dUPF with AI-native platforms (such as NVIDIA Grace Hopper) enables real-time edge inferencing for applications like distributed AI RAN, agentic AI, and localized autonomous control.
Figure 3 illustrates a VSS data processing ingestion pipeline, where camera streams are handled at the edge alongside the deployed dUPF for local breakout. By shifting inference tasks to the edge server, operators deliver low-latency services while significantly reducing the data load on their backbone networks.

dUPF user plane reference implementation
The dUPF user plane reference implementation is based on a decomposed architecture as illustrated in Figure 3, which comprises two key components, dUPF-UP and dUPF-CP:
dUPF-UP: This component is responsible for user plane packet processing accelerated using DOCA Flow APIs, which handles essential UPF user plane functionalities:
- Packet Detection Rule (PDR)
- QoS Enforcement Rule (QER)
- Usage Report Rule (URR)
- Forwarding Action Rule (FAR)
dUPF-CP: This component communicates with SMF over a 3GPP N4 interface and communicates with dUPF-UP through an internal messaging interface (gRPC) over CNI to facilitate user plane packet processing.

The dUPF-UP is deployed on the NVIDIA accelerated Supermicro 1U Grace Hopper MGX System server platform with NVIDIA Grace CPU and NVIDIA BF3 DPU. AI-DN traffic is handled by dUPF-UP at the edge, and other user traffic (such as Internet traffic) is delivered to centralized UPF through the transport network.
dUPF-UP acceleration architecture and data flows
The NVIDIA Grace CPU Superchip and NVIDIA BlueField-3 (BF3) SuperNIC are key hardware for co-hosted RAN and dUPF-UP. Figure 5 illustrates dUPF-UP packet processing.

The Grace CPU Superchip, with 72 Arm Neoverse V2 cores, uses the NVIDIA Scalable Coherency Fabric (SCF) to achieve a 3.2 TB/s bandwidth. This boosts dUPF user plane packet processing performance and energy efficiency. The BF3 SuperNIC accelerates dUPF data plane functions through DOCA Flow pipelines, including:
- Packet classification (5-tuples, DSCP/VLAN, GTP TEID/QFI)
- GTP encapsulation/decapsulation
- Metering (AMBR/MBR)
- Counting (URR usage/quotas)
- Forwarding (fast path for direct forwarding, slow path for exception packets)
- Mirroring for host CPU processing (Lawful Intercept, for example)
dUPF-UP reference implementation with DOCA Flow
The dUPF-UP reference implementation accelerates AI traffic LBO through DOCA Flow, leveraging IP subnet-based Service Data Flow (SDF) classification and simplifying AI-DN deployment. Key simplifications include:
- Differentiating edge AI applications using IP subnet SDF
- Avoiding IP segmentation/reassembly by aligning MTUs
- Simplifying QoS and charging with the Packet Detection Rule (PDR)-based assurance
dUPF-UP DOCA Flow pipelines are designed for N3 and N6 interfaces.
N3 interface DOCA Flow pipeline design
N3 interface uplink pipelines contain pipes as shown in Figure 5:
- GTP decap: Performs GTP header decapsulation
- Counter: Counts receiving packets for URR reporting
- Policer QoS flow MBR: QER enforcement for QoS flow level MBR
- Policer QoS Session MBR: QER enforcement for session level MBR
- Counter: Counts packets post QER metering for URR reporting
- FAR (DSCP Marking): Performs DSCP marking and other FAR handling
- Forward: Forwards packet to N6 interface

N6 interface DOCA Flow pipeline design
N6 interface downlink pipelines contain pipes as shown in Figure 7:
- GTP Decap: Performs GTP header decapsulation
- Counter: Counts receiving packets for URR reporting
- Policer QoS Flow MBR: Performs QER enforcement for QoS flow level MBR
- Policer QoS Session MBR: Performs QER enforcement for session level MBR
- Counter: Counts packets post QER metering for URR reporting
- GTP Encap: Performs GTP header encapsulation
- FAR (DSCP Marking): Performs DSCP marking and other FAR handling
- Forward: Forwards packet to N6 interface

To learn more about how to program Counter, Policer, GTP Encap, GTP Decap, FAR, and Forward pipes see the DOCA Flow Program Guide and DOCA Flow Example Application Guide.
dUPF-UP example implementation lab validation
dUPF-UP was tested on a Supermicro 1U Grace Hopper MGX System server, using two dedicated CPU cores (core-0 and core-1). Core-0 managed control procedures for AI-DN session setup, while Core-1 handled slow path exception packets via Poll Mode Driver (PMD) mode. The dUPF-CP simulator initiated 60,000 UE sessions at 1,000 sessions/second. After setup, user plane packets were sent over dual 100G links from a TRex traffic generator.
Observations include:
- Core-0 averaged under 7% CPU usage for control procedures
- Core-1 showed 100% CPU usage due to PMD polling mode, but no exception packets were delivered to it as all user plane packets were handled by BF3
- BF3 NIC hardware accelerated all user plane packets, achieving 100 Gbps throughput with zero packet loss
Lab performance testing summary
Based on the performance lab testing, the dUPF-UP example implementation on Grace plus BF3 achieved 100 Gbps throughput (line rate of 100G links of the test setup) with zero packet loss. This demonstrates full hardware acceleration of user plane packet processing for AI traffic using an IP subnet SDF-based pipeline design. This was accomplished using only two Grace CPU cores. Archived functionalities and performance in lab testing validated the value propositions of dUPF-UP on the AI Aerial platform.
dUPF ecosystem adoption
Cisco embraces dUPF architecture, accelerated by the NVIDIA AI Aerial platform and the NVIDIA DOCA framework, as a cornerstone for 6G AI-centric networks. When combined with the AI-ready data center architecture, this enables telecom operators to deploy high-performance, energy-efficient dUPF with security infused and closely integrated AI inference extended to the network edge—opening the door to applications such as VSS, agentic AI, XR, and ultra-responsive AI-driven services.
“Software-defined DPU and GPU-accelerated edge infrastructure enable efficient deployment of Wireless RAN, Core, and AI applications, delivering superior user experiences and new monetization opportunities for service providers,” said Darin Kaufman, Head of Product, Cisco Mobility. “Together, Cisco and NVIDIA are building intelligent, secure, and energy-efficient edge networks that power the next generation of wireless connectivity.”
Get started building and deploying AI-native networks
dUPF is a critical component for the 6G AI-centric network. By strategically deploying high-performance, ultra-low-latency, and energy-efficient dUPF accelerated on the NVIDIA AI Aerial platform with integrated AI inference at the network edge, operators can enable a new era of services. This dramatically lowers operational expenditures and ensures that the network infrastructure is agile and scalable enough to handle the immense demands of future AI-centric applications within a 6G network.
To get started, contact telco@nvidia.com to learn more about DOCA Flow hardware acceleration and the benefits of dUPF deployment on AI Aerial.