Deep packet inspection (DPI) is a critical technology for network security that enables the inspection and analysis of data packets as they travel across a network. By examining the content of these packets, DPI can identify potential security threats such as malware, viruses, and malicious traffic, and prevent them from infiltrating the network. However, the implementation of DPI also comes with a significant cost in performance impact on the network.
Using NVIDIA BlueField DPUs reduces the cost and performance impact of performing deep packet inspection.
Suricata overview
Suricata is a high-performance, open-source, network analysis and threat detection application that is used by private and public organizations and embedded by major vendors to protect assets. Inspecting high-throughput traffic using Suricata (or any other intrusion detection system and intrusion protection system (IDS/IPS) solution) demands high CPU usage. So, CPU availability can become a bottleneck.
Traffic inspection in a data center can be centralized or distributed:
- Centralized appliance: Uses one or more powerful servers to inspect all traffic entering and leaving the data center.
- Distributed on all nodes: Each node in the data center is responsible for inspecting its ingress and egress traffic using a small portion of its own compute power.
Each approach has its advantages and disadvantages. A distributed inspection is more complex because it requires the deployment and management of all the distributed nodes. However, it can offer a higher security level by enabling east-west traffic inspection and tailored inspection rules for the specific traffic processed by the distributed node.
The BlueField DPU can accelerate both centralized and distributed inspection. This reduces Suricata’s compute resources utilization and enables higher network throughput while freeing host resources.
For more information about how to use the BlueField DPU for a distributed solution in a zero-trust environment, see NVIDIA Creates Zero-Trust Cybersecurity Platform.
Offloading a Suricata bypass with BlueField and NVIDIA DOCA
Suricata v3.2, released in 2016, introduced the bypass feature that enables Suricata to stop inspecting specific flows under certain conditions. Suricata supports the following types of bypassed flows:
- Elephant flow: Flows that reach a preconfigured traffic limit.
- Encrypted flow: Flows that can’t be inspected or can only be partially inspected.
- Bypass rule: Flows that match a preconfigured rule in a rule set that is to be bypassed.
Suricata implements bypasses within the software using the kernel datapath. The throughput is improved but still relies on software that consumes CPU cycles to route the packets directly to the user space without being inspected by the Suricata engine.
The BlueField DPU offers a line-rate steering module in the SmartNIC subsystem that can be configured using the NVIDIA DOCA Flow API. DOCA Flow is the API for building generic packet processing pipes in hardware and enables you to redirect ingress traffic to the Arm subsystem or directly to the host. It can also be configured to redirect egress traffic to the Arm subsystem or directly to the external uplink port.
Use DOCA Flow with Suricata for configuring the hardware to redirect bypassed flows directly between the host and the external uplink. This enables line-rate traffic to be redirected to these flows for both centralized and distributed inspection.
In addition, BlueField-3 DPUs include an Arm subsystem with 16 ARM A78 cores. Running Suricata on the onboard Arm subsystem reduces host CPU utilization by offloading it to the Arm processor. Running Suricata on the Arm core enables you to use the BlueField 3 DPU for inspecting VM-to-VM traffic on the same host.
To showcase the value of the BlueField hardware-accelerated bypass in Suricata, NVIDIA performed a proof-of-concept of a distributed inspection scenario. Suricata was deployed on the BlueField Arm subsystem and the Suricata engine was updated to use the DOCA Flow API for bypassed flows instead of using a kernel bypass. We achieved a 400G device bidirectional line rate for bypassed flows on the BlueField-3 DPU along with several Gbps of inspected flows with no CPU load on the x86 host server.
Figure 2 describes the network performance boost and the x86 CPU utilization of the traditional software solution (host-based) compared to the DPU-accelerated and potentially distributed solution.
* The actual throughput for real traffic depends on the type and profile of the traffic and the inspection ruleset. Performance may vary accordingly.
Summary
This work can also be used to accelerate other traffic inspection solutions, for example, Snort or WAF, with the same principles as applied in the Suricata acceleration.
The BlueField DPU can also be used to accelerate the following:
- Inline IPsec and TLS acceleration: To support the inspection of encrypted traffic at line rate.
- Fast pattern match acceleration: Using the built-in RegEx accelerator in BlueField-3.
- Integrating with the user-space datapath: To achieve a ~10–20% performance boost.
- Receive side scaling (RSS): To better use the 8/16 cores of the Arm subsystem.
For more information about how BlueField DPUs can accelerate security applications, see the following resources:
- Developers Design Innovative Network Security Solutions at the NVIDIA Cybersecurity Hackathon
- Detecting Threats Faster with AI-Based Cybersecurity
- Enabling Enterprise Cybersecurity Protection with a DPU-Accelerated, Next-Generation Firewall
- Stop Modern Security Attacks in Real Time with ARIA Cybersecurity and NVIDIA