Detecting Out-of-Band Malware with NVIDIA BlueField DPU

In an era where cyberthreats are around every corner and with increasing attacks on data centers, security has become an essential element to include in every machine guarding user data. However, many security offerings are defenseless in the presence of malware. Furthermore, software-based security consumes compute and memory resources that should be allocated to users.

The NVIDIA BlueField SmartNIC is an advanced, programmable, Ethernet SmartNIC equipped with an array of Arm processor cores and an integrated NVIDIA ConnectX-5 network controller. It solves the problem of securing data centers while simultaneously enabling users to enjoy promised computation resources.

The BlueField SoC at the heart of the SmartNIC runs out-of-band security software in a trusted domain that is different and isolated from potential malware. As the security software runs on the SmartNIC Arm cores, all server compute resources are made available to users. Using an isolated environment, the SmartNIC can securely access application data for introspection, while simultaneously avoiding data tampering by malware and without leaving a footprint of when and what data was accessed. This innovative design makes BlueField the best solution for malware detection and forensics investigation.

Malware is vicious and stealthy. It can employ hiding techniques to avoid detection by traditional software security offerings. That’s due to an inherent problem in the data normally used to detect malware. Typically, a security solution has a data collection phase where the data is used to learn the activity of the malware. In the traditional approach, the data collection phase is based on software that runs on the same machine being inspected. It may fail to determine intrusion if the malware tampers with the data that it is trying to detect. The ability to hide from an observer—for example, detection software that is looking for indications of compromise (IOC)—is referred to as anti-forensics techniques. Malware can employ the same techniques to avoid detection by both intrusion detection systems (IDS) and intrusion prevention systems (IPS).

The failure to detect a malicious activity may occur at any step of the process. A critical element is the data acquisition. If the data used for inspection is unreliable, a detection system may not find any IOC, because any IOC signs were hidden by the malicious software. There are many questions to be asked regarding the data acquisition method to determinate the level of trust: How does the security IDS/IPS application acquire the data? Can malware tamper with the data being acquired by IDS?

There are several techniques for acquiring data, and several types of data used for the purpose of analysis. In this post, I briefly cover the main methods, type of data that is relevant to each, and their weaknesses.

Anti-malware scanner

Anti-malware software works on files that are persistently stored, also referred to as data-at-rest. The disk can be analyzed by anti-malware software running on the same machine or externally by another machine that is not compromised. When externally analyzed, and when the disk is not encrypted, it’s possible to build the filesystem tree and scan the disk for known IOC. For example, by scanning the disk for a file, the file can be reconstructed for the purpose of computing a hash value. In turn, the various online resources can provide information if a given file is malicious given its hash value. However, if malware is not stored on the hard disk, it may not have any footprint on the filesystem, and thus the anti-malware scanner technique would fail to detect a compromised system.

Network intrusion detection system

Most attacks have some footprint over the network. Consider the scenario of stealing secrets from a host machine and sending them to a remote attacker. Detecting such events can tell which IP address might have performed the attack and its goal. Today, most IDS and IPS solutions observe the network for malicious activity. The network data can be collected locally by the same machine or externally, for example, using a SmartNIC or a switch.

Memory analysis

The runtime data provides the best visibility into the system and there are two approaches for acquiring such data: intrusive and non-intrusive to the operating system. The intrusive option refers to a privileged software that hooks to events and triggers through functions in the operating system. For example, an event of opening or closing a file/socket would trigger collecting data of which file or socket is opened or closed and when.

Another example is forking a new process. A forking trigger and executing a new process is used by detection software to detect malicious activity. For example, it can help answer whether a new process is malware. Is the running process expected to fork a new process? Sophisticated malware can manipulate these hooks.

Malware detection

Ideally, you want to collect data that reflects the state of the system and the activity that is happening from three main sources: disk, network, and memory.

Most detection techniques use the network or the disk approach for detecting IOC. Unfortunately, this is insufficient to tackle the challenges of modern malware. Researchers have shown that modern malware has many tricks up its sleeve, and the bar keeps getting higher.

For example, some malware can attack the system without yielding any footprint on the disk, hiding its presence and malicious activity from detection techniques that are disk-based. Malware using the network to operate cannot hide completely. However, while the network traffic may contain many signs of compromise, in many cases, the volume of the traffic is stateless, too large, random, and complex. Even if an IOC is found, it’s not possible to analyze the behavior of the malware involved. To understand the behavior of malware and make sense of the network traffic, you need a closer look at the runtime environment.

An x-ray view of the malware activity requires acquiring data during execution time. Runtime data provides better visibility into events and actions, for example, which processes are running, the network connection, and the different primitive offered by the OS. Runtime data allows for better understanding of malware behavior; thus, detection software can more accurately identify malicious activity.

Acquiring such data is challenging. A software-based solution yields an observer effect as both the malware and IDS run on the same domain and share the same resources. Malware can manipulate the hooks and functions that an IDS uses to acquire data resulting in unreliable and compromised data. Rather than using hooks and functions subject to alteration by malware, the preferred alternative is to use a secure method to obtain raw data from the host’s physical memory, the arena for runtime execution of the system. Assuming a tamper-resilient method exists to acquire the raw data from the host’s physical memory, it’s possible to reconstruct the state of the system. This includes the kernel memory and code and user space environments.

The data built from the raw memory dump provides an abstraction to examine and detect an attack. If an attack were to happen—whether through injecting code, manipulating process memory, forking a new process, or opening a new network connection to a remote attacker—all would manifest as a change in physical memory. The greater the impact, the more artifacts it would leave in memory.

Most forensic investigations include both data from the network complemented with data from the host’s physical memory. The combination allows for building an accurate copy of the system’s state. This post discusses a novel proposal that allows for the reliable data acquisition of host physical memory.

Out-of-band malware detection

To detect and analyze malware, the out-of-band device acquires data without providing an indication of when the access is occurring. The hardware-based approach to acquiring data is considered the most reliable and trusted method for malware detection, thanks to modern computer architecture and how PCI Express (PCIe) devices access host physical memory. In most cases, using the PCIe protocol, peripheral devices have direct memory access (DMA) capabilities and can read from and write to host physical memory without yielding side effects to any software running on a host machine, including malware. When using a PCIe interface on an add-in card, it can issue memory-read and memory write transactions to host physical memory at rates of 8 Gbps (Gen3) or 16 Gbps (Gen4) per lane.

Figure 1. Intrusion detection system using PCIe interface to read data from the host physical memory.

The host physical memory is divided into multiple regions that are mapped during boot time and include system RAM, IO space, and ROM. For the most part, the data and areas of the malware attack reside in the system RAM, where the kernel and malware live. For data acquisition, the acquisition device issues a memory-read transaction to acquire the physical pages of the RAM region. Figure 2 shows a memory map of a machine running Linux Ubuntu 16.04

Figure 2. The memory map of an Ubuntu host machine used by an IDS.

The transaction travels from the PCIe add-in card through the PCIe link to the memory controller, which in turn provides access to the physical memory. That doesn’t involve software running on the host, as depicted in Figure 1. Instead, it follows a path that is hidden from the malware. Unlike software-based solutions, such a solution does not violate forensics requirements by running any new software on a host machine under investigation.

The next thing to ensure is constant data that can be analyzed. For example, consider the case when accessing two pages in host physical memory with one page pointing to the other. If the page being pointed to changes its physical address, then the data acquisition tool reads the wrong page in memory. This is a risk that exists for any tool acquiring the physical memory, notwithstanding if it’s a hardware– or software-based acquisition tool. The longer it takes to acquire the memory, the higher the likelihood for inconsistencies. The shorter the acquisition time, the fewer changes might occur, increasing the likelihood for reliable data.

Here again, hardware-based approaches outperform their software counterparts due to their superior speed and improved efficiency. For instance, acquiring 64 GB of RAM can take several minutes using software tools. When using a PCIe add-in card operating at rates of Gen4, the data acquisition happens at 16 Gbps per lane. A device with 16 PCIe lanes connected to a host machine, allows for data acquisition at 32 GB/s when using Gen4.

BlueField SmartNIC for malware detection

My team investigated the BlueField SmartNIC suitability for live-memory forensics. For the investigation, we used a variation of BlueField SmartNIC with eight lanes. Using volatility memory forensics framework, we extended the framework to support live-memory forensics from BlueField SmartNIC. The volatility memory forensics framework is a well-known open source framework used by malware researchers, forensics investigators, and incident response personnel and it works with memory image files.

Volatility uses a Python application ( to extract information such as the process list, network connections, and kernel modules that assist forensics investigators in understanding the footprint of the malware and its behavior. The framework allows developers and investigators to analyze host machines by looking at a dump of the memory. The new extension developed by my team enables using the volatility framework running on BlueField SmartNIC Arm cores to analyze malware in host physical memory. That allows live-memory analysis by acquiring segments of the physical memory on-demand. The normal mode of volatility works with memory files that can sometimes reach 64 GB and 128 GB. The extension allows acquiring select data needed for a specific purpose like building the process list.

The new volatility plugin connects to a memory access SDK that allows using BlueField DMA capabilities. The SDK provides different flavors of accessing the memory to allow fast memory access and lower latency when acquiring data. BlueField SmartNIC on-board memory allows copying the data from host physical memory and analyzing it locally using the Arm cores without the fear that it’s going to be modified by the host. The following video demonstrates the volatility framework running on BlueField Arm cores.



Attacks are getting stealthier and more complex, while the ability of current detection and prevention techniques is miles behind. Hardware-assisted data acquisition is considered the most reliable and trusted method to acquire data for analysis. BlueField enables hardware-assisted memory acquisition for securing today’s servers. It enables intrusion detection and forensics investigation out-of-band. When authorized, it allows speedy access to host physical memory while guarding security applications, such as an IDS, in an isolated environment. BlueField facilitates forensics investigations, incident response, malware detection, and intrusion detection system.

Discuss (1)