Cybersecurity

Analyzing Baseboard Management Controllers to Secure Data Center Infrastructure

Modern data centers depend on Baseboard Management Controllers (BMCs) for remote management. These embedded processors enable administrators to reconfigure servers, monitor hardware health, and push firmware updates—even when systems are powered off. 

But that same capability significantly compromises security and expands the attack surface. If left unprotected, a compromised BMC can provide persistent, stealthy access to an entire fleet of devices. 

The NVIDIA Offensive Security Research (OSR) team recently analyzed BMC firmware used in data center environments. The team identified 18 vulnerabilities and developed nine working exploits. Vulnerabilities such as these, in often overlooked components of modern infrastructure, can lead to significant security gaps that place enterprises at risk. 

This post walks through how BMCs work, the vulnerabilities OSR uncovered, and what enterprises should do to protect themselves. For more details, see Breaking BMC: The Forgotten Key to the Kingdom.

What makes BMCs powerful and risky 

BMCs are embedded service processors that enable secure, remote management at scale. In hyperscale or physically inaccessible environments, they enable administrators to monitor hardware, reconfigure system firmware, and recover machines. Some of these actions can even be performed without powering on the host. 

With keyboard-video-mouse (KVM) access, BMCs can modify BIOS settings, apply firmware updates, and control boot behavior. They collect detailed telemetry like temperature, power draw, and fan speed, and they operate independently of the host OS. While this level of access is essential for modern infrastructure, it also introduces potential security risks. 

BMCs often run outside the traditional security monitoring systems, exposed through dedicated management interfaces and backed by third-party firmware stacks. If compromised, they become a platform for persistence across every system they control, often in ways that are difficult or even impossible to detect. 

NVIDIA systems such as NVIDIA DGX H100 rely on BMCs for secure, scalable operations. For this reason, their security is rigorously evaluated, even when components are sourced from external vendors. 

BMCs aren’t just a control plane—they’re a potential compromise plane, which makes securing them essential. 

Image of BMC dashboard on NVIDIA DGX H100.
Figure 1. The BMC dashboard on an NVIDIA DGX H100 system provides full visibility into system health, firmware, and remote access 

Inside the BMC: From side channels to root access 

During evaluation, the NVIDIA OSR team analyzed a BMC firmware package commonly used in modern data center servers. Without access to source code, the team reverse engineered the firmware directly from a device image and uncovered 18 vulnerabilities—ranging from credential handling flaws to memory corruption bugs—and developed nine working exploits to assess real-world impact. Additionally, this provides the ability to verify whether NVIDIA products are susceptible to these attacks, and allows for verification of the patches. 

Leaking credentials with side channels 

The team initiated an investigation into the IPMI authentication process and verified that it remains susceptible to a hash-leak vulnerability initially identified in 2013 and assigned CVE-2013-4786. Nonetheless, this vulnerability can only be exploited if the attacker is aware of a valid username within the BMC system. The BMC response timing exposed a classic timing oracle side channel that allowed the team to identify valid usernames. Once we had a valid username, we could then brute force the password offline by using the leaked hash and standard word lists. 

The vulnerability stemmed from the BMC’s use of memcmp to compare usernames during authentication. Because memcmp exits on the first mismatch, the response time leaked how many initial characters of the username were correct, providing a classic timing side channel. 

Screen capture of timing side channel.
Figure 2. Timing side channel through memcmp enabled remote username extraction  

Full remote access through insecure APIs 

The firmware image indicated that the user database is managed by Redis, with encrypted passwords but keys stored alongside them. The team discovered an API through which this Redis database can be queried. We identified the locations where passwords and usernames are stored, accessed them through the API, decrypted the passwords, and successfully obtained the complete user database. 

Another API allows read/write access to virtual memory within the IPMI server process using BMC credentials. Without region or size checks, we discovered the server module loads at a consistent base address. This indicates a lack of Address Space Layout Randomization (ASLR), a standard mitigation against a range of attacks such as heap spraying, return-oriented programming exploits, and direct memory manipulation. 

Screen capture of memory access and shellcode injection.
Figure 3. An undocumented API enabled arbitrary memory access and shellcode injection

This lack of ASLR enabled the team to identify the location of a hidden configuration flag and toggle it directly through the API. This allowed us to enable a file download feature not normally exposed. By chaining this with a separate path traversal vulnerability, we were able to retrieve sensitive files from the BMC, including /etc/shadow

Screenshot of chained bugs.
Figure 4. Chained bugs allowed downloading sensitive files such as /etc/shadow 

These capabilities were enabled by default and were reachable post-authentication. Together, they gave us deep access to the system and a clear path to persistence. 

Pivoting to the host system 

With full access to the BMC, the team began exploring ways to interact with the host system itself. Using the BMC KVM functionality, we modified bootloader parameters and gained shell access to the host operating system—without needing any user credentials. Secure Boot was not enabled in this instance. However, if it were enabled, Unified Extensible Firmware Interface (UEFI) settings could be adjusted through the BMC. This would allow Secure Boot to be disabled, provided that UEFI is not password protected. 

Once inside, the team found the host’s disk was unencrypted and contained leftover firmware update images. This allowed us to retrieve and reverse engineer the original BMC firmware to understand the system in greater detail. 

We also identified an exposed API that allowed the BMC to read and write directly to the host’s SPI flash. Using this, we were able to modify NVRAM entries and for example disable Secure Boot. This wasn’t a theoretical path; it was tested and confirmed. The implications for persistence and host compromise are serious.

Screen capture of modification of NVRAM and Secure Boot.
Figure 5. SPI flash write allowed modification of NVRAM and disabling Secure Boot

Classic memory exploits, no modern mitigations 

While reviewing the authentication functionality, the team found a logic that handles a shared telemetry library for logging purposes. In that code, we discovered a pre-authentication classic stack-based buffer overflow. The BMC firmware used strcpy to copy unvalidated input into a fixed-size buffer, immediately followed by a function pointer call. This provided a direct path to code execution during login attempts. 

What really stood out was the lack of standard modern mitigation against stack and memory corruption vulnerabilities, including:

  • Data execution prevention for the stack
  • Stack cookies
  • ASLR
  • Control flow integrity (CFI)
  • Sandboxing 

These are baseline mitigations in modern systems, and their absence made exploitation significantly easier. 

The team developed an exploit for this pre-authentication vulnerability that fully hijacked the control flow, allowing execution of injected shellcode. The exploitation process was not difficult due to the missing defensive layers. 

Driving fixes across the ecosystem 

After validating the vulnerabilities, OSR worked closely with American Megatrends Inc. (AMI), the vendor responsible for the affected BMC firmware. We provided detailed technical reports, enabling AMI to patch the issues and coordinate fixes across their customer base. 

Because this firmware is widely deployed across the industry (not just in NVIDIA systems), the team issued our own CVEs in parallel with the vendor to accelerate awareness and remediation. This helped to ensure that affected NVIDIA customers could take action quickly while formal vendor CVEs followed their standard process. 

The impact of this work extends beyond NVIDIA products. Identifying and disclosing vulnerabilities in a commonly used BMC platform helps to raise the bar for BMC security across the entire ecosystem. 

What security teams should do now 

More than background infrastructure, BMCs are privileged systems with deep control over your hardware. If they’re not part of your security model, they should be. To get started:

  • Restrict access: Place BMC interfaces on isolated management networks and never expose them to the Internet. 
  • Patch aggressively: Work with your vendors to ensure BMC firmware is updated and CVEs are tracked. 
  • Monitor activity: Treat BMC events as part of your logging and detection strategy. Watch for changes in firmware, config, and login behavior. 
  • Review your supply chain: Ask vendors how BMC firmware is built, tested, and maintained. Validate where possible. 
  • Push for hardening: Require basic mitigations like ASLR, stack protection, and nonexecutable memory in embedded systems. These protections should be the baseline. 

BMCs are a gateway to full system control. When compromised, they provide attackers with a persistent, low-level foothold. Securing them is essential to protecting modern infrastructure. 

Strengthening infrastructure security  

By proactively identifying vulnerabilities, working closely with vendors, and sharing insights with the broader community, NVIDIA is helping to drive stronger defenses across the entire data center ecosystem. 

This BMC research is one example of how deep technical security work can reveal hidden risks and deliver real impact. The team will continue to challenge assumptions, investigate overlooked components, and elevate infrastructure security. Securing the stack means securing every layer of it. To learn more, reach the full research paper, Breaking BMC: The Forgotten Key to the Kingdom.

Want to learn more about securing other layers of your stack? Browse NVIDIA GTC conference sessions on the latest agentic AI advancements. 

Discuss (0)

Tags