NVIDIA has issued urgent security advisories addressing multiple vulnerabilities in its Hopper HGX 8-GPU High-Performance Computing (HMC) platforms, including a high-severity flaw (CVE-2024-0114, CVSS 8.1) that permits unauthorized code execution, privilege escalation, and systemic data compromise.
A secondary medium-severity vulnerability (CVE-2024-0141, CVSS 6.8) in the GPU vBIOS layer exposes systems to denial-of-service attacks through unsupported registry writes.
These vulnerabilities impact critical infrastructure components in AI/ML clusters, supercomputing environments, and enterprise data centers leveraging NVIDIA’s HGX architecture.
CVE-2024-0114: HMC Privilege Escalation
The HGX Management Controller (HMC), which orchestrates GPU resource allocation and firmware updates across multi-GPU nodes, contains an authentication bypass flaw.
Attackers with administrative access to the Baseboard Management Controller (BMC) typically exposed via IPMI or Redfish interfaces — can escalate privileges to the HMC administrator level. This grants full control over:
- Code Execution: Deploy malicious payloads through HMC’s firmware update mechanisms, compromising all attached GPUs.
- Data Tampering: Modify GPU compute workloads or training datasets in AI pipelines.
- Lateral Movement: Exploit HMC’s intra-node communication (NVLink/NVSwitch) to propagate across GPU clusters.
NVIDIA’s advisory confirms exploit chains could persist across reboots due to HMC’s persistence layer, which stores configuration data in non-volatile flash memory.
CVE-2024-0141: vBIOS Registry Corruption
The GPU vBIOS vulnerability allows tenants with GPU access, such as cloud users or containerized workloads, to write to restricted hardware registers.
This destabilizes the GPU’s power management subsystem (PSTATE transitions) and memory controllers (GDDR6X ECC handlers), triggering systemic faults.
Successful exploits force GPUs into a non-responsive state, requiring physical reseating or BMC-level hard resets to recover.
Affected Firmware Versions and Mitigation
Component | Vulnerable Firmware Versions | Patched Version |
HMC Controller | HGX-22.10-1-rc67 (1.5.0) | 1.6.0+ |
HGX-22.10-1-rc63 (1.4.0) | ||
HGX-22.10-1-rc59 (1.3.2) | ||
GPU vBIOS | All versions prior to 1.6.0 | 1.6.0+ |
Administrators must:
- Isolate BMC Interfaces: Enforce strict network segmentation for IPMI/Redfish endpoints and implement certificate-based authentication.
- Apply Firmware Updates: Use NVIDIA’s nvfwupd utility to deploy HMC 1.6.0+
- Audit Tenant Permissions: Restrict GPU passthrough privileges in virtualized environments to prevent vBIOS exploits
These vulnerabilities underscore systemic risks in computational acceleration platforms where hardware controllers (HMC/BMC) and firmware layers operate with elevated privileges.
The HMC’s role in managing NVIDIA’s NVSwitch Fabric, which interconnects up to 256 GPUs in exascale configurations, makes it a high-value target for APTs seeking to compromise distributed training jobs or exfiltrate proprietary models.
NVIDIA’s response aligns with its Secure Firmware Update framework, which cryptographically signs firmware images using ED25519 keys to prevent tampering.
However, the delayed disclosure timeline CVE-2024-0114 was cataloged by MITRE on 2023-12-02 but only patched in 2025, highlighting challenges in securing complex hardware/software integration points.
As enterprises increasingly rely on GPU clusters for critical workloads, continuous firmware monitoring, and hardware-rooted Zero Trust architectures will be essential to mitigate supply chain risks.
Collect Threat Intelligence on the Latest Malware and Phishing Attacks with ANY.RUN TI Lookup -> Try for free