BugZero found this defect 18 days ago.
Data sources
All data on this page is proprietary to BugZero® or gathered from public sources
5/2/2024
HPE Apollo 6500 Gen10 Plus System
No affected releases provided.
No fixed releases provided.
For any HPE ProLiant XL645d Gen10 Plus server running System ROM version A48 3.00_01-26-2024 (or later), and configured with AMD Mi210 GPUs or NVIDIA HGX A100 SXM4 40GB/80GB GPUs, only one GPU will be detected when performing a driver query using ROCm-smi or NVIDIA-smi. As a result, ROCm-smi or NVIDIA-smi will be unable to monitor/manage some GPUs. The BIOS/Platform Configuration (RBSU) and Operating System will detect all GPUs. The below example illustrates this issue when configured with AMD Mi210 GPUs: The below example illustrates this issue when configured with NVIDIA HGX A100 SXM4 40GB/80GB GPUs: The NVIDIA System Management Interface (NVIDIA-smi) is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices. This utility allows administrators to query the GPU device state and with the appropriate privileges, permits administrators to modify the GPU device state. AMD ROCm is an open-source stack, composed primarily of open-source software designed for GPU computation. ROCm consists of a collection of drivers, development tools, and APIs that enable GPU programming from low-level kernel to end-user applications. AMD ROCm System Management Interface (ROCm-smi) enables functionality for clock and temperature management of ROCm-enabled systems.
In the scenario described above, any HPE ProLiant XL645d Gen10 Plus server running System ROM version A48 3.00_01-26-2024 (or later), and configured with AMD Mi210 GPUs or NVIDIA HGX A100 SXM4 40GB/80GB GPUs.
This issue is under investigation. This advisory will be updated when additional information becomes available. If this issue has already occurred, downgrade the System ROM to version A48 2.90_10-27-2023. Note: System ROM versions 3.00_01-26-2024 and 3.00_01-26-2024(B) are considered Recommended. For a list of System ROM fixes, refer to the above link. RECEIVE PROACTIVE UPDATES : Receive support alerts (such as Customer Advisories), as well as updates on drivers, software, firmware, and customer replaceable components, proactively in your e-mail through HPE Support Alerts. Sign up for Support Alerts at the following URL: HPE Email Preference Center NAVIGATION TIP: For hints on navigating HPE.com to locate the latest drivers, patches and other support software downloads, refer to the Navigation Tips document. SEARCH TIP: For hints on locating similar documents on HPE.com, refer to the Search Tips document.