BugZero found this defect 45 days ago.
Data sources
All data on this page is proprietary to BugZero® or gathered from public sources
4/5/2024
HPE Cray Supercomputing EX
HPE Cray supercomputers
HPE Slingshot for HPC Clusters
No affected releases provided.
No fixed releases provided.
An issue exists where links between Nvidia ConnectX-6 and the Slingshot switch will not come up using Hisense Active Optical Cables after upgrading Clusterstor from 4.x to 6.x with Nvidia ConnectX-6 firmware 20.32.1010. Cable vendor confirmed a batch of incorrectly programmed cables. These AOC cables were programmed as Cu thus HPE developed a script to reprogram these cables correctly as AOC.
This advisory applies to all HPE Cray EX Supercomputer and HPE Cray Supercomputer systems with Nvidia ConnectX-6 NIC firmware version 20.32.1010.
HPE has developed a script that can be used on site. The script will check for incorrectly programmed cables. It fixes the issue by programming Extended Specification Compliance code of SFF specification SFF-2024 byte value of 192 to 0x80 with code 0x33 for Active Optical Cable with 50GAUI, 100GAUI-2, or 200GAUI-4 C2M. Currently this batch of cables has byte 192 set as 0x40 which is for Cu cables. hisense_qsfp_update_b192.sh will detect and reprogram above specified fields to identify as https://downloads.hpe.com/pub/softlib2/software1/cd/p1078951391/v246275/hisense_qsfp_update_b192.sh FAQ: 1. Can this be run in a production environment or does it need a maintenance cycle? A maintenance cycle is needed, albeit the link is not coming up so we do not think production would be affected as it would not be part of the fabric. 2. Are there any impacts from running this script? The script updates a byte field in the cable headshell eeprom so there is no impact other than setting the cable eeprom correctly to allow the CX6 card to recognize the cable. 3. What is the best mechanism to reset the link? We recommend rebooting the server. Run hisense_qsfp_update_b192.sh bash script to detect and reprogram the cable to AOC and reboot the server. This will resolve the issue and the link will come up.