Operational Defect Database

BugZero found this defect 929 days ago.

Hewlett Packard Enterprise | a00119184en_us

Advisory: HPE B-series Switches - Following Best Practice Recommendations on the HPE B-series SN80000B Could Prevent an ASIC on One Blade Causing a Fabric-wide Impact of Traffic

Last update date:

2/28/2024

Affected products:

HPE Storage SAN Director Switch

Affected releases:

No affected releases provided.

Fixed releases:

No fixed releases provided.

Description:

Info

Internal memory failures within an ASIC are rare, but can be impactful if allowed to continue to operate in a degraded state for a length of time. When repeated memory failures are encountered, FOS will detect and alert C3-1006 and C3-1010 log messages on the HPE SN8000B platforms. When this class of repeated memory failures are encountered, traffic across the fabric could be impacted by a single failing ASIC, causing a disruption to multiple hosts and targets. Performance may be impacted causing a significant slowdown or even a stoppage of traffic between certain host and target pairs. Traffic frames that flow through the failing ASIC can be impacted while all other traffic flows will continue without issue. This can make the error difficult for host fail-over management software to detect, and traffic will not be re-routed to alternate paths. This condition is often referred to as a "sick-but-not-dead" condition.

Scope

The HPE SN8000B 8-Slot Director Switch and HPE SB8000B 4-Slot Director Switch are impacted by this issue and could experience this "sick-but-not-dead" condition. Newer generations of HPE B-series switches (Gen 6 and Gen 7) have enhanced detection and automatic recovery logic for internal ASIC memory failures. Uncorrected, impactful memory failures observed on these later generations of hardware are extremely rare.

Resolution

Use the configureChassis CLI command to set the system.blade.bladeFaultOnHwErrMsk to 0x1 within the HPE SN8000B chassis configuration when operating within a fabric that has been designed with multiple alternate paths, has a redundant alternate fabric and storage target access is using multipath I/O software/drivers. Changing this setting will cause a hard fault of the port blade and will trigger the multipath I/O software to easily detect the failing condition and to properly ensure that all host-target traffic is directed to alternate paths and/or alternate fabrics. If this field is set to a value of 0x1, then any nonfatal hardware ASIC data parity error causes the problem blade to be powered off. The default value is 0x0. Setting this value is non-disruptive and does not require a switch reboot to activate. NOTE : Changing this setting to a non-default value is not recommended if your fabric does not have a redundant alternate fabric with storage target access using multipath I/O software/drivers. Example: sw0:FID128:admin>configureChassis Configure... cfgload attributes (yes, y, no, n): [no] ssl attributes (yes, y, no, n):[no] webtools attributes (yes, y, no, n): [no] Custom attributes (yes, y, no, n): [no] system attributes (yes, y, no, n): [no] y system.blade.bladeFaultOnHwErrMsk: (0x0..0x7fffffff) [0x1] system.cpuLoad: (10..121) [121] system.i2cTurboCnfg: (0..2) [1] system.Enable.bladeAutoRecovery (yes, y, no, n): [no] RECEIVE PROACTIVE UPDATES : Receive support alerts (such as Customer Advisories), as well as updates on drivers, software, firmware, and customer replaceable components, proactively in your e-mail through HPE Support Alerts. Sign up for Support Alerts at the following URL: Proactive Updates Subscription Form.

Additional Resources / Links

Original Vendor Announcement

BugZero® Risk Score

What's this?

Coming soon

Status

Unavailable

Search:

...