...
Document Version Release Date Details 3 July 31, 2023 UMCE output values added to Description. 2 August 23, 2022 Updated scope, mlx5_core 5.0-1 Red Hat Enterprise Linux 7.5 through Red Hat Enterprise Linux 7.9 inbox driver. 1 February 25, 2022 Original Document Release. HPE Engineering has observed a "Bank 6" Unexpected Machine Check Exception (UMCE) for a network adapter with the mlx5_core 5.0-1 Red Hat Enterprise Linux 7.5 through Red Hat Enterprise Linux 7.9 inbox driver. This UMCE has been observed when network adapter bonding/teaming is configured. When this occurs, the UMCE output with the following values will be displayed: Uncorrectable Machine Check Exception Bank 0x00000006, Status 0xBB800000'00000E0B Red Hat Enterprise Linux 7.9 - HPE ProLiant DL380 Gen10 server with HPE Ethernet 10/25Gb 2-port SFP28 MCX4121A-ACUT Adapter Mellanox mlx5_core UMCE (Bank 0x00000006, Status 0xB3800000'00000E0B) Critical,1452,5286,0x0005,CPU,0x0003,Hardware,04/19/2022 07:30:34,897: Uncorrectable Machine Check Exception (Processor 1, APIC ID 0x00000000, Bank 0x00000006, Status 0xB3800000'00000E0B, Address 0x00000000'00000000, Misc 0x00000000'10280000). ACTION: Update the system firmware. If the issue persists, contact support. Critical,1452,5296,0x0014,System Error,0x0005,Hardware,04/19/2022 07:30:36,898: Unrecoverable I/O Error has occurred. System Firmware will log additional details in a separate IML message entry if possible. Red Hat Enterprise Linux 7.9 - HPE ProLiant DL380 Gen10 server with 640FLR-SFP28, HPE Ethernet 10/25Gb 2-port SFP28 MCX4121A-ACUT Adapter Mellanox mlx5_core - UMCE (Bank 0x00000006, Status 0xFB800000'00000E0B); PCI Express Error Detected. Embedded Flexible LOM Critical,06/18/2022 17:35:51,Uncorrectable PCI Express Error Detected. Slot 2 (Segment 0x0, Bus 0x60, Device 0x0, Function 0x0). Uncorrectable Error Status: 0x100000 ACTION: Update the firmware of the failing device. If the issue persists, replace the device. Critical,06/18/2022 17:36:48,Uncorrectable Machine Check Exception (Processor 1, APIC ID 0x00000000, Bank 0x00000006, Status 0xFB800000'00000E0B, Address 0x00000000'00000000, Misc 0x00000000'60000000). ACTION: Update the system firmware. If the issue persists, contact support. Red Hat Enterprise Linux 7.9 - HPE ProLiant XL420 Gen10 server with HPE Ethernet 10/25Gb 2-port SFP28 MCX4121A-ACUT Adapter - Mellanox mlx5_core UMCE (Bank 0x00000006, Status 0xFB800000'00000E0B); Uncorrectable PCI Express Error Status: 0x100000 Critical,368,109711,0x0008,PCI Bus,0x0002,Hardware,06/24/2022 14:04:26,342: Uncorrectable PCI Express Error Detected. Slot 1 (Segment 0x0, Bus 0x5B, Device 0x0, Function 0x0). Uncorrectable Error Status: 0x100000 ACTION: Update the firmware of the failing device. If the issue persists, replace the device. Critical,368,109700,0x0005,CPU,0x0003,Hardware,06/24/2022 14:04:26,340: Uncorrectable Machine Check Exception (Processor 1, APIC ID 0x00000000, Bank 0x00000006, Status 0xBB800000'00000E0B, Address 0x00000000'00000000, Misc 0x00000000'5B000000). ACTION: Update the system firmware. If the issue persists, contact support. Red Hat Enterprise Linux 7.7 - HPE Apollo 4200 Gen10 server - Uncorrectable PCI Express Error Detected. Slot 7 (Segment 0x0, Bus 0xAE, Device 0x0, Function 0x0) Critical,102,12838,0x0008,PCI Bus,0x0002,Hardware,09/16/2021 05:57:33,263: Uncorrectable PCI Express Error Detected. Slot 7 (Segment 0x0, Bus 0xAE, Device 0x0, Function 0x0). Uncorrectable Error Status: 0x100000 ACTION: Update the firmware of the failing device. If the issue persists, replace the device. Critical,102,12834,0x0014,System Error,0x0005,Hardware,09/16/2021 05:57:33,262: Unrecoverable I/O Error has occurred. System Firmware will log additional details in a separate IML message entry if possible. Critical,102,12825,0x0005,CPU,0x0003,Hardware,09/16/2021 05:57:29,261: Uncorrectable Machine Check Exception (Processor 2, APIC ID 0x00000020, Bank 0x00000006, Status 0xBB800000'00000E0B, Address 0x00000000'00000000, Misc 0x00000000'AE000000). ACTION: Update the system firmware. If the issue persists, contact support.
Any HPE system when network adapter bonding/teaming is configured with the mlx5_core 5.0-1 Red Hat Enterprise Linux 7.5 through Red Hat Enterprise Linux 7.9 inbox driver.
HPE engineering has confirmed this issue does not occur when using any of the following HPE.com provided drivers: HPE Mellanox RoCE (RDMA over Converged Ethernet) Driver for Mellanox ConnectX-4, ConnectX-5 and ConnectX-6 Adapters for Red Hat Enterprise Linux 7 Update 7 (x86_64) Mellanox ConnectX4/ConnectX5/ConnectX6 Ethernet Driver for Linux Operating system SDR mlx5_core (10-200 G / IB) Mellanox ConnectX4/ConnectX5/ConnectX6 Ethernet Driver for Linux Operating system Mellanox InfiniBand and Ethernet Driver [ConnectX-4 and above] for Red Hat Enterprise Linux 7 Update 9 RECEIVE PROACTIVE UPDATES : Receive support alerts (such as Customer Advisories), as well as updates on drivers, software, firmware, and customer replaceable components, proactively in your e-mail through HPE Support Alerts. Sign up for Support Alerts at the following URL: HPE Email Preference Center NAVIGATION TIP: For hints on navigating HPE.com to locate the latest drivers, patches and other support software downloads, refer to the Navigation Tips document. SEARCH TIP: For hints on locating similar documents on HPE.com, refer to the Search Tips document.