...
BugZero updated this defect 38 days ago.
Due to a flaw in Solid State Drive (SSD) firmware, the SSD internal to 4100 or 9300 security appliance will no longer respond after approximately 3.2 years of cumulative operation. The 4100 or 9300 chassis may no longer pass network traffic. The management console may be responsive, but users with valid credentials may not be able to login. Previously logged in sessions may continue with reduced functionality. Additionally, one or more of the following symptoms may be observed: 1. SSH/HTTPS connections to the chassis fail. 2. The output of the show pmon state command in the local-mgmt command shell shows the failed svc_sam_dme process without a core file: FPR4115-1# connect local-mgmt KSEC-FPR4115-1(local-mgmt)# show pmon state SERVICE NAME STATE RETRY(MAX) EXITCODE SIGNAL CORE ------------ ----- ---------- -------- ------ ---- svc_sam_controller running 0(4) 0 0 no smConLogger running 0(4) 0 0 no svc_sam_dme failed 5(4) 1 0 no <--- The state is failed, and no core file is generated. 3. Changing the command scopes in the FirePOWER eXtensible Operating System (FXOS) command-line interface (CLI) fails with the following error: FPR4115-1# scope ssa Software Error: Exception during execution: [Error: Timed out communicating with DME] <--- 4. If the svc_sam_dme process is failed, and the application instance is restarted (blade or FTD container), after the restart all provisioned interfaces become unassociated from the chassis: firepower# show int ip brief Interface IP-Address OK? Method Status Protocol Internal-Control0/0 unassigned YES unset up up Internal-Data0/0 unassigned YES unset up up Internal-Data0/1 unassigned YES unset up up Internal-Data0/2 169.254.1.1 YES unset up up Internal-Data0/3 unassigned YES unset up up Internal-Data0/4 unassigned YES unset down up Port-channel1 192.0.2.1 unassociated CONFIG down down <------- Port-channel2 unassigned unassociated CONFIG down down <------- Port-channel3 unassigned unassociated unset down down <------- Port-channel4 unassigned unassociated unset admin down down <------- Ethernet1/1 203.0.113.130 unassociated unset down down <------- firepower# show int po1 Interface Port-channel1 "inside", is down, line protocol is down (not associated with Supervisor) <------ The unassociated interface results in impact.
After 28,224 hours (approximately 3.2 years) of accumulated Power On Hours (POH), a memory buffer overrun condition occurs which triggers the firmware event in the SSD. This causes the chassis to become unresponsive until it is power-cycled. No data loss will occur when the memory buffer overrun firmware event occurs. A power-cycle of the chassis restores normal operation of the drive. The drive continues to operate normally for 1008 additional accumulated POH (six weeks), at which time the drive will become unresponsive again. Power-cycling again will re-initiate the 1008 hour window.
A power-cycle of the 4100/9300 chassis is required in order to temporarily recover from this issue. However, this failure will reappear after another 1008 hours of operation. In order to prevent occurrence of this issue and disruption to the network and operations, Cisco recommends to proactively upgrade the SSD firmware before the accumulated uptime reaches 28,224 hours.
Field notice for this Issue with further details and validations can be found here: https://www.cisco.com/c/en/us/support/docs/field-notices/720/fn72077.html PSIRT Evaluation: The Cisco PSIRT has evaluated this issue and determined it does not meet the criteria for PSIRT ownership or involvement. This issue will be addressed via normal resolution channels. If you believe that there is new information that would cause a change in the severity of this issue, please contact psirt@cisco.com for another evaluation. Additional information on Cisco's security vulnerability policy can be found at the following URL: http://www.cisco.com/en/US/products/products_security_vulnerability_policy.html
7.5
ISE Evaluate OpenSSH CVE-2024-6387 "regreSSHion"7.5
Auth Step latency for policy evaluation due to Garbage Collection activity.7.5
Cisco 2800, 3800 and 1560 series APs fail to pass traffic7.5
M500IT Model Solid State Drives on 4100/9300 may go unresponsive after 3.2 Years in service7.5
Access Points stuck in bootloop due to image checksum verification failed7.5
ISE Evaluate OpenSSH CVE-2024-6387 "regreSSHion"7.5
Auth Step latency for policy evaluation due to Garbage Collection activity.7.5
Cisco 2800, 3800 and 1560 series APs fail to pass traffic7.5
M500IT Model Solid State Drives on 4100/9300 may go unresponsive after 3.2 Years in service7.5
Access Points stuck in bootloop due to image checksum verification failed