Symptom
Following kernel panic logs are seen on one of the N3K-C3548P-XL boxes.
$ %KERN-0-SYSTEM_MSG: [11269926.144849] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0] - kernel
%KERN-0-SYSTEM_MSG: [11269926.144869] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0] - kernel
VDC-1 %$ %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 1, VPC peer keep-alive receive has failed
VDC-1 %$ %KERN-0-SYSTEM_MSG: [11269926.144849] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0] - kernel
VDC-1 %$ %KERN-0-SYSTEM_MSG: [11269926.144869] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0] - kernel
VDC-1 %$ %KERN-0-SYSTEM_MSG: [11269926.148346] nxos_panic: Kernel panic - not syncing: softlockup: hung tasks - kernel
VDC-1 %$ %KERN-0-SYSTEM_MSG: [11269926.148415] ttyS console device is disabled - kernel
VDC-1 %$ %KERN-0-SYSTEM_MSG: [11269926.150327] END: PANIC REPORT GENERATED AT 1689769096 - kernel
Conditions
The issue is seen only on the platform N3K-C3548P-XL
Kernel panic generated after a keep-alive message has failed to be received from the peer.
Kernel panic happened due to softlockup.
Further Problem Description
Collect the following logs which helps in RCA of the issue:
show logging onboard stack-trace
show logging onboard kernel-trace
Once the traces are collected please run the following command to cleanup the logs to avoid the plog partition getting filled:
clear logging onboard stack-trace