...
A CAT3650 / CAT3850 switch may experience a kernel level memory leak in the "platform_mgr" process. Leading up to this, there may be platform-level memory "warning" or "critical" events depending on how severe memory depletion is. Eg: %PLATFORM-4-ELEMENT_WARNING:Switch 1 R0/0: smand: 1/RP/0: Used Memory value 91% exceeds warning level 90% %PLATFORM-3-ELEMENT_CRITICAL:Switch 1 R0/0: smand: 1/RP/0: Used Memory value 96% exceeds critical level 95% === The "platform_mgr" memory leak can be verified as follows: === 1) Collect the below output periodically and check whether the "Size" counter increases over time for "platform_mgr" process. Eg: switch# show platform software process list switch active R0 sort memory 2) Enable memory allocation tracking to look for signs of a "callsite" (function) leaking memory as follows: a) Enable the memory allocation tracking debug on the active switch. switch# debug platform software memory platform-mgr switch active R0 alloc callsite start b) Wait approximately 15 minutes, then run the following command. switch# show platform software memory platform-mgr switch active R0 alloc callsite brief c) When done, disable the memory allocation tracking debug. switch# debug platform software memory platform-mgr switch active R0 alloc callsite stop If in step 2b, the "diff_call" counters column, check for any increases over time. The longer this debug runs, the more accurate it is, so it may required to run this for up to an hour or so as needed. If there are large changes in "diff_call", you may be experiencing a memory leak in functions within the "platform-mgr" process as represented by the "callsite" column and this bug may apply. === Optional steps, for TAC debugging if opening a TAC case: === 3) Record the full backtrace of each "callsite" / function leaking memory as follows: a) For any "callsite" value that is rapidly increasing in memory usage in step 2b, configure: switch# debug platform software memory platform-mgr switch active alloc backtrace start depth 10 b) Wait approximately 15 minutes, then run the following command: switch# show platform software memory platform-mgr switch active alloc backtrace c) Disable allocator backtrace recording. switch# debug platform software memory platform-mgr switch active alloc backtrace stop === Example outputs: === switch# show platform software process list switch active R0 sort memory Name Pid PPid Group Id Status Priority Size ------------------------------------------------------------------------------ platform_mgr 13221 11799 13221 S 20 1410670592 switch# show platform software memory platform-mgr switch active R0 alloc callsite brief callsite thread diff_byte diff_call ---------------------------------------------------------- 276283394 20137 1309 12 270278657 20137 57384 1 539985923 20137 4896 9306 <-- leak, after several hours of tracking 270278658 20137 11808 5 539985922 20137 4896 9306 <-- leak, after several hours of tracking
Catalyst 3850/3650 switch running affected software version. This issue does NOT affect switches running 16.5.x , 16.6.x and later releases.
Reload of the affected Switch: 3850#reload slot ? Slot number of RP or line card to purge allocated memory.
A buffer leak related to a specific queue that was not getting free after the message sent to IOSd.