...
Memory leak at kernel level with pubd process triggering unexpected reloads on C9800 wireless controllers. Memory leak observed on 17.09.02, 17.09.03 IOS-XE: show platform software status control brief Load Average Slot Status 1-Min 5-Min 15-Min 1-RP0 Healthy 0.82 0.68 0.65 2-RP0 Healthy 0.20 0.46 0.38 Memory (kB) Slot Status Total Used (Pct) Free (Pct) Committed (Pct) 1-RP0 Critical 32356512 31981540 (99%) 374972 ( 1%) 34712256 (107%) 2-RP0 Healthy 32356512 5099088 (16%) 27257424 (84%) 7525456 (23%) show platform software process memory chassis active r0 all sorted Pid RSS PSS Heap Shared Private Name -------------------------------------------------------------------------- 24871 26252896 25930318 25803252 395988 25856908 pubd <<<<<<< RSS value is always increasing. 4499 1256916 1123343 464 177872 1079044 linux_iosd-imag
Memory usage for pubd process is always increasing and unexpected reload is observed once 99% of memory usage is reached due the memory leak condition. Suspected that memory leak is due a periodic telemetry related command being applied remotely
- Reload/Switchover the active controller in a planned maintenance window to recover memory. Or - Remove gRPC subscription just to stop the leak, but not to recover memory. Or - Unconfigure and Configure netconf-yang to restart the pubd process. Something that can be attempted is to subscriptions interval, i.e.: "update-policy periodic 1000" to "update-policy periodic 3000"
The leak is related to telemetry configuration. In the tech reports it can be observed sets of subscriptions. + Commands to monitor memory and track if pubd process leaking memory: show platform software status control brief show platform resources show process memory platform sorted show process memory platform accounting show platform software process memory chassis active r0 all sorted show platform software process memory chassis standby r0 all sorted set platform software trace shell-manager chassis active r0 smand verbose sh logging process sman internal + If detected that pubd is constantly increasing memory usage RSS, clear counters: debug platform software memory mdt-pubd chassis active r0 alloc callsite stop debug platform software memory mdt-pubd chassis active r0 alloc callsite clear debug platform software memory mdt-pubd chassis active r0 alloc callsite start + Collect many samples with the following command and determine which call site has diif_call more significantly increasing: show platform software memory mdt-pubd chassis active r0 alloc callsite brief callsite thread diff_byte diff_call ---------------------------------------------------------------- 0AFFF6C300FCC006 27519 781824 509 9423E051484E0000 27519 524992 26 0AFFF6C300FCC002 27519 524809 15279 <<<<<<<<<<<<<<<<< 108CA0D9C0F0007D 27519 371488 453 + Use callsite id with the following command debug platform software memory mdt-pubd chassis active r0 alloc backtrace start 0AFFF6C300FCC002 depth 10 + And share the below command output with Cisco TAC: show platform software memory mdt-pubd chassis active r0 alloc backtrace