...
N540-24Z8Q2C-M devices running IOS-XR 7.4.1, all configured with identical ADT/MDT configuration (configuration snippet below), who are regularly having the emsd process crash and restart, roughly every ~8-10 hours. The following log messages are seen each time it crashes: RP/0/RP0/CPU0:2021 Oct 31 19:12:06.826 CDT: dumper[67755]: %OS-SYSLOG-6-LOG_INFO : Dumping core /misc/scratch/core/emsd_18568.by.6.20211031-191206.node0_RP0_CPU0.7aa2f.core.gz RP/0/RP0/CPU0:2021 Oct 31 19:12:32.676 CDT: processmgr[51]: emsd(1) (jid 1106) (pid 18568) (fail_count 2) abnormally terminated, restart scheduled RP/0/RP0/CPU0:2021 Oct 31 19:12:33.509 CDT: emsd[1106]: %MGBL-MDT-5-SUB_PAUSED : Subscription logstash-01 has been paused RP/0/RP0/CPU0:2021 Oct 31 19:12:34.150 CDT: syslog_dev[114]: emsd[1106] PID-2422: Logged into TAM server, Token is 5577006791947779410 RP/0/RP0/CPU0:2021 Oct 31 19:12:34.150 CDT: syslog_dev[114]: emsd[1106] PID-2422: Init:KevlarListCount RP/0/RP0/CPU0:2021 Oct 31 19:12:34.333 CDT: emsd[1106]: %MGBL-EMS-6-EMSD_SERVICE_START : emsd service start RP/0/RP0/CPU0:2021 Oct 31 19:12:34.333 CDT: emsd[1106]: gRPC is secure with TLS RP/0/RP0/CPU0:2021 Oct 31 19:13:34.723 CDT: dumper[159]: %OS-COREHELPER-6-CORE_COPIED : Copied core emsd_18568.by.6.20211031-191206.node0_RP0_CPU0.7aa2f.core.gz to 0/RP0/CPU0:harddisk: RP/0/RP0/CPU0:2021 Oct 31 19:13:36.019 CDT: dumper[159]: %OS-COREHELPER-6-DELETE_CORE : Deleted core file emsd_18568.by.6.20211031-191206.node0_RP0_CPU0.7aa2f.core.gz. [8.5hrs passes] RP/0/RP0/CPU0:2021 Nov 1 03:49:24.855 CDT: dumper[66407]: %OS-SYSLOG-6-LOG_INFO : Dumping core /misc/scratch/core/emsd_2367.by.6.20211101-034924.node0_RP0_CPU0.7aa2f.core.gz RP/0/RP0/CPU0:2021 Nov 1 03:49:50.036 CDT: processmgr[51]: emsd(1) (jid 1106) (pid 2367) (fail_count 2) abnormally terminated, restart scheduled RP/0/RP0/CPU0:2021 Nov 1 03:49:50.681 CDT: emsd[1106]: %MGBL-MDT-5-SUB_PAUSED : Subscription logstash-01 has been paused RP/0/RP0/CPU0:2021 Nov 1 03:49:51.231 CDT: syslog_dev[114]: emsd[1106] PID-17442: Logged into TAM server, Token is 5577006791947779410 RP/0/RP0/CPU0:2021 Nov 1 03:49:51.231 CDT: syslog_dev[114]: emsd[1106] PID-17442: Init:KevlarListCount RP/0/RP0/CPU0:2021 Nov 1 03:49:51.371 CDT: emsd[1106]: %MGBL-EMS-6-EMSD_SERVICE_START : emsd service start RP/0/RP0/CPU0:2021 Nov 1 03:49:51.372 CDT: emsd[1106]: gRPC is secure with TLS RP/0/RP0/CPU0:2021 Nov 1 03:50:49.209 CDT: dumper[159]: %OS-COREHELPER-6-CORE_COPIED : Copied core emsd_2367.by.6.20211101-034924.node0_RP0_CPU0.7aa2f.core.gz to 0/RP0/CPU0:harddisk: RP/0/RP0/CPU0:2021 Nov 1 03:50:50.035 CDT: dumper[159]: %OS-COREHELPER-6-DELETE_CORE : Deleted core file emsd_2367.by.6.20211101-034924.node0_RP0_CPU0.7aa2f.core.gz.
Issue is seen when the telemetry sensor-path is streamed for more 8 to 10 hours on NCS540l Cisco-IOS-XR-telemetry-model-driven-oper:telemetry-model-driven/subscriptions/subscription recently upgraded to version 7.4.1 and with the configuration for ADT/MDT is as follows: ! adt enable ! telemetry model-driven strict-timer destination-group logstash-01 address-family ipv4 10.10.x.y port 57100 encoding self-describing-gpb protocol tcp ! ! destination-group telemetry-01 address-family ipv4 10.10.x.z port 57100 encoding self-describing-gpb protocol tcp ! ! sensor-group adt-events sensor-path Cisco-IOS-XR-adt-oper:adt/adt-output ! sensor-group telemetry-01 As well, the logs are being seen as follows: RP/0/RP0/CPU0:2021 Oct 13 00:22:59.140 CDT: emsd[1106]: %MGBL-MDT-5-SUB_PAUSED : Subscription logstash-01 has been paused RP/0/RP0/CPU0:2021 Oct 13 00:22:59.416 CDT: syslog_dev[114]: emsd[1106] PID-8566: Logged into TAM server, Token is 5577006791947779410 RP/0/RP0/CPU0:2021 Oct 13 00:22:59.416 CDT: syslog_dev[114]: emsd[1106] PID-8566: Init:KevlarListCount RP/0/RP0/CPU0:2021 Oct 13 00:22:59.669 CDT: emsd[1106]: %MGBL-EMS-6-EMSD_SERVICE_START : emsd service start
NA
The crash is seen because of stack corruption, because of use of stale data while going through the loop for generating data.