Overview
Fixed an issue where internal path monitoring failed due to the `sysdagent` not responding
Impact
sysdagent crash / system unresponsive
root_cause
nanosleep in sysdagent caused by stale httpd worker processes sees httpd processes piling up
workaround
No permanent workaround. As a short term solution : Restart websrvr-backend when the following Q glows or high. (it would take around ~4weeks or months to queue up) Check Q on 28260 port on httpd:> show netstat numeric yes programs yes | match 28260.*httpdOR look for netstat or netstat_detail in mp-monitor.log i.e. tcp 1993189 0 127.0.0.1:48524 127.0.0.1:28260 ESTABLISHED 19359/httpd Second column shows the Recv-Q. Check if it is more than 100000 across multiple subsequent outputs. Then restart web-backend: Check Q on 28260 port on sysd:> show netstat numeric yes programs yes | match 28260.*sysdHA sysd-sysd peer connection willalso see hig rx/tx or both i.e. tcp 770066 446064 198.18.31.237:56307 198.18.31.238:28260 ESTABLISHED 11761/sysd Second and Third column shows values > 100k. This may be growing or stuck How to restart the process > debug software restart process web-backend