...
Due to rare conditions during the bootup of a device, IPC communication between the FMAN-FP and FED is getting disrupted because of a lack of acknowledgments from the FED site. The problem can manifest itself in multiple ways causing general platform programming issues: Pending objects in FMAN, missing ports in HW STP table, general reachability issues, etc Examples of symptoms: switch#show platform software ipc stream-based forwarding-manager f0 connection summary BIPC Summary(Local Location: F0:0): CN - Channel Number, FD - File descriptor, LocPrt - Local Port Num, Status - Status of IPC connection Count - x/y, x=accepted connection, y=closed connection, Txmsgs - Txmsgs count, Rxmsgs- Rxmsgs count, RemLoc - Remote Location CltPrt - Client Port Num, Type - IPC types, ServiceName - IPC Service Name, LocalPortName - Local Port Name ServiceName LocalPortName CN FD LocPrt Status Count Txmsgs Rxmsgs RemLoc CltPrt Type =============================================================================================================================== fman-ui-fp__ripc 4 17 3008 listen 14/13 remote IPC \------------------------------------------------- 95 55 accept / 0 5 R0 41807 remote IPC sman-ui-serv__ripc 32 16 2028 connect 80/61 154769 6 R0 34095 remote IPC iosd-fman-stats__ripc 54 18 2036 connect 80/61 0 0 R0 51417 remote IPC iosd-fman-crypto__ripc 55 19 2040 connect 80/61 0 0 R0 45067 remote IPC iosd-gold__ripc 56 20 2048 connect 80/61 103147 51573 R0 53559 remote IPC smd-fmfp-stats__ripc 67 21 2504 connect 80/61 0 0 R0 59065 remote IPC smd-fmfp-data__ripc 68 22 2500 connect 80/61 0 0 R0 46293 remote IPC fman__ripc 69 23 2004 connect 80/61 120 5244 R0 47371 remote IPC fman-fmrp__ripc 70 24 2012 connect 80/61 0 0 R0 53407 remote IPC fman-iosd__ripc 71 25 2016 connect 80/61 58 1 R0 60919 remote IPC fman-smd__ripc 72 26 2508 connect 80/61 0 0 R0 33991 remote IPC fman-wncmgrd__ripc 73 27 2307 connect 80/61 0 0 R0 33535 remote IPC fman-wncd_0__ripc 74 28 2311 connect 80/61 0 0 R0 34549 remote IPC fman-mobilityd__ripc 75 29 3716 connect 80/61 0 0 R0 39785 remote IPC fman-fed-main-fp__lipc 76 49 0 connect 80/61 0 0 0 local IPC fman-fed-main-fp__lipc 77 50 0 connect 80/61 445 199 0 local IPC << sent 445 however only 199 acknowledged fman-fed-aclmo-fp__lipc 78 51 0 connect 80/61 0 0 0 local IPC fman-fed-aclmo-fp__lipc 79 52 0 connect 80/61 1 1 0 local IPC fman-fed-fnf-fp__lipc 80 53 0 connect 80/61 0 0 0 local IPC fman-fed-fnf-fp__lipc 81 54 0 connect 80/61 1 1 0 local IPC switch#show platform software ipc stream-based fed active connection summary BIPC Summary(Local Location: F0:0): CN - Channel Number, FD - File descriptor, LocPrt - Local Port Num, Status - Status of IPC connection Count - x/y, x=accepted connection, y=closed connection, Txmsgs - Txmsgs count, Rxmsgs- Rxmsgs count, RemLoc - Remote Location CltPrt - Client Port Num, Type - IPC types, ServiceName - IPC Service Name, LocalPortName - Local Port Name ServiceName LocalPortName CN FD LocPrt Status Count Txmsgs Rxmsgs RemLoc CltPrt Type =============================================================================================================================== fman-fed-main-fp__lipc fed_main_thr_socket 3 43 0 listen 2/0 local IPC \------------------------------------------------- 8 49 accept / 0 0 0 local IPC \------------------------------------------------- 9 53 accept / 199 199 0 local IPC << fed 'see only first 199 (messages instead of 445) fman-fed-aclmo-fp__lipc fed_aclmo_thr_socket 4 44 0 listen 2/0 local IPC \------------------------------------------------- 10 54 accept / 0 0 0 local IPC \------------------------------------------------- 11 55 accept / 1 1 0 local IPC fman-fed-fnf-fp__lipc fed_fnf_thr_socket 5 47 0 listen 2/0 local IPC \------------------------------------------------- 12 56 accept / 0 0 0 local IPC \------------------------------------------------- 13 57 accept / 1 1 0 local IPC fed-fed-fp__ripc 6 50 6020 listen 0/0 remote IPC fed-ui-fp__ripc 7 51 2110 listen 177/176 remote IPC \------------------------------------------------- 191 58 accept / 0 5 R0 59853 remote IPC iosd-fed__ripc 1 45 6012 connect 3/0 213144 11 R0 47853 remote IPC iosd-gold__ripc 2 46 2048 connect 3/0 110051 110050 R0 49499 remote IPC sman-ui-serv__ripc 14 60 2028 connect 3/0 154775 4 R0 58611 remote IPC switch#show platform software object-manager f0 statistics Forwarding Manager Asynchronous Object Manager Statistics Object update: Pending-issue: 1009, Pending-acknowledgement: 1085 <<-- a continually increasing number of Pending-acknowledgement FMAN objects Batch begin: Pending-issue: 0, Pending-acknowledgement: 0 Batch end: Pending-issue: 1, Pending-acknowledgement: 0 Command: Pending-acknowledgement: 176 Total-objects: 2173 Stale-objects: 0 Resolve-objects: 0 Childless-delete-objects: 0 Error-objects: 0 Paused-types: 139 Typical Syslog messages representing the issue: *Aug 19 12:03:09.415: %FMFP-3-OBJ_DWNLD_TO_DP_STUCK: R0/0: fman_fp_image: AOM download to Data Plane is stuck for more than 1800 seconds for obj[430] type[28] pending-issue Req-create Issued-none 'Tx Channel CPP_Null, handle 16777228, hw handle 16777228, flag 0x0, dirty hw: 0x1000000000 0x100000 TCP,IPV6 TCP, dirty aom NONE' *Aug 19 12:33:09.415: %FMFP-3-OBJ_ACK_FROM_DP_STUCK: R0/0: fman_fp_image: AOM ack download to Data Plane is stuck for more than 1800 seconds for obj[395] type[26] pending-ack Req-none Issued-create 'intf HundredGigE1/0/26/4, handle 143, hw handle 143, HW dirty: 0x1000000000 0x100000 TCP,IPV6 TCP, AOM dirty NONE' Examples of btrace logs: 2021/09/01 10:19:34.964745 fman_fp [cef] [21849]: UUID: 0, ra: 0, TID: 0 (ERR): Doppler: Unable to send stats query message. rc: Resource temporarily unavailable 2021/09/01 10:19:35.105601 fman_fp [fman_fp] [21849]: UUID: 0, ra: 0, TID: 0 (ERR): fman-fed-main channel is throttled. Feature needs to register backpressure notification 2021/09/01 10:19:35.106068 fman_fp [cef] [21849]: UUID: 0, ra: 0, TID: 0 (ERR): Doppler MCAST mroute: Fail to send mroute stats query message. rc: Resource temporarily unavailable 2021/09/01 10:19:35.106470 fman_fp [cef] [21849]: UUID: 0, ra: 0, TID: 0 (ERR): Doppler MPLS: Fail to send mpls_lsp stats query message. rc: Resource temporarily unavailable (Error (0): [Success]
The issue has been seen on DopplerE based platforms triggered by a reload. (rare conditions - to reproduce the issue in the lab around ~350 reloads is required)
Reload of the device (After the reload chances for the issue to re-appear again are very slim )
NA