...
Wireless Clients AAA based authentications are failing as C9800 is unable to send radius traffic out to the radius server, with accounting enabled The problem can happen in any of the wncd's or all of the wncd's. How to Identify the Signature? i) execute show aaa servers , and look for "Platform State from WNCD (X) : current DEAD" for all the working servers, here X denotes the wncd number. This alone is not enough as even a legit AAA timeout can also result in server marking DEAD temporarily & later when AAA responds as Status UP. ii) if _ALL_ the servers status is marked DEAD in one or more WNCD's then , this is the condition 100% client authentications will fail. Again if we hit this bug then the Status DEAD will not recover & shows permanently until 9800 reload. iii) if all the servers status is marked DEAD in a subset(lets assume WNCD-1 and WNCD-2) of WNCD's, this means that WNCD-1 and WNCD-2 cannot be used further for any client authentications. iv) Very-IMP : if "ONLY ONE OF THE SERVER" is marked DEAD in any or all of the WNCD, this doesn't indicate that we have hit the problem. Below error is seen when we run into full leak hitting the bug. 2021/08/09 17:58:58.200761 {wncd_x_R0-0}{1}: [radius] [17783]: (ERR): RSPE- Crete New Socket Data : Dynamic socket pool limit is still at Maximmum 96 after clean up In summary : if any WNCD has reached the problematic state, then "ALL THE SERVERS" will be marked DEAD in that WNCD. and supporting CLI's to check for accounting timeout's and retransmits and DEAD state are "show aaa servers" and "show radius statistics" The following output will indicate that this bug has been encountered: From WLC logs: *Aug 9 14:14:03.063: %SESSION_MGR-5-FAIL: Chassis 1 R0/0: wncd: Authorization failed or unapplied for client (66f9.e6b4.041c) on Interface capwap_900001cc AuditSessionID 0A88250A0001E6952B436836. Failure reason: Authc fail. Authc failure reason: AAA Server Down. From WNCD traces: 2021/08/09 17:58:58.200755 {wncd_x_R0-0}{1}: [radius] [17783]: (ERR): RSPE- Get Socket_Fd and Free Identifier : Reset SD_current_index to 2021/08/09 17:58:58.200756 {wncd_x_R0-0}{1}: [radius] [17783]: (ERR): RSPE- Get Free Identifier : All Identifiers are used up | going for next socket 2021/08/09 17:58:58.200757 {wncd_x_R0-0}{1}: [radius] [17783]: (ERR): RSPE- Crete New Socket Data : Dynamic socket pool limit reaced Max : 96 2021/08/09 17:58:58.200757 {wncd_x_R0-0}{1}: [radius] [17783]: (ERR): RSPE- Crete New Socket Data Clean up Idle sockets Now 2021/08/09 17:58:58.200759 {wncd_x_R0-0}{1}: [radius] [17783]: (ERR): RSPE- Delete Idle sockets in a Socket Pool : Input Validation Failed 2021/08/09 17:58:58.200760 {wncd_x_R0-0}{1}: [radius] [17783]: (ERR): RSPE- Delete Idle sockets in a Socket Pool : Input Validation Failed 2021/08/09 17:58:58.200760 {wncd_x_R0-0}{1}: [radius] [17783]: (ERR): RSPE- Delete Idle sockets in a Socket Pool : Input Validation Failed 2021/08/09 17:58:58.200761 {wncd_x_R0-0}{1}: [radius] [17783]: (ERR): RSPE- Delete Idle sockets in a Socket Pool : Input Validation Failed 2021/08/09 17:58:58.200761 {wncd_x_R0-0}{1}: [radius] [17783]: (ERR): RSPE- Crete New Socket Data : Dynamic socket pool limit is still at Maximmum 96 after clean up 2021/08/09 17:58:58.200762 {wncd_x_R0-0}{1}: [radius] [17783]: (ERR): $$$$ RSPE- Crete New Socket Data : Worst case scenario Reached $$$$ 2021/08/09 17:58:58.200762 {wncd_x_R0-0}{1}: [radius] [17783]: (ERR): RSPE- Get Socket_Fd and Free Identifier : Failed to get Free socket and Free Identifier 2021/08/09 17:58:58.200802 {wncd_x_R0-0}{1}: [radius] [17783]: (ERR): could not set the l3_packet info in the socket 2021/08/09 17:58:58.200803 {wncd_x_R0-0}{1}: [radius] [17783]: (ERR): RADIUS(00000000): Sending a IPv4 Radius Packet failed
aaa accounting enabled for wireless client sessions , and accounting-requests timeouts and retransmits due to slow-server.
Even Before reaching the problem. or even half way (meaning if NONE or Some of the WNCD's are in problematic state) i) Disabling the Accounting , only if customer doesn't see any significant impact. once we reached the problem,(it means that all the WNCD's have reached the problematic state.), the only possible solution is to reload. ii) Reload WLC
Due to bug in code during the Accounting timeouts and retransmissions, the identifiers(total 255 IDs per socket) were not released, hence new sockets gets created and reaching the limit of 128 sockets per wncd, and hence further messages cannot be sent as the limit is reach and no identifiers are free. This problem is a regression caused by the commit to support mixed IPv4/IPv6 servers withi "radius-server source-port extended" in effect (CSCvx50397.) This bugfix CSCvz30708 is a partial backout for CSCvx50397. The real fix for this bug, including support for IPv4+IPv6 servers, is committed via CSCvz55484.