...
Congestion observed in Traffic Class 1 (TC_1) of NPU's VOQ 24. #show controllers npu stats voq base 24 instance 2 location 0/4/CPU0 Asic Instance = 2 VOQ Base = 24 ReceivedPkts ReceivedBytes DroppedPkts DroppedBytes ------------------------------------------------------------------- TC_0 = 10 1068 0 0 TC_1 = 4704064 319887818 5296306 360155914 TC_2 = 99866 10086462 0 0 TC_3 = 0 0 0 0 TC_4 = 74 13066 0 0 TC_5 = 0 0 0 0 TC_6 = 0 0 0 0 TC_7 = 0 0 0 0 As a consequence, low priority packets punted to LC CPU will observe intermittent drops. Following command can be used to see which flows are classified as low priority and destined to local LC. RP/0/RP0/CPU0:ios#sh lpts pifib hardware entry brief location 0/1/CPU0 | i "Local LC LOW" Sample output : IPv4 any any any 0 0 any 0 5 Fragment Local LC LOW 0 0 IPv4 any any any 0 1 UNREACH 0 0 ICMP-default Local LC LOW 0 0 >>>>>>> Trace route replies IPv4 any any any 0 1 SRCQUENCH 0 0 ICMP-default Local LC LOW 0 0 IPv4 any any any 0 1 REDIRECT 0 0 ICMP-default Local LC LOW 0 0 IPv4 any any any 0 1 TIMXCEED 0 0 ICMP-default Local LC LOW 0 0 >>>>>Trace route replies IPv4 any any any 0 0 any 0 0 Raw-default Local LC LOW 0 0
Observed on NCS5500 series routers, on NPU which meets both the following conditions : 1. Egress IPv6 ACL is configured on the interfaces 2. IPv6 transit traffic egressing same NPU .
No workaround
On Jericho line cards, whenever egress V6 ACL is enabled, the packets will be recycled through a recycle port to use the ingress TCAM functionalities before egressing from the router. MAC learning is enabled by default on the recycle ports. Because of this HW generated MAC learning packets were getting punted to CPU queue 24. As such these packets are not being used by ACL feature. So , Fix is to disable the MAC learning on recycle ports for ACL feature. Note : This fix doesnt cover "l2vpn bridge domain" use case The following symtom can be used to check if this bug is matched or not. 1. to check the commands below multiple times, if we see drop packet counter increasing in TC_1 of "show controllers npu stats voq base 24" output below, which mean congestion is happening for TC_1 in VOQ 24. #show controllers npu stats voq base 24 instance 2 location 0/4/CPU0 Asic Instance = 2 VOQ Base = 24 ReceivedPkts ReceivedBytes DroppedPkts DroppedBytes ------------------------------------------------------------------- TC_0 = 10 1068 0 0 TC_1 = 4704064 319887818 5296306 360155914 TC_2 = 99866 10086462 0 0 TC_3 = 0 0 0 0 TC_4 = 74 13066 0 0 TC_5 = 0 0 0 0 TC_6 = 0 0 0 0 TC_7 = 0 0 0 0 2. to check the command below multiple times, every time, if we can see big value for counter OLP_PACKET_COUNTER, which mean a lot of MAC learning packet being punted to CPU RP/0/RP0/CPU0:ios#sh controller fia diag 1 " diag counter g cdsp" loc 0/6/cpu0 | inc OLP Tue Nov 17 00:35:26.431 UTC | OLP_PACKET_COUNTER = 705,333 | | EPE_DSCRD_PKT_CNT = 0 | EPE_DSCRD_PKT_CNT = 0 | 3. to check the command below multiple times, if the value for counter RCY_PACKET_CNT_0_0 is big value every time, it mean there a lot of packets are being recycled. RP/0/RP0/CPU0:ios#sh controller fia diag 1 " diag counter g cdsp" loc 0/6/cpu0 | inc RCY Tue Nov 17 00:35:43.855 UTC | RCY_PACKET_CNT_0_0 = 761,788 | RCY_PACKET_CNT_1_0 = 0 | | | RCY_PACKET_CNT_0_1 = 0 | RCY_PACKET_CNT_1_1 = 0 |