...
+ Traffic injected by RSP (this case the affected traffic was TACACS and SNMP and PING) may get dropped. Some source/dst ip/ports got dropped and others don't. ++ SNMP Drops (when the snmp source port is different it may or may not drop): F340.05.11-7600-3#snmp get-next v2c 192.168.1.5 cisco retry 0 timeout 2 oid 1.3.6.1.2.1.2.2.1.2 *Feb 27 19:02:15.338 UTC: SNMP: Get-next request, reqid 25, errstat 0, erridx 0 ifDescr = NULL TYPE/VALUE *Feb 27 19:02:15.342 UTC: SNMP: Packet sent via UDP to 192.168.1.5.161 Timeout ++ When originating from the ASR9K to same destination but varying the source IP/interface some pings fail others don't: RP/0/RSP0/CPU0:ASR9006-E#! WORKING: RP/0/RSP0/CPU0:ASR9006-E#ping 192.168.50.38 source 192.168.10.1 timeout 0 repeat 1 Sending 1, 100-byte ICMP Echos to 192.168.50.38, timeout is 0 seconds: ! Success rate is 100 percent (1/1), round-trip min/avg/max = 1/1/1 ms RP/0/RSP0/CPU0:ASR9006-E#! NOT WORKING RP/0/RSP0/CPU0:ASR9006-E#ping 192.168.50.38 source 192.168.10.17 timeout 0 repeat 1 Sending 1, 100-byte ICMP Echos to 192.168.50.38, timeout is 0 seconds: . Success rate is 0 percent (0/1) RP/0/RSP0/CPU0:ASR9006-E#ping 192.168.50.38 source 192.168.10.37 timeout 0 repeat 1 Sending 1, 100-byte ICMP Echos to 192.168.50.38, timeout is 0 seconds: . + When tracking down the packet loss we see that it may hit one of two NP counters: RSV_DROP_IPV4_TXADJ_NO_MATCH RSV_DROP_IPV4_NRLDI_NOT_LOCAL <<< this means that the traffic is reaching the incorrect NP and that the RLDIM and resulting egress interface is not on this NP. + We also observe that the show cef exact-route is inconsistent between RSP and LC: RP/0/RSP0/CPU0:ASR9006-E#show cef exact-route 192.168.10.1 192.168.50.38 | i via via GigabitEthernet0/3/1/0.19 RP/0/RSP0/CPU0:ASR9006-E#show cef exact-route 192.168.10.1 192.168.50.38 location 0/2/cpu0 | i via via GigabitEthernet0/3/1/1.4 RP/0/RSP0/CPU0:ASR9006-E#show cef exact-route 192.168.10.1 192.168.50.38 location 0/2/cpu0 | i via via GigabitEthernet0/3/1/1.4 RP/0/RSP0/CPU0:ASR9006-E#show cef exact-route 192.168.10.17 192.168.50.38 | i via via TenGigE0/2/1/1 RP/0/RSP0/CPU0:ASR9006-E#show cef exact-route 192.168.10.17 192.168.50.38 location 0/2/cpu0 | i via via GigabitEthernet0/3/1/1.100 RP/0/RSP0/CPU0:ASR9006-E#show cef exact-route 192.168.10.17 192.168.50.38 location 0/3/cpu0 | i via via GigabitEthernet0/3/1/1.100 RP/0/RSP0/CPU0:ASR9006-E#show cef exact-route 189.43.33.26 192.168.50.38 | i via via TenGigE0/2/1/1 RP/0/RSP0/CPU0:ASR9006-E#show cef exact-route 189.43.33.26 192.168.50.38 location 0/2/cpu0 | i via via TenGigE0/2/1/1 RP/0/RSP0/CPU0:ASR9006-E#show cef exact-route 189.43.33.26 192.168.50.38 location 0/3/cpu0 | i via via TenGigE0/2/1/1 RP/0/RSP0/CPU0:ASR9006-E#show cef exact-route 192.168.10.37 192.168.50.38 | i via via GigabitEthernet0/3/1/0.19 RP/0/RSP0/CPU0:ASR9006-E#show cef exact-route 192.168.10.37 192.168.50.38 location 0/2/cpu0 | i via via GigabitEthernet0/3/1/1.100 RP/0/RSP0/CPU0:ASR9006-E#show cef exact-route 192.168.10.37 192.168.50.38 location 0/3/cpu0 | i via via GigabitEthernet0/3/1/1.100
Existence of many ECMP paths, if the number of uncompressed paths surpasses the Max allowed by the RSP, it will result in RSP truncating the "Load Sharing interface" list and ultimately selecting a different interface than the LC. That may result in drops in RSP injected packets as the packet may travel to the wrong NP. The below number of PATHS to the same destination is known to have caused this issue: RP/0/RSP0/CPU0:ASR9006-E#show route 192.168.50.38 Routing entry for 0.0.0.0/0 <<< default route Known via "bgp 65000", Routing Descriptor Blocks 192.168.10.2, from 192.168.10.2, BGP external, BGP multi path Route metric is 0 192.168.10.42, from 192.168.10.42, BGP external, BGP multi path Route metric is 0 192.168.10.6, from 192.168.10.6, BGP external, BGP multi path Route metric is 0 192.168.1.6, from 192.168.1.6, BGP external, BGP multi path <<< This NEXT-HOP is the only not directly connected.It is resolved with the below static route Route metric is 0 192.168.10.14, from 192.168.10.14, BGP external, BGP multi path Route metric is 0 192.168.10.18, from 192.168.10.18, BGP external, BGP multi path Route metric is 0 No advertising protos. RP/0/RSP0/CPU0:ASR9006-E#show route 192.168.1.6 Routing Descriptor Blocks 192.168.10.22 Route metric is 0 192.168.10.26 Route metric is 0 192.168.10.30 Route metric is 0 192.168.10.38 Route metric is 0 192.168.10.34 Route metric is 0 192.168.10.10 Route metric is 0 No advertising protos. ++ The above number of routes, is uncompressed and results in 36 routes (6 static routes + (5 BGP routes*6)) which is over 32 limit for RSP. This can be checked with "show cef 192.168.50.38 det | b Hash OK Interface Address"
+ 1) Remove routes until the RSP no longer reaches the Max Number of Paths and the truncated keyword no longer appears. ++ In this particular case, removing one static route works: RP/0/RSP0/CPU0:ASR9006-E(config)#router static RP/0/RSP0/CPU0:ASR9006-E(config-static)# address-family ipv4 unicast RP/0/RSP0/CPU0:ASR9006-E(config-static-afi)#no 192.168.1.6/32 192.168.10.22 description MH_01988_2035_2036_2037/2036 RP/0/RSP0/CPU0:ASR9006-E#show cef 192.168.50.38 internal IPv4:default:0xe0000000[0x7149e468]:rib:192.168.50.36/30[ref:1 proto:ipv4 flags:0x5000001 f2:0x0 src:rib][0x71e1e92c] loadinfo:[ref:1 pl:75f40bb8 proto:ipv4 type:ip lvl:2 buckets:30 slots:30 fixup:0 flags:[owner locked, recursive, collapsed, added to pl, extension, pd shared]][0x76011ed0] slots: RP/0/RSP0/CPU0:ASR9006-E#show cef 192.168.50.38 internal location 0/2/cPU0 IPv4:default:0xe0000000[0x8785d274]:rib:192.168.50.36/30[ref:1 proto:ipv4 flags:0x5000001 f2:0x0 src:rib][0x88439678] loadinfo:[ref:1 pl:8f8f08e0 proto:ipv4 type:ip lvl:2 buckets:30 slots:30 fixup:0 flags:[owner locked, recursive, collapsed, added to pl, extension, pd shared]][0x90db2a2c] slots: +++ <<< truncated keyword is no longer present and number of paths 30 is equal on both LC and RSP + 2) Install a static route to the particular destinations which are failing via one of the available Next-Hops.
++ Tracing the NP drops on counters RSV_DROP_IPV4_TXADJ_NO_MATCH or RSV_DROP_IPV4_NRLDI_NOT_LOCAL (monitor np counter) and decoding those outputs RP/0/RSP0/CPU0:ASR9006-E#monitor np counter RSV_DROP_IPV4_NRLDI_NOT_LOCAL np2 count 5 location 0/3/CPU0 From Egress Fabric1: 164 byte packet, bytes[0-3] invalid! 0000: c0 24 40 22 02 00 00 00 0f 0e 00 00 00 00 00 00 @$@"............ 0010: 00 00 00 00 ba 06 11 b6 51 11 e5 4a 00 00 17 67 ....:..6Q.eJ...g 0020: 00 00 00 00 07 03 00 02 00 00 00 01 00 00 80 00 ................ 0030: 64 00 00 00 00 ff ff ff fe 0a 00 08 40 00 00 00 d.......~...@... 0040: 45 00 00 64 00 00 00 00 ff 01 10 23 bd 2b 21 1e E..d.......#=+!. 0050: c9 06 04 26 08 00 9d 54 31 bf 00 00 ab cd ab cd I..&...T1?..+M+M 0060: ab cd ab cd ab cd ab cd ab cd ab cd ab cd ab cd +M+M+M+M+M+M+M+M 0070: ab cd ab cd ab cd ab cd ab cd ab cd ab cd ab cd +M+M+M+M+M+M+M+M 0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00a0: 00 00 00 00 .... +++ Below is the decode dropped packet: Decoding as big-endian (network order) DC3 HEADER: ----------- FGID/VQI: 438 VQI valid: 1 <<< PUNT-INJECT EXTENSION HEADER: ----------------------------- raw_hash: 0x514a0000 <<< (n)rldim: 0x17 <<< PUNT-INJECT HEADER: ------------------- output_ifhandle: 0xa000840 np_port: 0 <<< +++ Packet is being injected to VQI 438. The RLDI hash selector is 0x17 (23) and output ifh is 0xa000840 which is Gi0/3/1/0.19. Based on the fact that the calculated RLDI hash selector is 23, output IFH is correct as shown by the software forwarding chain on RP: RP/0/RSP0/CPU0:ASR9006-E#show cef 192.168.50.38 det Load distribution: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ (refcount 1) Hash OK Interface Address - Y GigabitEthernet0/3/1/0.5 remote - Y GigabitEthernet0/3/1/0.5 remote - Y GigabitEthernet0/3/1/0.5 remote - Y GigabitEthernet0/3/1/0.5 remote - Y GigabitEthernet0/3/1/0.5 remote - Y TenGigE0/2/1/1 remote - Y TenGigE0/2/1/1 remote - Y TenGigE0/2/1/1 remote - Y TenGigE0/2/1/1 remote - Y TenGigE0/2/1/1 remote - Y GigabitEthernet0/3/1/0.7 remote - Y GigabitEthernet0/3/1/0.7 remote - Y GigabitEthernet0/3/1/0.7 remote - Y GigabitEthernet0/3/1/0.7 remote - Y GigabitEthernet0/3/1/0.7 remote - Y GigabitEthernet0/3/1/1.10 remote - Y GigabitEthernet0/3/1/0.18 remote - Y GigabitEthernet0/3/1/1.2 remote - Y GigabitEthernet0/3/1/1.3 remote - Y GigabitEthernet0/3/1/1.4 remote - Y GigabitEthernet0/3/1/0.19 remote - Y GigabitEthernet0/3/1/0.19 remote - Y GigabitEthernet0/3/1/0.19 remote - Y GigabitEthernet0/3/1/0.19 remote <<< index 23 - Y GigabitEthernet0/3/1/0.19 remote - Y GigabitEthernet0/3/1/1.100 remote - Y GigabitEthernet0/3/1/1.100 remote - Y GigabitEthernet0/3/1/1.100 remote - Y GigabitEthernet0/3/1/1.100 remote - Y GigabitEthernet0/3/1/1.100 remote +++ VQI value of 438 is also correct for the given output IFH: RP/0/RSP0/CPU0:ASR9006-E# sh controllers pm vqi loc 0/3/cpu0 GigabitEthernet0_3_1_0 | 0xa000180 | 438 | 2 +++ The packet is injected to NP2 because that's where the interface is: RP/0/RSP0/CPU0:ASR9006-E#sh controllers np ports all loc 0/3/cpu0 Thu Jul 20 13:59:03.904 EDT Node: 0/3/CPU0: ---------------------------------------------------------------- NP Bridge Fia Ports -- ------ --- --------------------------------------------------- 0 -- 0 FortyGigE0/3/0/0 1 -- 1 FortyGigE0/3/0/1 2 -- 2 GigabitEthernet0/3/1/0 - GigabitEthernet0/3/1/9 3 -- 3 GigabitEthernet0/3/1/10 - GigabitEthernet0/3/1/19 +++ The problem is that when the packet comes to the LC, RLDI hash selector of 23 does not correspond to the same interface: RP/0/RSP0/CPU0:ASR9006-E#show cef 192.168.50.38 det loc 0/3/cpu0 Load distribution: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ (refcount 1) Hash OK Interface Address - Y GigabitEthernet0/3/1/0.5 192.168.10.2 - Y GigabitEthernet0/3/1/0.5 192.168.10.2 - Y GigabitEthernet0/3/1/0.5 192.168.10.2 - Y GigabitEthernet0/3/1/0.5 192.168.10.2 - Y GigabitEthernet0/3/1/0.5 192.168.10.2 - Y GigabitEthernet0/3/1/0.5 192.168.10.2 - Y TenGigE0/2/1/1 remote - Y TenGigE0/2/1/1 remote - Y TenGigE0/2/1/1 remote - Y TenGigE0/2/1/1 remote - Y TenGigE0/2/1/1 remote - Y TenGigE0/2/1/1 remote - Y GigabitEthernet0/3/1/0.7 192.168.10.6 - Y GigabitEthernet0/3/1/0.7 192.168.10.6 - Y GigabitEthernet0/3/1/0.7 192.168.10.6 - Y GigabitEthernet0/3/1/0.7 192.168.10.6 - Y GigabitEthernet0/3/1/0.7 192.168.10.6 - Y GigabitEthernet0/3/1/0.7 192.168.10.6 - Y GigabitEthernet0/3/1/1.10 192.168.10.34 - Y GigabitEthernet0/3/1/0.18 192.168.10.10 - Y GigabitEthernet0/3/1/1.2 192.168.10.22 - Y GigabitEthernet0/3/1/1.3 192.168.10.26 - Y GigabitEthernet0/3/1/1.4 192.168.10.30 - Y GigabitEthernet0/3/1/14 192.168.10.38 <<< index 23 - Y GigabitEthernet0/3/1/0.19 192.168.10.14 - Y GigabitEthernet0/3/1/0.19 192.168.10.14 - Y GigabitEthernet0/3/1/0.19 192.168.10.14 - Y GigabitEthernet0/3/1/0.19 192.168.10.14 - Y GigabitEthernet0/3/1/0.19 192.168.10.14 - Y GigabitEthernet0/3/1/0.19 192.168.10.14 - Y GigabitEthernet0/3/1/1.100 192.168.10.18 - Y GigabitEthernet0/3/1/1.100 192.168.10.18 - Y GigabitEthernet0/3/1/1.100 192.168.10.18 - Y GigabitEthernet0/3/1/1.100 192.168.10.18 - Y GigabitEthernet0/3/1/1.100 192.168.10.18 - Y GigabitEthernet0/3/1/1.100 192.168.10.18 +++ The selected interface is actually on NP3: RP/0/RSP0/CPU0:ASR9006-E#sh controllers np ports all loc 0/3/cpu0 Thu Jul 20 13:59:03.904 EDT Node: 0/3/CPU0: ---------------------------------------------------------------- NP Bridge Fia Ports -- ------ --- --------------------------------------------------- 0 -- 0 FortyGigE0/3/0/0 1 -- 1 FortyGigE0/3/0/1 2 -- 2 GigabitEthernet0/3/1/0 - GigabitEthernet0/3/1/9 3 -- 3 GigabitEthernet0/3/1/10 - GigabitEthernet0/3/1/19 +++ This is confirmed by the hardware chain for path 23 on LC 0/3/CPU0: fast_switch_entry tx_adj_ptr: 0xce3300 (LE) np_bit_map: 8 <<< rx_adj_field: 1bc if_handle: 0xa000500 nhindex: 250000 prefix_length: 32 +++ np_bit_map of 8 means that provided TX adjacency is for NP3, but the packet is actually going to NP2. That is why it is being dropped. + From the above demonstration we see there is a mismatch in paths (Load Distribuition List on command show cef detail) between RP and LC. + That mismatch is caused by the difference on the supported max # paths between RSP and LC which leads the RSP to truncate the Paths list (Load Distribuition List on command show cef detail). + Ultimately that mismatch causes the RSP to send traffic to the wrong NP resulting in traffic drop.