Symptom
A Catalyst 9k or Catalyst 3k (3850/3650) switch may crash intermittently on 16.12.5
Conditions
This issue can be seen on cat3k with 16.12.5, and cat9k with 16.12.5 and 17.3.1 releases
The trigger for this issue is mac instability/churn. As a result, there are certain activities which cause a large-scale updates to the mac table have an increased chance at triggering this problem.
åç
Examples include, but are not limited to:
1 User triggered clearing of the mac-address table
2 Physical Port bounce that triggers port associated macs to be deleted and re-learned.
3 Spanning-tree topology changes
Topology Changes (TCNs) can trigger a rapid aging of existing mac entries which may be cleared and re-learned, or could cause an L2 reconvergence around a different interface triggering multiple moves/updates for existing mac addresses.
4 Normal MAC entry is getting aged out due to inactivity.
Workaround
This issue is intermittent and will not be seen every time there is a change to the mac table. You can reduce the chances of hitting the issue by reducing the amount of mac churn on the switch.
The following may help reduce the frequency that the issue may be seen.
1. Ensure spanning-tree portfast is properly applied and the L2 network is stable to reduce the number of topology changes on the network.
- TCNs can be traced using "show span de | in exec|occur|from" to see the last port to receive/trigger them
2. Manually increase the mac aging time on the switch from the default (5 minutes) to a higher value such as 30 minutes or larger.
- Silent hosts that are not expected to communicate often can have a mac aging time configured just above the ARP timer so that the gateway will always refresh the ARP entry before it expires, which in turn will update the mac aging time.
3. Do not manually clear the mac-table.
4. Avoid physical link changes where possible which would result in mac add/delete activities. port-channels can be used on redundant interfaces to protect from any one member link triggering a larger change.