Scenario

As most IT professionals I usually configure network devices in a lab environment before the actual installation at customer site.

I try to limit the installation as much as possible to a simple box moving process, spending most of the change window in a previously defined validation process.

In this particular case I deal with a data center core network that includes 8 Nexus 9k switches configured in 4 VPC pairs and a bunch of links between them.

The core backbone alone includes more than 50 cables.

The challenge

Design and configuration apart, this kind of installations has some main physical challenges related to cabling:

  • how to validate the cabling is correct during the lab configuration
  • how to validate the cabling is correct during and after the redundancy/resiliency tests*
  • how to validate the cabling is correct after on-site installation

*Redundancy/resiliency tests include removing cables, introducing errors, creating loops etc., Chaos Monkey style.

The solution

If you read my blog or my tweets you already know I try to automate all the manual and repetitive (read: boring and error-prone) tasks with some scripting.

So how can we check if the cables are connected to the right ports: with CDP of course!

So what I did is a short (less that 100 lines) Python script that:

  • reads current CDP neighbors on all devices
  • compares the current neighbors to a valid neighbor list
  • shows any difference

In the first release of the script I used Netmiko with TextFSM integration that allows to get structured data of CDP neighbors in Python. There are plenty of templates available in the repository, take a look .

The script worked fine but it was quite slow, so in the second release I switched to Nornir , now the results arrive in a few seconds.

The output

Let do it backwards starting with the output of the script.

If the neighbor is found on the correct interface

10.0.0.1 EXPECTED NEIGHBOR SWITCH_02 FOUND ON LOCAL INTERFACE Eth1/54 REMOTE INTERFACE Eth1/54

If the expected neighbor is not found

10.0.0.2 MISSING  NEIGHBOR - EXPECTED SWITCH_03 ON LOCAL INTERFACE Eth1/46 REMOTE INTERFACE Eth1/46

If a neighbor is found but it’s not what was expected

10.0.0.3 CHANGED NEIGHBOR - EXPECTED SWITCH_04 ON LOCAL INTERFACE Eth1/54 REMOTE INTERFACE Eth1/54 BUT FOUND SWITCH_05 ON REMOTE PORT Eth1/54

If the device is missing from the baseline

***** SWITCH_05 MISSING OR INACESSIBLE DEVICE *****

Cool uh? Running the script we can have an overview of the actual cabling and notice any error.

Script details

Let’s start with the script logic and the files.

The script itself

check_topology.py

Nornir inventory files:

hosts.yaml

---
10.255.255.3:
  nornir_host: 10.255.255.3
  groups:
    - cisco_nxos

groups.yaml

---
defaults:
  nornir_username: admin
  nornir_password: sup3rs3cr3t
  nornir_ssh_port: 22
  excludeIface:
    - mgmt0
  excludeCapa:
    - VMware

cisco_nxos:
	nornir_nos: cisco_nxos

This file contains the expected CDP neighbors

expected.yml

This file contains the CDP neighbors obtained from the last read

current.yml

A sample of current.yml

---
10.255.255.3:
- capability: S I
  local_interface: Gig 9/18
  neighbor: SW_CDS
  neighbor_interface: Gig 0/11
  platform: WS-C3560C
- capability: S I
  local_interface: Gig 10/20
  neighbor: SW_FDN
  neighbor_interface: Gig 0/9
  platform: WS-C3560C
- capability: S I
  local_interface: Gig 2/21
  neighbor: SW_CDA
  neighbor_interface: Gig 1/0/24
  platform: WS-C3850-

This file contains all the output logs

nornir.log

First run

In the first run the script reads all the CDP neighbors and check if the file expected.yml already exists. If it doesn’t the current read is written in yaml format. This will be used as a baseline to compare changes in the following executions.

The second run will read again the CDP neighbors of the hosts in the hosts.yaml file and execute the actual compare process.

In file groups.yaml I put some additional variables

  excludeIface:
    - mgmt0
  excludeCapa:
    - VMware

These are used to exclude some interfaces (mgmt0) and some host capabilities (VMware) from the compare process. Add/change based on your use case.

Requirements

The script requires Netmiko with TextFSM templates installed. Follow the instructions HERE to install and configure.

Put the export in your .profile file in home folder to avoid having to set the variable every time.

export NET_TEXTFSM=/path/to/ntc-templates/templates/ 

See it in action

The workflow is:

  1. cable all the switches
  2. edit hosts.yaml with credentials and ip addresses of all the switches under analysis
  3. edit groups.yaml to set username, password and create the groups
  4. run the script to create a baseline
  5. run the script again anytime you need to verify the topology matches the baseline

Sound better than checking CDP every time or blindly trusting the physical connections are right because we never fail.

What else?

The script is available on my GitHub account . Feel free to use it but remember I take no responsibility of any damage casued by it. Use it at your own risk.

The performances are great, without any tuning it can verify hundreds of interfaces few seconds.

If you use it please share your experience and give credits to the author (or Bitcoins ;-) ).

Futher reading

My GitHub account

An Introduction to Nornir By Kirk Byers

Netmiko and TextFSM By Kirk Byers

Nornir on GitHub

Exploring Nornir, the Python Automation Framework by by Patrick Ogenstad

Network to Code slack Channel