War stories - VLAN renumbering

For $reasons it happened a few times that I needed to renumber a VLAN in a campus or DC network.

The first time the customer had to remove all the clients in VLAN 1 to comply with the network design standards of the company after a merge with a large international group. In other cases, the customers assigned the same VLAN ID to two sites and needed to link them eventually, with an L2 dark fiber link.

Whatever the $reason was, the request was the same: }how to move all the hosts from one VLAN to the other with the most negligible impact?

During the process the IP addressing of all the hosts and the gateway of the VLAN will not change. In this post I assume the gateway of the VLAN is configured in a SVI.

For both cases, I used a straightforward method I'll describe in this post.

Plan the change window

Despite just a brief interruption of network traffic is expected, I strongly suggest planning a change window and preparing for some dropped packets. Better safe than sorry.

Also be ready to clear the mac address table of some switches due to bugs or undocumented features.

Merge L2

The first step is to merge the two broadcast domains:

  • create the new target VLAN
  • add the new VLAN to all the necessary trunks between the switches
  • configure two access ports (or two port-channels for redundancy), one assigned to the current VLAN, another to the new VLAN
  • disable STP on the ports, enable bpdu filter - whathever is necessary to prevent the ports to move to a blocking state
  • connect a patch to the ports to create the bridge

graph LR; A[port A - access vlan 1] -->|patch/DAC| B[port B - access vlan 2]

We start with all the hosts in the old VLAN and none in the new VLAN. When hosts are reassigned to the new VLAN, all the intra-VLAN traffic between hosts assigned to the two VLANs will flow through the patch used for the bridge. Also, all the traffic to the gateway will use the same link.

Warning

That patch will then be a potential bottleneck and a SPOF. In my case I've used a 10G link or a 40G DAC cable, ymmw. Just be aware of the risk.

Move the hosts

The next step is to move the hosts to the target vlan.

I usually prepare in advance a list of all the access ports assigned to the original VLAN. A quick update of the port configuration is all we need to reassign the hosts to the new vlan.

For VMs, it is just a matter to update the VID in the port group on ESXi. Again a small change that causes a brief interruption of the traffic flow.

Warning

Verify that the new VLAN is allowed on the trunk ports to the ESX host.

For firewalls, NLB, and other equipment it depends. In some cases changing the VLAN ID is very easy, sometimes it requires more effort and some important reconfigurations.

The order in which the hosts will be moved should be planned to minimize service disruption and risk. I usually start with some test/dev hosts to validate the L2 merge and reachability of the gateway and then run some iperf to measure the performances.

I suggest to use some tool like PRTG to monitor the use of the bridge ports between the two VLANs, to notice if the link usage reaches the limit.

If some host exchange a lot of data for DB sync for example, they should be moved together.

Move the gateway

Moving the gateway from the old SVI to the new one is a critical step that must be carefully planned.

Technically we can move the SVI before, during or after reassigning the hosts to the target VLAN.

Info

Most switches use the same mac-address for all the SVI interfaces, so there's no need to refresh the ARP cache on the clients.

I usually don't advise to do that before the reassignment, unless there's a valid reason to do so.

Personally, I prefer to move the gateway when around 50% of the hosts are in the new VLAN, to avoid the bottleneck of the L2 bridge and to verify network reachability.

Movng the gateway as the last step may be a greater risk, in case a rollback is necessary it means reassigning back all the hosts to the original VLAN.

Info

If some hosts send or receive a lot of traffic to/from the gateway, move them together.

Warning

Verify in advance if any redistribution or routing protocol configuration must be updated to advertise the prefix associated with another SVI.

Verify

At the end of the process, we need to confirm that all the hosts moved to the new VLAN. We can do that by checking the mac-address-table of the switch with the bridge: all the mac-addresses should come from the bridge port.

If the campus has many switches, verify them one by one to confirm no access ports are left on the old VLANs and all the mac addresses arrive from the uplinks (💡 this task can be automated with pyATS or Nornir.

Wrap up

Sometimes we need to find creative ways to accomplish a task.

The method I described in this post may not be the most elegant but it served me well multiple times. I've managed to reassign hundreds of ports in a short time with the help of a good plan and some scripts.


Links

Warstories post series