How to Replace a Faulty NSX-T Edge Node in Edge Cluster

NSX Edge nodes can be virtual appliances or Bare metal instances. NSX-T Edge nodes are service appliances with pools of capacity, dedicated to running network services that cannot be distributed to the hypervisors.
The NSX-T Edge appliance provides routing services and connectivity (North-South) to networks that are external to the NSX-T environment. An NSX Edge is required if you want to deploy a tier-0 router or a tier-1 router with stateful services such as NAT, DHCP Server, Edge Firewall, etc.

NSX Edges provides routing services and connectivity to networks that are external to the NSX-T deployment. You use an NSX-T Edge for establishing external connectivity from the NSX-T domain by using a Tier-0 router using BGP or static routing.

How to Replace Faulty NSX-T Edge Node in Edge Cluster

Recently I came across a situation in one of my production environments, The Customer has a Tier-0 gateway and edge cluster associated with it. My Tier-0 is Active -Passive node. We noticed an issue of all TCP packets failing when the traffic goes through one of the edge nodes.

To identify the issue, We have manually powered off the active edge node of the Tier-0 gateway. You can take a look at my article How to Identify the Active Edge Node of NSX-T Tier-0/Tier-1 Gateway

Other Edge node in the edge cluster became active and everything started working. No issues were found. This behavior is only observed when the faulty edge node becomes Active. We did involve VMware support for the same. They also suggested replacing the faulty edge node with the new node.

In this article, I will explain to you the detailed step-by-step procedure to replace the faulty NSX-T edge node in the Edge cluster.

Note: If the NSX Edge node to be replaced is not running, the new NSX Edge node can have the same name, management IP address, and TEP IP address. If the NSX Edge node to be replaced is running, the new NSX Edge node must have a different name, management IP address and TEP IP address.

In my case, My faulty edge node is edgenode-02a which is currently up and running. So I have deployed the edgenode-03a with the same configuration as edgenode2a but with a different name, Management IP, and TEP IP (Auto assigned by IP Pool).

You can also verify the edge nodes under Fabric -> Node -> Edge Transport Nodes which are part of the Edge cluster. In my case , Edgenode01a and 02a is part of the edge cluster “EdgeCluster-01a”

Replace Faulty NSX-T Edge Node

You can also validate the same from the Edge cluster view. Expand fabric -> Edge Clusters -> Click cluster name. It will show the transport node (edge nodes) part of this edge cluster.

Replace Faulty NSX-T Edge Node

Before replacing the faulty edge with a new node, We have to place the faulty edge node “edgenode-02a” into NSX Maintenance Mode.

To place the NSX-T edge node into maintenance mode, Select the NSX-T Edge node -> Select “Enter NSX Maintenance Mode” under actions

Replace Faulty NSX-T Edge Node

Click YES to confirm to keep the edge node ” edgenode-02a” in NSX Maintenance Mode.

Replace Faulty NSX-T Edge Node

Faulty edge node edgenode-02a is entered into “NSX Maintenance Mode” and Node status is showing as “Down” as well.

Replace Faulty NSX-T Edge Node

To replace the faulty edge node. Go to the Edge cluster from System -> Fabric ->Edge Cluster -> Select the Edge cluster “EdgeCluster-01a”.

Replace Faulty NSX-T Edge Node

Select Replace Edge Cluster Member under Actions.

Replace Faulty NSX-T Edge Node

Select the faulty NSX-T edge node from the drop-down under the Replace option. In my case, select Edgenode-02a.

Replace Faulty NSX-T Edge Node

Select the newly deployed edge node from the drop-down under With Option. In my case, my newly deployed edge node is edgnode-03a and click Save.

Replace Faulty NSX-T Edge Node

After the edge node replacement, edge node edgnode-01a and edgenode-03a is part of the edge cluster “Edgecluster-01a” now and Tunnel is up and also Node status also became UP.

Replace Faulty NSX-T Edge Node

You can also validate the edge cluster members from the edge cluster view. Now edge cluster members are edgnode-01a and edgenode-03a.

Replace Faulty NSX-T Edge Node

Post Faulty Edge node replacement, you can also validate the traffic and Tier-0 gateway status. In my case, Everything turned be healthy. That’s it. We are done with replacing the faulty NSX-T edge node in the Edge cluster.

You can also watch the detailed step-by-step video on How to Replace the Faulty NSX-T Edge node in the NSX-T edge cluster from my YouTube channel.

I hope this is informative for you. Thanks for Reading!!. Be social and share it with social media, if you feel worth sharing it.