How to Fix VSAN Cluster Partition in Nested VSAN LAB

Nested ESXi and nested vSphere labs are more flexible to test the product features without having much investment for labs. It helps us to save some hardware costs while setting up the virtualized lab. For lab purposes, most of the admin would like to just clone from an existing Nested ESXi VM that has already been configured to simplify the deployment of additional Nested ESXi VM instances from that. It is the same case with setting up Nested VSAN Lab as well. VSAN Host needs disk addition and other disk configurations in the lab like marking the HDD as Flash if you are setting up All-Flash VSAN cluster. Cloning the already configured ESXi host minimizes the effort and time to set up the nested VSAN lab. You need to take care of some of the important tasks to avoid Network partition issues in the VSAN cluster. In this article, I will explain how to fix the VSAN Cluster Partition occurred on Nested VSAN Lab.

What is vSAN Cluster Partition?

If all ESXi hosts in the VSAN cluster cannot communicate with each other, a vSAN cluster will split into multiple network partitions(sub-groups of ESXi hosts that can talk to each other, but not to other sub-groups). When this occurs, vSAN objects may become unavailable until the network misconfiguration is resolved. For smooth operations of production vSAN clusters, it is very important to have a stable network with no extra network partitions.

How to Fix VSAN Cluster Partition in Nested VSAN Lab

I wish you could follow all the prior steps before cloning the ESXi host for Nested VSAN lab.  William Lam wrote a great post “How to properly Clone nested ESXi“. If you have not followed the article and you received the VSAN Cluster Partition on your VSAN cluster. This article is for you.

To run the VSAN Health check -> Login to vCenter Server using vSphere Web client -> Select VSAN cluster -> Monitor -> Click on “Retest” to initiate the VSAN Health Check. If you received the Error on Network section. Expand the Network Section. If you see vSAN Cluster Partition is marked as “Red”.

Click on vSAN Cluster Partition and it lists out the Partition number of all the ESXi hosts which are part of the same VSAN cluster. In the ideal working environment, all the ESXi host should be part of the same partition.

When setting up VSAN nested Lab, if you have deployed ESXi hosts by cloning the existing host then It may be chance that cloned ESXi VM may have same UUID as the source host. It may create a VSAN Cluster partition. In the VSAN cluster, I have 4 hosts (esxic-lab-1 to esxic-lab-4) but in the screenshot, it displays the duplicate hostnames with duplicate UUID.

VSAN network Partition

 

To confirm the duplicate UUID issue, Login to any of the ESXi hosts which is part of VSAN cluster. Execute the command

esxcli vsan cluster get

You can notice the Local Node UUID and Sub-cluster Member UUIDs have the same UUID. It is the reason for the VSAN cluster partition. Each ESXi host in the VSAN cluster should have unique UUID. Since we have cloned the ESXi host to setup nested VSAN lab, the Same UUID came for the hosts.

VSAN network Partition

You can execute the command to Check the IP and Node UUID of ESXi hosts in VSAN cluster

esxcli vsan cluster unicastagent list

In the below screenshot, Each host in the cluster have different IP address but the UUID is same for the two hosts.

VSAN network Partition

To Fix the Duplicate UUID issue on the host, We need to remove the existing UUID for the ESXi host.

Log in to one of the ESXi hosts which has duplicate UUID, open /etc/vmware/esx.conf and delete the entire /system/uuid line entry as seen in the screenshot below

VSAN network Partition

Alternatively, you can execute the below command without needing to open up VI to remove the UUID manually.

sed -i 's#/system/uuid.*##' /etc/vmware/esx.conf

Once the above changes for removing the existing System UUID for ESXi host is completed, Reboot the ESXi host. Once ost reboot is completed and connected back to vCenter. Re-run the VSAN Health check. All the Health Checks are Green Now and especially “VSAN Cluster Partition”. All the hosts are in same partition number “1”. We are good to deploy workloads on the VSAN cluster.

VSAN network Partition

That’s it. We are done with fixing the VSAN Cluster Partition issue on Nested VSAN lab. I hope this is informative for you. Thanks for Reading !!!. Be social and share it with social media, if you feel worth sharing it.