Troubleshooting VMWare HA -Cannot complete the HA Configuration

What the basic troubleshooting steps in case of HA agent install failed on hosts in HA cluster?


If you are facing any issues related to hosts in  the HA cluster , I would recommend to follow the  below basic 10 troubleshooting steps. Most of the time, This will resolve the issues.


Error message will be similar to the below one

1. Check your environment, if any temporary network problem exists

2. Check the DNS is configured properly

3. Check the vmware HA agent status in ESX host by using below commands

  service vmware-aam status

4. Check the  ESX networks are properly configured  and named exactly as other hosts in the cluster.              otherwise, you will get the below errors while installing or reconfiguring HA agent.

5. Check HA related ports are open in firewall to allow for the communication
   
     Incoming port: TCP/UDP 8042-8045
     Outgoing port: TCP/UDP 2050-2250

6. Try to restart /stop/start the vmware HA agent on the affected host using the below commands.
In addition, u can also try to restart vpxa and management agent in the Host.

service vmware-aam restart

service vmware-aam stop

service vmware-aam start

7. Right Click the affected host and click on “Reconfigure for VMWare HA” to re-install the HA agent that particular host.

8. Remove the affected host from the cluster. Removing ESX host from the cluster will not be allowed untill that host is put into maintenance mode.

9.Alternative solution for 8 step is, Goto cluster settings and uncheck the vmware HA to turnoff the HA in that cluster and re-enable the vmware HA to get the agent installed from the scratch.

10. For further troubleshooting , review the HA logs under /Var/log/vmware/aam directory.

Thanks For Reading!!!!!