Troubleshoot multiple VM servers unexpected reboot - HA trigger the VM restart

sicnarflatosa
Apr 17, 2022
1 min read

Updated: Apr 21, 2022

To determining if your VMware vSphere HA cluster has experienced a host failure

Steps:

- Collect one of the ESXi VMkernel coredumps

- Use Notpad++ to open the fdm log and search "dead"

Output:

Line 507: 2022-04-12T02:47:10.049Z verbose fdm[2815311] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a] Waited 15 seconds for disk heartbeat for host host-2075951 - declaring dead

Line 669: 2022-04-12T02:47:10.053Z info fdm[2815305] [Originator@6876 sub=Invt opID=SWI-7f9b698e] Host host-2075951 changed state: Dead

Line 752: --> host-2075951: Dead

Line 827: --> host-2075951 state=Dead reservedMem=398505017344 reservedCpu=115750 unreservedMem=197486706688 unreservedCpu=17840 inMaintenanceMode=false

Line 3545: --> state=Dead electionId=1507429880550

Line 7400: --> host-2075951: Dead

Line 7475: --> host-2075951 state=Dead reservedMem=398505017344 reservedCpu=115750 unreservedMem=197486706688 unreservedCpu=17840 inMaintenanceMode=false

Line 10139: --> state=Dead electionId=1507429880550

Line 10572: 2022-04-12T02:53:50.548Z verbose fdm[2815311] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a] Dead slave host-2075951 responded to icmp ping - going to FDMUnreachable

Note: In the log analysis result, one of the host in the cluster dead or manually pull-out to the Chasis or in the power cable

To Justify this. Compare the suspected host logs timeframe ( Base on the log analysis below the host dead causing the HA to restart the VM servers )

- To verify this. Check vmksummary log and sysboot log

Output of the vmksummary log - We can see here that the host booted "2022-04-12T02:55:43Z bootstop: Host has booted"