We had this issue when suddenly the ESXi network connectivity went out. The ESXi is still up in the console but after testing the management network ping it is failed.
In this case, the HA will trigger a VM server reboot and migrate to the other available ESXi (Check the Production Server)
Steps to troubleshoot.
- Initial troubleshooting. Try to ping the ESXi, and login to ESXi Web UI or SSH. If all these initial things do not work. Apply the second steps
- Login to the blade server console and see the screen if there is any PSOD code reboot the ESXi If there is no PSOD code please login as a root and trigger the Test Management Network. If ping test fails you need to reboot the ESXi
- In our case, we need to reboot the ESXi
- After rebooting the ESXi it returned to a normal state.
Steps for Log Analysis.
- Login to the ESXi Web UI and Generate the Log Bundle
- After downloading the Logs. Extract the files
- Normally the Logs are located on this path C:\esx-hostname-2022-11-25--15.26-2103189\var\run\log
- You need to check first the Activity in the vmkernel.log (Note: The log time is UTC; you need to convert the time according to your time zone.)
- Based on the logs below, it's pointing to a storage or network issue. Also, check the NIC Driver's current version to possibly see if there is a bug that needs to be fixed or if a new release version is available and it's time to upgrade the NIC Driver.
Note: This behavior is normally indicative of a hardware issue, as any issue within the OS would have some warning in the lead-up to the event, and a situation where all logging stops dead at the same time would indicate an issue with either the Storage or Networking.
Comments