Testing Bad RAM to Diagnose Server Crashes

Home, Bangkok, Thailand, 2015-03-21 13:15 +0700

#infrastructure

The new ESXi server that I built some months back has been crashing suddenly and unexpectedly. At first I suspected the system disk where the hypervisor is installed (which is quite old) might be bad and swapped it out but the crashes still occurred. On a hunch that it might be a RAM issue I flashed a MemTest86 USB key and tested the RAM and sure enough:

With more tests I narrowed down on which DIMM was faulty and pulled it along with it’s neighbour (on this machine DIMMs must be installed in pairs) and the system has now been running for a week without crashes.