Backups, we all know we should do them and we all forget some every now and again.
Now for my really important stuff I rely on Time Machine, hasn’t let me down ( touch wood ) and is nice and hassle free. On top of that for my photographs Aperture vaults including one to an iSCSI target on an Openfiler based NAS give me peace of mind.
This NAS is also the primary storage for my VMWare ESX cluster. This cluster runs, among other things, a Linux firewall, infrastructure hosts ( DHCP, DNS, LDAP etc. ) and the ESX Virtual Centre management server.
Without the NAS therefore everything goes dark, including my internet connection which goes through a virtual machine firewall.
The other day everything went dark…..
Now these things happen, the boot disk in the NAS was very old and had died, leaving all the other components running on ESX effectively without their system disks.
In a testament to Linux quite a lot of stuff carried on working on the infrastructure hosts, DHCP, named etc. for some time even though the hosts had lost local disk. However eventually they too died and only then did the situation present itself to the wider world.
This didn’t seem like a big deal as I keep a copy of these critical virtual machines on the local disk of the ESX servers themselves so I could start one up. At this point however ESX virtual centre had been down for some time, so long in fact that the ESX hosts had lost their licenses. Hmm, chicken, egg.
Now I know that in theory running virtual centre and the license server on the ESX infrastructure is supported, I’m just not quite sure how you are supposed to boot strap things and without internet access couldn’t find out.
This led to a very boring process where I rebuilt the NAS system disk, using the installer disk I thankfully left near to it, got it up to a state where I could copy things off it. I then had to copy a 9GB VM over to my Mac and run VC in VMWare fusion, which thankfully worked first time.
This allowed me to start the local disks versions of the infrastructure hosts and start the long job of rebuilding the NAS config, which wasn’t backed up, to resurrect the rest of ESX.
So what does this sorry tail tell us ? One backup all your configs, two plan for the worst and three check you plans. Ok that last one doesn’t flow from this story but it’s good advice anyway !