[volunteergrid2-l] Gratch is down -- hard, and probably for two weeks

Shawn Willden shawn at willden.org
Sun Mar 13 21:47:15 PDT 2011


So, I feel really bad about this, especially since I was driving the
requirement to maintain good uptime, but I don't see a way around it.

Yesterday I updated my file server from Debian Lenny to Squeeze.  I actually
don't know if that had anything to do with what happened, or if it was
coincidental, but this evening the upgrade was completed so I decided to
reboot.  I knew this was a little risky because I'm flying to Colorado in
the morning, but I figured I had several hours to deal with any breakage,
and I have over a decade of Debian upgrades under my belt, so I was
confident I could handle it.

On boot, my BIOS didn't recognize any of my seven SATA drives, just the
lonely IDE drive which wasn't configured as a boot drive.  I reset the BIOS
config to "failsafe defaults" and restarted.  It saw the SATA drives.  When
I tried to boot it found GRUB (with the newly-installed GRUB2 stuff) but got
an error trying to read the partition table of the drive with the root file
system.  I actually had the machine configured to be able to boot off of any
one of four drives (all of which have a partition which is part of a
mirrored array containing the root file system), so by editing the boot
parameters I was able to find a drive it could boot from.

When it came up, though, all of the RAID5 and RAID6 arrays failed to start
because they were unable to find enough drives.

It was able to get to a command line, though, and when I started looking it
quickly became apparent what the source of the problem was:  FOUR of my
seven SATA drives claim to have no partition table.  Apparently the
partition tables have gotten corrupted somehow.  Hopefully that's all that
got lost.  My most crucial data is backed up onto multiple laptop drives,
but there's a LOT of less critical stuff which is nowhere else.

Anyway, I decided to shut the machine off and walk away while I think about
what I need to do, and research the tools that are available to recover lost
partition tables.  My disks all have identical partition tables, so I think
I can just rebuild the tables based on the known config which I can get from
the others.  But I want to proceed very cautiously, and think everything
through THOROUGHLY before I start making any changes.

Which leads back to:  I'm flying to Colorado tomorrow, and won't be back
home for two full weeks.  Eventually I plan to take this machine to
Colorado, but I can't do it now.  So I see no way to avoid a two-week
downtime.

-- 
Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/cgi-bin/mailman/private/volunteergrid2-l/attachments/20110313/46413828/attachment.html>


More information about the volunteergrid2-l mailing list