[volunteergrid2-l] Gratch going down, briefly
Shawn Willden
shawn at willden.org
Mon Sep 12 18:53:05 PDT 2011
Okay, it's back up.
It turned out to be even smoother than I'd thought. LVM even started the
volume group with the bad PV, it just didn't start the LV with the bad PV.
And then xfs_repair didn't find anything wrong.
Also, I notice that the first of slush's two nodes is now on-line! One of
Caner's is off-line, though (not that I have any room to complain).
On Mon, Sep 12, 2011 at 8:56 AM, Shawn Willden <shawn at willden.org> wrote:
> Sigh.
>
> Gratch is down again. One of the disks in my newly-expanded RAID array
> crapped out before the sync was complete, taking the array offline
> completely and taking out the logical volume containing my VG2 Tahoe node.
> It appears that it was a transient error, but it's the same model of disk
> as the last one that failed. I'll replace it after I get things running
> again. Really, though, it's my fault for adding the array to the volume
> group and extending the logical volume before the sync was complete. RAID-5
> is intended to add reliability, but an unsynced array is in danger of
> failing entirely if any one of the disks has even a transient error, a fact
> of which I'm well aware but stupidly ignored.
>
> I think I can fix it by forcing some things, but I have to get the array
> offline first, which requires taking the volume group offline, which
> requires unmounting all logical volumes... and the system won't let me
> unmount the problem child. I tried to reboot, but it's hung up trying to
> unmount so I'm going to have to hit the Big Red Switch... when I get home.
>
> In case there's someone here with more knowledge of LVM, MD and XFS than I
> have, here's what I'm thinking:
>
> First, here's the way it's set up:
>
> There are a few RAID-5 arrays which are all marked as physical volumes and
> added to one volume group. There are a few logical volumes in that volume
> group, none that are essential to system operation. One is the Tahoe
> volume, which is an XFS file system.
>
> My thought is that the XFS file system should be fine after running
> xfs_repair. The new storage was just added last night and most likely
> hasn't been used to store any files, so nothing should be lost. To be able
> to repair it, I have to get the logical volume back in operational shape,
> but I'm thinking it doesn't matter so much what is on the damaged portion of
> the LV.
>
> So, what I think I'm going to do is to forcibly restart the RAID array with
> "--assume-good" and the disks in the same order (that's important!). This
> should mean that the portion of the array that was synced will be back like
> it was before the failure. The unsynced portion of the array will contain
> random garbage, but LVM won't care in the slightest, because the PV label
> should be present and correct. Once the logical volume is back up, I can
> run xfs_repair. The random garbage will annoy xfs_repair, but not fatally,
> I think.
>
> Any comments/suggestions are welcome.
>
> Worst case, of course, is that the shares I was holding are lost and y'all
> are going to have to run a Tahoe repair. But I'm hopeful it won't come to
> that.
>
>
> On Sun, Sep 11, 2011 at 7:26 PM, Shawn Willden <shawn at willden.org> wrote:
>
>> And now gratch's Tahoe storage is up to 1 TB of RAID-5 storage.
>>
>> Speaking of storage, my survey idea was apparently a complete non-starter,
>> since I didn't get a single response.
>>
>>
>> On Sun, Sep 11, 2011 at 6:38 PM, Shawn Willden <shawn at willden.org> wrote:
>>
>>> Gratch is back up. Was actually up 15 minutes ago.
>>>
>>>
>>> On Sun, Sep 11, 2011 at 6:09 PM, Shawn Willden <shawn at willden.org>wrote:
>>>
>>>> Had a drive fail a couple weeks ago and I'm finally getting around to
>>>> installing the replacement. Shouldn't be down more than a few minutes.
>>>>
>>>> --
>>>> Shawn
>>>>
>>>
>>>
>>>
>>> --
>>> Shawn
>>>
>>
>>
>>
>> --
>> Shawn
>>
>
>
>
> --
> Shawn
>
--
Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/cgi-bin/mailman/private/volunteergrid2-l/attachments/20110912/ffb71db3/attachment.html>
More information about the volunteergrid2-l
mailing list