[volunteergrid2-l] Failure Analysis
Shawn Willden
shawn at willden.org
Wed Feb 2 17:14:43 PST 2011
On Wed, Feb 2, 2011 at 9:26 AM, Jody Harris <jharris at harrisdev.com> wrote:
> What things can take a node offline?
>
> - Node crash (Tahoe falls down)
> - Computer crash
> - ISP outage
> - Power failure - house, neighborhood, city, regional
> - ISP upstream outage (my biggest off-line cause)
>
>
Some others:
- Administrative error (e.g. rm -rf)
- Router failure
- Localized catastrophe (e.g. building burns down)
- Large-scale catastrophe (e.g. major earthquake)
You could argue that the last two are just forms of ISP/power outages, and
perhaps router failure could be lumped in with ISP outage. For that matter,
"Computer crash" can be broken down into failure sub-modes, primarily
failures of different components.
Disk failures are sufficiently common that it might be useful to break them
out, especially for systems with storage architectures that make data loss
more or less likely. For example, my Tahoe node storage currently resides
on a RAID-5 array, but I'm planning to migrate it to a non-redundant LVM
pool (similar to RAID-0), so I'll be going from a storage architecture where
the data loss requires near-simultaneous failures of two of six disks to an
architecture where data will be lost if any one of four disks fails.
If you really want to model the failure modes of the volunteergrid2, I think
taking into account each node's storage architecture is important.
Also, there should probably be two models, one that focuses on permanent
failures that cause data loss, and one that focuses on transient failures
that address data availability (and perhaps another that focuses on write
availability).
If everyone wants to pitch in and help with defining the structure and
content of the models, I've worked out some nice mathematical tools for
translating those models into comprehensive probability estimates.
--
Shawn.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/cgi-bin/mailman/private/volunteergrid2-l/attachments/20110202/98dca387/attachment.html>
More information about the volunteergrid2-l
mailing list