[tahoe-dev] Grid Design Feedback

Mon Jun 27 12:17:27 PDT 2011

On Mon, Jun 27, 2011 at 10:44 AM, Nathan Eisenberg
<nathan at atlasnetworks.us>wrote:

> Expansion factor - again, I may be stuck in regular-filesystem-land, but
> simple replication (for files) -feels- better to me.
>

This is your design, so what feels right to you is what matters, but simple
replication gets you a lot less increase in reliability than broader
distribution.

One thing I have been thinking about in relation to the AMD issue.  Since
> directories are so important, why not handle them differently than files?
>  For example, would it really be too expensive to store all the directory
> shares on all the storage nodes in a K=1, N=$gridsize$ manner?  It just
> seems that this is basic filesystem metadata that should be MORE resilient
> than the files themselves.  They're tiny, so who -really- cares if they're
> stored more diversely than files?

I don't disagree, and I think there's an outstanding bug in Trac to do
something like that, but I think the issue is a little deeper than just
dirnodes.

Suppose allmydata had always replicated every dirnode to every storage node.
 That would have ensured that the directories were available, but allmydata
would still have had the *same* problem with individual files that were
distributed in the normal way.  If eight of 200 servers are down, and there
are enough files stored in the grid, there will be some files that have
shares on those eight servers, and those files will be unavailable.

>From a statistical perspective, the issue is that if you've assured 99.999%
reliability for any given file, but you have 100,000 files, the probability
that ALL of them are available is only 37%.  If you have a million files or
more the probability is basically zero.  Note that those calculations assume
that files live or die independently which isn't true in a small grid, but
when you scale up to hundreds of nodes it becomes close enough.

The fact that dirnodes are so crucial amplified the problem at allmydata
dramatically, but the fundamental problem would have existed regardless.
 Some users' files would have been unavailable.

Without getting into the details, I think the real solution is more
dispersion, putting shares on as many nodes as possible.

-- 
Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20110627/dfdb0ec1/attachment.html>