<div class="gmail_quote">On Mon, Jun 27, 2011 at 10:44 AM, Nathan Eisenberg <span dir="ltr"><<a href="mailto:nathan@atlasnetworks.us">nathan@atlasnetworks.us</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div class="im">Expansion factor - again, I may be stuck in regular-filesystem-land, but simple replication (for files) -feels- better to me.</div></blockquote><div><br></div><div>This is your design, so what feels right to you is what matters, but simple replication gets you a lot less increase in reliability than broader distribution. </div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">One thing I have been thinking about in relation to the AMD issue.  Since directories are so important, why not handle them differently than files?  For example, would it really be too expensive to store all the directory shares on all the storage nodes in a K=1, N=$gridsize$ manner?  It just seems that this is basic filesystem metadata that should be MORE resilient than the files themselves.  They're tiny, so who -really- cares if they're stored more diversely than files?</blockquote>

<div><br></div><div>I don't disagree, and I think there's an outstanding bug in Trac to do something like that, but I think the issue is a little deeper than just dirnodes.</div><div><br></div><div>Suppose allmydata had always replicated every dirnode to every storage node.  That would have ensured that the directories were available, but allmydata would still have had the *same* problem with individual files that were distributed in the normal way.  If eight of 200 servers are down, and there are enough files stored in the grid, there will be some files that have shares on those eight servers, and those files will be unavailable.</div>

<div><br></div><div>From a statistical perspective, the issue is that if you've assured 99.999% reliability for any given file, but you have 100,000 files, the probability that ALL of them are available is only 37%.  If you have a million files or more the probability is basically zero.  Note that those calculations assume that files live or die independently which isn't true in a small grid, but when you scale up to hundreds of nodes it becomes close enough.</div>

<div><br></div><div>The fact that dirnodes are so crucial amplified the problem at allmydata dramatically, but the fundamental problem would have existed regardless.  Some users' files would have been unavailable.</div>

<div><br></div><div>Without getting into the details, I think the real solution is more dispersion, putting shares on as many nodes as possible.</div><div><br></div></div>-- <br>Shawn<br>