[tahoe-dev] [tahoe-lafs] #1252: use different encoding parameters for dirnodes than for files

Thu Jan 6 19:17:32 UTC 2011

#1252: use different encoding parameters for dirnodes than for files
-------------------------------+--------------------------------------------
     Reporter:  davidsarah     |       Owner:  davidsarah                                        
         Type:  defect         |      Status:  assigned                                          
     Priority:  major          |   Milestone:  1.9.0                                             
    Component:  code-frontend  |     Version:  1.8.0                                             
   Resolution:                 |    Keywords:  preservation availability dirnodes anti-censorship
Launchpad Bug:                 |  
-------------------------------+--------------------------------------------

Comment (by warner):

 I guess I'm +0 on the general idea of making dirnodes more robust than the
 default, and -0 about the implementation/configuration complexity
 involved.
 If you have a deep directory tree, and the only path from a rootcap to a
 filenode is through 10 subdirectories, then your chances of recovering the
 file are P(recover_dirnode)^10^*P(recover_filenode) . We provision things
 to
 make sure that P(recover_node) is extremely high, but that x^10^ is a big
 factor, so making P(recover_dirnode) even higher isn't a bad idea.

 But I agree that it's a pretty vague heuristic, and it'd be nicer to have
 something less uncertain, or at least some data to work from. I'd bet that
 most people retain a small number of rootcaps and use them to access a
 much
 larger number of files, and that making dirnodes more reliable (at the
 cost
 of more storage space) would be a good thing for 95% of the use cases.
 (note
 that folks who keep track of individual filecaps directly, like a big
 database or something, would not see more storage space consumed by this
 change).

 On the "data to work from" front, it might be interesting if
 {{{tahoe deep-stats}}} built a histogram of node-depth (i.e. number of
 dirnodes traversed, from the root, for each file). With the exception of
 multiply-linked nodes and additional external rootcaps, this might give us
 a
 better notion of how much dirnode reliability affects filenode
 reachability.

 I'll also throw in a +0 for Zooko's deeper message, which perhaps he
 didn't
 state explicitly this particular time, which is that our P(recover_node)
 probability is already above the it-makes-sense-to-think-about-it-further
 threshold: the notion that unmodeled real-world failures are way more
 likely
 than the nice-clean-(artificial) modeled
 all-servers-randomly-independently-fail-simultaneously failures. Once your
 P(failure) drops below 10^5^ or something, any further modeling is just an
 act of self-indulgent mathematics.

 I go back and forth on this: it feels like a good exercise to do the math
 and
 build a system with a theoretical failure probability low enough that we
 don't need to worry about it, and to keep paying attention to that
 theoretical number when we make design changes (e.g. the reason we use
 segmentation instead of chunking is because the math says that chunking is
 highly likely to fail). It's nice to be able to say that, if you have 20
 servers with Poisson failure rates X and repair with frequency Y then your
 files will have Poisson durability Z (where Z is really good). But it's
 also
 important to remind the listener that you'll never really achieve Z
 because
 something outside the model will happen first: somebody will pour coffee
 into
 your only copy of ~/.tahoe/private/aliases, put a backhoe into the DSL
 line
 that connects you to the whole grid, or introduce a software bug into all
 your storage servers at the same time.

 (incidentally, this is one of the big reasons I'd like to move us to a
 simpler storage protocol: it would allow multiple implementations of the
 storage server, in different languages, improving diversity and reducing
 the
 chance of simultaneous non-independent failures).

 So anyways, yeah, I still think reinforcing dirnodes might be a good idea,
 but I have no idea how good, or how much extra expansion is appropriate,
 so
 I'm content to put it off for a while yet. Maybe 1.9.0, but I'd prioritize
 it
 lower than most of the other 1.9.0-milestone projects I can think of.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1252#comment:5>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage