[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better

Wed Aug 12 08:22:00 PDT 2009

#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
--------------------------------+-------------------------------------------
 Reporter:  zooko               |           Owner:           
     Type:  defect              |          Status:  new      
 Priority:  critical            |       Milestone:  undecided
Component:  code-peerselection  |         Version:  1.4.1    
 Keywords:  reliability         |   Launchpad_bug:           
--------------------------------+-------------------------------------------

Comment(by swillden):

 I have an alternative idea which I described on the mailing list:

 http://allmydata.org/pipermail/tahoe-dev/2009-August/002605.html

 In my view shares and servers of happiness are both wrong, though servers
 is less wrong than shares.

 I think the right view is "probability of survival of happiness".  Given
 some assumptions about server reliability, we can calculate for any given
 distribution of shares what the probability of file survival is over a
 given interval.  That means that we can also calculate (perhaps directly,
 or perhaps through a quick numerical search) the number of shares we need
 to distribute in order to achieve a given level of reliability.

 This would fix the original problem of this thread:  If it is impossible
 to distribute enough shares to enough servers to achieve the required
 reliability, then the upload would fail.  Ultimately, I think it's a far
 more useful and flexible approach to the issue.  A future Tahoe
 incarnation that tracks statistics on the availability of peers could
 estimate their reliability individually, and generate and distribute
 additional shares to attain the required file reliability.  Heck, given
 aggregate data from a large enough number of grids, we might be able to
 estimate reliabilities for given hardware/OS configurations to feed into
 the mix. A share on a Debian stable machine with RAID-6 storage and 100
 days of uptime is worth more than a share on a laptop running a copy of
 Vista which was re-installed last week.  Actually implementing the code
 required to gather all of that sort of data and usefully synthesize it
 would be a significant project all on its own, but the point is that it
 would fit within the reliability-based framework, as would anything else
 we could dream up to make reliability estimates more accurate.

 The main reason this is better, though, is that even if the file
 reliability is computed from estimates that are of somewhat questionable
 value, it still gives the end user a better way to know/specify what
 reliability they want to obtain than simply specifying "servers/shares of
 happiness" and fixed FEC parameters.

 If this is of interest, I will be happy to write the code to compute M
 given K, r (the desired reliability threshold) and estimates of server
 reliability.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:5>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid