[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better

tahoe-lafs trac at allmydata.org
Sun Aug 16 23:43:40 PDT 2009


#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
--------------------------------+-------------------------------------------
 Reporter:  zooko               |           Owner:           
     Type:  defect              |          Status:  new      
 Priority:  critical            |       Milestone:  undecided
Component:  code-peerselection  |         Version:  1.4.1    
 Keywords:  reliability         |   Launchpad_bug:           
--------------------------------+-------------------------------------------

Comment(by kevan):

 A summary of the discussion, for those of you following along at home
 (zooko,
 feel free to add to this if you think I've missed something):

 ==== The Problem ====

 Tahoe-LAFS will store a file on a grid in an unreliable way (specifically,
 at least for this bug report, uploading everything associated with a file
 {{{f}}} to only one storage node) without
 reporting anything to the user.

 ==== The solution ====

 We will change {{{shares.happy}}} in {{{tahoe.cfg}}} to mean
 {{{servers_of_happiness}}}.

 {{{servers_of_happiness}}} means two things:

   1. If a file upload is successful, then shares for that file have gone
 to at
   least {{{servers_of_happiness}}} distinct storage nodes.
   1. If a file upload is successful, then the uploaded file can be
 recovered if
   no more than {{{servers_of_happiness}}} storage nodes uploaded to in the
 initial upload
   remain functioning.

 Both of these conditions are necessary to solve metacrob's use case.

 He should be able to tell Tahoe-LAFS that he does not consider an upload
 successful unless shares from that upload were distributed across at least
 {{{n}}} servers (> 1 for the bug report, but in general): the first
 condition
 addresses this.

 This is not enough to solve metacrob's use case, though -- if
 he has uploaded shares from a file to 5 servers, but cannot recover that
 file
 unless one particular server is online and working, then he is no better
 off
 when that server fails than he would be if that server held every share of
 his
 file. The second condition addresses this.

 If we remove the first condition, then {{{servers_of_happiness}}} is
 satisfied if
 the file is uploaded entirely to only one server (since 1 < "servers of
 happiness"): clearly, this is undesirable -- indeed, it is the exact
 behavior
 mentioned in the bug report.

 ==== Implementation issues ====

 Supporting this in Tahoe-LAFS is fairly trivial if
 {{{servers_of_happiness}}} is
 greater than or equal to {{{k}}}, the number of distinct shares generated
 from a file {{{f}}}
 necessary to recover {{{f}}}: the first condition
 ({{{servers_of_happiness}}}
 distinct servers having a distinct share of a file {{{f}}}) implies the
 second ({{{servers_of_happiness}}} distinct servers being enough to
 reconstruct {{{f}}}), because no more
 than {{{servers_of_happiness}}} distinct pieces of {{{f}}} are necessary
 to reconstruct
 {{{f}}}.

 Supporting {{{servers_of_happiness}}} values less than {{{k}}} is harder
 -- the first condition no longer implies the second. To see why this is,
 consider uploading a file {{{f}}} onto a grid of 10 well-behaved storage
 nodes
 with encoding parameters ({{{happy=2, k=3, m=10}}}). Suppose that each
 storage
 node accepts one share. Then each pair of server nodes have only two
 distinct
 shares between them -- not enough to reconstruct {{{f}}}.

 We could support these values if we ensured that some servers had more
 than one
 share in such a way as to ensure that the cardinality of the set
 difference of
 the shares held by any {{{servers_of_happiness}}} servers is at least
 {{{k}}}, but
 this is tricky, and, for the moment, beyond the scope of this ticket. As a
 stop-gap, Tahoe-LAFS will fail with an error if asked to upload a file
 when a
 user has {{{servers_of_happiness}}} set to a value less than {{{k}}}.

 The proposed default encoding parameters for {{{servers_of_happiness}}}
 are
 ({{{k=3, happy=7, m=10}}}). One
 consequence of these defaults and the stop-gap described above is that
 users of small grids (where there are one
 or two storage nodes) will by default not be able to upload files unless
 they change their
 {{{k}}}s to 2 or 1. If bug reports surface about this decision, we'll
 revisit
 it.

 (I'm hoping that there aren't any more points to discuss in there -- my
 goal in writing it was to summarize the discussion so that I know what I
 need to do in fixing this ticket)

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:19>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list