[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better
tahoe-lafs
trac at allmydata.org
Sun Aug 16 23:43:40 PDT 2009
#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
--------------------------------+-------------------------------------------
Reporter: zooko | Owner:
Type: defect | Status: new
Priority: critical | Milestone: undecided
Component: code-peerselection | Version: 1.4.1
Keywords: reliability | Launchpad_bug:
--------------------------------+-------------------------------------------
Comment(by kevan):
A summary of the discussion, for those of you following along at home
(zooko,
feel free to add to this if you think I've missed something):
==== The Problem ====
Tahoe-LAFS will store a file on a grid in an unreliable way (specifically,
at least for this bug report, uploading everything associated with a file
{{{f}}} to only one storage node) without
reporting anything to the user.
==== The solution ====
We will change {{{shares.happy}}} in {{{tahoe.cfg}}} to mean
{{{servers_of_happiness}}}.
{{{servers_of_happiness}}} means two things:
1. If a file upload is successful, then shares for that file have gone
to at
least {{{servers_of_happiness}}} distinct storage nodes.
1. If a file upload is successful, then the uploaded file can be
recovered if
no more than {{{servers_of_happiness}}} storage nodes uploaded to in the
initial upload
remain functioning.
Both of these conditions are necessary to solve metacrob's use case.
He should be able to tell Tahoe-LAFS that he does not consider an upload
successful unless shares from that upload were distributed across at least
{{{n}}} servers (> 1 for the bug report, but in general): the first
condition
addresses this.
This is not enough to solve metacrob's use case, though -- if
he has uploaded shares from a file to 5 servers, but cannot recover that
file
unless one particular server is online and working, then he is no better
off
when that server fails than he would be if that server held every share of
his
file. The second condition addresses this.
If we remove the first condition, then {{{servers_of_happiness}}} is
satisfied if
the file is uploaded entirely to only one server (since 1 < "servers of
happiness"): clearly, this is undesirable -- indeed, it is the exact
behavior
mentioned in the bug report.
==== Implementation issues ====
Supporting this in Tahoe-LAFS is fairly trivial if
{{{servers_of_happiness}}} is
greater than or equal to {{{k}}}, the number of distinct shares generated
from a file {{{f}}}
necessary to recover {{{f}}}: the first condition
({{{servers_of_happiness}}}
distinct servers having a distinct share of a file {{{f}}}) implies the
second ({{{servers_of_happiness}}} distinct servers being enough to
reconstruct {{{f}}}), because no more
than {{{servers_of_happiness}}} distinct pieces of {{{f}}} are necessary
to reconstruct
{{{f}}}.
Supporting {{{servers_of_happiness}}} values less than {{{k}}} is harder
-- the first condition no longer implies the second. To see why this is,
consider uploading a file {{{f}}} onto a grid of 10 well-behaved storage
nodes
with encoding parameters ({{{happy=2, k=3, m=10}}}). Suppose that each
storage
node accepts one share. Then each pair of server nodes have only two
distinct
shares between them -- not enough to reconstruct {{{f}}}.
We could support these values if we ensured that some servers had more
than one
share in such a way as to ensure that the cardinality of the set
difference of
the shares held by any {{{servers_of_happiness}}} servers is at least
{{{k}}}, but
this is tricky, and, for the moment, beyond the scope of this ticket. As a
stop-gap, Tahoe-LAFS will fail with an error if asked to upload a file
when a
user has {{{servers_of_happiness}}} set to a value less than {{{k}}}.
The proposed default encoding parameters for {{{servers_of_happiness}}}
are
({{{k=3, happy=7, m=10}}}). One
consequence of these defaults and the stop-gap described above is that
users of small grids (where there are one
or two storage nodes) will by default not be able to upload files unless
they change their
{{{k}}}s to 2 or 1. If bug reports surface about this decision, we'll
revisit
it.
(I'm hoping that there aren't any more points to discuss in there -- my
goal in writing it was to summarize the discussion so that I know what I
need to do in fixing this ticket)
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:19>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list