[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better
tahoe-lafs
trac at allmydata.org
Thu Sep 10 03:20:09 PDT 2009
#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
--------------------------------+-------------------------------------------
Reporter: zooko | Owner: kevan
Type: defect | Status: assigned
Priority: critical | Milestone: 1.5.1
Component: code-peerselection | Version: 1.4.1
Keywords: reliability | Launchpad_bug:
--------------------------------+-------------------------------------------
Comment(by kevan):
(maybe writing up the problem in detail will help me to think of a
solution)
If I understand the code, Tahoe-LAFS (in
[source:src/allmydata/immutable/checker.py at 4045#L287 checker.py]) defines
an immutable file node as healthy if all of the shares originally placed
onto the grid ({{{m}}}) are still available (for some definition of
available, depending on the verify flag), unhealthy if fewer than {{{m}}}
but more than {{{k}}} shares are still available, and unrecoverable if
fewer than {{{k}}} shares are still available.
In [source:src/allmydata/interfaces.py at 4045#L1628], ICheckable defines the
method {{{check_and_repair}}}, which tells the receiving object to check
and attempt to repair (if necessary) itself: this interface and method are
implemented by [source:src/allmydata/immutable/filenode.py at 4045#L181
FileNode], which represents an immutable file node on the Tahoe-LAFS grid.
The check and repair process proceeds something like this (again, if I
understand the logic):
1. A Checker is instantiated and started on the verifycap of the
FileNode.
2. If the results of the Checker indicate that the FileNode is in need
of repair, and that the FileNode can be repaired, a Repairer is
instantiated and started.
3. The results of the repair operation are reported back to the caller.
(I know there's a bit of hand waving in there, but hopefully I got the
gist of it)
The repairer ([source:src/allmydata/immutable/repairer.py at 4045#L14]) is
pretty simple: it downloads the content associated with the FileNode in
the normal way using a DownUpConnector as a target, and then uploads the
DownUpConnector in the normal way. Since
[source:src/allmydata/immutable/repairer.py at 4045#L87 DownUpConnector]
implements [source:src/allmydata/interfaces.py at 4045#L1400
IEncryptedUploadable], it is responsible for providing the encoding and
uploading operations with encoding parameters, including
{{{servers_of_happiness}}}.
The problem that this long-winded comment is getting to is here:
[source:src/allmydata/immutable/download.py at 4048#L869]. The
CiphertextDownloader sets the {{{happy}}} encoding parameter of its target
to be {{{k}}}. Since {{{k}}} can be bigger than
{{{servers_of_happiness}}}, this isn't good. In most cases, the
accompanying comment is right; in the case of a file repair, it isn't,
because the encoding parameters stored in the DownUpConnector are used by
the encoding + upload process.
I think that the CiphertextDownloader should ideally be following the
user's configured {{{happy}}} value instead of setting it to something
else. Where I'm stuck is in figuring out a way to tell it what {{{happy}}}
should be. Some ideas:
* Parse the configuration file: this is straightforward, but ugly,
because it duplicates the configuration file parsing code, and duplicates
it in a part of the program that doesn't really have anything to do with
parsing the configuration file.
* Pass it as a parameter to the Repairer, which then passes it as a
parameter to CiphertextDownloader, which then uses it: but I don't see
where in FileNode I'd get {{{happy}}}.
In any case, that's basically the one stumbling block that I'm aware of in
this ticket.
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:45>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list