[tahoe-dev] How Tahoe-LAFS fails to scale up and how to fix it (Re: Starvation amidst plenty)

Zooko O'Whielacronx zooko at zooko.com
Thu Oct 7 07:01:44 UTC 2010


On Fri, Sep 24, 2010 at 5:38 PM, Greg Troxel <gdt at ir.bbn.com> wrote:
>
> There's a semi-related reliability issue, which is that a grid of N
> servers which are each available most of the time should allow a user to
> store, check and repair without a lot of churn.

Isn't the trouble here that repair increments the version number
instead of just regenerating missing shares of the same version
number? And in fact, are you sure that is happening? I haven't checked
the source, manually tested it, or looked for an automated test of
that.

Hm, let's see, test_repairer.py [1] tests only the immutable repairer.
Where are tests of mutable repairer? Aha:

http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/test/test_mutable.py?annotate=blame&rev=4657#L1286

Look at this nice comment:

            # TODO: this really shouldn't change anything. When we implement
            # a "minimal-bandwidth" repairer", change this test to assert:
            #self.failUnlessEqual(new_shares, initial_shares)
            # all shares should be in the same place as before
            self.failUnlessEqual(set(initial_shares.keys()),
set(new_shares.keys()))
            # but they should all be at a newer seqnum. The IV will be
            # different, so the roothash will be too.

Okay, so it sounds like someone (Greg) ought to comment-in that
selfUnlessEqual() and change the following lines to say that you fail
if the seqnum is different. :-)

Regards,

Zooko

[1] http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/test/test_repairer.py


More information about the tahoe-dev mailing list