[tahoe-dev] How Tahoe-LAFS fails to scale up and how to fix it (Re: Starvation amidst plenty)
Greg Troxel
gdt at ir.bbn.com
Thu Oct 7 12:57:33 UTC 2010
"Zooko O'Whielacronx" <zooko at zooko.com> writes:
> On Fri, Sep 24, 2010 at 5:38 PM, Greg Troxel <gdt at ir.bbn.com> wrote:
>>
>> There's a semi-related reliability issue, which is that a grid of N
>> servers which are each available most of the time should allow a user to
>> store, check and repair without a lot of churn.
>
> Isn't the trouble here that repair increments the version number
> instead of just regenerating missing shares of the same version
> number? And in fact, are you sure that is happening? I haven't checked
> the source, manually tested it, or looked for an automated test of
> that.
I have observed something like
2 shares of N
9 shares of N+1
repair
10 shares of N+2
and then later, some N+1 shares reappear (probably the ones that were on
offline servers during the repair). This happens to me all the time. I
bet you can provoke it on the pubgrid by:
start client
put something out
wait an hour
repair
check that it's ok
restart client
do a check
the restart/wait will cause my servers to come online and go away due to
a suspected firewall problem.
So yes, I agree that the essential trouble is the repairer generating a
new seqN++ instead of just regenerating missing shares.
> Hm, let's see, test_repairer.py [1] tests only the immutable repairer.
> Where are tests of mutable repairer? Aha:
>
> http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/test/test_mutable.py?annotate=blame&rev=4657#L1286
>
> Look at this nice comment:
>
> # TODO: this really shouldn't change anything. When we implement
> # a "minimal-bandwidth" repairer", change this test to assert:
> #self.failUnlessEqual(new_shares, initial_shares)
> # all shares should be in the same place as before
> self.failUnlessEqual(set(initial_shares.keys()),
> set(new_shares.keys()))
> # but they should all be at a newer seqnum. The IV will be
> # different, so the roothash will be too.
>
> Okay, so it sounds like someone (Greg) ought to comment-in that
> selfUnlessEqual() and change the following lines to say that you fail
> if the seqnum is different. :-)
I will try to take a look, but I am already not following with my
limited available brain cycles.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20101007/00246bb3/attachment.pgp>
More information about the tahoe-dev
mailing list