[tahoe-dev] [tahoe-lafs] #1212: Repairing fails if less than 7 servers available

Thu Sep 30 01:17:35 UTC 2010

#1212: Repairing fails if less than 7 servers available
------------------------------+---------------------------------------------
     Reporter:  eurekafag     |       Owner:          
         Type:  defect        |      Status:  closed  
     Priority:  major         |   Milestone:  soon    
    Component:  code-network  |     Version:  1.8.0   
   Resolution:  fixed         |    Keywords:  reviewed
Launchpad Bug:                |  
------------------------------+---------------------------------------------

Comment (by zooko):

 Replying to [comment:12 kevan]:
 > If we do that, we lose the property that the repairer will always try to
 place whichever shares are missing onto *some* storage servers, even if
 the end result isn't optimally distributed.

 Doesn't this mean that {{{H}}} is effectively {{{0}}} for you when you are
 doing this?

 > I can also make my node's repair go for broke with share regeneration by
 changing the value of happiness in {{{tahoe.cfg}}} to be 0. This is a
 chore, but it means that people who really want the repairer to try to
 place new shares regardless of where can still get that behavior.

 Right. If you want this behavior, set {{{H==0}}}. If you want the other
 behavior (abort the repair) set {{{H}}} to something else. With the v1.7.1
 behavior and the current trunk behavior (since
 20100927200102-b8d28-9111a341188a4264e5070f91b52364a2addcb3dc), setting
 {{{H}}} in your {{{tahoe.cfg}}} has no effect on repairer
 behavior—repairer always acts as though {{{H==0}}}.

 > Maybe the best approach is to fix #614 with this in mind. The repairer
 could regenerate and try to place all of the missing shares, as it does
 now, but also tell the caller (in the post repair results) whether the
 repair was ultimately successful or not based on how the shares are
 distributed, using the client's configured happiness value for that check.

 Oh, good catch. Yes, if we fix #614 then repairer would be using {{{H}}}
 (during the check/verify step) to determine whether or not to trigger a
 repair. Once it triggered the repairer, then it could ''also'' use {{{H}}}
 to determine whether to abort the repair, or it could instead treat
 {{{H}}} as effectively {{{0}}} for the purpose of the repair.

 Now that I've thought about it more and read your comments, Kevan, I think
 I agree that we should have the latter behavior, as long as we fix #614 so
 that the output reported by the repairer can be easily understood by the
 user as indicating "unhealthy" when the servers of happiness is less than
 {{{H}}}.

 Oh, in fact, what I ''really'' want is for repairer to ''proceed'' and to
 do its best even if it knows that it can't reach servers of happiness
 greater than or equal to {{{H}}} (instead of aborting the way uploader
 does), but then to return a failure result saying that it wasn't able to
 repair the file back to health.

 Does that make sense?

 Okay, I'm done changing my mind for the moment. What do you think?

 > Edit: I didn't read Zooko's comment closely enough. Is what I describe
 in the third paragraph what the repairer already does? If so, what don't
 you like about that?

 Sorry: I don't understand this question. Hopefully I answered it above.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1212#comment:13>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage