[tahoe-lafs-trac-stream] [tahoe-lafs] #2106: RAIC behaviour different from RAID behaviour

tahoe-lafs trac at tahoe-lafs.org
Thu Nov 14 22:53:44 UTC 2013


#2106: RAIC behaviour different from RAID behaviour
--------------------------+--------------------
     Reporter:  sickness  |      Owner:
         Type:  defect    |     Status:  new
     Priority:  normal    |  Milestone:  1.11.0
    Component:  code      |    Version:  1.10.0
   Resolution:            |   Keywords:
Launchpad Bug:            |
--------------------------+--------------------

Comment (by zooko):

 sickness: thanks for the detailed description of the issue! I agree with
 you that it would be a problem if we got to the end of this story you've
 written and lost a file that way.

 There are several improvements we can make.

 === improvement 1: let repair improve file health (#1382)

 The last chance we have to avoid this fate is in the step where a repair
 is attempted when the placement is already:
 {{{
  SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[ ]
 }}}

 If we are ever in that state, and a repair (or upload) is attempted, then
 a copy of either share1 or share ''must'' be uploaded to SERV4 in order to
 improve the health of the file. The #1382 branch (by Mark Berger;
 currently in review — ''almost'' ready to commit to trunk!) fixes this, so
 that a repair or upload in that case ''would'' upload a share to SERV4.

 Note that this improvement "let repair improve file health" is the same
 whether the state is:
 {{{
  SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[ ]
 }}}
 or:
 {{{
  SERV1[share1] SERV2[share2] SERV3[share3] SERV4[ ]
 }}}

 In either case, we want to upload a share to SERV4! The #1382 branch does
 this right.

 === improvement 2: launch a repair job when needed (#614)

 If a "check" job is running, and it detects a layout like:
 {{{
  SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[ ]
 }}}
 or:
 {{{
  SERV1[share1] SERV2[share2] SERV3[share3] SERV4[ ]
 }}}
 Then what should it do? Trigger a repair job, or leave well enough alone?
 That depends on the user's preferred trade-off between file health and
 bandwidth-consumption. If the user has configured the setting that says
 "Try to keep the file spread across at least 4 servers", then it will
 trigger a repair. If the user has configured it to "Try to keep the file
 spread across at least 3 servers", then it will not. (Because to do so
 would annoy the user by using up their network bandwidth.)

 This is the topic of #614. There is a patch from Mark Berger on that
 ticket, but I think there is disagreement or confusion over how it should
 work.

 === possible improvement 3: don't put multiple shares on a server (#2107)

 Another possible change we could make is in the step where an upload-or-
 repair process was running and it saw this state:
 {{{
  SERV1[share1] SERV2[share2] SERV3[share3] SERV4[XXXXXX]
 }}}
 and it decided to send an extra share to SERV1, resulting in this state:
 {{{
  SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[XXXXXX]
 }}}
 I used to think this was a good idea for the uploader/repairer to do this
 (if we would implement improvement 1 and improvement 2 above!), but now
 I've changed my mind. I explained on #2107 my current reasoning. Possible
 improvement 3 is not provided by the #1382 branch. As far as I understand,
 the #1382 branch will go ahead and upload an extra share in this case.

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2106#comment:1>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list