[tahoe-lafs-trac-stream] [tahoe-lafs] #2106: RAIC behaviour different from RAID behaviour
tahoe-lafs
trac at tahoe-lafs.org
Thu Nov 14 22:53:44 UTC 2013
#2106: RAIC behaviour different from RAID behaviour
--------------------------+--------------------
Reporter: sickness | Owner:
Type: defect | Status: new
Priority: normal | Milestone: 1.11.0
Component: code | Version: 1.10.0
Resolution: | Keywords:
Launchpad Bug: |
--------------------------+--------------------
Comment (by zooko):
sickness: thanks for the detailed description of the issue! I agree with
you that it would be a problem if we got to the end of this story you've
written and lost a file that way.
There are several improvements we can make.
=== improvement 1: let repair improve file health (#1382)
The last chance we have to avoid this fate is in the step where a repair
is attempted when the placement is already:
{{{
SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[ ]
}}}
If we are ever in that state, and a repair (or upload) is attempted, then
a copy of either share1 or share ''must'' be uploaded to SERV4 in order to
improve the health of the file. The #1382 branch (by Mark Berger;
currently in review — ''almost'' ready to commit to trunk!) fixes this, so
that a repair or upload in that case ''would'' upload a share to SERV4.
Note that this improvement "let repair improve file health" is the same
whether the state is:
{{{
SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[ ]
}}}
or:
{{{
SERV1[share1] SERV2[share2] SERV3[share3] SERV4[ ]
}}}
In either case, we want to upload a share to SERV4! The #1382 branch does
this right.
=== improvement 2: launch a repair job when needed (#614)
If a "check" job is running, and it detects a layout like:
{{{
SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[ ]
}}}
or:
{{{
SERV1[share1] SERV2[share2] SERV3[share3] SERV4[ ]
}}}
Then what should it do? Trigger a repair job, or leave well enough alone?
That depends on the user's preferred trade-off between file health and
bandwidth-consumption. If the user has configured the setting that says
"Try to keep the file spread across at least 4 servers", then it will
trigger a repair. If the user has configured it to "Try to keep the file
spread across at least 3 servers", then it will not. (Because to do so
would annoy the user by using up their network bandwidth.)
This is the topic of #614. There is a patch from Mark Berger on that
ticket, but I think there is disagreement or confusion over how it should
work.
=== possible improvement 3: don't put multiple shares on a server (#2107)
Another possible change we could make is in the step where an upload-or-
repair process was running and it saw this state:
{{{
SERV1[share1] SERV2[share2] SERV3[share3] SERV4[XXXXXX]
}}}
and it decided to send an extra share to SERV1, resulting in this state:
{{{
SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[XXXXXX]
}}}
I used to think this was a good idea for the uploader/repairer to do this
(if we would implement improvement 1 and improvement 2 above!), but now
I've changed my mind. I explained on #2107 my current reasoning. Possible
improvement 3 is not provided by the #1382 branch. As far as I understand,
the #1382 branch will go ahead and upload an extra share in this case.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2106#comment:1>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list