Opened at 2009-02-10T06:17:43Z
Last modified at 2013-11-14T22:15:10Z
#614 new defect
redefine "Healthy" to be "Happy" for checker/verifier/repairer — at Version 19
Reported by: | zooko | Owned by: | markberger |
---|---|---|---|
Priority: | major | Milestone: | soon |
Component: | code-encoding | Version: | 1.3.0 |
Keywords: | upload repair verify preservation performance docs unfinished-business servers-of-happiness | Cc: | amontero@…, cl@… |
Launchpad Bug: |
Description (last modified by markberger)
Part of dreid's performance problem (in addition to the major part: #610, and the other consideration: #613) is that his client is uploading every file he has ever uploaded when the checker reports that the file is not "Healthy", with only 9 shares of the M=10 (K=3). Maybe we should redefine "Healthy" to be 7 shares and let numbers of shares greater than 7 be "super extra Healthy".
I choose 7 because that is the current default value of "shares of happiness". "shares of happiness" is a related notion: when you are doing an upload, if some of the attempts to upload shares fail, and you are left with 7 or more shares at the end, then you report to the user that the upload succeeded. If enough uploads fail so that you can't get more than 6 shares uploaded, then you immediately abort and report to the user that the upload failed. Maybe repairer ought to use the same heuristic as uploader does with regard to how many shares is enough to "call it good".
Change History (19)
comment:1 Changed at 2009-03-24T20:15:00Z by warner
- Component changed from code-network to code-encoding
comment:2 Changed at 2009-12-29T19:12:11Z by zooko
- Summary changed from redefine "Healthy" to be 7 in 3-of-10 encoding to redefine "Healthy" to be "Happy" in 3-of-10 encoding
comment:3 Changed at 2009-12-29T19:12:35Z by zooko
- Summary changed from redefine "Healthy" to be "Happy" in 3-of-10 encoding to redefine "Healthy" to be "Happy" for checker/verifier/repairer
comment:4 Changed at 2009-12-29T23:48:48Z by davidsarah
- Keywords repair verify preservation performance added
comment:5 Changed at 2009-12-29T23:49:24Z by davidsarah
- Keywords upload added
comment:6 Changed at 2010-01-17T14:41:24Z by davidsarah
- Keywords docs added
When this is fixed, remember to change the webapi.txt doc for POST $URL?t=check.
comment:7 Changed at 2010-05-16T05:17:43Z by zooko
- Milestone changed from undecided to 1.7.0
- Owner set to kevan
#778 ("shares of happiness" is the wrong measure; "servers of happiness" is better) is fixed! Now we should change checker/verifier/repairer to report health (for the first two) and to trigger a repair (for the last one) only if the file in question lacks sufficient servers-of-happiness.
Kevan: if you are interested, you could investigate whether this is such an easy change to make that we could squeeze it into the v1.7 release.
(I doubt it.)
comment:8 Changed at 2010-05-16T18:23:53Z by kevan
The checker/verifier would be straightforward to modify. The repairer works by downloading and re-uploading the file, regardless of how healthy it is (relying on the immutable file upload code to not do more work than it needs to), and doesn't explicitly consider file health in doing that. However, from what I can tell, you can't repair an immutable file without first checking its health, and the immutable filenode code won't repair a file unless the Checker/Verifier? says that it is unhealthy, so it is probably enough to just change the definition of health in the Checker/Verifier?.
I might have some time later this week to work on this, but getting it implemented correctly and clearly, testing it thoroughly, and having adequate code review by the 23rd might be optimistic.
comment:9 Changed at 2010-05-16T18:34:33Z by zooko
- Milestone changed from 1.7.0 to 1.8.0
Yeah, let's plan to finish this up in v1.8.0.
comment:10 Changed at 2010-05-17T04:14:23Z by zooko
I'm just wondering if not doing this for v1.7.0 means there is a regression in v1.7.0. I don't think so, but I thought I should write down my notes here to be sure. The notion of whether a file is "healthy" according to checker-verifier and the notion of whether an upload was "successful" according to uploader-repairer differs (also true in earlier releases ever since repairer was originally implemented).
There are two ways the difference could manifest:
- Checker-verifier could say the file is Ok even though uploader-repairer would not be satisfied with it and would either strengthen/rebalance it or report failure.
- Checker-verifier could say that the file is not-Ok even though uploader-repairer would be satisfied with it and would not change it when asked to upload it.
For a while I was worried about the second case as a potential regression in v1.7.0 because the new uploader (and therefore repairer) has more stringent requirements for what constitutes a successful upload than the v1.6 one did. I imagined that maybe every time someone's checker-repairer process ran the checker would report "This is not healthy" and then the repairer would run and do a lot of work but leave the file in such a state that the next time the checker looked at it the checker would still consider it to be unhealthy.
However, I don't believe this is a risk for v1.7 because the new uploader (repairer), while it could be satisfied with fewer shares available than the checker would be, it actually goes ahead and uploads all shares if possible, which would be enough to satisfy the v1.7 checker. In fact, this is the same pattern as the old v1.6 uploader, which would be satisfied with only shares-of-happiness shares being available (which is not enough to satisfy a checker/verifier) but goes ahead and uploads all N shares normally.
So, there's no problem, but I thought I should write all this down just in case someone else detects a flaw in my thinking. Also if we ever implement #946 (upload should succeed as soon as the servers-of-happiness criterion is met) then we'll have to revisit this issue!
comment:11 Changed at 2010-08-10T04:13:13Z by davidsarah
- Keywords unfinished-business added
- Milestone changed from 1.8.0 to 1.9.0
comment:12 Changed at 2010-09-11T00:52:36Z by david-sarah@…
In 31f66c5470641890:
comment:13 Changed at 2010-10-13T23:07:36Z by kevan
See also: #1212, where we discuss and ultimately agree (I think?) on how a happiness-aware repairer/checker ought to work.
comment:14 Changed at 2010-12-29T09:11:54Z by zooko
- Keywords servers-of-happiness added
comment:15 Changed at 2011-10-13T17:03:38Z by warner
- Milestone changed from 1.9.0 to 1.10.0
not making it into 1.9
comment:16 Changed at 2012-03-04T18:41:12Z by amontero
- Cc amontero@… added
comment:17 Changed at 2012-03-22T16:26:58Z by cl
- Cc cl@… added
comment:18 Changed at 2013-03-18T17:10:06Z by zooko
There was discussion of this issue on tahoe-dev: pipermail/tahoe-dev/2013-March/008091.html
comment:19 Changed at 2013-07-15T18:31:47Z by markberger
- Description modified (diff)
- Owner changed from kevan to markberger
I have started working on this ticket. You can view my progress here: https://github.com/markberger/tahoe-lafs/tree/614-happy-is-healthy
As of right now I've only written a couple unit tests that should pass before this ticket is closed. If someone could glance over them to make sure they're implemented correctly, I would really appreciate it. The solution to this problem will probably be a little hacky because h is not stored in the verify cap while k and n are. Whoever is working on the new cap design might want to consider storing h in the verify cap if servers-of-happiness is going to be used with the new caps.
Per #778 ("shares of happiness" is the wrong measure; "servers of happiness" is better), the definition of a "well stored" file should be a file for which there are "servers of happiness" distinct servers such that any K of them are sufficient to recover your file. This would be a good definition for "healthy" in checker/verifier/repairer -- don't bother repairing a file if there are already "servers of happiness" for that file.