Opened at 2011-12-17T22:32:42Z
Last modified at 2014-09-11T22:22:43Z
#1640 new defect
the mutable publisher should try harder to place all shares
Reported by: | kevan | Owned by: | nobody |
---|---|---|---|
Priority: | major | Milestone: | soon |
Component: | code-peerselection | Version: | 1.9.0 |
Keywords: | mutable upload | Cc: | zooko |
Launchpad Bug: |
Description (last modified by warner)
If a connection error is encountered while pushing a share to a storage server, the mutable publisher forgets about the writer object associated with the (share, server) placement; this is consistent with the pre-1.9 publisher, and, in high level terms, means that the publisher views that share placement as probably invalid, associating the error with a server failure or something like it. The pre-1.9 publisher attempts to find another home for the share placed on the broken server. The current publisher doesn't.
When I first wrote the publisher, I wanted to support streaming upload of mutable files. That made it hard to find a new home for a share placed on a broken storage server, since we wouldn't necessarily have all of the parts of the share we generated and placed before the failure available to upload to a new server. We ended up ditching streaming uploads due to other concerns; instead, we write a share all at once, and we have everything we will write to a storage server available to us when we write. Given this, there's no compelling reason that the publisher couldn't attempt to find a new home for shares placed on broken servers. Ensuring that all shares are placed if at all possible makes it more likely that there will be a recoverable version of the mutable file available after an update.
In practical terms, this increases the chance of data loss somewhat, proportional to the number of servers that fail during a publish operation. If too many storage servers fail during the upload process and too much of the initial share placement is lost due to these failures, the newly-placed mutable file might not be recoverable. A fix would involve a way to change the server associated with a writer after the writer is created, and probably some control flow changes to ensure that write failures result in shares being reassigned.
Change History (5)
comment:1 Changed at 2011-12-18T00:40:55Z by zooko
- Cc zooko added
comment:2 follow-up: ↓ 4 Changed at 2011-12-18T01:42:39Z by zooko
- Keywords mutable upload added
- Milestone changed from undecided to soon
comment:3 Changed at 2011-12-18T02:46:48Z by kevan
The old publisher won't finish until it has either placed all of its shares somewhere or has tried and failed a certain number of times to do place all of its shares somewhere. In the second case, a failure message is returned. The new publisher will return a failure message if it can't place enough shares for the file to be recoverable. So the robustness criterion in the old publisher is whether all shares are placed somewhere, and the robustness criterion in the new publisher is whether enough shares are placed for the file to be recoverable.
comment:4 in reply to: ↑ 2 Changed at 2011-12-18T18:58:40Z by davidsarah
comment:5 Changed at 2014-09-11T22:22:43Z by warner
- Component changed from unknown to code-peerselection
- Description modified (diff)
I'm not sure this is important enough to warrant trying to fix it in a 1.9.1. That's because a server failing during an upload isn't that common, and if it does happen it isn't that damaging. Or, wait, does mutable upload have a servers-of-happiness-style of check to return a failure message in case the file is not sufficiently robustly stored?