[tahoe-lafs-trac-stream] [Tahoe-LAFS] #1640: the mutable publisher should try harder to place all shares
Tahoe-LAFS
trac at tahoe-lafs.org
Thu Sep 11 22:22:43 UTC 2014
#1640: the mutable publisher should try harder to place all shares
------------------------------------+----------------------------
Reporter: kevan | Owner: nobody
Type: defect | Status: new
Priority: major | Milestone: soon
Component: code-peerselection | Version: 1.9.0
Resolution: | Keywords: mutable upload
Launchpad Bug: |
------------------------------------+----------------------------
Changes (by warner):
* component: unknown => code-peerselection
Old description:
> If a connection error is encountered while pushing a share to a storage
> server, the mutable publisher forgets about the writer object associated
> with the (share, server) placement; this is consistent with the pre-1.9
> publisher, and, in high level terms, means that the publisher views that
> share placement as probably invalid, associating the error with a server
> failure or something like it. The pre-1.9 publisher attempts to find
> another home for the share placed on the broken server. The current
> publisher doesn't.
>
> When I first wrote the publisher, I wanted to support streaming upload of
> mutable files. That made it hard to find a new home for a share placed on
> a broken storage server, since we wouldn't necessarily have all of the
> parts of the share we generated and placed before the failure available
> to upload to a new server. We ended up ditching streaming uploads due to
> other concerns; instead, we write a share all at once, and we have
> everything we will write to a storage server available to us when we
> write. Given this, there's no compelling reason that the publisher
> couldn't attempt to find a new home for shares placed on broken servers.
> Ensuring that all shares are placed if at all possible makes it more
> likely that there will be a recoverable version of the mutable file
> available after an update.
>
> In practical terms, this increases the chance of data loss somewhat,
> proportional to the number of servers that fail during a publish
> operation. If too many storage servers fail during the upload process and
> too much of the initial share placement is lost due to these failures,
> the newly-placed mutable file might not be recoverable. A fix would
> involve a way to change the server associated with a writer after the
> writer is created, and probably some control flow changes to ensure that
> write failures result in shares being reassigned.
New description:
If a connection error is encountered while pushing a share to a storage
server, the mutable publisher forgets about the writer object associated
with the (share, server) placement; this is consistent with the pre-1.9
publisher, and, in high level terms, means that the publisher views that
share placement as probably invalid, associating the error with a server
failure or something like it. The pre-1.9 publisher attempts to find
another home for the share placed on the broken server. The current
publisher doesn't.
When I first wrote the publisher, I wanted to support streaming upload of
mutable files. That made it hard to find a new home for a share placed on
a broken storage server, since we wouldn't necessarily have all of the
parts of the share we generated and placed before the failure available to
upload to a new server. We ended up ditching streaming uploads due to
other concerns; instead, we write a share all at once, and we have
everything we will write to a storage server available to us when we
write. Given this, there's no compelling reason that the publisher
couldn't attempt to find a new home for shares placed on broken servers.
Ensuring that all shares are placed if at all possible makes it more
likely that there will be a recoverable version of the mutable file
available after an update.
In practical terms, this increases the chance of data loss somewhat,
proportional to the number of servers that fail during a publish
operation. If too many storage servers fail during the upload process and
too much of the initial share placement is lost due to these failures, the
newly-placed mutable file might not be recoverable. A fix would involve a
way to change the server associated with a writer after the writer is
created, and probably some control flow changes to ensure that write
failures result in shares being reassigned.
--
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1640#comment:5>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list