[tahoe-lafs-trac-stream] [tahoe-lafs] #540: inappropriate "uncoordinated write error" after handling a server failure
tahoe-lafs
trac at tahoe-lafs.org
Tue Nov 13 23:27:23 UTC 2012
#540: inappropriate "uncoordinated write error" after handling a server failure
-------------------------+-------------------------------------------------
Reporter: warner | Owner: kevan
Type: defect | Status: new
Priority: normal | Milestone: soon
Component: code- | Version: 1.2.0
mutable | Keywords: availability upload ucwe test-
Resolution: | needed
Launchpad Bug: |
-------------------------+-------------------------------------------------
Changes (by zooko):
* priority: critical => normal
Old description:
> I noticed the automated "speedtest" failing with an unexpected
> Uncoordinated Write Error for the past few days. There were several
> issues involved, but the one for this ticket is as follows:
>
> * mutable publish assigns shares to servers, sends out requests. Let's
> say that share 1 goes to server A, and share 2 goes to server B.
> * for whatever reason, server A returns an error
> * the publish process must find a new server for share 1, say it picks B
> * the publish process sends a readv-and-testv-and-writev for share 1 to
> server B
> * '''but''', it uses the same test vector that it used for the first
> request (the one that wrote share 2), which includes a clause that says
> "the server should not have any unknown shares". This probably only hits
> when we're first creating the mutable file.
> * server B receives the request for share 2, and accepts it, and
> responds with success
> * server B then receives the request for share 1, looks at the test
> vector, says "hey, but I already have a share (i.e. share 2)", so the
> test vector does not match, so the write is rejected
> * the publish process sees the rejected write and concludes that someone
> else must have written a share at the same time, so it throws
> Uncoordinated Write Error
>
> So really the sole publisher is colliding with themselves.
>
> I think the fix would be to have the publisher keep track of which share
> requests it has sent, perhaps in the servermap (as "pending writes", or
> "proposed writes"). When the second writev request is generated, it
> should build a test vector based upon the pending write (so it includes
> share2).
New description:
I noticed the automated "speedtest" failing with an unexpected
Uncoordinated Write Error for the past few days. There were several issues
involved, but the one for this ticket is as follows:
* mutable publish assigns shares to servers, sends out requests. Let's
say that share 1 goes to server A, and share 2 goes to server B.
* for whatever reason, server A returns an error
* the publish process must find a new server for share 1, say it picks B
* the publish process sends a readv-and-testv-and-writev for share 1 to
server B
* '''but''', it uses the same test vector that it used for the first
request (the one that wrote share 2), which includes a clause that says
"the server should not have any unknown shares". This probably only hits
when we're first creating the mutable file.
* server B receives the request for share 2, and accepts it, and responds
with success
* server B then receives the request for share 1, looks at the test
vector, says "hey, but I already have a share (i.e. share 2)", so the test
vector does not match, so the write is rejected
* the publish process sees the rejected write and concludes that someone
else must have written a share at the same time, so it throws
Uncoordinated Write Error
So really the sole publisher is colliding with themselves.
I think the fix would be to have the publisher keep track of which share
requests it has sent, perhaps in the servermap (as "pending writes", or
"proposed writes"). When the second writev request is generated, it should
build a test vector based upon the pending write (so it includes share2).
--
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/540#comment:14>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list