id summary reporter owner description type status priority milestone component version resolution keywords cc launchpad_bug 540 "inappropriate ""uncoordinated write error"" after handling a server failure" warner kevan "I noticed the automated ""speedtest"" failing with an unexpected Uncoordinated Write Error for the past few days. There were several issues involved, but the one for this ticket is as follows: * mutable publish assigns shares to servers, sends out requests. Let's say that share 1 goes to server A, and share 2 goes to server B. * for whatever reason, server A returns an error * the publish process must find a new server for share 1, say it picks B * the publish process sends a readv-and-testv-and-writev for share 1 to server B * '''but''', it uses the same test vector that it used for the first request (the one that wrote share 2), which includes a clause that says ""the server should not have any unknown shares"". This probably only hits when we're first creating the mutable file. * server B receives the request for share 2, and accepts it, and responds with success * server B then receives the request for share 1, looks at the test vector, says ""hey, but I already have a share (i.e. share 2)"", so the test vector does not match, so the write is rejected * the publish process sees the rejected write and concludes that someone else must have written a share at the same time, so it throws Uncoordinated Write Error So really the sole publisher is colliding with themselves. I think the fix would be to have the publisher keep track of which share requests it has sent, perhaps in the servermap (as ""pending writes"", or ""proposed writes""). When the second writev request is generated, it should build a test vector based upon the pending write (so it includes share2). " defect new normal soon code-mutable 1.2.0 availability upload ucwe test-needed