id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	launchpad_bug
474	uncaught exception in mutable-retrieve: UCW between mapupdate and retrieve	warner		"If a mutable file is modified (by some external uncoordinated writer) after
we've done a servermap update but before we do a retrieve, then the exception
raised by mutable.Retrieve._maybe_send_more_queries
(source:src/allmydata/mutable/retrieve.py#L415) is not caught by the Deferred
chain. This means that the caller (who is probably in a filenode.modify
sequence) will be waiting forever for the answer.

The exception that shows up in the twisted log looks like this:
{{{

2008-06-22 00:19:31.726Z [-] Unhandled Error
        Traceback (most recent call last):
          File ""/usr/lib/python2.5/site-packages/foolscap/call.py"", line 667, in _done
            self.request.complete(res)
          File ""/usr/lib/python2.5/site-packages/foolscap/call.py"", line 53, in complete
            self.deferred.callback(res)
          File ""/usr/lib/python2.5/site-packages/twisted/internet/defer.py"", line 239, in callback
            self._startRunCallbacks(result)
          File ""/usr/lib/python2.5/site-packages/twisted/internet/defer.py"", line 304, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File ""/usr/lib/python2.5/site-packages/twisted/internet/defer.py"", line 317, in _runCallbacks
            self.result = callback(self.result, *args, **kw)
          File ""/usr/lib/python2.5/site-packages/allmydata/mutable/retrieve.py"", line 325, in _check_for_done
            return self._maybe_send_more_queries(k)
          File ""/usr/lib/python2.5/site-packages/allmydata/mutable/retrieve.py"", line 415, in _maybe_send_more_queries
            raise err
        allmydata.encode.NotEnoughSharesError: ran out of peers: have 0 shares (k=3), 2 queries in flight, need 1 more, found 0 bad shares, last failure: [Failure instance: Traceback: <class 'allmydata.mutable.common.UncoordinatedWriteError'>: someone wrote to the data since we read the servermap: prefix changed
        /usr/lib/python2.5/site-packages/foolscap/call.py:667:_done
        /usr/lib/python2.5/site-packages/foolscap/call.py:53:complete
        /usr/lib/python2.5/site-packages/twisted/internet/defer.py:239:callback
        /usr/lib/python2.5/site-packages/twisted/internet/defer.py:304:_startRunCallbacks
        --- <exception caught here> ---
        /usr/lib/python2.5/site-packages/twisted/internet/defer.py:317:_runCallbacks
        /usr/lib/python2.5/site-packages/allmydata/mutable/retrieve.py:246:_got_results
        /usr/lib/python2.5/site-packages/allmydata/mutable/retrieve.py:268:_got_results_one_share
        ]
}}}


The Deferred chaining needs to be investigated to make sure that this
exception is properly returned to the caller via the errback on their
Deferred.

In addition, the code in filenode.modify needs to be examined to make sure
that this kind of uncoordinated write error is caught and retried. My concern
is that Retrieve is returning a {{{NotEnoughSharesError}}} that wraps a
{{{UncoordinatedWriteError}}}, rather than the UCWE directly, and that the
f.trap in modify() might not know to look for that.
"	defect	new	major	soon	code-mutable	1.1.0		mutable upload download hang reliability test-needed ucwe