Opened at 2010-12-10T20:48:02Z
Last modified at 2014-12-04T15:04:33Z
#1278 new defect
gateway won't serve any page; variety of interesting error messages in twistd.log — at Version 6
Reported by: | zooko | Owned by: | somebody |
---|---|---|---|
Priority: | major | Milestone: | soon |
Component: | code | Version: | 1.8.0 |
Keywords: | reliability availability error hang heisenbug wui anti-censorship test-needed | Cc: | |
Launchpad Bug: |
Description (last modified by zooko)
Some servers on the public test grid are misbehaving -- I can tell because updating my blog takes about ten minutes sometimes. Having updated my blog a few times this morning, my gateway got into a broken state where it will not even load the recent uploads and downloads page...
Oh, wait, now it works again. Hm, I will attach twistd.log, and here are a few of the interesting messages:
2010-12-08 22:48:44-0800 [HTTPChannel,8,127.0.0.1] Unhandled Error Traceback (most recent call last): Failure: allmydata.immutable.downloader.common.DownloadStopped: our Consumer called stopProducing() 2010-12-10 10:21:11-0800 [-] Unhandled Error Traceback (most recent call last): File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 546, in _got_response return self._loop() File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 435, in _loop self.contacted_peers2) File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 320, in addBoth callbackKeywords=kw, errbackKeywords=kw) File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 286, in addCallbacks self._runCallbacks() --- <exception caught here> --- File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 546, in _got_response return self._loop() File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 458, in _loop return self._failed(msg) File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 561, in _failed raise UploadUnhappinessError(msg) allmydata.interfaces.UploadUnhappinessError: peer selection failed for <Tahoe2PeerSelector for upload kcaaq>: shares could be placed on only 4 server(s) such that any 3 of them have enough share s to recover the file, but we were asked to place shares on at least 5 such servers. (placed 9 shares out of 10 total (1 homeless), want to place shares on at least 5 servers such that any 3 of them hav e enough shares to recover the file, sent 13 queries to 11 peers, 2 queries placed some shares, 11 placed none (of which 4 placed none due to the server being full and 7 placed none due to an error)) (l ast failure (from <PeerTracker for peer nszizgf5 and SI kcaaq>) was: [Failure instance: Traceback: <class 'foolscap.ipb.DeadReferenceError'>: Calling Stale Broker /Library/Python/2.6/site-packages/allmydata/immutable/upload.py:546:_got_response /Library/Python/2.6/site-packages/allmydata/immutable/upload.py:433:_loop /Library/Python/2.6/site-packages/allmydata/immutable/upload.py:119:query /Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/referenceable.py:408:callRemote --- <exception caught here> --- /Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py:133:maybeDeferred /Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/referenceable.py:447:_callRemote /Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/broker.py:468:newRequestID ]) 2010-12-10 10:21:12-0800 [-] Exception rendering error page: 2010-12-10 10:21:12-0800 [-] Unhandled Error Traceback (most recent call last): File "/Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/call.py", line 95, in fail self.deferred.errback(why) File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 388, in errback self._startRunCallbacks(fail) File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 455, in _startRunCallbacks self._runCallbacks() File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) --- <exception caught here> --- File "/Library/Python/2.6/site-packages/Nevow-0.10.0-py2.6.egg/nevow/appserver.py", line 82, in processingFailed handler.renderHTTP_exception(ctx, reason) File "/Library/Python/2.6/site-packages/allmydata/web/common.py", line 278, in renderHTTP_exception return super.renderHTTP_exception(self, ctx, f) File "/Library/Python/2.6/site-packages/Nevow-0.10.0-py2.6.egg/nevow/appserver.py", line 61, in renderHTTP_exception request.finishRequest( False ) File "/Library/Python/2.6/site-packages/Nevow-0.10.0-py2.6.egg/nevow/appserver.py", line 176, in finishRequest server.Request.finish(self) File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/web/http.py", line 900, in finish "Request.finish called on a request after its connection was lost; " exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this. 2010-12-10 10:21:12-0800 [-] 'Original exception:'
2010-12-10 11:26:14-0800 [-] Unhandled Error Traceback (most recent call last): File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 286, in addCallbacks self._runCallbacks() File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 694, in <lambda> self._read_encrypted(length, ciphertext, hash_only, d2)) File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 709, in _read_encrypted d = defer.maybeDeferred(self.original.read, size) --- <exception caught here> --- File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 133, in maybeDeferred result = f(*args, **kw) File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 1362, in read return defer.succeed([self._filehandle.read(length)]) exceptions.ValueError: I/O operation on closed file
There is one flog incident file. I'll attach it.
Change History (8)
Changed at 2010-12-10T20:49:55Z by zooko
Changed at 2010-12-10T20:51:02Z by zooko
comment:1 Changed at 2010-12-16T01:11:02Z by davidsarah
- Keywords anti-censorship added
comment:2 Changed at 2011-02-21T06:00:09Z by davidsarah
- Keywords test-needed added
- Milestone changed from undecided to 1.9.0
comment:3 Changed at 2011-02-21T06:14:31Z by warner
Hm, the #1366 patch would probably address the second weird message. But I don't think it could have been causing the web server to be unavailable forever. So yeah, let's leave this ticket open a bit longer until we understand what was going on.
comment:4 Changed at 2011-10-13T17:05:50Z by warner
- Milestone changed from 1.9.0 to 1.10.0
not making it into 1.9
comment:5 Changed at 2012-01-22T00:17:27Z by nejucomo
comment:6 Changed at 2014-11-28T15:50:09Z by zooko
- Description modified (diff)
I have a consulting client (codename "WAG") who is reporting an issue with very similar symptoms. They say that they have hundreds of thousands of people inconvenienced by this failure, so it is urgent for them to figure out what happened.
Possibly fixed by the patch on #1366. However, to close this ticket we should have a reproducible test.