#1278 new defect

gateway won't serve any page; variety of interesting error messages in twistd.log — at Version 6

Reported by: zooko Owned by: somebody
Priority: major Milestone: soon
Component: code Version: 1.8.0
Keywords: reliability availability error hang heisenbug wui anti-censorship test-needed Cc:
Launchpad Bug:

Description (last modified by zooko)

Some servers on the public test grid are misbehaving -- I can tell because updating my blog takes about ten minutes sometimes. Having updated my blog a few times this morning, my gateway got into a broken state where it will not even load the recent uploads and downloads page...

Oh, wait, now it works again. Hm, I will attach twistd.log, and here are a few of the interesting messages:

2010-12-08 22:48:44-0800 [HTTPChannel,8,127.0.0.1] Unhandled Error
        Traceback (most recent call last):
        Failure: allmydata.immutable.downloader.common.DownloadStopped: our Consumer called stopProducing()
        
2010-12-10 10:21:11-0800 [-] Unhandled Error
        Traceback (most recent call last):
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 546, in _got_response
            return self._loop()
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 435, in _loop
            self.contacted_peers2)
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 320, in addBoth
            callbackKeywords=kw, errbackKeywords=kw)
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 286, in addCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 542, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 546, in _got_response
            return self._loop()
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 458, in _loop
            return self._failed(msg)
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 561, in _failed
            raise UploadUnhappinessError(msg)
        allmydata.interfaces.UploadUnhappinessError: peer selection failed for <Tahoe2PeerSelector for upload kcaaq>: shares could be placed on only 4 server(s) such that any 3 of them have enough share
s to recover the file, but we were asked to place shares on at least 5 such servers. (placed 9 shares out of 10 total (1 homeless), want to place shares on at least 5 servers such that any 3 of them hav
e enough shares to recover the file, sent 13 queries to 11 peers, 2 queries placed some shares, 11 placed none (of which 4 placed none due to the server being full and 7 placed none due to an error)) (l
ast failure (from <PeerTracker for peer nszizgf5 and SI kcaaq>) was: [Failure instance: Traceback: <class 'foolscap.ipb.DeadReferenceError'>: Calling Stale Broker
        /Library/Python/2.6/site-packages/allmydata/immutable/upload.py:546:_got_response
        /Library/Python/2.6/site-packages/allmydata/immutable/upload.py:433:_loop
        /Library/Python/2.6/site-packages/allmydata/immutable/upload.py:119:query
        /Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/referenceable.py:408:callRemote
        --- <exception caught here> ---
        /Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py:133:maybeDeferred
        /Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/referenceable.py:447:_callRemote
        /Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/broker.py:468:newRequestID
        ])
        
2010-12-10 10:21:12-0800 [-] Exception rendering error page:
2010-12-10 10:21:12-0800 [-] Unhandled Error
        Traceback (most recent call last):
          File "/Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/call.py", line 95, in fail
            self.deferred.errback(why)
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 388, in errback
            self._startRunCallbacks(fail)
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 455, in _startRunCallbacks
            self._runCallbacks()
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 542, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
        --- <exception caught here> ---
          File "/Library/Python/2.6/site-packages/Nevow-0.10.0-py2.6.egg/nevow/appserver.py", line 82, in processingFailed
            handler.renderHTTP_exception(ctx, reason)
          File "/Library/Python/2.6/site-packages/allmydata/web/common.py", line 278, in renderHTTP_exception
            return super.renderHTTP_exception(self, ctx, f)
          File "/Library/Python/2.6/site-packages/Nevow-0.10.0-py2.6.egg/nevow/appserver.py", line 61, in renderHTTP_exception
            request.finishRequest( False )
          File "/Library/Python/2.6/site-packages/Nevow-0.10.0-py2.6.egg/nevow/appserver.py", line 176, in finishRequest
            server.Request.finish(self)
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/web/http.py", line 900, in finish
            "Request.finish called on a request after its connection was lost; "
        exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this.
        
2010-12-10 10:21:12-0800 [-] 'Original exception:'
2010-12-10 11:26:14-0800 [-] Unhandled Error
        Traceback (most recent call last):
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 286, in addCallbacks
            self._runCallbacks()
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 542, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 694, in <lambda>
            self._read_encrypted(length, ciphertext, hash_only, d2))
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 709, in _read_encrypted
            d = defer.maybeDeferred(self.original.read, size)
        --- <exception caught here> ---
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 133, in maybeDeferred
            result = f(*args, **kw)
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 1362, in read
            return defer.succeed([self._filehandle.read(length)])
        exceptions.ValueError: I/O operation on closed file
        

There is one flog incident file. I'll attach it.

Change History (8)

Changed at 2010-12-10T20:49:55Z by zooko

Changed at 2010-12-10T20:51:02Z by zooko

comment:1 Changed at 2010-12-16T01:11:02Z by davidsarah

  • Keywords anti-censorship added

comment:2 Changed at 2011-02-21T06:00:09Z by davidsarah

  • Keywords test-needed added
  • Milestone changed from undecided to 1.9.0

Possibly fixed by the patch on #1366. However, to close this ticket we should have a reproducible test.

Last edited at 2011-02-21T06:00:37Z by davidsarah (previous) (diff)

comment:3 Changed at 2011-02-21T06:14:31Z by warner

Hm, the #1366 patch would probably address the second weird message. But I don't think it could have been causing the web server to be unavailable forever. So yeah, let's leave this ticket open a bit longer until we understand what was going on.

comment:4 Changed at 2011-10-13T17:05:50Z by warner

  • Milestone changed from 1.9.0 to 1.10.0

not making it into 1.9

comment:5 Changed at 2012-01-22T00:17:27Z by nejucomo

I've seen similar behavior with respect to the second exception, but suspect my symptoms are distinct from #1366, so I filed #1664.

I have not seen the UploadUnhappinessError. I am using an LAE account where there N = K = 1.

comment:6 Changed at 2014-11-28T15:50:09Z by zooko

  • Description modified (diff)

I have a consulting client (codename "WAG") who is reporting an issue with very similar symptoms. They say that they have hundreds of thousands of people inconvenienced by this failure, so it is urgent for them to figure out what happened.

Note: See TracTickets for help on using tickets.