[tahoe-lafs-trac-stream] [tahoe-lafs] #1768: intermittent test_system failure

tahoe-lafs trac at tahoe-lafs.org
Tue Mar 19 23:54:21 UTC 2013


#1768: intermittent test_system failure
------------------------+-----------------------
     Reporter:  warner  |      Owner:
         Type:  defect  |     Status:  new
     Priority:  normal  |  Milestone:  undecided
    Component:  code    |    Version:  1.9.1
   Resolution:          |   Keywords:  test
Launchpad Bug:          |
------------------------+-----------------------

Old description:

> The windows buildslave reported a failure in test_system today, in
> [b2dcbbb6/git], which went away upon a rebuild.
>
> https://tahoe-lafs.org/buildbot-tahoe-
> lafs/builders/FreeStorm%20WinXP-x86%20py2.6/builds/103
>
> Relevant console output:
> {{{
> allmydata.test.test_system.SystemTest.test_mutable ... [OK]
> (4.703 secs)
> allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
> ... [<twisted.python.failure.Failure <class
> 'foolscap.ipb.DeadReferenceError'>>]
> Traceback (most recent call last):
>   File "c:\python26\lib\site-packages\twisted\internet\defer.py", line
> 134, in maybeDeferred
>     result = f(*args, **kw)
>   File "c:\buildbot_tahoe\freestorm_winxp-
> x86_py2_6\build\src\allmydata\util\pollmixin.py", line 49, in _poll
>     self.fail("Errors snooped, terminating early")
> twisted.trial.unittest.FailTest: Errors snooped, terminating early
> [FAILURE]Traceback (most recent call last):
> Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was
> unclean.
> DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to
> debug)
> <DelayedCall 0x494acb0 [0.875s] called=0 cancelled=0
> Reconnector._timer_expired()>
> [ERROR]Traceback (most recent call last):
> Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a new
> connection
> [ERROR]Traceback (most recent call last):
> Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
> tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload)
> [ERROR]Traceback (most recent call last):
> Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
> tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version)
> [ERROR]Traceback (most recent call last):
> Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
> tubid=ncie) (during
> method=RIStorageServer.tahoe.allmydata.com:get_version)
> [ERROR]Traceback (most recent call last):
> Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
> tubid=ncie) (during
> method=RIStorageServer.tahoe.allmydata.com:get_version)
> [ERROR]Traceback (most recent call last):
> Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
> tubid=ncie) (during
> method=RIStorageServer.tahoe.allmydata.com:get_version)
> [ERROR]
> (6.500 secs)
> allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
> ... [OK]
> (10.938 secs)
>
> ...
>
> ===============================================================================
> [FAIL]
> Traceback (most recent call last):
>   File "c:\python26\lib\site-packages\twisted\internet\defer.py", line
> 134, in maybeDeferred
>     result = f(*args, **kw)
>   File "c:\buildbot_tahoe\freestorm_winxp-
> x86_py2_6\build\src\allmydata\util\pollmixin.py", line 49, in _poll
>     self.fail("Errors snooped, terminating early")
> twisted.trial.unittest.FailTest: Errors snooped, terminating early
>
> allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
> ===============================================================================
> [ERROR]
> Traceback (most recent call last):
> Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was
> unclean.
> DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to
> debug)
> <DelayedCall 0x494acb0 [0.875s] called=0 cancelled=0
> Reconnector._timer_expired()>
>
> allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
> ===============================================================================
> [ERROR]
> Traceback (most recent call last):
> Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a new
> connection
>
> allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
> ===============================================================================
> [ERROR]
> Traceback (most recent call last):
> Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
> tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload)
>
> allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
> ===============================================================================
> [ERROR]
> Traceback (most recent call last):
> Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
> tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version)
>
> allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
> ===============================================================================
> [ERROR]
> Traceback (most recent call last):
> Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
> tubid=ncie) (during
> method=RIStorageServer.tahoe.allmydata.com:get_version)
>
> allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
> allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
> allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
> -------------------------------------------------------------------------------
> }}}
>
> and corresponding test.log contents:
>
> {{{
> 2012-06-14 19:38:27.801Z [-] (UDP Port 1279 Closed)
> 2012-06-14 19:38:27.801Z [-] Stopping protocol
> <twisted.internet.protocol.DatagramProtocol instance at 0x035D34E0>
> 2012-06-14 19:38:28.410Z [-] Unhandled Error
>         Traceback (most recent call last):
>         Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a
> new connection
>
> 2012-06-14 19:38:28.441Z [-] Unhandled error in Deferred:
> 2012-06-14 19:38:28.441Z [-] Unhandled Error
>         Traceback (most recent call last):
>         Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
> tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload)
>
> 2012-06-14 19:38:28.504Z [-] Unhandled Error
>         Traceback (most recent call last):
>         Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
> tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version)
>
> 2012-06-14 19:38:28.504Z [-] Unhandled Error
>         Traceback (most recent call last):
>         Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
> tubid=ncie) (during
> method=RIStorageServer.tahoe.allmydata.com:get_version)
>
> 2012-06-14 19:38:28.504Z [-] Unhandled Error
>         Traceback (most recent call last):
>         Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
> tubid=ncie) (during
> method=RIStorageServer.tahoe.allmydata.com:get_version)
>
> 2012-06-14 19:38:28.519Z [-] Unhandled Error
>         Traceback (most recent call last):
>         Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
> tubid=ncie) (during
> method=RIStorageServer.tahoe.allmydata.com:get_version)
>
> 2012-06-14 19:38:28.519Z [-] (TCP Port 1274 Closed)
> 2012-06-14 19:38:28.519Z [-] Stopping factory <nevow.appserver.NevowSite
> instance at 0x03DAB8F0>
> 2012-06-14 19:38:28.519Z [-] (TCP Port 1190 Closed)
> }}}
>
> The "Unhandled Error" (as opposed to the usual "Unhandled Error in
> Deferred") means that something called twisted's {{{log.err()}}}
> with a Failure or Exception, but nothing else (in particular not a
> why= argument). This is a pretty common (perhaps unfortunate)
> pattern in our code: tahoe has roughly 47 such calls, and Foolscap
> has about 29. So we don't really know what caught the exception
> (and therefore what ought to be handling it differently). The
> exception itself was a !DeadReferenceError, caused by a connection
> being replaced while a remote call was outstanding. And most of the
> remote calls were {{{get_version}}}, which we do immediately after
> a getReference or connectTo completes (with one extra
> {{{RIUploadHelper upload()}}} call). The Dirty Reactor Error
> indicates that something failed to shut down a Tub in the error
> path.
>
> I'm not really sure what's going on here. It feels like the client
> is being bounced while it's in the middle of making some
> connections, as if we're not waiting for the client to finish
> starting up before we shut it down again. It also feels like there
> are some {{{log.err()}}} calls that should be replaced with the
> less-serious {{{log.msg()}}}, since they don't really justify
> flunking the tests. (Any log.err will flunk a test, unless the test
> specifically checks for and discards them).
>
> Throwing a log.err at the end of a Deferred chain when you don't
> really care about it anymore is pretty convenient and safe (you
> won't be losing information about unexpected errors). But it's
> still a sort of "XXX this point is never reached" sort of marker.
> Having them get invoked during tests might suggest that control is
> actually reaching those points.

New description:

 The windows buildslave reported a failure in test_system today, in
 [b2dcbbb6/git], which went away upon a rebuild.

 https://tahoe-lafs.org/buildbot-tahoe-
 lafs/builders/FreeStorm%20WinXP-x86%20py2.6/builds/103

 Relevant console output:
 {{{
 allmydata.test.test_system.SystemTest.test_mutable ... [OK]
 (4.703 secs)
 allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
 ... [<twisted.python.failure.Failure <class
 'foolscap.ipb.DeadReferenceError'>>]
 Traceback (most recent call last):
   File "c:\python26\lib\site-packages\twisted\internet\defer.py", line
 134, in maybeDeferred
     result = f(*args, **kw)
   File "c:\buildbot_tahoe\freestorm_winxp-
 x86_py2_6\build\src\allmydata\util\pollmixin.py", line 49, in _poll
     self.fail("Errors snooped, terminating early")
 twisted.trial.unittest.FailTest: Errors snooped, terminating early
 [FAILURE]Traceback (most recent call last):
 Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was
 unclean.
 DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to
 debug)
 <DelayedCall 0x494acb0 [0.875s] called=0 cancelled=0
 Reconnector._timer_expired()>
 [ERROR]Traceback (most recent call last):
 Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a new
 connection
 [ERROR]Traceback (most recent call last):
 Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload)
 [ERROR]Traceback (most recent call last):
 Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version)
 [ERROR]Traceback (most recent call last):
 Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during
 method=RIStorageServer.tahoe.allmydata.com:get_version)
 [ERROR]Traceback (most recent call last):
 Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during
 method=RIStorageServer.tahoe.allmydata.com:get_version)
 [ERROR]Traceback (most recent call last):
 Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during
 method=RIStorageServer.tahoe.allmydata.com:get_version)
 [ERROR]
 (6.500 secs)
 allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
 ... [OK]
 (10.938 secs)

 ...

 ===============================================================================
 [FAIL]
 Traceback (most recent call last):
   File "c:\python26\lib\site-packages\twisted\internet\defer.py", line
 134, in maybeDeferred
     result = f(*args, **kw)
   File "c:\buildbot_tahoe\freestorm_winxp-
 x86_py2_6\build\src\allmydata\util\pollmixin.py", line 49, in _poll
     self.fail("Errors snooped, terminating early")
 twisted.trial.unittest.FailTest: Errors snooped, terminating early

 allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
 Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was
 unclean.
 DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to
 debug)
 <DelayedCall 0x494acb0 [0.875s] called=0 cancelled=0
 Reconnector._timer_expired()>

 allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
 Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a new
 connection

 allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
 Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload)

 allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
 Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version)

 allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
 Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during
 method=RIStorageServer.tahoe.allmydata.com:get_version)

 allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
 allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
 allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
 -------------------------------------------------------------------------------
 }}}

 and corresponding test.log contents:

 {{{
 2012-06-14 19:38:27.801Z [-] (UDP Port 1279 Closed)
 2012-06-14 19:38:27.801Z [-] Stopping protocol
 <twisted.internet.protocol.DatagramProtocol instance at 0x035D34E0>
 2012-06-14 19:38:28.410Z [-] Unhandled Error
         Traceback (most recent call last):
         Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a new
 connection

 2012-06-14 19:38:28.441Z [-] Unhandled error in Deferred:
 2012-06-14 19:38:28.441Z [-] Unhandled Error
         Traceback (most recent call last):
         Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload)

 2012-06-14 19:38:28.504Z [-] Unhandled Error
         Traceback (most recent call last):
         Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version)

 2012-06-14 19:38:28.504Z [-] Unhandled Error
         Traceback (most recent call last):
         Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during
 method=RIStorageServer.tahoe.allmydata.com:get_version)

 2012-06-14 19:38:28.504Z [-] Unhandled Error
         Traceback (most recent call last):
         Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during
 method=RIStorageServer.tahoe.allmydata.com:get_version)

 2012-06-14 19:38:28.519Z [-] Unhandled Error
         Traceback (most recent call last):
         Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during
 method=RIStorageServer.tahoe.allmydata.com:get_version)

 2012-06-14 19:38:28.519Z [-] (TCP Port 1274 Closed)
 2012-06-14 19:38:28.519Z [-] Stopping factory <nevow.appserver.NevowSite
 instance at 0x03DAB8F0>
 2012-06-14 19:38:28.519Z [-] (TCP Port 1190 Closed)
 }}}

 The "Unhandled Error" (as opposed to the usual "Unhandled Error in
 Deferred") means that something called twisted's {{{log.err()}}}
 with a Failure or Exception, but nothing else (in particular not a
 why= argument). This is a pretty common (perhaps unfortunate)
 pattern in our code: tahoe has roughly 47 such calls, and Foolscap
 has about 29. So we don't really know what caught the exception
 (and therefore what ought to be handling it differently). The
 exception itself was a !DeadReferenceError, caused by a connection
 being replaced while a remote call was outstanding. And most of the
 remote calls were {{{get_version}}}, which we do immediately after
 a getReference or connectTo completes (with one extra
 {{{RIUploadHelper upload()}}} call). The Dirty Reactor Error
 indicates that something failed to shut down a Tub in the error
 path.

 I'm not really sure what's going on here. It feels like the client
 is being bounced while it's in the middle of making some
 connections, as if we're not waiting for the client to finish
 starting up before we shut it down again. It also feels like there
 are some {{{log.err()}}} calls that should be replaced with the
 less-serious {{{log.msg()}}}, since they don't really justify
 flunking the tests. (Any log.err will flunk a test, unless the test
 specifically checks for and discards them).

 Throwing a log.err at the end of a Deferred chain when you don't
 really care about it anymore is pretty convenient and safe (you
 won't be losing information about unexpected errors). But it's
 still a sort of "XXX this point is never reached" sort of marker.
 Having them get invoked during tests might suggest that control is
 actually reaching those points.

--

Comment (by davidsarah):

 Another probably instance of this error on the same buildslave (build
 170):
 {{{
 [FAIL]
 Traceback (most recent call last):
   File "c:\python26\lib\site-packages\twisted\internet\defer.py", line
 134, in maybeDeferred
     result = f(*args, **kw)
   File "c:\buildbot_tahoe\freestorm_winxp-
 x86_py2_6\build\src\allmydata\util\pollmixin.py", line 49, in _poll
     self.fail("Errors snooped, terminating early")
 twisted.trial.unittest.FailTest: Errors snooped, terminating early

 allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
   File "c:\python26\lib\site-packages\twisted\internet\defer.py", line
 134, in maybeDeferred
     result = f(*args, **kw)
   File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib
 \site-packages\foolscap-0.6.4-py2.6.egg\foolscap\referenceable.py", line
 455, in _callRemote
     reqID = broker.newRequestID()
   File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib
 \site-packages\foolscap-0.6.4-py2.6.egg\foolscap\broker.py", line 477, in
 newRequestID
     raise DeadReferenceError("Calling Stale Broker")
 foolscap.ipb.DeadReferenceError: Calling Stale Broker

 allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
 Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload)

 allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
 Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=iwlr) (during
 method=RIStorageServer.tahoe.allmydata.com:get_version)

 allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
 Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=z3g3) (during
 method=RIStorageServer.tahoe.allmydata.com:get_version)

 allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
 Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=ncie) (during
 method=RIStorageServer.tahoe.allmydata.com:get_version)

 allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
   File "c:\python26\lib\site-packages\twisted\python\log.py", line 84, in
 callWithLogger
     return callWithContext({"system": lp}, func, *args, **kw)
   File "c:\python26\lib\site-packages\twisted\python\log.py", line 69, in
 callWithContext
     return context.call({ILogContext: newCtx}, func, *args, **kw)
   File "c:\python26\lib\site-packages\twisted\python\context.py", line
 118, in callWithContext
     return self.currentContext().callWithContext(ctx, func, *args, **kw)
   File "c:\python26\lib\site-packages\twisted\python\context.py", line 81,
 in callWithContext
     return func(*args,**kw)
   File "c:\python26\lib\site-packages\twisted\internet\selectreactor.py",
 line 155, in _doReadOrWrite
     self._disconnectSelectable(selectable, why, method=="doRead")
   File "c:\python26\lib\site-packages\twisted\internet\posixbase.py", line
 263, in _disconnectSelectable
     selectable.connectionLost(f)
   File "c:\python26\lib\site-packages\twisted\internet\tcp.py", line 433,
 in connectionLost
     Connection.connectionLost(self, reason)
   File "c:\python26\lib\site-packages\twisted\internet\tcp.py", line 277,
 in connectionLost
     protocol.connectionLost(reason)
   File "c:\python26\lib\site-packages\twisted\protocols\tls.py", line 460,
 in connectionLost
     ProtocolWrapper.connectionLost(self, reason)
   File "c:\python26\lib\site-packages\twisted\protocols\policies.py", line
 123, in connectionLost
     self.wrappedProtocol.connectionLost(reason)
   File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib
 \site-packages\foolscap-0.6.4-py2.6.egg\foolscap\broker.py", line 253, in
 connectionLost
     self.tub.connectionDropped(self)
   File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib
 \site-packages\foolscap-0.6.4-py2.6.egg\foolscap\pb.py", line 1070, in
 connectionDropped
     self._allConnectionsAreDisconnected.fire(self)
   File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib
 \site-packages\foolscap-0.6.4-py2.6.egg\foolscap\observer.py", line 42, in
 fire
     assert not self._fired
 exceptions.AssertionError:

 allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
 allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
 -------------------------------------------------------------------------------
 }}}

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1768#comment:1>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list