﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	launchpad_bug
1768	intermittent test_system failure	warner		"The windows buildslave reported a failure in test_system today, in
[changeset:b2dcbbb6], which went away upon a rebuild.

https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/FreeStorm%20WinXP-x86%20py2.6/builds/103

Relevant console output:
{{{
allmydata.test.test_system.SystemTest.test_mutable ... [OK]
(4.703 secs)
allmydata.test.test_system.SystemTest.test_upload_and_download_convergent ... [<twisted.python.failure.Failure <class 'foolscap.ipb.DeadReferenceError'>>]
Traceback (most recent call last):
  File ""c:\python26\lib\site-packages\twisted\internet\defer.py"", line 134, in maybeDeferred
    result = f(*args, **kw)
  File ""c:\buildbot_tahoe\freestorm_winxp-x86_py2_6\build\src\allmydata\util\pollmixin.py"", line 49, in _poll
    self.fail(""Errors snooped, terminating early"")
twisted.trial.unittest.FailTest: Errors snooped, terminating early
[FAILURE]Traceback (most recent call last):
Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean.
DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to debug)
<DelayedCall 0x494acb0 [0.875s] called=0 cancelled=0 Reconnector._timer_expired()>
[ERROR]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a new connection
[ERROR]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload)
[ERROR]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version)
[ERROR]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
[ERROR]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
[ERROR]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
[ERROR]
(6.500 secs)
allmydata.test.test_system.SystemTest.test_upload_and_download_random_key ... [OK]
(10.938 secs)

...

===============================================================================
[FAIL]
Traceback (most recent call last):
  File ""c:\python26\lib\site-packages\twisted\internet\defer.py"", line 134, in maybeDeferred
    result = f(*args, **kw)
  File ""c:\buildbot_tahoe\freestorm_winxp-x86_py2_6\build\src\allmydata\util\pollmixin.py"", line 49, in _poll
    self.fail(""Errors snooped, terminating early"")
twisted.trial.unittest.FailTest: Errors snooped, terminating early

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean.
DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to debug)
<DelayedCall 0x494acb0 [0.875s] called=0 cancelled=0 Reconnector._timer_expired()>

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a new connection

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload)

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version)

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
-------------------------------------------------------------------------------
}}}

and corresponding test.log contents:

{{{
2012-06-14 19:38:27.801Z [-] (UDP Port 1279 Closed)
2012-06-14 19:38:27.801Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x035D34E0>
2012-06-14 19:38:28.410Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a new connection
	
2012-06-14 19:38:28.441Z [-] Unhandled error in Deferred:
2012-06-14 19:38:28.441Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload)
	
2012-06-14 19:38:28.504Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version)
	
2012-06-14 19:38:28.504Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
2012-06-14 19:38:28.504Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
2012-06-14 19:38:28.519Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
2012-06-14 19:38:28.519Z [-] (TCP Port 1274 Closed)
2012-06-14 19:38:28.519Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x03DAB8F0>
2012-06-14 19:38:28.519Z [-] (TCP Port 1190 Closed)
}}}

The ""Unhandled Error"" (as opposed to the usual ""Unhandled Error in
Deferred"") means that something called twisted's {{{log.err()}}}
with a Failure or Exception, but nothing else (in particular not a
why= argument). This is a pretty common (perhaps unfortunate)
pattern in our code: tahoe has roughly 47 such calls, and Foolscap
has about 29. So we don't really know what caught the exception
(and therefore what ought to be handling it differently). The
exception itself was a !DeadReferenceError, caused by a connection
being replaced while a remote call was outstanding. And most of the
remote calls were {{{get_version}}}, which we do immediately after
a getReference or connectTo completes (with one extra
{{{RIUploadHelper upload()}}} call). The Dirty Reactor Error
indicates that something failed to shut down a Tub in the error
path.

I'm not really sure what's going on here. It feels like the client
is being bounced while it's in the middle of making some
connections, as if we're not waiting for the client to finish
starting up before we shut it down again. It also feels like there
are some {{{log.err()}}} calls that should be replaced with the
less-serious {{{log.msg()}}}, since they don't really justify
flunking the tests. (Any log.err will flunk a test, unless the test
specifically checks for and discards them).

Throwing a log.err at the end of a Deferred chain when you don't
really care about it anymore is pretty convenient and safe (you
won't be losing information about unexpected errors). But it's
still a sort of ""XXX this point is never reached"" sort of marker.
Having them get invoked during tests might suggest that control is
actually reaching those points.
"	defect	new	normal	undecided	code	1.9.1		test windows heisenbug		
