[tahoe-lafs-trac-stream] [tahoe-lafs] #2017: non-deterministic test hang on OpenBSD

tahoe-lafs trac at tahoe-lafs.org
Wed Aug 28 15:51:58 UTC 2013


#2017: non-deterministic test hang on OpenBSD
------------------------+------------------------------------------------
     Reporter:  zooko   |      Owner:  sickness
         Type:  defect  |     Status:  new
     Priority:  normal  |  Milestone:  soon
    Component:  code    |    Version:  1.10.0
   Resolution:          |   Keywords:  iputil heisenbug openbsd test hang
Launchpad Bug:          |
------------------------+------------------------------------------------
Description changed by daira:

Old description:

> sickness's OpenBSD buildslave showed a test timeout:
>
> {{{
> ===============================================================================
> [ERROR]
> Traceback (most recent call last):
> Failure: twisted.internet.defer.TimeoutError:
> <allmydata.test.test_runner.RunNode testMethod=test_client_no_noise>
> (test_client_no_noise) still running at 240.0 secs
>
> allmydata.test.test_runner.RunNode.test_client_no_noise
> ===============================================================================
> [ERROR]
> Traceback (most recent call last):
> Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was
> unclean.
> DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to
> debug)
> <DelayedCall 0x816eb82c [0.00169348716736s] called=0 cancelled=0
> LoopingCall<0.01>(RunNode._poll, *(<function _node_has_started at
> 0x7ff29ed4>, 1373030506.664452), **{})()>
>
> allmydata.test.test_runner.RunNode.test_client_no_noise
> -------------------------------------------------------------------------------
> Ran 1139 tests in 1784.336s
>
> FAILED (skips=15, expectedFailures=3, errors=2, successes=1120)
> }}}
>
> (from https://tahoe-lafs.org/buildbot-tahoe-
> lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/27)
>
> Rerunning the tests with the exact same build (using Buildbot's "force
> rebuild" feature) resulted in success:
>
> https://tahoe-lafs.org/buildbot-tahoe-
> lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28
>
> In that run (build number 28), those tests took only a few seconds:
>
> {{{
>  19.917 seconds: allmydata.test.test_runner.RunNode.test_client
> }}}
> {{{
>  13.758 seconds: allmydata.test.test_runner.RunNode.test_client_no_noise
> }}}
>
> (from https://tahoe-lafs.org/buildbot-tahoe-
> lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28/steps/test/logs/timings)
>
> So there is a non-deterministic bug that exhibits on sickness's
> buildslave which causes those two tests to hang.
>
> Questions:
>
> 1. Does this happen on any other buildslaves?
>
> 2. Did this ever happen before the recent patches which changed the
> behavior of iputil — [b0883807361830c609dff1677c3cb34fd64d3ebb],
> [f97b8e5e1df75284aa9b89dd830f8728040eab67],
> [08590b1f6a880d51751fdcacea6a007ebc568f2e],
> [16b245563db2f6ca71b9332b06debbe3e1d734b4],
> [b31a4f6e870cb56efa40c785a868a944b964e8b9],
> [a493ee0bb641175ecf918e28fce4d25df15994b6],
> [6104950ed8a7a356eed2218f2df958d074022eea],
> [f77ec470d75f4b8fb81b1abca4ee3b73f1ad8b22],
> [8e31d66cd0b0821ccaa2c7c259e7d6f262ad4738],
> [6a445d73bc5253ec4ae0dec70af02e33bc869cf6]?
>
> ~~I suspect those iputil patches of causing this hang.~~
>
> sickness: could you please run the unit tests from the current trunk
> version repeatedly with trial's {{{--until-failure}}} option?
> {{{./bin/tahoe debug trial --until-failure allmydata.test}}} (See
> [wiki:HowToWriteTests] for more options.) If you can reliably reproduce
> the problem, then would you use git to rewind to before those patches and
> see if that makes the problem go away? Thanks!

New description:

 sickness's OpenBSD buildslave showed a test timeout:

 {{{
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
 Failure: twisted.internet.defer.TimeoutError:
 <allmydata.test.test_runner.RunNode testMethod=test_client_no_noise>
 (test_client_no_noise) still running at 240.0 secs

 allmydata.test.test_runner.RunNode.test_client_no_noise
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
 Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was
 unclean.
 DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to
 debug)
 <DelayedCall 0x816eb82c [0.00169348716736s] called=0 cancelled=0
 LoopingCall<0.01>(RunNode._poll, *(<function _node_has_started at
 0x7ff29ed4>, 1373030506.664452), **{})()>

 allmydata.test.test_runner.RunNode.test_client_no_noise
 -------------------------------------------------------------------------------
 Ran 1139 tests in 1784.336s

 FAILED (skips=15, expectedFailures=3, errors=2, successes=1120)
 }}}

 (from https://tahoe-lafs.org/buildbot-tahoe-
 lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/27)

 Rerunning the tests with the exact same build (using Buildbot's "force
 rebuild" feature) resulted in success:

 https://tahoe-lafs.org/buildbot-tahoe-
 lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28

 In that run (build number 28), those tests took only a few seconds:

 {{{
  19.917 seconds: allmydata.test.test_runner.RunNode.test_client
 }}}
 {{{
  13.758 seconds: allmydata.test.test_runner.RunNode.test_client_no_noise
 }}}

 (from https://tahoe-lafs.org/buildbot-tahoe-
 lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28/steps/test/logs/timings)

 So there is a non-deterministic bug that exhibits on sickness's buildslave
 which causes those two tests to hang.

--

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2017#comment:10>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list