[tahoe-lafs-trac-stream] [tahoe-lafs] #2017: non-deterministic test hang on OpenBSD
tahoe-lafs
trac at tahoe-lafs.org
Wed Aug 28 15:51:58 UTC 2013
#2017: non-deterministic test hang on OpenBSD
------------------------+------------------------------------------------
Reporter: zooko | Owner: sickness
Type: defect | Status: new
Priority: normal | Milestone: soon
Component: code | Version: 1.10.0
Resolution: | Keywords: iputil heisenbug openbsd test hang
Launchpad Bug: |
------------------------+------------------------------------------------
Description changed by daira:
Old description:
> sickness's OpenBSD buildslave showed a test timeout:
>
> {{{
> ===============================================================================
> [ERROR]
> Traceback (most recent call last):
> Failure: twisted.internet.defer.TimeoutError:
> <allmydata.test.test_runner.RunNode testMethod=test_client_no_noise>
> (test_client_no_noise) still running at 240.0 secs
>
> allmydata.test.test_runner.RunNode.test_client_no_noise
> ===============================================================================
> [ERROR]
> Traceback (most recent call last):
> Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was
> unclean.
> DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to
> debug)
> <DelayedCall 0x816eb82c [0.00169348716736s] called=0 cancelled=0
> LoopingCall<0.01>(RunNode._poll, *(<function _node_has_started at
> 0x7ff29ed4>, 1373030506.664452), **{})()>
>
> allmydata.test.test_runner.RunNode.test_client_no_noise
> -------------------------------------------------------------------------------
> Ran 1139 tests in 1784.336s
>
> FAILED (skips=15, expectedFailures=3, errors=2, successes=1120)
> }}}
>
> (from https://tahoe-lafs.org/buildbot-tahoe-
> lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/27)
>
> Rerunning the tests with the exact same build (using Buildbot's "force
> rebuild" feature) resulted in success:
>
> https://tahoe-lafs.org/buildbot-tahoe-
> lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28
>
> In that run (build number 28), those tests took only a few seconds:
>
> {{{
> 19.917 seconds: allmydata.test.test_runner.RunNode.test_client
> }}}
> {{{
> 13.758 seconds: allmydata.test.test_runner.RunNode.test_client_no_noise
> }}}
>
> (from https://tahoe-lafs.org/buildbot-tahoe-
> lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28/steps/test/logs/timings)
>
> So there is a non-deterministic bug that exhibits on sickness's
> buildslave which causes those two tests to hang.
>
> Questions:
>
> 1. Does this happen on any other buildslaves?
>
> 2. Did this ever happen before the recent patches which changed the
> behavior of iputil — [b0883807361830c609dff1677c3cb34fd64d3ebb],
> [f97b8e5e1df75284aa9b89dd830f8728040eab67],
> [08590b1f6a880d51751fdcacea6a007ebc568f2e],
> [16b245563db2f6ca71b9332b06debbe3e1d734b4],
> [b31a4f6e870cb56efa40c785a868a944b964e8b9],
> [a493ee0bb641175ecf918e28fce4d25df15994b6],
> [6104950ed8a7a356eed2218f2df958d074022eea],
> [f77ec470d75f4b8fb81b1abca4ee3b73f1ad8b22],
> [8e31d66cd0b0821ccaa2c7c259e7d6f262ad4738],
> [6a445d73bc5253ec4ae0dec70af02e33bc869cf6]?
>
> ~~I suspect those iputil patches of causing this hang.~~
>
> sickness: could you please run the unit tests from the current trunk
> version repeatedly with trial's {{{--until-failure}}} option?
> {{{./bin/tahoe debug trial --until-failure allmydata.test}}} (See
> [wiki:HowToWriteTests] for more options.) If you can reliably reproduce
> the problem, then would you use git to rewind to before those patches and
> see if that makes the problem go away? Thanks!
New description:
sickness's OpenBSD buildslave showed a test timeout:
{{{
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: twisted.internet.defer.TimeoutError:
<allmydata.test.test_runner.RunNode testMethod=test_client_no_noise>
(test_client_no_noise) still running at 240.0 secs
allmydata.test.test_runner.RunNode.test_client_no_noise
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was
unclean.
DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to
debug)
<DelayedCall 0x816eb82c [0.00169348716736s] called=0 cancelled=0
LoopingCall<0.01>(RunNode._poll, *(<function _node_has_started at
0x7ff29ed4>, 1373030506.664452), **{})()>
allmydata.test.test_runner.RunNode.test_client_no_noise
-------------------------------------------------------------------------------
Ran 1139 tests in 1784.336s
FAILED (skips=15, expectedFailures=3, errors=2, successes=1120)
}}}
(from https://tahoe-lafs.org/buildbot-tahoe-
lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/27)
Rerunning the tests with the exact same build (using Buildbot's "force
rebuild" feature) resulted in success:
https://tahoe-lafs.org/buildbot-tahoe-
lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28
In that run (build number 28), those tests took only a few seconds:
{{{
19.917 seconds: allmydata.test.test_runner.RunNode.test_client
}}}
{{{
13.758 seconds: allmydata.test.test_runner.RunNode.test_client_no_noise
}}}
(from https://tahoe-lafs.org/buildbot-tahoe-
lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28/steps/test/logs/timings)
So there is a non-deterministic bug that exhibits on sickness's buildslave
which causes those two tests to hang.
--
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2017#comment:10>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list