#2017 closed defect

non-deterministic test hang on OpenBSD — at Initial Version

Reported by: zooko Owned by: sickness
Priority: normal Milestone: soon
Component: code Version: 1.10.0
Keywords: iputil heisenbug openbsd test hang Cc:
Launchpad Bug:

Description

sickness's !OpenBSD buildslave showed a test timeout:

===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: twisted.internet.defer.TimeoutError: <allmydata.test.test_runner.RunNode testMethod=test_client_no_noise> (test_client_no_noise) still running at 240.0 secs

allmydata.test.test_runner.RunNode.test_client_no_noise
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean.
DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to debug)
<DelayedCall 0x816eb82c [0.00169348716736s] called=0 cancelled=0 LoopingCall<0.01>(RunNode._poll, *(<function _node_has_started at 0x7ff29ed4>, 1373030506.664452), **{})()>

allmydata.test.test_runner.RunNode.test_client_no_noise
-------------------------------------------------------------------------------
Ran 1139 tests in 1784.336s

FAILED (skips=15, expectedFailures=3, errors=2, successes=1120)

(from https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/27)

Rerunning the tests with the exact same build (using Buildbot's "force rebuild" feature) resulted in success:

https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28

In that run (build number 28), those tests took only a few seconds:

 19.917 seconds: allmydata.test.test_runner.RunNode.test_client
 13.758 seconds: allmydata.test.test_runner.RunNode.test_client_no_noise

(from https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28/steps/test/logs/timings)

So there is a non-deterministic bug that exhibits on sickness's buildslave which causes those two tests to hang.

Questions:

  1. Does this happen on any other buildslaves?
  1. Did this ever happen before the recent patches which changed the behavior of iputil — [b0883807361830c609dff1677c3cb34fd64d3ebb], [f97b8e5e1df75284aa9b89dd830f8728040eab67], [08590b1f6a880d51751fdcacea6a007ebc568f2e], [16b245563db2f6ca71b9332b06debbe3e1d734b4], [b31a4f6e870cb56efa40c785a868a944b964e8b9], [a493ee0bb641175ecf918e28fce4d25df15994b6], [6104950ed8a7a356eed2218f2df958d074022eea], [f77ec470d75f4b8fb81b1abca4ee3b73f1ad8b22], [8e31d66cd0b0821ccaa2c7c259e7d6f262ad4738], [6a445d73bc5253ec4ae0dec70af02e33bc869cf6]?

I suspect those iputil patches of causing this hang.

sickness: could you please run the unit tests from the current trunk version repeatedly with trial's --until-failure option? ./bin/tahoe debug trial --until-failure allmydata.test (See HowToWriteTests for more options.) If you can reliably reproduce the problem, then would you use git to rewind to before those patches and see if that makes the problem go away? Thanks!

Change History (0)

Note: See TracTickets for help on using tickets.