[tahoe-lafs-trac-stream] [tahoe-lafs] #2017: non-deterministic test hang on OpenBSD

tahoe-lafs trac at tahoe-lafs.org
Sat Jul 6 15:16:56 UTC 2013


#2017: non-deterministic test hang on OpenBSD
--------------------------------------+---------------------------
 Reporter:  zooko                     |          Owner:  sickness
     Type:  defect                    |         Status:  new
 Priority:  normal                    |      Milestone:  undecided
Component:  code                      |        Version:  1.10.0
 Keywords:  iputil heisenbug openbsd  |  Launchpad Bug:
--------------------------------------+---------------------------
 sickness's !OpenBSD buildslave showed a test timeout:

 {{{
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
 Failure: twisted.internet.defer.TimeoutError:
 <allmydata.test.test_runner.RunNode testMethod=test_client_no_noise>
 (test_client_no_noise) still running at 240.0 secs

 allmydata.test.test_runner.RunNode.test_client_no_noise
 ===============================================================================
 [ERROR]
 Traceback (most recent call last):
 Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was
 unclean.
 DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to
 debug)
 <DelayedCall 0x816eb82c [0.00169348716736s] called=0 cancelled=0
 LoopingCall<0.01>(RunNode._poll, *(<function _node_has_started at
 0x7ff29ed4>, 1373030506.664452), **{})()>

 allmydata.test.test_runner.RunNode.test_client_no_noise
 -------------------------------------------------------------------------------
 Ran 1139 tests in 1784.336s

 FAILED (skips=15, expectedFailures=3, errors=2, successes=1120)
 }}}

 (from https://tahoe-lafs.org/buildbot-tahoe-
 lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/27)

 Rerunning the tests with the exact same build (using Buildbot's "force
 rebuild" feature) resulted in success:

 https://tahoe-lafs.org/buildbot-tahoe-
 lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28

 In that run (build number 28), those tests took only a few seconds:

 {{{
  19.917 seconds: allmydata.test.test_runner.RunNode.test_client
 }}}
 {{{
  13.758 seconds: allmydata.test.test_runner.RunNode.test_client_no_noise
 }}}

 (from https://tahoe-lafs.org/buildbot-tahoe-
 lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28/steps/test/logs/timings)

 So there is a non-deterministic bug that exhibits on sickness's buildslave
 which causes those two tests to hang.

 Questions:

 1. Does this happen on any other buildslaves?

 2. Did this ever happen before the recent patches which changed the
 behavior of iputil — [b0883807361830c609dff1677c3cb34fd64d3ebb],
 [f97b8e5e1df75284aa9b89dd830f8728040eab67],
 [08590b1f6a880d51751fdcacea6a007ebc568f2e],
 [16b245563db2f6ca71b9332b06debbe3e1d734b4],
 [b31a4f6e870cb56efa40c785a868a944b964e8b9],
 [a493ee0bb641175ecf918e28fce4d25df15994b6],
 [6104950ed8a7a356eed2218f2df958d074022eea],
 [f77ec470d75f4b8fb81b1abca4ee3b73f1ad8b22],
 [8e31d66cd0b0821ccaa2c7c259e7d6f262ad4738],
 [6a445d73bc5253ec4ae0dec70af02e33bc869cf6]?

 I suspect those iputil patches of causing this hang.

 sickness: could you please run the unit tests from the current trunk
 version repeatedly with trial's {{{--until-failure}}} option?
 {{{./bin/tahoe debug trial --until-failure allmydata.test}}} (See
 [wiki:HowToWriteTests] for more options.) If you can reliably reproduce
 the problem, then would you use git to rewind to before those patches and
 see if that makes the problem go away? Thanks!

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2017>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list