[tahoe-lafs-trac-stream] [tahoe-lafs] #1336: improve the mechanism that causes test nodes to exit even if not successfully stopped
tahoe-lafs
trac at tahoe-lafs.org
Tue Jan 25 01:36:25 UTC 2011
#1336: improve the mechanism that causes test nodes to exit even if not
successfully stopped
--------------------------+-------------------------------------------------
Reporter: davidsarah | Owner: somebody
Type: defect | Status: new
Priority: major | Milestone: undecided
Component: code | Version: 1.8.1
Keywords: cleanup test | Launchpad Bug:
--------------------------+-------------------------------------------------
[source:src/allmydata/test/test_runner.py] includes some tests (in the
!RunNode class) for whether node processes can be successfully started and
stopped. If stopping the node fails, we don't want the node process to be
left running. (On Windows the process would hold open file handles that
prevent the _trial_test directory from being deleted, interfering with
subsequent test runs -- although currently these tests don't work on
Windows anyway, as discussed below.)
Currently this is done by writing a file, with the poorly-chosen name
"suicide_prevention_hotline", in the node directory. If a node sees this
file at startup, it will set a 1-second
[http://twistedmatrix.com/documents/10.2.0/api/twisted.application.internet.TimerService.html
periodic timer] ([source:src/allmydata/client.py#L154]) that each time it
triggers, causes the node process to exit if either the file's mtime is
more than 120 seconds ago, or the file no longer exists
([source:src/allmydata/client.py#L440]).
There are several problems with this mechanism:
* On slow machines, the node process may exit before the test had chance
to stop it, causing a spurious test failure. This seems to be happening on
the '!FranXois lenny-armv5tel' buildbot ([http://tahoe-
lafs.org/buildbot/builders/FranXois%20lenny-
armv5tel/builds/438/steps/test/logs/stdio example]).
* There is no way to distinguish an exit due to this cause from the
process being killed or exiting for another reason.
* The name of the file is based on a very poor choice of metaphor, that is
both unpleasant and misleading. (The existence of the file doesn't prevent
the node from exiting, as the name might imply.)
In addition, the tests of starting nodes don't work on Windows, because
twistd doesn't daemonize or write the pid file on that platform. While
that isn't directly due to this mechanism, it would be nice to redesign
these tests in a way that does work on Windows (if we're not going to
change the Windows behaviour to be more like Unix).
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1336>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list