Changes between Initial Version and Version 2 of Ticket #1336


Ignore:
Timestamp:
2014-08-17T15:05:57Z (10 years ago)
Author:
daira
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #1336

    • Property Owner changed from somebody to daira
    • Property Status changed from new to assigned
  • Ticket #1336 – Description

    initial v2  
    11[source:src/allmydata/test/test_runner.py] includes some tests (in the !RunNode class) for whether node processes can be successfully started and stopped. If stopping the node fails, we don't want the node process to be left running. (On Windows the process would hold open file handles that prevent the _trial_test directory from being deleted, interfering with subsequent test runs -- although currently these tests don't work on Windows anyway, as discussed below.)
    22
    3 Currently this is done by writing a file, with the poorly-chosen name "suicide_prevention_hotline", in the node directory. If a node sees this file at startup, it will set a 1-second [http://twistedmatrix.com/documents/10.2.0/api/twisted.application.internet.TimerService.html periodic timer] ([source:src/allmydata/client.py#L154]) that each time it triggers, causes the node process to exit if either the file's mtime is more than 120 seconds ago, or the file no longer exists ([source:src/allmydata/client.py#L440]).
     3Currently this is done by writing a file, ~~with the poorly-chosen name "suicide_prevention_hotline"~~ called "exit_trigger", in the node directory. If a node sees this file at startup, it will set a 1-second [http://twistedmatrix.com/documents/10.2.0/api/twisted.application.internet.TimerService.html periodic timer] ([source:src/allmydata/client.py#L161]) that each time it triggers, causes the node process to exit if either the file's mtime is more than 120 seconds ago, or the file no longer exists ([source:src/allmydata/client.py#L498]).
    44
    55There are several problems with this mechanism:
     
    77* On slow machines, the node process may exit before the test had chance to stop it, causing a spurious test failure. This seems to be happening on the '!FranXois lenny-armv5tel' buildbot ([http://tahoe-lafs.org/buildbot/builders/FranXois%20lenny-armv5tel/builds/438/steps/test/logs/stdio example]).
    88* There is no way to distinguish an exit due to this cause from the process being killed or exiting for another reason.
    9 * The name of the file is based on a very poor choice of metaphor, that is both unpleasant and misleading. (The existence of the file doesn't prevent the node from exiting, as the name might imply.)
     9* ~~The name of the file is based on a very poor choice of metaphor, that is both unpleasant and misleading. (The existence of the file doesn't prevent the node from exiting, as the name might imply.)~~
    1010
    1111In addition, the tests of starting nodes don't work on Windows, because twistd doesn't daemonize or write the pid file on that platform. While that isn't directly due to this mechanism, it would be nice to redesign these tests in a way that does work on Windows (if we're not going to change the Windows behaviour to be more like Unix).