#532 closed defect (worksforme)

occasional failure in iputil (timeout in test_runner): use 'netifaces' package?

Reported by: warner Owned by: somebody
Priority: minor Milestone: 1.5.0
Component: code Version: 1.2.0
Keywords: Cc:
Launchpad Bug:


I'm seeing very occasional failures in the allmydata.test.test_runner.RunNode.test_introducer test. To reproduce it, in one shell I run:

run_to_death.pl 'make quicktest TEST=allmydata.test.test_runner.RunNode.test_introducer'

(where run_to_death.pl is a little perl script I've got to just keep running the same command over and over again until the exit status is nonzero)

while in another shell I slow things down by doing python -c "while 1: pass".

This usually fails after about 10 minutes.

The actual failure is a timeout. It appears that the iputil.py routine that uses reactor.spawnProcess to run /sbin/ifconfig (to figure out which interfaces are available and therefore what local IP addresses we should advertise) just plain fails: the Deferred never fires. My hunch is that somehow the SIGCHLD handle is broken, so the child process has finished but the parent doesn't notice.

This doesn't happen frequently enough to really worry about, but some day it'd be nice to fix it.

One possibility is to switch to the 'python-netifaces' tool, which unfortunately has compiled C code, but which claims to be fairly cross-platform and probably doesn't require a separate command to be spawned.

Change History (8)

comment:1 Changed at 2008-11-03T22:18:29Z by zooko

So if your hunch is correct then this reveals the existence of a bug in Twisted?

comment:2 Changed at 2008-11-03T23:30:54Z by warner

seems plausible, yes. A smaller test case (which I don't quite have the time to build right now) would be to just run the /sbin/ifconfig command via reactor.spawnProcess, gathering but mostly ignoring the output, and then see if that can be made to fail.

comment:3 Changed at 2009-01-18T15:42:12Z by zooko

Should trial --until-failure allmydata.test.test_runner.RunNode.test_introducer also trigger the bug, then?

I'm running trial --until-failure pyutil.test.test_iputil.

comment:4 Changed at 2009-01-18T17:35:44Z by zooko

trial --until-failure pyutil.test.test_iputil wasn't able to reproduce this failure after about an hour of running. I'll try Brian's script next.

comment:5 Changed at 2009-01-18T17:37:55Z by zooko

Okay now I'm running this script:

time ( /bin/true; while [ $? = 0 ] ; do trial pyutil.test.test_iputil; done ) &> x.txt

comment:6 Changed at 2009-01-19T02:45:01Z by zooko

Okay, I let that script run all day and it didn't fail. Also the workstation (yukyuk) was loaded down with other jobs at the same time.

comment:7 Changed at 2009-06-21T20:22:02Z by warner

  • Milestone changed from undecided to 1.5.0
  • Resolution set to worksforme
  • Status changed from new to closed

I ran this test in a loop on a loaded box for a while and it didn't fail either, so maybe it's been fixed in whatever new version of Twisted I'm using now. It sounds like we can let this one go. Closing as "works for me".

comment:8 Changed at 2012-12-16T15:22:50Z by davidsarah

Note that there definitely are nondeterministic bugs due to how we spawn the command for iputil; see #1381. I think that bug would not cause a timeout, though.

Note: See TracTickets for help on using tickets.