[tahoe-lafs-trac-stream] [tahoe-lafs] #532: occasional failure in iputil (timeout in test_runner): use 'netifaces' package?
tahoe-lafs
trac at tahoe-lafs.org
Sun Dec 16 15:22:51 UTC 2012
#532: occasional failure in iputil (timeout in test_runner): use 'netifaces'
package?
----------------------------+----------------------
Reporter: warner | Owner: somebody
Type: defect | Status: closed
Priority: minor | Milestone: 1.5.0
Component: code | Version: 1.2.0
Resolution: worksforme | Keywords:
Launchpad Bug: |
----------------------------+----------------------
Old description:
> I'm seeing very occasional failures in the
> {{{allmydata.test.test_runner.RunNode.test_introducer}}} test. To
> reproduce it, in one shell I run:
>
> {{{
> run_to_death.pl 'make quicktest
> TEST=allmydata.test.test_runner.RunNode.test_introducer'
> }}}
>
> (where run_to_death.pl is a little perl script I've got to just keep
> running the same command over and over again until the exit status is
> nonzero)
>
> while in another shell I slow things down by doing {{{python -c "while 1:
> pass"}}}.
>
> This usually fails after about 10 minutes.
>
> The actual failure is a timeout. It appears that the iputil.py routine
> that uses {{{reactor.spawnProcess}}} to run {{{/sbin/ifconfig}}} (to
> figure out which interfaces are available and therefore what local IP
> addresses we should advertise) just plain fails: the Deferred never
> fires. My hunch is that somehow the SIGCHLD handle is broken, so the
> child process has finished but the parent doesn't notice.
>
> This doesn't happen frequently enough to really worry about, but some day
> it'd be nice to fix it.
>
> One possibility is to switch to the 'python-netifaces' tool, which
> unfortunately has compiled C code, but which claims to be fairly cross-
> platform and probably doesn't require a separate command to be spawned.
New description:
I'm seeing very occasional failures in the
{{{allmydata.test.test_runner.RunNode.test_introducer}}} test. To
reproduce it, in one shell I run:
{{{
run_to_death.pl 'make quicktest
TEST=allmydata.test.test_runner.RunNode.test_introducer'
}}}
(where run_to_death.pl is a little perl script I've got to just keep
running the same command over and over again until the exit status is
nonzero)
while in another shell I slow things down by doing {{{python -c "while 1:
pass"}}}.
This usually fails after about 10 minutes.
The actual failure is a timeout. It appears that the iputil.py routine
that uses {{{reactor.spawnProcess}}} to run {{{/sbin/ifconfig}}} (to
figure out which interfaces are available and therefore what local IP
addresses we should advertise) just plain fails: the Deferred never fires.
My hunch is that somehow the SIGCHLD handle is broken, so the child
process has finished but the parent doesn't notice.
This doesn't happen frequently enough to really worry about, but some day
it'd be nice to fix it.
One possibility is to switch to the 'python-netifaces' tool, which
unfortunately has compiled C code, but which claims to be fairly cross-
platform and probably doesn't require a separate command to be spawned.
--
Comment (by davidsarah):
Note that there definitely are nondeterministic bugs due to how we spawn
the command for iputil; see #1381. I think that bug would not cause a
timeout, though.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/532#comment:8>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list