[tahoe-lafs-trac-stream] [tahoe-lafs] #532: occasional failure in iputil (timeout in test_runner): use 'netifaces' package?

tahoe-lafs trac at tahoe-lafs.org
Sun Dec 16 15:22:51 UTC 2012


#532: occasional failure in iputil (timeout in test_runner): use 'netifaces'
package?
----------------------------+----------------------
     Reporter:  warner      |      Owner:  somebody
         Type:  defect      |     Status:  closed
     Priority:  minor       |  Milestone:  1.5.0
    Component:  code        |    Version:  1.2.0
   Resolution:  worksforme  |   Keywords:
Launchpad Bug:              |
----------------------------+----------------------

Old description:

> I'm seeing very occasional failures in the
> {{{allmydata.test.test_runner.RunNode.test_introducer}}} test. To
> reproduce it, in one shell I run:
>
> {{{
> run_to_death.pl 'make quicktest
> TEST=allmydata.test.test_runner.RunNode.test_introducer'
> }}}
>
> (where run_to_death.pl is a little perl script I've got to just keep
> running the same command over and over again until the exit status is
> nonzero)
>
> while in another shell I slow things down by doing {{{python -c "while 1:
> pass"}}}.
>
> This usually fails after about 10 minutes.
>
> The actual failure is a timeout. It appears that the iputil.py routine
> that uses {{{reactor.spawnProcess}}} to run {{{/sbin/ifconfig}}} (to
> figure out which interfaces are available and therefore what local IP
> addresses we should advertise) just plain fails: the Deferred never
> fires. My hunch is that somehow the SIGCHLD handle is broken, so the
> child process has finished but the parent doesn't notice.
>
> This doesn't happen frequently enough to really worry about, but some day
> it'd be nice to fix it.
>
> One possibility is to switch to the 'python-netifaces' tool, which
> unfortunately has compiled C code, but which claims to be fairly cross-
> platform and probably doesn't require a separate command to be spawned.

New description:

 I'm seeing very occasional failures in the
 {{{allmydata.test.test_runner.RunNode.test_introducer}}} test. To
 reproduce it, in one shell I run:

 {{{
 run_to_death.pl 'make quicktest
 TEST=allmydata.test.test_runner.RunNode.test_introducer'
 }}}

 (where run_to_death.pl is a little perl script I've got to just keep
 running the same command over and over again until the exit status is
 nonzero)

 while in another shell I slow things down by doing {{{python -c "while 1:
 pass"}}}.

 This usually fails after about 10 minutes.

 The actual failure is a timeout. It appears that the iputil.py routine
 that uses {{{reactor.spawnProcess}}} to run {{{/sbin/ifconfig}}} (to
 figure out which interfaces are available and therefore what local IP
 addresses we should advertise) just plain fails: the Deferred never fires.
 My hunch is that somehow the SIGCHLD handle is broken, so the child
 process has finished but the parent doesn't notice.

 This doesn't happen frequently enough to really worry about, but some day
 it'd be nice to fix it.

 One possibility is to switch to the 'python-netifaces' tool, which
 unfortunately has compiled C code, but which claims to be fairly cross-
 platform and probably doesn't require a separate command to be spawned.

--

Comment (by davidsarah):

 Note that there definitely are nondeterministic bugs due to how we spawn
 the command for iputil; see #1381. I think that bug would not cause a
 timeout, though.

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/532#comment:8>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list