[tahoe-lafs-trac-stream] [Tahoe-LAFS] #2023: regression coincident with iputil fixes, on FreeBSD and Slackware

Wed Sep 3 05:31:55 UTC 2014

#2023: regression coincident with iputil fixes, on FreeBSD and Slackware
-------------------------+-------------------------------------------------
     Reporter:  zooko    |      Owner:  warner
         Type:  defect   |     Status:  assigned
     Priority:  normal   |  Milestone:  1.11.0
    Component:  code-    |    Version:  1.10.0
  network                |   Keywords:  regression portability iputil
   Resolution:           |  blocks-release
Launchpad Bug:           |
-------------------------+-------------------------------------------------

Comment (by warner):

 Oh, and now I can, by editing `_auto_deps.py` to require twisted >=13.0 .
 (maybe this causes twisted to be installed into support/lib/ before
 whatever other mysterious thing gets installed that demands >=13.0 but is
 unable to install it).

 The cluster of failures I'm getting includes:

 * `close failed in file object destructor: IO Error: [Errno 9] Bad file
   descriptor`. This appears to be happening in `node.Node._setup_tub >
   fileutil.write_atomically`, just after calling
   `iputil.get_local_addresses_async()`, as it tries to write the
   kernel-assigned port number to disk. In one case, this caused an
   exception to be caught by the node-startup-time Deferred chain, which
   then bails (os.abort) via the `Node._startService failed, aborting`
   path. In another case, this didn't actually flunk the test.

 * `twisted.internet.error.CannotListenError: Couldn't listen on
   any:64198: [Errno 48] Address already in use`, in
   `SystemTestMixin.bounce_client` as it tries to start up a new Client
   service, just after shutting down the old one (and waiting for the
   `disownServiceParent` deferred to fire, then waiting an extra 1.0
   seconds for good measure). The arbitrary 1.0 second stall already
   smells funny (I left a note there blaming windows, but I'm seeing this
   problem on OS-X too).

 * `CannotListenError`, on `any:0`, with `[Errno 9] Bad file descriptor`,
   in `get_local_addresses_async > get_local_ip_for > listenUDP >
   startListening > _bindSocket`, again with the "close failed in file
   object destructor" message, and triggering the `Node._startService
   failed, aborting` path.

 * `exceptions.OSError: [Errno 9] Bad file descriptor` on a call to
   `os.urandom()` inside Foolscap. I think this might be collateral
   damage due to --rterrors or the bail-on-failure stuff, when the test
   fails, but part of the code charges on ahead without realizing it, and
   then you've got one thread closing all fds in preparation for
   shutdown, and a different thread trying to use those fds.

 * sometimes combinations of these errors

 * I also see "Malformed file descriptor found. Preening lists." in the
   logs, which happens when `select()` gets an error (`ValueError`,
   `TypeError`, or our old friend Bad File Descriptor).

 Exceptions that occur during object destructors are always screwy
 (actually anything that happens inside a destructor call is screwy).
 They're a concurrency hazard that's worse than threads: at least with
 threads you can pretend to fix the problem with locks. But it seems like
 *something* is triggering a bunch of Bad File Descriptor errors in random
 places.

 Hm, I know Twisted has had, at various times, a feature to close extra fds
 when preparing for a fork(), and I think that code got simplified or
 changed recently (last two years?) to take advantage of some feature that
 lets you mark fds for automatic closing instead of manually calling
 os.close() on them. Maybe something in a newer version of python? I'm
 wondering if the thread that gets started when Twisted's DNS resolver is
 created (the one on which the blocking `gethostbyname()` is called), or
 the fork/exec that might happen when iputil.py spawns off ifconfig, is
 causing existing fds to be killed, wreaking all sorts of havoc.

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2023#comment:10>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage