Opened at 2011-03-22T20:04:17Z
Closed at 2013-07-17T13:00:09Z
#1381 closed defect (fixed)
EINTR from communication with subprocess in allmydata/util/iputil.py _query
Reported by: | davidsarah | Owned by: | zooko |
---|---|---|---|
Priority: | major | Milestone: | 1.10.1 |
Component: | code-network | Version: | 1.8.2 |
Keywords: | iputil heisenbug review-needed | Cc: | |
Launchpad Bug: |
Description (last modified by zooko)
Reported by 'sickness' on irc:
# Run # test_loadable ... [OK] # test_reloadable ... Node._startService failed, aborting # [Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 4] Interrupted system call # /usr/lib/python2.6/threading.py:497:__bootstrap # /usr/lib/python2.6/threading.py:525:__bootstrap_inner # /usr/lib/python2.6/threading.py:477:run # --- <exception caught here> --- # /usr/lib/python2.6/vendor-packages/twisted/python/threadpool.py:210:_worker # /usr/lib/python2.6/vendor-packages/twisted/python/context.py:59:callWithContext # /usr/lib/python2.6/vendor-packages/twisted/python/context.py:37:callWithContext # /home/righdieg/allmydata-tahoe-1.8.2/src/allmydata/util/iputil.py:222:_synchronously_find_addresses_via_config # /home/righdieg/allmydata-tahoe-1.8.2/src/allmydata/util/iputil.py:237:_query # /usr/lib/python2.6/subprocess.py:689:communicate # /usr/lib/python2.6/subprocess.py:1233:_communicate # /usr/lib/python2.6/subprocess.py:1157:wait # ] # calling os.abort()
Possibly related: http://bugs.python.org/issue1068268 . It may be that the patch for that bug wasn't complete enough. EINTR failures are usually not very reproducible, but the fix is just to repeat the query until it works (or fails with a different error).
Change History (18)
comment:1 follow-up: ↓ 2 Changed at 2011-03-22T21:28:16Z by sickness
comment:2 in reply to: ↑ 1 Changed at 2011-03-23T01:26:42Z by davidsarah
Replying to sickness:
python: 2.6.4,
Hmm, that should have had the backported fix for http://bugs.python.org/issue1068268 . Oh well, we would need to work around it for earlier Pythons anyway.
comment:3 Changed at 2011-05-28T22:09:17Z by davidsarah
- Keywords heisenbug added
comment:4 follow-up: ↓ 5 Changed at 2011-05-29T04:32:59Z by zooko
Should we work-around this by catching OSError with errno==4 and retrying the subprocess?
comment:5 in reply to: ↑ 4 Changed at 2011-05-29T15:33:32Z by davidsarah
Replying to zooko:
Should we work-around this by catching OSError with errno==4 and retrying the subprocess?
Yes, I believe so. We probably shouldn't retry forever, so let's retry 10 times. The try/except should cover lines 236 and 237 of iputil.py.
BTW, rather than 4 we should use errno.EINTR (I think this is defined on all platforms, even though EINTR is only really relevant on Unix).
Should _query return [] (i.e. no addresses) if the subprocess fails? Oh, I see that issue is #854 ('what to do when you can't find any IP address for yourself').
comment:6 Changed at 2011-08-14T00:09:40Z by davidsarah
- Milestone changed from 1.9.0 to 1.10.0
comment:7 Changed at 2011-08-14T00:09:58Z by davidsarah
- Status changed from new to assigned
comment:9 follow-up: ↓ 10 Changed at 2013-05-27T20:50:38Z by daira
This is a separate bug to #1988, though. The correct fix is to retry.
comment:10 in reply to: ↑ 9 Changed at 2013-05-29T17:26:28Z by zooko
comment:11 follow-up: ↓ 12 Changed at 2013-05-30T18:36:06Z by daira
- Keywords review-needed added
- Owner davidsarah deleted
- Status changed from assigned to new
Review needed for https://github.com/daira/tahoe-lafs/commits/refactor-address-finding.
comment:12 in reply to: ↑ 11 Changed at 2013-06-06T21:10:05Z by zooko
- Keywords review-needed removed
- Owner set to daira
Replying to daira:
Review needed for https://github.com/daira/tahoe-lafs/commits/refactor-address-finding.
Not ready yet, tests fail.
comment:13 Changed at 2013-06-14T23:52:37Z by daira
Oops, I accidentally committed the patch for this while committing the reviewed fix for #1717. Sorry :-(
I'll fix the tests next.
comment:14 Changed at 2013-06-25T18:15:57Z by Daira Hopwood <david-sarah@…>
comment:15 Changed at 2013-06-27T02:09:40Z by daira
- Owner changed from daira to zooko
comment:16 Changed at 2013-06-27T16:55:44Z by daira
- Keywords review-needed added
comment:17 Changed at 2013-07-12T17:32:41Z by markberger
+1
comment:18 Changed at 2013-07-17T13:00:09Z by daira
- Resolution set to fixed
- Status changed from new to closed
The OS is opensolaris snv134 64bit
$ uname -a
SunOS MYWORKPC 5.11 snv_134 i86pc i386 i86pc Solaris
$ psrinfo -pv
The physical processor has 2 virtual processors (0 1)
x86 (GenuineIntel? 1067A family 6 model 23 step 10 clock 2800 MHz)
Pentium(r) Dual-Core CPU E6300 @ 2.80GHz
$ isainfo -x
amd64: ssse3 cx16 mon sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu
i386: ssse3 ahf cx16 mon sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu
This is instead the tahoe version:
$ allmydata-tahoe-1.8.2/bin/tahoe --version
allmydata-tahoe: 1.8.2,
foolscap: 0.6.1,
pycryptopp: 0.5.29,
zfec: 1.4.22,
Twisted: 8.2.0,
Nevow: 0.10.0,
zope.interface: unknown,
python: 2.6.4,
platform: SunOS-5.11-i86pc-i386-32bit-ELF,
pyOpenSSL: 0.11,
simplejson: 2.0.9,
pycrypto: 2.3,
pyasn1: unknown,
mock: 0.7.0,
sqlite3: 2.4.1 [sqlite 3.6.17],
setuptools: 0.6c16dev3