Opened at 2011-03-22T20:04:17Z
Last modified at 2013-07-17T13:00:09Z
#1381 closed defect
EINTR from communication with subprocess in allmydata/util/iputil.py _query — at Version 8
Reported by: | davidsarah | Owned by: | davidsarah |
---|---|---|---|
Priority: | major | Milestone: | 1.10.1 |
Component: | code-network | Version: | 1.8.2 |
Keywords: | iputil heisenbug review-needed | Cc: | |
Launchpad Bug: |
Description (last modified by zooko)
Reported by 'sickness' on irc:
# Run # test_loadable ... [OK] # test_reloadable ... Node._startService failed, aborting # [Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 4] Interrupted system call # /usr/lib/python2.6/threading.py:497:__bootstrap # /usr/lib/python2.6/threading.py:525:__bootstrap_inner # /usr/lib/python2.6/threading.py:477:run # --- <exception caught here> --- # /usr/lib/python2.6/vendor-packages/twisted/python/threadpool.py:210:_worker # /usr/lib/python2.6/vendor-packages/twisted/python/context.py:59:callWithContext # /usr/lib/python2.6/vendor-packages/twisted/python/context.py:37:callWithContext # /home/righdieg/allmydata-tahoe-1.8.2/src/allmydata/util/iputil.py:222:_synchronously_find_addresses_via_config # /home/righdieg/allmydata-tahoe-1.8.2/src/allmydata/util/iputil.py:237:_query # /usr/lib/python2.6/subprocess.py:689:communicate # /usr/lib/python2.6/subprocess.py:1233:_communicate # /usr/lib/python2.6/subprocess.py:1157:wait # ] # calling os.abort()
Possibly related: http://bugs.python.org/issue1068268 . It may be that the patch for that bug wasn't complete enough. EINTR failures are usually not very reproducible, but the fix is just to repeat the query until it works (or fails with a different error).
Change History (8)
comment:1 follow-up: ↓ 2 Changed at 2011-03-22T21:28:16Z by sickness
comment:2 in reply to: ↑ 1 Changed at 2011-03-23T01:26:42Z by davidsarah
Replying to sickness:
python: 2.6.4,
Hmm, that should have had the backported fix for http://bugs.python.org/issue1068268 . Oh well, we would need to work around it for earlier Pythons anyway.
comment:3 Changed at 2011-05-28T22:09:17Z by davidsarah
- Keywords heisenbug added
comment:4 follow-up: ↓ 5 Changed at 2011-05-29T04:32:59Z by zooko
Should we work-around this by catching OSError with errno==4 and retrying the subprocess?
comment:5 in reply to: ↑ 4 Changed at 2011-05-29T15:33:32Z by davidsarah
Replying to zooko:
Should we work-around this by catching OSError with errno==4 and retrying the subprocess?
Yes, I believe so. We probably shouldn't retry forever, so let's retry 10 times. The try/except should cover lines 236 and 237 of iputil.py.
BTW, rather than 4 we should use errno.EINTR (I think this is defined on all platforms, even though EINTR is only really relevant on Unix).
Should _query return [] (i.e. no addresses) if the subprocess fails? Oh, I see that issue is #854 ('what to do when you can't find any IP address for yourself').
comment:6 Changed at 2011-08-14T00:09:40Z by davidsarah
- Milestone changed from 1.9.0 to 1.10.0
comment:7 Changed at 2011-08-14T00:09:58Z by davidsarah
- Status changed from new to assigned
The OS is opensolaris snv134 64bit
$ uname -a
SunOS MYWORKPC 5.11 snv_134 i86pc i386 i86pc Solaris
$ psrinfo -pv
The physical processor has 2 virtual processors (0 1)
x86 (GenuineIntel? 1067A family 6 model 23 step 10 clock 2800 MHz)
Pentium(r) Dual-Core CPU E6300 @ 2.80GHz
$ isainfo -x
amd64: ssse3 cx16 mon sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu
i386: ssse3 ahf cx16 mon sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu
This is instead the tahoe version:
$ allmydata-tahoe-1.8.2/bin/tahoe --version
allmydata-tahoe: 1.8.2,
foolscap: 0.6.1,
pycryptopp: 0.5.29,
zfec: 1.4.22,
Twisted: 8.2.0,
Nevow: 0.10.0,
zope.interface: unknown,
python: 2.6.4,
platform: SunOS-5.11-i86pc-i386-32bit-ELF,
pyOpenSSL: 0.11,
simplejson: 2.0.9,
pycrypto: 2.3,
pyasn1: unknown,
mock: 0.7.0,
sqlite3: 2.4.1 [sqlite 3.6.17],
setuptools: 0.6c16dev3