[tahoe-lafs-trac-stream] [tahoe-lafs] #1824: Tahoe process gone wild

Tue Oct 16 03:14:16 UTC 2012

#1824: Tahoe process gone wild
------------------------+--------------------------------
     Reporter:  kpreid  |      Owner:  kpreid
         Type:  defect  |     Status:  new
     Priority:  normal  |  Milestone:  undecided
    Component:  code    |    Version:  1.9.2
   Resolution:          |   Keywords:  hang repair memory
Launchpad Bug:          |
------------------------+--------------------------------

Comment (by kpreid):

 There are no recent incident report files. There are no recent twistd.log
 entries. My tahoesvc.log is more interesting, and is full of repetitions
 (0.11 second intervals) of the following sequence, followed at the end by
 the startup log after I killed it.

 {{{
 2012-10-15 14:05:34.469Z [-] Unhandled error in Deferred:
 2012-10-15 14:05:38.016Z [-] Unhandled Error
         Traceback (most recent call last):
           File
 "/External/Projects/tahoe/src/allmydata/mutable/retrieve.py", line 610, in
 _download_current_segment
             d = self._process_segment(self._current_segment)
           File
 "/External/Projects/tahoe/src/allmydata/mutable/retrieve.py", line 638, in
 _process_segment
             dl.addErrback(self._validation_or_decoding_failed, [reader])
           File
 "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py",
 line 308, in addErrback
             errbackKeywords=kw)
           File
 "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py",
 line 286, in addCallbacks
             self._runCallbacks()
         --- <exception caught here> ---
           File
 "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py",
 line 542, in _runCallbacks
             current.result = callback(current.result, *args, **kw)
           File
 "/External/Projects/tahoe/src/allmydata/mutable/retrieve.py", line 736, in
 _validation_or_decoding_failed
             self._mark_bad_share(reader.server, reader.shnum, reader, f)
           File
 "/External/Projects/tahoe/src/allmydata/mutable/retrieve.py", line 595, in
 _mark_bad_share
             self.notify_server_corruption(server, shnum, str(f.value))
           File
 "/External/Projects/tahoe/src/allmydata/mutable/retrieve.py", line 938, in
 notify_server_corruption
             rref.callRemoteOnly("advise_corrupt_share",
         exceptions.AttributeError: 'NoneType' object has no attribute
 'callRemoteOnly'
 }}}

 Going back to the beginning of this error (85 megabytes of logs ago), I
 find this as immediately preceding the spew, and nothing recent before it:

 {{{
 2012-10-15 01:43:37.119Z [-] Unhandled error in Deferred:
 2012-10-15 01:43:37.136Z [-] Unhandled Error
         Traceback (most recent call last):
         Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to
 tubid=gtnn) (during method=RIStorageServer.tahoe.allmydata.com:slot_readv)
 }}}

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1824#comment:4>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage