[tahoe-dev] [tahoe-lafs] #758: maxmimum recursion depth exceeded in Tahoe2PeerSelector

tahoe-lafs trac at allmydata.org
Mon Jul 13 21:19:24 PDT 2009


#758: maxmimum recursion depth exceeded in Tahoe2PeerSelector
--------------------------------+-------------------------------------------
 Reporter:  zooko               |           Owner:       
     Type:  defect              |          Status:  new  
 Priority:  major               |       Milestone:  1.5.0
Component:  code-peerselection  |         Version:  1.4.1
 Keywords:                      |   Launchpad_bug:       
--------------------------------+-------------------------------------------
 I just got this traceback from a node using the volunteergrid:

 {{{
 /usr/local/lib/python2.6/dist-
 packages/Twisted-8.2.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py,
 line 328 in _runCallbacks
 326                    self._runningCallbacks = True
 327                    try:
 328                        self.result = callback(self.result, *args,
 **kw)
 329                    finally:
 Locals
 callback        <bound method Tahoe2PeerSelector._got_response of
 <Tahoe2PeerSelector for upload nztp5>>
 self    <Deferred at 0x4d93a70 current result: None>
 args    (<PeerTracker for peer xjy2clbq and SI nztp5>, set([19, 20]),
 [<PeerTracker for peer gapnio7p and SI nztp5>])
 kw      {}
 /home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 384
 in _got_response
 382
 383        # now loop
 384        return self._loop()
 385
 Locals
 self    <Tahoe2PeerSelector for upload nztp5>
 /home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 284
 in _loop
 282            self.contacted_peers.extend(self.contacted_peers2)
 283            self.contacted_peers[:] = []
 284            return self._loop()
 285        else:
 Locals
 self    <Tahoe2PeerSelector for upload nztp5>
 /home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 284
 in _loop
 282            self.contacted_peers.extend(self.contacted_peers2)
 283            self.contacted_peers[:] = []
 284            return self._loop()
 285        else:
 Locals
 self    <Tahoe2PeerSelector for upload nztp5>
 }}}

 (And so forth until maximum recursion depth exceeded.)

 There are only 15 servers on the volunteergrid right now.  The clause that
 is shown, around [source:src/allmydata/immutable/upload.py#L279 279 of
 upload.py] is for the case that all servers have been asked to hold a
 share, and then all servers have been asked to hold a second share, and
 this clause is to iterate and go on to ask them to hold yet a third-or-
 greater share.

 It appears that this loop never terminated before the recursion depth was
 exceeded.  We have
 [source:src/allmydata/tahoe/test/test_upload.py at 20090625021809-4233b-
 9cdbf53c54025466fea8ab97bed668cd0017b142#L483 tests of this case], but...
 Hey waitaminute!  That code in upload.py says:

 {{{
 elif self.contacted_peers2:
     # we've finished the second-or-later pass. Move all the remaining
     # peers back into self.contacted_peers for the next pass
     self.contacted_peers.extend(self.contacted_peers2)
     self.contacted_peers[:] = []
     return self._loop()
 }}}

 That can't be right.  It probably means to say:

 {{{
     self.contacted_peers.extend(self.contacted_peers2)
     del self.contacted_peers2[:]
 }}}

 Why does that test catch this bug?

 But it is too late at night for me to be messing with such stuff.

 If someone in a different timezone or a different sleep schedule wants to
 fix the test to catch this bug while I sleep, that would be great!  :-)

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/758>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list