#758 closed defect (fixed)

maximum recursion depth exceeded in Tahoe2PeerSelector

Reported by: zooko Owned by:
Priority: major Milestone: 1.5.0
Component: code-peerselection Version: 1.4.1
Keywords: Cc:
Launchpad Bug:

Description

I just got this traceback from a node using the volunteergrid:

/usr/local/lib/python2.6/dist-packages/Twisted-8.2.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py, line 328 in _runCallbacks
326                    self._runningCallbacks = True
327                    try:
328                        self.result = callback(self.result, *args, **kw)
329                    finally:
Locals
callback	<bound method Tahoe2PeerSelector._got_response of <Tahoe2PeerSelector for upload nztp5>>
self	<Deferred at 0x4d93a70 current result: None>
args	(<PeerTracker for peer xjy2clbq and SI nztp5>, set([19, 20]), [<PeerTracker for peer gapnio7p and SI nztp5>])
kw	{}
/home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 384 in _got_response
382
383        # now loop
384        return self._loop()
385
Locals
self	<Tahoe2PeerSelector for upload nztp5>
/home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 284 in _loop
282            self.contacted_peers.extend(self.contacted_peers2)
283            self.contacted_peers[:] = []
284            return self._loop()
285        else:
Locals
self	<Tahoe2PeerSelector for upload nztp5>
/home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 284 in _loop
282            self.contacted_peers.extend(self.contacted_peers2)
283            self.contacted_peers[:] = []
284            return self._loop()
285        else:
Locals
self	<Tahoe2PeerSelector for upload nztp5>

(And so forth until maximum recursion depth exceeded.)

There are only 15 servers on the volunteergrid right now. The clause that is shown, around 279 of upload.py is for the case that all servers have been asked to hold a share, and then all servers have been asked to hold a second share, and this clause is to iterate and go on to ask them to hold yet a third-or-greater share.

It appears that this loop never terminated before the recursion depth was exceeded. We have tests of this case, but... Hey waitaminute! That code in upload.py says:

elif self.contacted_peers2:
    # we've finished the second-or-later pass. Move all the remaining
    # peers back into self.contacted_peers for the next pass
    self.contacted_peers.extend(self.contacted_peers2)
    self.contacted_peers[:] = []
    return self._loop()

That can't be right. It probably means to say:

    self.contacted_peers.extend(self.contacted_peers2)
    del self.contacted_peers2[:]

Why does that test catch this bug?

But it is too late at night for me to be messing with such stuff.

If someone in a different timezone or a different sleep schedule wants to fix the test to catch this bug while I sleep, that would be great! :-)

Change History (3)

comment:1 Changed at 2009-07-15T03:45:54Z by terrell

  • Summary changed from maxmimum recursion depth exceeded in Tahoe2PeerSelector to maximum recursion depth exceeded in Tahoe2PeerSelector

comment:2 Changed at 2009-07-15T07:15:50Z by warner

Huh, yeah, that code !!!is!!! odd.. your analysis feel right, but I'm too jetlagged to understand this code right now either. I want to rewrite the uploader anyways, but that's not going to happen for 1.5.

comment:3 Changed at 2009-07-17T05:13:14Z by warner

  • Resolution set to fixed
  • Status changed from new to closed

This should be fixed, by 1192b61dfed62a49.

Note: See TracTickets for help on using tickets.