Opened at 2009-07-14T04:19:24Z
Closed at 2009-07-17T05:13:14Z
#758 closed defect (fixed)
maximum recursion depth exceeded in Tahoe2PeerSelector
| Reported by: | zooko | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | 1.5.0 |
| Component: | code-peerselection | Version: | 1.4.1 |
| Keywords: | Cc: | ||
| Launchpad Bug: |
Description
I just got this traceback from a node using the volunteergrid:
/usr/local/lib/python2.6/dist-packages/Twisted-8.2.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py, line 328 in _runCallbacks
326 self._runningCallbacks = True
327 try:
328 self.result = callback(self.result, *args, **kw)
329 finally:
Locals
callback <bound method Tahoe2PeerSelector._got_response of <Tahoe2PeerSelector for upload nztp5>>
self <Deferred at 0x4d93a70 current result: None>
args (<PeerTracker for peer xjy2clbq and SI nztp5>, set([19, 20]), [<PeerTracker for peer gapnio7p and SI nztp5>])
kw {}
/home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 384 in _got_response
382
383 # now loop
384 return self._loop()
385
Locals
self <Tahoe2PeerSelector for upload nztp5>
/home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 284 in _loop
282 self.contacted_peers.extend(self.contacted_peers2)
283 self.contacted_peers[:] = []
284 return self._loop()
285 else:
Locals
self <Tahoe2PeerSelector for upload nztp5>
/home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 284 in _loop
282 self.contacted_peers.extend(self.contacted_peers2)
283 self.contacted_peers[:] = []
284 return self._loop()
285 else:
Locals
self <Tahoe2PeerSelector for upload nztp5>
(And so forth until maximum recursion depth exceeded.)
There are only 15 servers on the volunteergrid right now. The clause that is shown, around 279 of upload.py is for the case that all servers have been asked to hold a share, and then all servers have been asked to hold a second share, and this clause is to iterate and go on to ask them to hold yet a third-or-greater share.
It appears that this loop never terminated before the recursion depth was exceeded. We have tests of this case, but... Hey waitaminute! That code in upload.py says:
elif self.contacted_peers2:
# we've finished the second-or-later pass. Move all the remaining
# peers back into self.contacted_peers for the next pass
self.contacted_peers.extend(self.contacted_peers2)
self.contacted_peers[:] = []
return self._loop()
That can't be right. It probably means to say:
self.contacted_peers.extend(self.contacted_peers2)
del self.contacted_peers2[:]
Why does that test catch this bug?
But it is too late at night for me to be messing with such stuff.
If someone in a different timezone or a different sleep schedule wants to fix the test to catch this bug while I sleep, that would be great! :-)
Change History (3)
comment:1 Changed at 2009-07-15T03:45:54Z by terrell
- Summary changed from maxmimum recursion depth exceeded in Tahoe2PeerSelector to maximum recursion depth exceeded in Tahoe2PeerSelector
comment:2 Changed at 2009-07-15T07:15:50Z by warner
comment:3 Changed at 2009-07-17T05:13:14Z by warner
- Resolution set to fixed
- Status changed from new to closed
This should be fixed, by 1192b61dfed62a49.

Huh, yeah, that code !!!is!!! odd.. your analysis feel right, but I'm too jetlagged to understand this code right now either. I want to rewrite the uploader anyways, but that's not going to happen for 1.5.