[tahoe-dev] [tahoe-lafs] #758: maxmimum recursion depth exceeded in Tahoe2PeerSelector
tahoe-lafs
trac at allmydata.org
Mon Jul 13 21:19:24 PDT 2009
#758: maxmimum recursion depth exceeded in Tahoe2PeerSelector
--------------------------------+-------------------------------------------
Reporter: zooko | Owner:
Type: defect | Status: new
Priority: major | Milestone: 1.5.0
Component: code-peerselection | Version: 1.4.1
Keywords: | Launchpad_bug:
--------------------------------+-------------------------------------------
I just got this traceback from a node using the volunteergrid:
{{{
/usr/local/lib/python2.6/dist-
packages/Twisted-8.2.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py,
line 328 in _runCallbacks
326 self._runningCallbacks = True
327 try:
328 self.result = callback(self.result, *args,
**kw)
329 finally:
Locals
callback <bound method Tahoe2PeerSelector._got_response of
<Tahoe2PeerSelector for upload nztp5>>
self <Deferred at 0x4d93a70 current result: None>
args (<PeerTracker for peer xjy2clbq and SI nztp5>, set([19, 20]),
[<PeerTracker for peer gapnio7p and SI nztp5>])
kw {}
/home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 384
in _got_response
382
383 # now loop
384 return self._loop()
385
Locals
self <Tahoe2PeerSelector for upload nztp5>
/home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 284
in _loop
282 self.contacted_peers.extend(self.contacted_peers2)
283 self.contacted_peers[:] = []
284 return self._loop()
285 else:
Locals
self <Tahoe2PeerSelector for upload nztp5>
/home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 284
in _loop
282 self.contacted_peers.extend(self.contacted_peers2)
283 self.contacted_peers[:] = []
284 return self._loop()
285 else:
Locals
self <Tahoe2PeerSelector for upload nztp5>
}}}
(And so forth until maximum recursion depth exceeded.)
There are only 15 servers on the volunteergrid right now. The clause that
is shown, around [source:src/allmydata/immutable/upload.py#L279 279 of
upload.py] is for the case that all servers have been asked to hold a
share, and then all servers have been asked to hold a second share, and
this clause is to iterate and go on to ask them to hold yet a third-or-
greater share.
It appears that this loop never terminated before the recursion depth was
exceeded. We have
[source:src/allmydata/tahoe/test/test_upload.py at 20090625021809-4233b-
9cdbf53c54025466fea8ab97bed668cd0017b142#L483 tests of this case], but...
Hey waitaminute! That code in upload.py says:
{{{
elif self.contacted_peers2:
# we've finished the second-or-later pass. Move all the remaining
# peers back into self.contacted_peers for the next pass
self.contacted_peers.extend(self.contacted_peers2)
self.contacted_peers[:] = []
return self._loop()
}}}
That can't be right. It probably means to say:
{{{
self.contacted_peers.extend(self.contacted_peers2)
del self.contacted_peers2[:]
}}}
Why does that test catch this bug?
But it is too late at night for me to be messing with such stuff.
If someone in a different timezone or a different sleep schedule wants to
fix the test to catch this bug while I sleep, that would be great! :-)
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/758>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list