[tahoe-dev] Sighting reports

Tue Jul 13 19:36:20 UTC 2010

Hey developers,

I've been putting my 4-node grid through some stress and I've encountered
a few problems I wanted to report.

1) Sometimes I get backup operations failing like this:

allmydata.scripts.common_http.HTTPError: Error during file PUT: 500
Internal Server Error
Traceback (most recent call last):
  File "build/bdist.openbsd-4.6-amd64/egg/foolscap/call.py", line 674, in
_done

  File "build/bdist.openbsd-4.6-amd64/egg/foolscap/call.py", line 60, in
complete

  File
"/usr/local/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-openbsd-4.6-amd64.egg/twisted/internet/defer.py",
line 280, in callback
    self._startRunCallbacks(result)
  File
"/usr/local/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-openbsd-4.6-amd64.egg/twisted/internet/defer.py",
line 354, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File
"/usr/local/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-openbsd-4.6-amd64.egg/twisted/internet/defer.py",
line 371, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File
"/usr/local/lib/python2.6/site-packages/allmydata_tahoe-1.7.0-py2.6.egg/allmydata/immutable/upload.py",
line 506, in _got_response
    return self._loop()
  File
"/usr/local/lib/python2.6/site-packages/allmydata_tahoe-1.7.0-py2.6.egg/allmydata/immutable/upload.py",
line 359, in _loop
    self._get_progress_message()))
allmydata.interfaces.UploadUnhappinessError: shares could be placed on
only 3 server(s) such that any 2 of them have enough shares to recover the
file, but we were asked to place shares on at least 4 such servers. (placed
all 4 shares, want to place shares on at least 4 servers such that any 2 of
them have enough shares to recover the file, sent 4 queries to 4 peers, 3
queries placed some shares, 1 placed none (of which 1 placed none due to
the server being full and 0 placed none due to an error))

This error report is incorrect -- all of the storage nodes show on their
status pages that they are still accepting new shares!  Further, I've seen
that if I keep trying to restart the backup, the storage situation degrades
until eventually it says that all 4 shares couldn't be placed due to the
server being full.  If I restart the tahoe node trying to run the backup,
this problem goes away, at least for a while.

This backup operation is not using a helper, but is running on the node
that runs the helper.

2) A long tahoe backup aborted with this error:

allmydata.scripts.common_http.HTTPError: Error during file PUT: 500
Internal Server Error
Traceback (most recent call last):
  File
"/usr/local/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-openbsd-4.6-amd64.egg/twisted/internet/defer.py",
line 325, in unpause
    self._runCallbacks()
  File
"/usr/local/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-openbsd-4.6-amd64.egg/twisted/internet/defer.py",
line 371, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File
"/usr/local/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-openbsd-4.6-amd64.egg/twisted/internet/defer.py",
line 330, in _continue
    self.unpause()
  File
"/usr/local/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-openbsd-4.6-amd64.egg/twisted/internet/defer.py",
line 325, in unpause
    self._runCallbacks()
--- <exception caught here> ---
  File
"/usr/local/lib/python2.6/site-packages/Twisted-10.0.0-py2.6-openbsd-4.6-amd64.egg/twisted/internet/defer.py",
line 371, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File
"/usr/local/lib/python2.6/site-packages/allmydata_tahoe-1.7.0-py2.6.egg/allmydata/immutable/upload.py",
line 896, in set_shareholders
    assert len(buckets) == sum([len(peer.buckets) for peer in used_peers])
exceptions.AssertionError: 

I can reproduce this error, which should make it more debuggable.

-- 
Kyle Markley