[tahoe-dev] Help uploading when file exists but needs repair
Kyle Markley
kyle at arbyte.us
Wed Dec 1 06:58:50 UTC 2010
Brian et al,
> Huh? Shouldn't the new upload just put new shares in place? I know
> our
> uploader isn't particularly clever in the face of existing shares (it
> will put multiple shares on one server, and in general not achieve
> the
> ideal diversity), but it shouldn't just fail.
Ok; maybe I'm misunderstanding the failure. Let's do a more robust
diagnosis.
Start with this to clear out old cruft:
rm ~/.tahoe/private/aliases
rm ~/.tahoe/private/backupdb.sqlite
tahoe create-alias $USER
$ tahoe --version
allmydata-tahoe: 1.8.1, foolscap: 0.5.1, pycryptopp: 0.5.25, zfec:
1.4.7, Twisted: 10.1.0, Nevow: 0.10.0, zope.interface: 3.6.1, python:
2.6.5, platform:
OpenBSD-4.8-amd64-Genuine_Intel-R-_CPU_000_ at _2.93GHz-64bit-ELF, sqlite:
3.6.23, simplejson: 2.1.2, argparse: 1.1, pycrypto: 2.3, pyOpenSSL:
0.11, pyutil: 1.7.12, zbase32: 1.1.2, setuptools: 0.6c11, pyasn1:
0.0.11a, pysqlite: 2.4.1
$ tahoe backup -v --exclude-vcs --exclude=build --exclude=.darcs
--exclude=.python-eggs $HOME $USER:
.... lots of normal-looking output, followed by ....
uploading '/storage/_buildbot/.login'..
Traceback (most recent call last):
File "/usr/local/bin/tahoe", line 9, in <module>
load_entry_point('allmydata-tahoe==1.8.1', 'console_scripts',
'tahoe')()
File
"/usr/local/lib/python2.6/site-packages/allmydata/scripts/runner.py",
line 111, in run
File
"/usr/local/lib/python2.6/site-packages/allmydata/scripts/runner.py",
line 97, in runner
File
"/usr/local/lib/python2.6/site-packages/allmydata/scripts/cli.py", line
513, in backup
File
"/usr/local/lib/python2.6/site-packages/allmydata/scripts/tahoe_backup.py",
line 324, in backup
File
"/usr/local/lib/python2.6/site-packages/allmydata/scripts/tahoe_backup.py",
line 117, in run
File
"/usr/local/lib/python2.6/site-packages/allmydata/scripts/tahoe_backup.py",
line 193, in process
File
"/usr/local/lib/python2.6/site-packages/allmydata/scripts/tahoe_backup.py",
line 304, in upload
allmydata.scripts.common_http.HTTPError: Error during file PUT: 500
Internal Server Error
"Traceback (most recent call last):\x0a File
\"build/bdist.openbsd-4.8-amd64/egg/foolscap/call.py\", line 674, in
_done\x0a \x0a File
\"build/bdist.openbsd-4.8-amd64/egg/foolscap/call.py\", line 60, in
complete\x0a \x0a File
\"/usr/local/lib/python2.6/site-packages/Twisted-10.1.0-py2.6-openbsd-4.8-amd64.egg/twisted/internet/defer.py\",
line 318, in callback\x0a self._startRunCallbacks(result)\x0a File
\"/usr/local/lib/python2.6/site-packages/Twisted-10.1.0-py2.6-openbsd-4.8-amd64.egg/twisted/internet/defer.py\",
line 424, in _startRunCallbacks\x0a self._runCallbacks()\x0a---
<exception caught here> ---\x0a File
\"/usr/local/lib/python2.6/site-packages/Twisted-10.1.0-py2.6-openbsd-4.8-amd64.egg/twisted/internet/defer.py\",
line 441, in _runCallbacks\x0a self.result = callback(self.result,
*args, **kw)\x0a File
\"/usr/local/lib/python2.6/site-packages/allmydata/immutable/upload.py\",
line 546, in _got_response\x0a \x0a File
\"/usr/local/lib/python2.6/site-packages/allmydata/immutable/upload.py\",
line 396, in _loop\x0a \x0a File
\"/usr/local/lib/python2.6/site-packages/allmydata/immutable/upload.py\",
line 561, in _failed\x0a
\x0aallmydata.interfaces.UploadUnhappinessError: shares could be placed
on only 3 server(s) such that any 2 of them have enough shares to
recover the file, but we were asked to place shares on at least 4 such
servers. (placed all 4 shares, want to place shares on at least 4
servers such that any 2 of them have enough shares to recover the file,
sent 4 queries to 4 peers, 4 queries placed some shares, 0 placed none
(of which 0 placed none due to the server being full and 0 placed none
due to an error))\x0a"
So it appears it's failing to upload the .login file. The specific
error message doesn't make sense to me -- if all 4 queries placed some
shares, and 0 queries placed none, then why hasn't the file become
healthy?
In this particular case I am able to locate a copy of that file on the
grid. This is the output from tahoe check --raw for what I believe is
the corresponding file. Note that one server has two shares and two
servers have none; I don't know whether that's relevant. (I'd like to
learn how to be more certain I'm looking at the correct object, to begin
with!):
{
"results": {
"needs-rebalancing": true,
"count-shares-expected": 4,
"healthy": false,
"count-unrecoverable-versions": 0,
"count-shares-needed": 2,
"sharemap": {
"0": [
"juwmgssmwnhrhfdcpxxmrz3bghh37esx"
],
"1": [
"vjqcroalrgmft66mgiwfjug667fl6qjd"
],
"3": [
"juwmgssmwnhrhfdcpxxmrz3bghh37esx"
]
},
"count-recoverable-versions": 1,
"servers-responding": [
"vjqcroalrgmft66mgiwfjug667fl6qjd",
"juwmgssmwnhrhfdcpxxmrz3bghh37esx",
"47cslusczp3uu2kygodi3nlalcruscif",
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"count-good-share-hosts": 2,
"count-wrong-shares": 0,
"count-shares-good": 3,
"count-corrupt-shares": 0,
"list-corrupt-shares": [],
"recoverable": true
},
"storage-index": "dumi26otgmnemrypt3zlesxm5y",
"summary": "Not Healthy: 3 shares (enc 2-of-4)"
}
What do the expert folk make of this situation?
--
Kyle Markley
More information about the tahoe-dev
mailing list