[tahoe-dev] Manual rebalancing in 1.10.0?
Kyle Markley
kyle at arbyte.us
Sun Sep 22 16:46:24 UTC 2013
Mark Berger, et al,
I (believe I have) tried my scenario with your code, and it doesn't fix
the behavior I have been seeing.
Given a file on the grid for which all shares exist, but which needs
rebalancing, "tahoe put" for that same file will fail. (And "tahoe
check --repair" does not attempt to rebalance.)
This is what I did. I'm a git novice, so maybe I didn't get the right code:
$ git clone https://github.com/markberger/tahoe-lafs.git
$ cd tahoe-lafs/
$ git checkout 1382-rewrite
Branch 1382-rewrite set up to track remote branch 1382-rewrite from origin.
Switched to a new branch '1382-rewrite'
$ python setup.py build
<snip>
$ bin/tahoe --version
allmydata-tahoe: 1.10b1.post68 [1382-rewrite:
7b95f937089d59b595dfe5e85d2d81ec36d5cf9d]
foolscap: 0.6.4
pycryptopp: 0.6.0.1206569328141510525648634803928199668821045408958
zfec: 1.4.24
Twisted: 13.0.0
Nevow: 0.10.0
zope.interface: unknown
python: 2.7.3
platform: OpenBSD-5.3-amd64-64bit
pyOpenSSL: 0.13
simplejson: 3.3.0
pycrypto: 2.6
pyasn1: 0.1.7
mock: 1.0.1
setuptools: 0.6c16dev4
Original output from tahoe check --raw:
{
"results": {
"needs-rebalancing": true,
"count-unrecoverable-versions": 0,
"count-good-share-hosts": 2,
"count-shares-good": 10,
"count-corrupt-shares": 0,
"list-corrupt-shares": [],
"count-shares-expected": 10,
"healthy": true,
"count-shares-needed": 4,
"sharemap": {
"0": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"1": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"2": [
"v0-7ags2kynskk5rrmbyk6yzjzmceswxh7x5lekghwsfbwdpfeaztxa"
],
"3": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"4": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"5": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"6": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"7": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"8": [
"v0-7ags2kynskk5rrmbyk6yzjzmceswxh7x5lekghwsfbwdpfeaztxa"
],
"9": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
]
},
"count-recoverable-versions": 1,
"count-wrong-shares": 0,
"servers-responding": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya",
"v0-7ags2kynskk5rrmbyk6yzjzmceswxh7x5lekghwsfbwdpfeaztxa",
"v0-jqs2izy4yo2wusmsso2mzkfqpqrmmbhegtxcyup7heisfrf4octa",
"v0-rbwrud2e6alixe4xwlaynv7jbzvhn2wxbs4jniqlgu6wd5sk724q"
],
"recoverable": true
},
"storage-index": "rfomclj5ogk434v2gchspipv3i",
"summary": "Healthy"
}
Then I try to re-upload the unbalanced file:
$ bin/tahoe put /tmp/temp_file
Error: 500 Internal Server Error
Traceback (most recent call last):
File
"/usr/local/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/call.py",
line 677, in _done
self.request.complete(res)
File
"/usr/local/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/call.py",
line 60, in complete
self.deferred.callback(res)
File
"/usr/local/lib/python2.7/site-packages/Twisted-13.0.0-py2.7-openbsd-5.3-amd64.egg/twisted/internet/defer.py",
line 380, in callback
self._startRunCallbacks(result)
File
"/usr/local/lib/python2.7/site-packages/Twisted-13.0.0-py2.7-openbsd-5.3-amd64.egg/twisted/internet/defer.py",
line 488, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File
"/usr/local/lib/python2.7/site-packages/Twisted-13.0.0-py2.7-openbsd-5.3-amd64.egg/twisted/internet/defer.py",
line 575, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File
"/usr/local/lib/python2.7/site-packages/allmydata/immutable/upload.py",
line 604, in _got_response
return self._loop()
File
"/usr/local/lib/python2.7/site-packages/allmydata/immutable/upload.py",
line 455, in _loop
return self._failed("%s (%s)" % (failmsg,
self._get_progress_message()))
File
"/usr/local/lib/python2.7/site-packages/allmydata/immutable/upload.py",
line 617, in _failed
raise UploadUnhappinessError(msg)
allmydata.interfaces.UploadUnhappinessError: shares could be placed or
found on 4 server(s), but they are not spread out evenly enough to
ensure that any 4 of these servers would have enough shares to recover
the file. We were asked to place shares on at least 4 servers such that
any 4 of them have enough shares to recover the file. (placed all 10
shares, want to place shares on at least 4 servers such that any 4 of
them have enough shares to recover the file, sent 4 queries to 4
servers, 4 queries placed some shares, 0 placed none (of which 0 placed
none due to the server being full and 0 placed none due to an error))
On 09/17/13 08:45, Kyle Markley wrote:
> It would be my pleasure. But I won't have time to do it until the
> weekend.
>
> It might be faster, and all-around better, to create a unit test that
> exercises the scenario in my original message. Then my buildbot
> (which has way more free time than I do) can try it for me.
>
> Incidentally, I understand how I created that scenario. The machine
> that had all the shares is always on, and runs deep-check --repair
> crons. My other machines aren't reliably on the grid, so after
> repeated repair operations, the always-on machine tends to get a lot
> of shares. Eventually, it accumulated shares.needed, and then a
> repair happened while it was the only machine on the grid. Because
> repair didn't care about shares.happy, this machine got all
> shares.total shares. Then, because an upload cares about shares.happy
> but wouldn't rebalance, it had to fail.
>
> A grid whose nodes don't have similar uptime is surprisingly fragile.
> Failure of that single always-on machine makes the file totally
> unretrievable, definitely not the desired behavior.
>
>
>
> On 09/16/13 09:57, Zooko O'Whielacronx wrote:
>> Dear Kyle:
>>
>> Could you try Mark Berger's #1382 patch on your home grid and tell us
>> if it fixes the problem?
>>
>> https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1382# immutable peer
>> selection refactoring and enhancements
>>
>> https://github.com/tahoe-lafs/tahoe-lafs/pull/60
>>
>> Regards,
>>
>> Zooko
>> _______________________________________________
>> tahoe-dev mailing list
>> tahoe-dev at tahoe-lafs.org
>> https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
>
>
--
Kyle Markley
More information about the tahoe-dev
mailing list