<div dir="ltr">Hi Kyle, sorry for not getting back to you. I just started school so I've been pretty busy.<div><br></div><div>Tahoe considers a new file upload to be a special instance of file repair, so my branch should address your issue. Currently file repair is necessary in order to rebalance a file, but Brian <a href="https://tahoe-lafs.org/trac/tahoe-lafs/ticket/543">has written an extensive ticket</a> on rebalancing shares. However, I don't think anyone is actively working on it.</div>
<div><br></div><div>I think the first error you received is caused by two things:</div><div><br></div><div>1. While the original scope of 1382 is pretty large, my patch only implements a new upload algorithm which will distributes shares effectively. The checker doesn't upload anything unless an item needs to be repaired. So you need to supply the "-repair" flag with the cli.</div>
<div><br></div><div>2. The checker does not consider the file to be unhealthy. Since the file is "healthy", tahoe doesn't attempt to repair the file and rebalancing doesn't occur. There is a <a href="https://tahoe-lafs.org/trac/tahoe-lafs/ticket/614">separate ticket for this issue</a>, but it hasn't been committed to trunk and it's not included in the 1382 branch.</div>
<div><br></div><div>To fix your problem you can manually repair the file from the webui using my branch. </div><div><br></div><div>As for the UploadUnhappinessError, I'm not sure why that is happening. My initial guess would be that you weren't connected to four servers at the time you ran the command. Are you able to upload other files with my branch? Are there any details about your grid that you think might have caused the issue?</div>
<div><br></div><div>- Mark</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sat, Sep 28, 2013 at 2:17 PM, Kyle Markley <span dir="ltr"><<a href="mailto:kyle@arbyte.us" target="_blank">kyle@arbyte.us</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Reading the text of 1382, it isn't clear to me whether it's expected to address the error from "tahoe put" at all. (It only mentions check, verify, and repair.) And although it mentions repair, in my scenario all the shares are present on the grid, so the file doesn't need "repair" at all... it *only* needs rebalancing.<br>
<br>
Should I file my scenario in a new ticket? Or is it actually intended to be covered by 1382?<br>
Did I test the new code from 1382 correctly?<div class="HOEnZb"><div class="h5"><br>
<br>
<br>
On 09/22/13 09:46, Kyle Markley wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Mark Berger, et al,<br>
<br>
I (believe I have) tried my scenario with your code, and it doesn't fix the behavior I have been seeing.<br>
<br>
Given a file on the grid for which all shares exist, but which needs rebalancing, "tahoe put" for that same file will fail. (And "tahoe check --repair" does not attempt to rebalance.)<br>
<br>
This is what I did. I'm a git novice, so maybe I didn't get the right code:<br>
$ git clone <a href="https://github.com/markberger/tahoe-lafs.git" target="_blank">https://github.com/markberger/<u></u>tahoe-lafs.git</a><br>
$ cd tahoe-lafs/<br>
$ git checkout 1382-rewrite<br>
Branch 1382-rewrite set up to track remote branch 1382-rewrite from origin.<br>
Switched to a new branch '1382-rewrite'<br>
$ python setup.py build<br>
<snip><br>
$ bin/tahoe --version<br>
allmydata-tahoe: 1.10b1.post68 [1382-rewrite: 7b95f937089d59b595dfe5e85d2d81<u></u>ec36d5cf9d]<br>
foolscap: 0.6.4<br>
pycryptopp: 0.6.0.<u></u>120656932814151052564863480392<u></u>8199668821045408958<br>
zfec: 1.4.24<br>
Twisted: 13.0.0<br>
Nevow: 0.10.0<br>
zope.interface: unknown<br>
python: 2.7.3<br>
platform: OpenBSD-5.3-amd64-64bit<br>
pyOpenSSL: 0.13<br>
simplejson: 3.3.0<br>
pycrypto: 2.6<br>
pyasn1: 0.1.7<br>
mock: 1.0.1<br>
setuptools: 0.6c16dev4<br>
<br>
<br>
Original output from tahoe check --raw:<br>
{<br>
"results": {<br>
"needs-rebalancing": true,<br>
"count-unrecoverable-versions"<u></u>: 0,<br>
"count-good-share-hosts": 2,<br>
"count-shares-good": 10,<br>
"count-corrupt-shares": 0,<br>
"list-corrupt-shares": [],<br>
"count-shares-expected": 10,<br>
"healthy": true,<br>
"count-shares-needed": 4,<br>
"sharemap": {<br>
"0": [<br>
"v0-<u></u>ylkbcys5oqliy26d6s6kuwk5nmw5kt<u></u>lcxmx254dfprm4rwrojhya"<br>
],<br>
"1": [<br>
"v0-<u></u>ylkbcys5oqliy26d6s6kuwk5nmw5kt<u></u>lcxmx254dfprm4rwrojhya"<br>
],<br>
"2": [<br>
"v0-<u></u>7ags2kynskk5rrmbyk6yzjzmceswxh<u></u>7x5lekghwsfbwdpfeaztxa"<br>
],<br>
"3": [<br>
"v0-<u></u>ylkbcys5oqliy26d6s6kuwk5nmw5kt<u></u>lcxmx254dfprm4rwrojhya"<br>
],<br>
"4": [<br>
"v0-<u></u>ylkbcys5oqliy26d6s6kuwk5nmw5kt<u></u>lcxmx254dfprm4rwrojhya"<br>
],<br>
"5": [<br>
"v0-<u></u>ylkbcys5oqliy26d6s6kuwk5nmw5kt<u></u>lcxmx254dfprm4rwrojhya"<br>
],<br>
"6": [<br>
"v0-<u></u>ylkbcys5oqliy26d6s6kuwk5nmw5kt<u></u>lcxmx254dfprm4rwrojhya"<br>
],<br>
"7": [<br>
"v0-<u></u>ylkbcys5oqliy26d6s6kuwk5nmw5kt<u></u>lcxmx254dfprm4rwrojhya"<br>
],<br>
"8": [<br>
"v0-<u></u>7ags2kynskk5rrmbyk6yzjzmceswxh<u></u>7x5lekghwsfbwdpfeaztxa"<br>
],<br>
"9": [<br>
"v0-<u></u>ylkbcys5oqliy26d6s6kuwk5nmw5kt<u></u>lcxmx254dfprm4rwrojhya"<br>
]<br>
},<br>
"count-recoverable-versions": 1,<br>
"count-wrong-shares": 0,<br>
"servers-responding": [<br>
"v0-<u></u>ylkbcys5oqliy26d6s6kuwk5nmw5kt<u></u>lcxmx254dfprm4rwrojhya",<br>
"v0-<u></u>7ags2kynskk5rrmbyk6yzjzmceswxh<u></u>7x5lekghwsfbwdpfeaztxa",<br>
"v0-<u></u>jqs2izy4yo2wusmsso2mzkfqpqrmmb<u></u>hegtxcyup7heisfrf4octa",<br>
"v0-<u></u>rbwrud2e6alixe4xwlaynv7jbzvhn2<u></u>wxbs4jniqlgu6wd5sk724q"<br>
],<br>
"recoverable": true<br>
},<br>
"storage-index": "rfomclj5ogk434v2gchspipv3i",<br>
"summary": "Healthy"<br>
}<br>
<br>
<br>
Then I try to re-upload the unbalanced file:<br>
$ bin/tahoe put /tmp/temp_file<br>
<br>
Error: 500 Internal Server Error<br>
Traceback (most recent call last):<br>
File "/usr/local/lib/python2.7/<u></u>site-packages/foolscap-0.6.4-<u></u>py2.7.egg/foolscap/call.py", line 677, in _done<br>
self.request.complete(res)<br>
File "/usr/local/lib/python2.7/<u></u>site-packages/foolscap-0.6.4-<u></u>py2.7.egg/foolscap/call.py", line 60, in complete<br>
self.deferred.callback(res)<br>
File "/usr/local/lib/python2.7/<u></u>site-packages/Twisted-13.0.0-<u></u>py2.7-openbsd-5.3-amd64.egg/<u></u>twisted/internet/defer.py", line 380, in callback<br>
self._startRunCallbacks(<u></u>result)<br>
File "/usr/local/lib/python2.7/<u></u>site-packages/Twisted-13.0.0-<u></u>py2.7-openbsd-5.3-amd64.egg/<u></u>twisted/internet/defer.py", line 488, in _startRunCallbacks<br>
self._runCallbacks()<br>
--- <exception caught here> ---<br>
File "/usr/local/lib/python2.7/<u></u>site-packages/Twisted-13.0.0-<u></u>py2.7-openbsd-5.3-amd64.egg/<u></u>twisted/internet/defer.py", line 575, in _runCallbacks<br>
current.result = callback(current.result, *args, **kw)<br>
File "/usr/local/lib/python2.7/<u></u>site-packages/allmydata/<u></u>immutable/upload.py", line 604, in _got_response<br>
return self._loop()<br>
File "/usr/local/lib/python2.7/<u></u>site-packages/allmydata/<u></u>immutable/upload.py", line 455, in _loop<br>
return self._failed("%s (%s)" % (failmsg, self._get_progress_message()))<br>
File "/usr/local/lib/python2.7/<u></u>site-packages/allmydata/<u></u>immutable/upload.py", line 617, in _failed<br>
raise UploadUnhappinessError(msg)<br>
allmydata.interfaces.<u></u>UploadUnhappinessError: shares could be placed or found on 4 server(s), but they are not spread out evenly enough to ensure that any 4 of these servers would have enough shares to recover the file. We were asked to place shares on at least 4 servers such that any 4 of them have enough shares to recover the file. (placed all 10 shares, want to place shares on at least 4 servers such that any 4 of them have enough shares to recover the file, sent 4 queries to 4 servers, 4 queries placed some shares, 0 placed none (of which 0 placed none due to the server being full and 0 placed none due to an error))<br>
<br>
<br>
<br>
<br>
<br>
On 09/17/13 08:45, Kyle Markley wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
It would be my pleasure. But I won't have time to do it until the weekend.<br>
<br>
It might be faster, and all-around better, to create a unit test that exercises the scenario in my original message. Then my buildbot (which has way more free time than I do) can try it for me.<br>
<br>
Incidentally, I understand how I created that scenario. The machine that had all the shares is always on, and runs deep-check --repair crons. My other machines aren't reliably on the grid, so after repeated repair operations, the always-on machine tends to get a lot of shares. Eventually, it accumulated shares.needed, and then a repair happened while it was the only machine on the grid. Because repair didn't care about shares.happy, this machine got all shares.total shares. Then, because an upload cares about shares.happy but wouldn't rebalance, it had to fail.<br>
<br>
A grid whose nodes don't have similar uptime is surprisingly fragile. Failure of that single always-on machine makes the file totally unretrievable, definitely not the desired behavior.<br>
<br>
<br>
<br>
On 09/16/13 09:57, Zooko O'Whielacronx wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Dear Kyle:<br>
<br>
Could you try Mark Berger's #1382 patch on your home grid and tell us<br>
if it fixes the problem?<br>
<br>
<a href="https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1382#" target="_blank">https://tahoe-lafs.org/trac/<u></u>tahoe-lafs/ticket/1382#</a> immutable peer<br>
selection refactoring and enhancements<br>
<br>
<a href="https://github.com/tahoe-lafs/tahoe-lafs/pull/60" target="_blank">https://github.com/tahoe-lafs/<u></u>tahoe-lafs/pull/60</a><br>
<br>
Regards,<br>
<br>
Zooko<br>
______________________________<u></u>_________________<br>
tahoe-dev mailing list<br>
<a href="mailto:tahoe-dev@tahoe-lafs.org" target="_blank">tahoe-dev@tahoe-lafs.org</a><br>
<a href="https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev" target="_blank">https://tahoe-lafs.org/cgi-<u></u>bin/mailman/listinfo/tahoe-dev</a><br>
</blockquote>
<br>
<br>
</blockquote>
<br>
<br>
</blockquote>
<br>
<br>
-- <br>
Kyle Markley<br>
<br>
______________________________<u></u>_________________<br>
tahoe-dev mailing list<br>
<a href="mailto:tahoe-dev@tahoe-lafs.org" target="_blank">tahoe-dev@tahoe-lafs.org</a><br>
<a href="https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev" target="_blank">https://tahoe-lafs.org/cgi-<u></u>bin/mailman/listinfo/tahoe-dev</a><br>
</div></div></blockquote></div><br></div>