[tahoe-dev] Manual rebalancing in 1.10.0?
Kyle Markley
kyle at arbyte.us
Sun Sep 1 06:47:37 UTC 2013
I have run into an interesting scenario on my home grid.
This grid has 6 machines and all are configured for shares.total=10,
shares.happy=4, shares.needed=4. (Five machines are running 1.10.0, one
is running 1.9.2, but I don't think this matters.)
Somehow, over time, I've managed to get a file seriously unbalanced.
One node has all ten shares, another has just two, and that's it!
{
"results": {
"needs-rebalancing": true,
"count-unrecoverable-versions": 0,
"count-good-share-hosts": 2,
"count-shares-good": 10,
"count-corrupt-shares": 0,
"list-corrupt-shares": [],
"count-shares-expected": 10,
"healthy": true,
"count-shares-needed": 4,
"sharemap": {
"0": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"1": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"2": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"3": [
"v0-7ags2kynskk5rrmbyk6yzjzmceswxh7x5lekghwsfbwdpfeaztxa",
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"4": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"5": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"6": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"7": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"8": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
],
"9": [
"v0-7ags2kynskk5rrmbyk6yzjzmceswxh7x5lekghwsfbwdpfeaztxa",
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya"
]
},
"count-recoverable-versions": 1,
"count-wrong-shares": 0,
"servers-responding": [
"v0-ylkbcys5oqliy26d6s6kuwk5nmw5ktlcxmx254dfprm4rwrojhya",
"47cslusczp3uu2kygodi3nlalcruscif",
"v0-7zw5vd263ktna2nuxouv5byodxrsxo4pfdokz3qixdgft7bkvlmq",
"v0-jqs2izy4yo2wusmsso2mzkfqpqrmmbhegtxcyup7heisfrf4octa",
"v0-rbwrud2e6alixe4xwlaynv7jbzvhn2wxbs4jniqlgu6wd5sk724q",
"v0-7ags2kynskk5rrmbyk6yzjzmceswxh7x5lekghwsfbwdpfeaztxa"
],
"recoverable": true
},
"storage-index": "x4ahcfbulwcaltkohuz55ttwke",
"summary": "Healthy"
}
I'm not able to re-upload this file:
allmydata.interfaces.UploadUnhappinessError: shares could be placed or
found on 5 server(s), but they are not spread out evenly enough to
ensure that any 4 of these servers would have enough shares to recover
the file. We were asked to place shares on at least 4 servers such that
any 4 of them have enough shares to recover the file. (placed all 10
shares, want to place shares on at least 4 servers such that any 4 of
them have enough shares to recover the file, sent 6 queries to 6
servers, 5 queries placed some shares, 1 placed none (of which 1 placed
none due to the server being full and 0 placed none due to an error))
Is there any command line mechanism in 1.10.0 for me to fix this, or
must I go "outside" the system and delete files from .tahoe/storage/shares/?
Critically, this situation causes my "tahoe backup" command to fail.
The root directory of the backup is the last thing to be created, so
although this failure occurs ~80% of the way through my backup, it
results in 0% of the backup being actually available. My backups are
getting stale.
It would be best if the backup would rebalance that file for me (which I
know is a long-requested feature), but even without that, wouldn't it be
better if the backup continued to run instead of stopping? The backup
could complete successfully with all files being recoverable, even if
not well-balanced, and that would still have value.
And a minor gripe: when running deep-check, a failed repair results in a
message that there was a failed repair. But it doesn't identify which
file couldn't be repaired! To figure that out, I have to rerun the
command in verbose mode, but then there's so much output that I have to
redirect it to a file and then search for the failures. The original
non-verbose-mode error message should tell me specifically what failed.
--
Kyle Markley
More information about the tahoe-dev
mailing list