[tahoe-dev] Question about changing k/N
Kyle Markley
kyle at arbyte.us
Mon Jun 4 04:48:42 UTC 2012
Zooko,
As I went about gathering information to submit a ticket, I discovered
something interesting. This probably has something to do with the
health of the target directories. I discovered one 2/4 directory that
works cleanly, and another 2/4 directory that gives an error (after
*successfully* creating the link). I haven't created a ticket yet
because I'm not sure what's supposed to happen here. :)
Working directory:
$ tahoe check --raw kyle:
{
"results": {
"needs-rebalancing": true,
"count-shares-expected": 4,
"healthy": false,
"count-unrecoverable-versions": 0,
"count-shares-needed": 2,
"sharemap": {
"seq52-2x5m-sh3": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"seq52-2x5m-sh2": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"seq52-2x5m-sh1": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"seq52-2x5m-sh0": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"seq49-cqp3-sh2": [
"juwmgssmwnhrhfdcpxxmrz3bghh37esx"
],
"seq49-cqp3-sh3": [
"vjqcroalrgmft66mgiwfjug667fl6qjd"
]
},
"count-recoverable-versions": 2,
"servers-responding": [
"vjqcroalrgmft66mgiwfjug667fl6qjd",
"juwmgssmwnhrhfdcpxxmrz3bghh37esx",
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx",
"47cslusczp3uu2kygodi3nlalcruscif",
"eqtprtidz5emkvzlqt27dylgocdf3f77"
],
"count-good-share-hosts": 1,
"count-wrong-shares": 2,
"count-shares-good": 4,
"count-corrupt-shares": 0,
"list-corrupt-shares": [],
"recoverable": true
},
"storage-index": "rsi6ge4hmbzhxplyqjzkmd254e",
"summary": "Unhealthy: multiple versions are recoverable"
}
Gives an error:
$ tahoe check --raw share:
{
"results": {
"needs-rebalancing": true,
"count-shares-expected": 4,
"healthy": false,
"count-unrecoverable-versions": 1,
"count-shares-needed": 2,
"sharemap": {
"seq45-7bxx-sh3": [
"juwmgssmwnhrhfdcpxxmrz3bghh37esx"
],
"seq46-jiym-sh1": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"seq46-jiym-sh0": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"seq56-aiot-sh0": [
"vjqcroalrgmft66mgiwfjug667fl6qjd"
],
"seq56-aiot-sh1": [
"vjqcroalrgmft66mgiwfjug667fl6qjd"
],
"seq56-aiot-sh2": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"seq56-aiot-sh3": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
]
},
"count-recoverable-versions": 2,
"servers-responding": [
"vjqcroalrgmft66mgiwfjug667fl6qjd",
"juwmgssmwnhrhfdcpxxmrz3bghh37esx",
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx",
"47cslusczp3uu2kygodi3nlalcruscif",
"eqtprtidz5emkvzlqt27dylgocdf3f77"
],
"count-good-share-hosts": 2,
"count-wrong-shares": 3,
"count-shares-good": 4,
"count-corrupt-shares": 0,
"list-corrupt-shares": [],
"recoverable": true
},
"storage-index": "lb2mpyg4fnznnfebfayevcdpki",
"summary": "Unhealthy: some versions are unrecoverable multiple
versions are recoverable"
}
My error looks something like this. The exact message has changed; it
used to say there was an UncoordinatedWriteError, but my experimentation
seems to have changed things a bit, and right now I only see that error
mentioned in the incident report.
$ tahoe ln foo: share:foo
Error: 500 Internal Server Error
"Traceback (most recent call last):\x0a File
\"/usr/local/lib/python2.7/site-packages/twisted/internet/base.py\",
line 800, in runUntilCurrent\x0a call.func(*call.args,
**call.kw)\x0a File
\"/usr/local/lib/python2.7/site-packages/foolscap-0.6.3-py2.7.egg/foolscap/eventual.py\",
line 26, in _turn\x0a cb(*args, **kwargs)\x0a File
\"/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py\",
line 368, in callback\x0a self._startRunCallbacks(result)\x0a File
\"/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py\",
line 464, in _startRunCallbacks\x0a self._runCallbacks()\x0a---
<exception caught here> ---\x0a File
\"/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py\",
line 551, in _runCallbacks\x0a current.result =
callback(current.result, *args, **kw)\x0a File
\"/usr/local/lib/python2.7/site-packages/allmydata/mutable/filenode.py\", line
855, in <lambda>\x0a self._modify_once(modifier, first_time))\x0a
File
\"/usr/local/lib/python2.7/site-packages/allmydata/mutable/filenode.py\", line
881, in _modify_once\x0a d = self._try_to_download_data()\x0a File
\"/usr/local/lib/python2.7/site-packages/allmydata/mutable/filenode.py\", line
959, in _try_to_download_data\x0a d = self._read(c,
fetch_privkey=True)\x0a File
\"/usr/local/lib/python2.7/site-packages/allmydata/mutable/filenode.py\", line
980, in _read\x0a d = r.download(consumer, offset, size)\x0a File
\"/usr/local/lib/python2.7/site-packages/allmydata/mutable/retrieve.py\", line
237, in download\x0a self._setup_download()\x0a File
\"/usr/local/lib/python2.7/site-packages/allmydata/mutable/retrieve.py\", line
277, in _setup_download\x0a shares =
versionmap[self.verinfo]\x0aexceptions.KeyError: (57,
'\\x16i\\xdb\\xa8\\xbc\\xd7\\xabrY\\xcdpv\\xa4I\\x82\\xfe\\xa5i\\xed\\x82;\\xca\\xe8\\xcaL\\xf7\\xdav\\xa9\\xf2O\\t',
'\\x19f_S!&\\xb0\\xa1\\xeb\\x94\\x81F)\\xbb\\x89q', 336, 335, 2, 4,
'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x009\\x16i\\xdb\\xa8\\xbc\\xd7\\xabrY\\xcdpv\\xa4I\\x82\\xfe\\xa5i\\xed\\x82;\\xca\\xe8\\xcaL\\xf7\\xdav\\xa9\\xf2O\\t\\x19f_S!&\\xb0\\xa1\\xeb\\x94\\x81F)\\xbb\\x89q\\x02\\x04\\x00\\x00\\x00\\x00\\x00\\x00\\x01P\\x00\\x00\\x00\\x00\\x00\\x00\\x01O',
(('enc_privkey', 923), ('EOF', 2138), ('share_data', 755), ('signature',
399), ('block_hash_tree', 723), ('share_hash_chain', 655)))\x0a"
From the incident report:
local#11372110 21:30:59.251: current goal: before update: , sh0 to
[vjqcroal], sh0 to [xxaj2tgm], sh1 to [vjqcroal], sh1 to [xxaj2tgm], sh2
to [xxaj2tgm], sh3 to [juwmgssm], sh3 to [xxaj2tgm]
local#11372111 21:30:59.251: we are planning to push new seqnum=#58
local#11372112 21:30:59.252: Starting push
local#11372113 21:30:59.252: Pushing segment 1 of 1
local#11372114 21:30:59.275: storage: slot_writev lb2mpyg4fnznnfebfayevcdpki
local#11372115 21:30:59.277: storage: slot_writev lb2mpyg4fnznnfebfayevcdpki
local#11372116 21:30:59.281: _got_write_answer from xxaj2tgm, share 2
local#11372117 21:30:59.281: found the following surprise shares:
set([0, 1])
local#11372118 21:30:59.281: they had shares [0, 1] that we didn't know
about [INCIDENT-TRIGGER]
local#11372119 21:31:00.433: wrote successfully: adding new share to
servermap
local#11372120 21:31:00.435: _got_write_answer from xxaj2tgm, share 3
local#11372121 21:31:00.435: found the following surprise shares:
set([0, 1])
local#11372122 21:31:00.435: they had shares [0, 1] that we didn't know
about
local#11372123 21:31:00.435: wrote successfully: adding new share to
servermap
local#11372124 21:31:00.436: _got_write_answer from vjqcroal, share 0
local#11372125 21:31:00.436: found the following surprise shares: set([])
local#11372126 21:31:00.436: wrote successfully: adding new share to
servermap
local#11372127 21:31:00.437: _got_write_answer from vjqcroal, share 1
local#11372128 21:31:00.437: found the following surprise shares: set([])
local#11372129 21:31:00.437: wrote successfully: adding new share to
servermap
local#11372130 21:31:00.437: Publish failed with UncoordinatedWriteError
What's up with these surprise shares?
On 06/03/12 21:04, Zooko Wilcox-O'Hearn wrote:
> Dear Kyle:
>
> Thanks for the information!
>
> There are no known bugs which would cause changing K or N values to
> lead to UncoordinatedWriteError or other errors. Please report more
> details about the errors you've encountered and let's try to reproduce
> them and narrow down their cause.
>
> https://tahoe-lafs.org/trac/tahoe-lafs/wiki/HowToReportABug
>
> Regards,
>
> Zooko
> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev at tahoe-lafs.org
> https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
--
Kyle Markley
More information about the tahoe-dev
mailing list