[tahoe-lafs-trac-stream] [tahoe-lafs] #1670: KeyError in mutable retrieve
tahoe-lafs
trac at tahoe-lafs.org
Mon Jun 4 17:14:42 UTC 2012
#1670: KeyError in mutable retrieve
-------------------------+-------------------------------------------------
Reporter: vikarti | Owner: davidsarah
Type: defect | Status: new
Priority: | Milestone: 1.9.2
critical | Version: 1.9.1
Component: code- | Keywords: mutable retrieve error tahoe-backup
mutable | regression
Resolution: |
Launchpad Bug: |
-------------------------+-------------------------------------------------
Comment (by davidsarah):
Kyle Markley reported a bug with a similar stacktrace:
> As I went about gathering information to submit a ticket, I discovered
something interesting. This probably has something to do with the health
of the target directories. I discovered one 2/4 directory that works
cleanly, and another 2/4 directory that gives an error (after
*successfully* creating the link). I haven't created a ticket yet because
I'm not sure what's supposed to happen here. :)
> Working directory:
{{{
$ tahoe check --raw kyle:
{
"results": {
"needs-rebalancing": true,
"count-shares-expected": 4,
"healthy": false,
"count-unrecoverable-versions": 0,
"count-shares-needed": 2,
"sharemap": {
"seq52-2x5m-sh3": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"seq52-2x5m-sh2": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"seq52-2x5m-sh1": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"seq52-2x5m-sh0": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"seq49-cqp3-sh2": [
"juwmgssmwnhrhfdcpxxmrz3bghh37esx"
],
"seq49-cqp3-sh3": [
"vjqcroalrgmft66mgiwfjug667fl6qjd"
]
},
"count-recoverable-versions": 2,
"servers-responding": [
"vjqcroalrgmft66mgiwfjug667fl6qjd",
"juwmgssmwnhrhfdcpxxmrz3bghh37esx",
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx",
"47cslusczp3uu2kygodi3nlalcruscif",
"eqtprtidz5emkvzlqt27dylgocdf3f77"
],
"count-good-share-hosts": 1,
"count-wrong-shares": 2,
"count-shares-good": 4,
"count-corrupt-shares": 0,
"list-corrupt-shares": [],
"recoverable": true
},
"storage-index": "rsi6ge4hmbzhxplyqjzkmd254e",
"summary": "Unhealthy: multiple versions are recoverable"
}
}}}
> Gives an error:
{{{
$ tahoe check --raw share:
{
"results": {
"needs-rebalancing": true,
"count-shares-expected": 4,
"healthy": false,
"count-unrecoverable-versions": 1,
"count-shares-needed": 2,
"sharemap": {
"seq45-7bxx-sh3": [
"juwmgssmwnhrhfdcpxxmrz3bghh37esx"
],
"seq46-jiym-sh1": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"seq46-jiym-sh0": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"seq56-aiot-sh0": [
"vjqcroalrgmft66mgiwfjug667fl6qjd"
],
"seq56-aiot-sh1": [
"vjqcroalrgmft66mgiwfjug667fl6qjd"
],
"seq56-aiot-sh2": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
],
"seq56-aiot-sh3": [
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
]
},
"count-recoverable-versions": 2,
"servers-responding": [
"vjqcroalrgmft66mgiwfjug667fl6qjd",
"juwmgssmwnhrhfdcpxxmrz3bghh37esx",
"xxaj2tgmnl7debjdpn4mgv2oks6pjjnx",
"47cslusczp3uu2kygodi3nlalcruscif",
"eqtprtidz5emkvzlqt27dylgocdf3f77"
],
"count-good-share-hosts": 2,
"count-wrong-shares": 3,
"count-shares-good": 4,
"count-corrupt-shares": 0,
"list-corrupt-shares": [],
"recoverable": true
},
"storage-index": "lb2mpyg4fnznnfebfayevcdpki",
"summary": "Unhealthy: some versions are unrecoverable multiple versions
are recoverable"
}
}}}
> My error looks something like this. The exact message has changed; it
used to say there was an {{{UncoordinatedWriteError}}}, but my
experimentation seems to have changed things a bit, and right now I only
see that error mentioned in the incident report.
{{{
$ tahoe ln foo: share:foo
Error: 500 Internal Server Error
"Traceback (most recent call last):\x0a File \"/usr/local/lib/python2.7
/site-packages/twisted/internet/base.py\", line 800, in
runUntilCurrent\x0a call.func(*call.args, **call.kw)\x0a File
\"/usr/local/lib/python2.7/site-
packages/foolscap-0.6.3-py2.7.egg/foolscap/eventual.py\", line 26, in
_turn\x0a cb(*args, **kwargs)\x0a File \"/usr/local/lib/python2.7
/site-packages/twisted/internet/defer.py\", line 368, in callback\x0a
self._startRunCallbacks(result)\x0a File \"/usr/local/lib/python2.7/site-
packages/twisted/internet/defer.py\", line 464, in _startRunCallbacks\x0a
self._runCallbacks()\x0a--- <exception caught here> ---\x0a File
\"/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py\", line
551, in _runCallbacks\x0a current.result = callback(current.result,
*args, **kw)\x0a File \"/usr/local/lib/python2.7/site-
packages/allmydata/mutable/filenode.py\", line 855, in <lambda>\x0a
self._modify_once(modifier, first_time))\x0a File
\"/usr/local/lib/python2.7/site-packages/allmydata/mutable/filenode.py\",
line 881, in _modify_once\x0a d = self._try_to_download_data()\x0a
File \"/usr/local/lib/python2.7/site-
packages/allmydata/mutable/filenode.py\", line 959, in
_try_to_download_data\x0a d = self._read(c, fetch_privkey=True)\x0a
File \"/usr/local/lib/python2.7/site-
packages/allmydata/mutable/filenode.py\", line 980, in _read\x0a d =
r.download(consumer, offset, size)\x0a File \"/usr/local/lib/python2.7
/site-packages/allmydata/mutable/retrieve.py\", line 237, in download\x0a
self._setup_download()\x0a File \"/usr/local/lib/python2.7/site-
packages/allmydata/mutable/retrieve.py\", line 277, in _setup_download\x0a
shares = versionmap[self.verinfo]\x0aexceptions.KeyError: (57,
'\\x16i\\xdb\\xa8\\xbc\\xd7\\xabrY\\xcdpv\\xa4I\\x82\\xfe\\xa5i\\xed\\x82;\\xca\\xe8\\xcaL\\xf7\\xdav\\xa9\\xf2O\\t',
'\\x19f_S!&\\xb0\\xa1\\xeb\\x94\\x81F)\\xbb\\x89q', 336, 335, 2, 4,
'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x009\\x16i\\xdb\\xa8\\xbc\\xd7\\xabrY\\xcdpv\\xa4I\\x82\\xfe\\xa5i\\xed\\x82;\\xca\\xe8\\xcaL\\xf7\\xdav\\xa9\\xf2O\\t\\x19f_S!&\\xb0\\xa1\\xeb\\x94\\x81F)\\xbb\\x89q\\x02\\x04\\x00\\x00\\x00\\x00\\x00\\x00\\x01P\\x00\\x00\\x00\\x00\\x00\\x00\\x01O',
(('enc_privkey', 923), ('EOF', 2138), ('share_data', 755), ('signature',
399), ('block_hash_tree', 723), ('share_hash_chain', 655)))\x0a"
}}}
> From the incident report:
{{{
local#11372110 21:30:59.251: current goal: before update: , sh0 to
[vjqcroal], sh0 to [xxaj2tgm], sh1 to [vjqcroal], sh1 to [xxaj2tgm], sh2
to [xxaj2tgm], sh3 to [juwmgssm], sh3 to [xxaj2tgm]
local#11372111 21:30:59.251: we are planning to push new seqnum=#58
local#11372112 21:30:59.252: Starting push
local#11372113 21:30:59.252: Pushing segment 1 of 1
local#11372114 21:30:59.275: storage: slot_writev
lb2mpyg4fnznnfebfayevcdpki
local#11372115 21:30:59.277: storage: slot_writev
lb2mpyg4fnznnfebfayevcdpki
local#11372116 21:30:59.281: _got_write_answer from xxaj2tgm, share 2
local#11372117 21:30:59.281: found the following surprise shares: set([0,
1])
local#11372118 21:30:59.281: they had shares [0, 1] that we didn't know
about [INCIDENT-TRIGGER]
local#11372119 21:31:00.433: wrote successfully: adding new share to
servermap
local#11372120 21:31:00.435: _got_write_answer from xxaj2tgm, share 3
local#11372121 21:31:00.435: found the following surprise shares: set([0,
1])
local#11372122 21:31:00.435: they had shares [0, 1] that we didn't know
about
local#11372123 21:31:00.435: wrote successfully: adding new share to
servermap
local#11372124 21:31:00.436: _got_write_answer from vjqcroal, share 0
local#11372125 21:31:00.436: found the following surprise shares: set([])
local#11372126 21:31:00.436: wrote successfully: adding new share to
servermap
local#11372127 21:31:00.437: _got_write_answer from vjqcroal, share 1
local#11372128 21:31:00.437: found the following surprise shares: set([])
local#11372129 21:31:00.437: wrote successfully: adding new share to
servermap
local#11372130 21:31:00.437: Publish failed with UncoordinatedWriteError
What's up with these surprise shares?
}}}
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1670#comment:13>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list