[tahoe-lafs-trac-stream] [tahoe-lafs] #1670: KeyError in mutable retrieve

tahoe-lafs trac at tahoe-lafs.org
Mon Jun 4 17:14:42 UTC 2012


#1670: KeyError in mutable retrieve
-------------------------+-------------------------------------------------
     Reporter:  vikarti  |      Owner:  davidsarah
         Type:  defect   |     Status:  new
     Priority:           |  Milestone:  1.9.2
  critical               |    Version:  1.9.1
    Component:  code-    |   Keywords:  mutable retrieve error tahoe-backup
  mutable                |  regression
   Resolution:           |
Launchpad Bug:           |
-------------------------+-------------------------------------------------

Comment (by davidsarah):

 Kyle Markley reported a bug with a similar stacktrace:
 > As I went about gathering information to submit a ticket, I discovered
 something interesting.  This probably has something to do with the health
 of the target directories.  I discovered one 2/4 directory that works
 cleanly, and another 2/4 directory that gives an error (after
 *successfully* creating the link).  I haven't created a ticket yet because
 I'm not sure what's supposed to happen here.  :)

 > Working directory:
 {{{
 $ tahoe check --raw kyle:
 {
  "results": {
   "needs-rebalancing": true,
   "count-shares-expected": 4,
   "healthy": false,
   "count-unrecoverable-versions": 0,
   "count-shares-needed": 2,
   "sharemap": {
    "seq52-2x5m-sh3": [
     "xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
    ],
    "seq52-2x5m-sh2": [
     "xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
    ],
    "seq52-2x5m-sh1": [
     "xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
    ],
    "seq52-2x5m-sh0": [
     "xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
    ],
    "seq49-cqp3-sh2": [
     "juwmgssmwnhrhfdcpxxmrz3bghh37esx"
    ],
    "seq49-cqp3-sh3": [
     "vjqcroalrgmft66mgiwfjug667fl6qjd"
    ]
   },
   "count-recoverable-versions": 2,
   "servers-responding": [
    "vjqcroalrgmft66mgiwfjug667fl6qjd",
    "juwmgssmwnhrhfdcpxxmrz3bghh37esx",
    "xxaj2tgmnl7debjdpn4mgv2oks6pjjnx",
    "47cslusczp3uu2kygodi3nlalcruscif",
    "eqtprtidz5emkvzlqt27dylgocdf3f77"
   ],
   "count-good-share-hosts": 1,
   "count-wrong-shares": 2,
   "count-shares-good": 4,
   "count-corrupt-shares": 0,
   "list-corrupt-shares": [],
   "recoverable": true
  },
  "storage-index": "rsi6ge4hmbzhxplyqjzkmd254e",
  "summary": "Unhealthy: multiple versions are recoverable"
 }
 }}}

 > Gives an error:
 {{{
 $ tahoe check --raw share:
 {
  "results": {
   "needs-rebalancing": true,
   "count-shares-expected": 4,
   "healthy": false,
   "count-unrecoverable-versions": 1,
   "count-shares-needed": 2,
   "sharemap": {
    "seq45-7bxx-sh3": [
     "juwmgssmwnhrhfdcpxxmrz3bghh37esx"
    ],
    "seq46-jiym-sh1": [
     "xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
    ],
    "seq46-jiym-sh0": [
     "xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
    ],
    "seq56-aiot-sh0": [
     "vjqcroalrgmft66mgiwfjug667fl6qjd"
    ],
    "seq56-aiot-sh1": [
     "vjqcroalrgmft66mgiwfjug667fl6qjd"
    ],
    "seq56-aiot-sh2": [
     "xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
    ],
    "seq56-aiot-sh3": [
     "xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
    ]
   },
   "count-recoverable-versions": 2,
   "servers-responding": [
    "vjqcroalrgmft66mgiwfjug667fl6qjd",
    "juwmgssmwnhrhfdcpxxmrz3bghh37esx",
    "xxaj2tgmnl7debjdpn4mgv2oks6pjjnx",
    "47cslusczp3uu2kygodi3nlalcruscif",
    "eqtprtidz5emkvzlqt27dylgocdf3f77"
   ],
   "count-good-share-hosts": 2,
   "count-wrong-shares": 3,
   "count-shares-good": 4,
   "count-corrupt-shares": 0,
   "list-corrupt-shares": [],
   "recoverable": true
  },
  "storage-index": "lb2mpyg4fnznnfebfayevcdpki",
  "summary": "Unhealthy: some versions are unrecoverable multiple versions
 are recoverable"
 }
 }}}
 > My error looks something like this.  The exact message has changed; it
 used to say there was an {{{UncoordinatedWriteError}}}, but my
 experimentation seems to have changed things a bit, and right now I only
 see that error mentioned in the incident report.
 {{{
 $ tahoe ln foo: share:foo
 Error: 500 Internal Server Error
 "Traceback (most recent call last):\x0a  File \"/usr/local/lib/python2.7
 /site-packages/twisted/internet/base.py\", line 800, in
 runUntilCurrent\x0a    call.func(*call.args, **call.kw)\x0a  File
 \"/usr/local/lib/python2.7/site-
 packages/foolscap-0.6.3-py2.7.egg/foolscap/eventual.py\", line 26, in
 _turn\x0a    cb(*args, **kwargs)\x0a  File \"/usr/local/lib/python2.7
 /site-packages/twisted/internet/defer.py\", line 368, in callback\x0a
 self._startRunCallbacks(result)\x0a  File \"/usr/local/lib/python2.7/site-
 packages/twisted/internet/defer.py\", line 464, in _startRunCallbacks\x0a
 self._runCallbacks()\x0a--- <exception caught here> ---\x0a  File
 \"/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py\", line
 551, in _runCallbacks\x0a    current.result = callback(current.result,
 *args, **kw)\x0a  File \"/usr/local/lib/python2.7/site-
 packages/allmydata/mutable/filenode.py\", line 855, in <lambda>\x0a
 self._modify_once(modifier, first_time))\x0a  File
 \"/usr/local/lib/python2.7/site-packages/allmydata/mutable/filenode.py\",
 line 881, in _modify_once\x0a    d = self._try_to_download_data()\x0a
 File \"/usr/local/lib/python2.7/site-
 packages/allmydata/mutable/filenode.py\", line 959, in
 _try_to_download_data\x0a    d = self._read(c, fetch_privkey=True)\x0a
 File \"/usr/local/lib/python2.7/site-
 packages/allmydata/mutable/filenode.py\", line 980, in _read\x0a    d =
 r.download(consumer, offset, size)\x0a  File \"/usr/local/lib/python2.7
 /site-packages/allmydata/mutable/retrieve.py\", line 237, in download\x0a
 self._setup_download()\x0a  File \"/usr/local/lib/python2.7/site-
 packages/allmydata/mutable/retrieve.py\", line 277, in _setup_download\x0a
 shares = versionmap[self.verinfo]\x0aexceptions.KeyError: (57,
 '\\x16i\\xdb\\xa8\\xbc\\xd7\\xabrY\\xcdpv\\xa4I\\x82\\xfe\\xa5i\\xed\\x82;\\xca\\xe8\\xcaL\\xf7\\xdav\\xa9\\xf2O\\t',
 '\\x19f_S!&\\xb0\\xa1\\xeb\\x94\\x81F)\\xbb\\x89q', 336, 335, 2, 4,
 '\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x009\\x16i\\xdb\\xa8\\xbc\\xd7\\xabrY\\xcdpv\\xa4I\\x82\\xfe\\xa5i\\xed\\x82;\\xca\\xe8\\xcaL\\xf7\\xdav\\xa9\\xf2O\\t\\x19f_S!&\\xb0\\xa1\\xeb\\x94\\x81F)\\xbb\\x89q\\x02\\x04\\x00\\x00\\x00\\x00\\x00\\x00\\x01P\\x00\\x00\\x00\\x00\\x00\\x00\\x01O',
 (('enc_privkey', 923), ('EOF', 2138), ('share_data', 755), ('signature',
 399), ('block_hash_tree', 723), ('share_hash_chain', 655)))\x0a"
 }}}

 > From the incident report:
 {{{
 local#11372110 21:30:59.251: current goal: before update: , sh0 to
 [vjqcroal], sh0 to [xxaj2tgm], sh1 to [vjqcroal], sh1 to [xxaj2tgm], sh2
 to [xxaj2tgm], sh3 to [juwmgssm], sh3 to [xxaj2tgm]
 local#11372111 21:30:59.251: we are planning to push new seqnum=#58
 local#11372112 21:30:59.252: Starting push
 local#11372113 21:30:59.252: Pushing segment 1 of 1
 local#11372114 21:30:59.275: storage: slot_writev
 lb2mpyg4fnznnfebfayevcdpki
 local#11372115 21:30:59.277: storage: slot_writev
 lb2mpyg4fnznnfebfayevcdpki
 local#11372116 21:30:59.281: _got_write_answer from xxaj2tgm, share 2
 local#11372117 21:30:59.281: found the following surprise shares: set([0,
 1])
 local#11372118 21:30:59.281: they had shares [0, 1] that we didn't know
 about [INCIDENT-TRIGGER]
 local#11372119 21:31:00.433: wrote successfully: adding new share to
 servermap
 local#11372120 21:31:00.435: _got_write_answer from xxaj2tgm, share 3
 local#11372121 21:31:00.435: found the following surprise shares: set([0,
 1])
 local#11372122 21:31:00.435: they had shares [0, 1] that we didn't know
 about
 local#11372123 21:31:00.435: wrote successfully: adding new share to
 servermap
 local#11372124 21:31:00.436: _got_write_answer from vjqcroal, share 0
 local#11372125 21:31:00.436: found the following surprise shares: set([])
 local#11372126 21:31:00.436: wrote successfully: adding new share to
 servermap
 local#11372127 21:31:00.437: _got_write_answer from vjqcroal, share 1
 local#11372128 21:31:00.437: found the following surprise shares: set([])
 local#11372129 21:31:00.437: wrote successfully: adding new share to
 servermap
 local#11372130 21:31:00.437: Publish failed with UncoordinatedWriteError

 What's up with these surprise shares?
 }}}

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1670#comment:13>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list