#1669 closed defect (fixed)
ValueError (need more than 8 values to unpack) during mutable MDMF repair
Reported by: | vikarti | Owned by: | nobody |
---|---|---|---|
Priority: | critical | Milestone: | 1.9.2 |
Component: | code-mutable | Version: | 1.9.1 |
Keywords: | mutable mdmf repair error regression preservation review-needed | Cc: | |
Launchpad Bug: |
Description
Gateway is running Mac OS X, mutable.format=mdmf
Sometimes running deep check from WUI results in
{no <type 'exceptions.ValueError'>: need more than 8 values to unpack /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py, line 542 in _runCallbacks 540 current._runningCallbacks = True 541 try: 542 current.result = callback(current.result, *args, **kw) 543 finally: Locals callback <function _repair_finished at 0x10e9b8050> args () current <Deferred at 0x10dfba6c8 current result: None> kw {} /Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/checker.py, line 269 in _repair_finished 267 r = CheckResults(from_string(self._node.get_uri()), self._storage_index) 268 self.cr_results.post_repair_results = r 269 self._fill_checker_results(repair_results.servermap, r) 270 self.cr_results.repair_results = repair_results # TODO? Locals repair_results <allmydata.mutable.repairer.RepairResults instance at 0x10df61488> r <allmydata.check_results.CheckResults instance at 0x10e8faef0> self <allmydata.mutable.checker.MutableCheckAndRepairer instance at 0x10df75908> Globals CheckResults <class allmydata.check_results.CheckResults at 0x10d0b6d50> from_string <function from_string at 0x10d024140> /Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/checker.py, line 130 in _fill_checker_results 128 summary = [] 129 vmap = smap.make_versionmap() 130 recoverable = smap.recoverable_versions() 131 unrecoverable = smap.unrecoverable_versions() Locals smap <allmydata.mutable.servermap.ServerMap instance at 0x10e97cab8> summary [] vmap {(2, "\x94(:\xb2\xb1}\x7f\xbe\xf3\x08\xdac\x1f\xaf)r\x89V\xdfZJ\xa1[\x1c\xfb\x19\r'\xb9\x96s\xbd", None, 131073, 343, 3, 10, "\x01\x00\x00\x00\x00\x00\x00\x00\x02\x94(:\xb2\xb1}\x7f\xbe\xf3\x08\xdac\x1f\xaf)r\x89V\xdfZJ\xa1[\x1c\xfb\x19\r'\xb9\x96s\xbd\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x01W", (('EOF', 2330), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2298), ('share_hash_chain', 1339), ('verification_key_end', 2023))): set([(8, '\x1f\x0b\xf9H\x83\xca\xeas\xea5\xdf\x8eo\xd2u\xfa\xb1\x87\xe1\xd9', 1328857611.610529)]), (5, '-F\xf0\xd4T\x82k\xd2\x02{\xf2{7\xf3\x9c\x1d\xfb\xc9Z<J\xcb\xbf\x80\xe4MP\xffMI\x89R', 3, 10, 131073, 598, '\x01\x00\x00\x00\x00\x00\x00\x00\x05-F\xf0\xd4T\x82k\xd2\x02{\xf2{7\xf3\x9c\x1d\xfb\xc9Z<J\xcb\xbf\x80\xe4MP\xffMI\x89R\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x02V', (('EOF', 2415), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2383), ('share_hash_chain', 1339), ('verification_key_end', 2023))): set([(4, '\xfd\xc7gI\xc4m\xba!\x97\xf9\xe6\x82`\xf16\xb4\xfaY\xbf7', 1328857613.566889), (8, "'W\xb0P\xc3\xeaH\xfc\x18\xe5\xdb\xc2\x87\x8b\xab\x8efn\xfa\xaa", 1328857613.566889), (0, "'W\xb0P\xc3\xeaH\xfc\x18\xe5\xdb\xc2\x87\x8b\xab\x8efn\xfa\xaa", 1328857613.566889), (1, '\xba=\x91Y\x1a\xf3&\xaeXm\xc8\x1b\x0eO4!\xef\xa9g.', 1328857613.566889), (5, '\x9e\xf2\xc7\x10 \x90\x80W\xd0 \xce\xe2\xf2\xc2+Z\xaf\xb2\xb1+', 1328857613.566889), (9, '\xba=\x91Y\x1a\xf3&\xaeXm\xc8\x1b\x0eO4!\xef\xa9g.', 1328857613.566889), (6, ']\xe9\x1d:\x1b\x05\x04\xba\xa4\xf3\xc7xuq\xa83\xaeT\x04\x90', 1328857613.566889), (3, 'YE\xcb1\x8af0\x918J\xed\x1d@ed\xf6B\xd0?q', 1328857613.566889), (7, '\xcb\xd9\xdc\xa2\x07m\xd8\x10Szq\x1b\xde\x8c\xf5\xf6\xa3\x97\x10\x93', 1328857613.566889), (2, '\xc1|\x03\xe0\x1c"\xb6\xd58\xccO\xc6\x90\xf6\xe8f\x1dk\xc3\x86', 1328857613.566889)]), (3, '\x15%\xa7\\\xd3p\xa0\xe1\xe7\xbc\xd9\xe8\xb6\x9c/\xd90\xd5\xd9:\x92\x07\x0e.Q\x0c\x1b\xc5l\xb0\xf5\xbe', None, 131073, 598, 3, 10, '\x01\x00\x00\x00\x00\x00\x00\x00\x03\x15%\xa7\\\xd3p\xa0\xe1\xe7\xbc\xd9\xe8\xb6\x9c/\xd90\xd5\xd9:\x92\x07\x0e.Q\x0c\x1b\xc5l\xb0\xf5\xbe\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x02V', (('EOF', 2415), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2383), ('share_hash_chain', 1339), ('verification_key_end', 2023))): set([(9, '\x88\xb2wj\x93\x07k\xf5D\xc9M2\x92d\xbdt>N\xb50', 1328857611.60796)])} /Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/servermap.py, line 258 in recoverable_versions 256 for (verinfo, shares) in versionmap.items(): 257 (seqnum, root_hash, IV, segsize, datalength, k, N, prefix, 258 offsets_tuple) = verinfo 259 shnums = set([shnum for (shnum, peerid, timestamp) in shares]) Locals root_hash "\x94(:\xb2\xb1}\x7f\xbe\xf3\x08\xdac\x1f\xaf)r\x89V\xdfZJ\xa1[\x1c\xfb\x19\r'\xb9\x96s\xbd" timestamp 1328857611.610529 offsets_tuple (('EOF', 2330), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2298), ('share_hash_chain', 1339), ('verification_key_end', 2023)) IV None prefix "\x01\x00\x00\x00\x00\x00\x00\x00\x02\x94(:\xb2\xb1}\x7f\xbe\xf3\x08\xdac\x1f\xaf)r\x89V\xdfZJ\xa1[\x1c\xfb\x19\r'\xb9\x96s\xbd\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x01W" peerid '\x1f\x0b\xf9H\x83\xca\xeas\xea5\xdf\x8eo\xd2u\xfa\xb1\x87\xe1\xd9' N 10 seqnum 2 k 3 versionmap {(2, "\x94(:\xb2\xb1}\x7f\xbe\xf3\x08\xdac\x1f\xaf)r\x89V\xdfZJ\xa1[\x1c\xfb\x19\r'\xb9\x96s\xbd", None, 131073, 343, 3, 10, "\x01\x00\x00\x00\x00\x00\x00\x00\x02\x94(:\xb2\xb1}\x7f\xbe\xf3\x08\xdac\x1f\xaf)r\x89V\xdfZJ\xa1[\x1c\xfb\x19\r'\xb9\x96s\xbd\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x01W", (('EOF', 2330), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2298), ('share_hash_chain', 1339), ('verification_key_end', 2023))): set([(8, '\x1f\x0b\xf9H\x83\xca\xeas\xea5\xdf\x8eo\xd2u\xfa\xb1\x87\xe1\xd9', 1328857611.610529)]), (5, '-F\xf0\xd4T\x82k\xd2\x02{\xf2{7\xf3\x9c\x1d\xfb\xc9Z<J\xcb\xbf\x80\xe4MP\xffMI\x89R', 3, 10, 131073, 598, '\x01\x00\x00\x00\x00\x00\x00\x00\x05-F\xf0\xd4T\x82k\xd2\x02{\xf2{7\xf3\x9c\x1d\xfb\xc9Z<J\xcb\xbf\x80\xe4MP\xffMI\x89R\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x02V', (('EOF', 2415), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2383), ('share_hash_chain', 1339), ('verification_key_end', 2023))): set([(4, '\xfd\xc7gI\xc4m\xba!\x97\xf9\xe6\x82`\xf16\xb4\xfaY\xbf7', 1328857613.566889), (8, "'W\xb0P\xc3\xeaH\xfc\x18\xe5\xdb\xc2\x87\x8b\xab\x8efn\xfa\xaa", 1328857613.566889), (0, "'W\xb0P\xc3\xeaH\xfc\x18\xe5\xdb\xc2\x87\x8b\xab\x8efn\xfa\xaa", 1328857613.566889), (1, '\xba=\x91Y\x1a\xf3&\xaeXm\xc8\x1b\x0eO4!\xef\xa9g.', 1328857613.566889), (5, '\x9e\xf2\xc7\x10 \x90\x80W\xd0 \xce\xe2\xf2\xc2+Z\xaf\xb2\xb1+', 1328857613.566889), (9, '\xba=\x91Y\x1a\xf3&\xaeXm\xc8\x1b\x0eO4!\xef\xa9g.', 1328857613.566889), (6, ']\xe9\x1d:\x1b\x05\x04\xba\xa4\xf3\xc7xuq\xa83\xaeT\x04\x90', 1328857613.566889), (3, 'YE\xcb1\x8af0\x918J\xed\x1d@ed\xf6B\xd0?q', 1328857613.566889), (7, '\xcb\xd9\xdc\xa2\x07m\xd8\x10Szq\x1b\xde\x8c\xf5\xf6\xa3\x97\x10\x93', 1328857613.566889), (2, '\xc1|\x03\xe0\x1c"\xb6\xd58\xccO\xc6\x90\xf6\xe8f\x1dk\xc3\x86', 1328857613.566889)]), (3, '\x15%\xa7\\\xd3p\xa0\xe1\xe7\xbc\xd9\xe8\xb6\x9c/\xd90\xd5\xd9:\x92\x07\x0e.Q\x0c\x1b\xc5l\xb0\xf5\xbe', None, 131073, 598, 3, 10, '\x01\x00\x00\x00\x00\x00\x00\x00\x03\x15%\xa7\\\xd3p\xa0\xe1\xe7\xbc\xd9\xe8\xb6\x9c/\xd90\xd5\xd9:\x92\x07\x0e.Q\x0c\x1b\xc5l\xb0\xf5\xbe\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x02V', (('EOF', 2415), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2383), ('share_hash_chain', 1339), ('verification_key_end', 2023))): set([(9, '\x88\xb2wj\x93\x07k\xf5D\xc9M2\x92d\xbdt>N\xb50', 1328857611.60796)])} shnum 8 shares set([(4, '\xfd\xc7gI\xc4m\xba!\x97\xf9\xe6\x82`\xf16\xb4\xfaY\xbf7', 1328857613.566889), (8, "'W\xb0P\xc3\xeaH\xfc\x18\xe5\xdb\xc2\x87\x8b\xab\x8efn\xfa\xaa", 1328857613.566889), (0, "'W\xb0P\xc3\xeaH\xfc\x18\xe5\xdb\xc2\x87\x8b\xab\x8efn\xfa\xaa", 1328857613.566889), (1, '\xba=\x91Y\x1a\xf3&\xaeXm\xc8\x1b\x0eO4!\xef\xa9g.', 1328857613.566889), (5, '\x9e\xf2\xc7\x10 \x90\x80W\xd0 \xce\xe2\xf2\xc2+Z\xaf\xb2\xb1+', 1328857613.566889), (9, '\xba=\x91Y\x1a\xf3&\xaeXm\xc8\x1b\x0eO4!\xef\xa9g.', 1328857613.566889), (6, ']\xe9\x1d:\x1b\x05\x04\xba\xa4\xf3\xc7xuq\xa83\xaeT\x04\x90', 1328857613.566889), (3, 'YE\xcb1\x8af0\x918J\xed\x1d@ed\xf6B\xd0?q', 1328857613.566889), (7, '\xcb\xd9\xdc\xa2\x07m\xd8\x10Szq\x1b\xde\x8c\xf5\xf6\xa3\x97\x10\x93', 1328857613.566889), (2, '\xc1|\x03\xe0\x1c"\xb6\xd58\xccO\xc6\x90\xf6\xe8f\x1dk\xc3\x86', 1328857613.566889)]) verinfo (5, '-F\xf0\xd4T\x82k\xd2\x02{\xf2{7\xf3\x9c\x1d\xfb\xc9Z<J\xcb\xbf\x80\xe4MP\xffMI\x89R', 3, 10, 131073, 598, '\x01\x00\x00\x00\x00\x00\x00\x00\x05-F\xf0\xd4T\x82k\xd2\x02{\xf2{7\xf3\x9c\x1d\xfb\xc9Z<J\xcb\xbf\x80\xe4MP\xffMI\x89R\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x02V', (('EOF', 2415), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2383), ('share_hash_chain', 1339), ('verification_key_end', 2023))) shnums set([8]) datalength 343 segsize 131073 <type 'exceptions.ValueError'>: need more than 8 values to unpack
Attachments (6)
Change History (36)
comment:1 Changed at 2012-02-10T07:31:43Z by vikarti
comment:2 Changed at 2012-02-10T07:40:30Z by vikarti
added possible related incident files
comment:3 Changed at 2012-02-15T22:37:47Z by zooko
- Priority changed from major to critical
comment:4 Changed at 2012-02-17T18:17:18Z by vikarti
Addition: same from console, reproducible. even if gateway is not doing anything. no incident file this time(none was generated)
bash-3.2$ ./tahoe-fullrepair.sh *** checking alias tahoe ERROR: ValueError(need more than 8 values to unpack) "[Failure instance: Traceback: <type 'exceptions.ValueError'>: need more than 8 values to unpack" /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/base.py:793:runUntilCurrent /Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/support/lib/python2.7/site-packages/foolscap-0.6.3-py2.7.egg/foolscap/eventual.py:26:_turn /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py:361:callback /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py:455:_startRunCallbacks --- <exception caught here> --- /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py:542:_runCallbacks /Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/checker.py:269:_repair_finished /Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/checker.py:130:_fill_checker_results /Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/servermap.py:258:recoverable_versions *** checking alias test '<root>': healthy done: 1 objects checked pre-repair: 1 healthy, 0 unhealthy 0 repairs attempted, 0 successful, 0 failed post-repair: 1 healthy, 0 unhealthy bash-3.2$ tahoe ls WindowsSupport backups_Calibre_Library backups_INARI4_Evernote backups_INARI4_Libs backups_INARI4_Pictures backups_INARI4_Python26 backups_INARI4_Python27 backups_INARI4_Win_Libs backups_INARI4_Win_Python26 backups_INARI4_Win_Python27 backups_INARI4_Win_TahoeSupport backups_INARI4_Win_Work backups_Work bash-3.2$ cat ./tahoe-fullrepair.sh #!/bin/sh #sometimes we are not started in login shell! export PATH="/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/bin:$PATH" for item in `tahoe list-aliases | cut -f 1 -d :`; do echo '*** checking alias '"$item" #--verify -v --repair tahoe deep-check --repair -v --add-lease $item: echo done bash-3.2$ tahoe list-aliases tahoe: URI:DIR2-MDMF:yhu<skipped> test: URI:DIR2:2lz<skipped>
last (possible relevant) lines of twisted.log
2012-02-18 01:05:06+0700 [-] Unhandled error in Deferred: 2012-02-18 01:05:06+0700 [-] Unhandled Error Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/base.py", line 793, in runUntilCurrent call.func(*call.args, **call.kw) File "/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/support/lib/python2.7/site-packages/foolscap-0.6.3-py2.7.egg/foolscap/eventual.py", line 26, in _turn cb(*args, **kwargs) File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 361, in callback self._startRunCallbacks(result) File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 455, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/support/lib/python2.7/site-packages/Nevow-0.10.0-py2.7.egg/nevow/appserver.py", line 91, in processingFailed request.finishRequest( False ) File "/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/support/lib/python2.7/site-packages/Nevow-0.10.0-py2.7.egg/nevow/appserver.py", line 176, in finishRequest server.Request.finish(self) File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/web/http.py", line 866, in finish "Request.finish called on a request after its connection was lost; " exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this.
attempt to perform same repair operation on Windows or Linux nodes in same grid results in basically same issue.
comment:5 Changed at 2012-02-17T18:56:30Z by davidsarah
- Component changed from unknown to code-mutable
- Keywords mutable mdmf repair error regression preservation added
- Summary changed from Crashes while repair on Mac OS X to ValueError (need more than 8 values to unpack) during mutable MDMF repair
#1672 was marked as a duplicate of this.
Note that although mutable.format is mdmf, we don't have enough evidence to conclude that this is specific to MDMF. It seems likely that it is a regression in the 1.9.x mutable publisher/repairer, though.
comment:6 Changed at 2012-02-17T18:57:20Z by vikarti
this was planned to be added to #1672 davidsarah, mutable.format is not set but directory was created as mutable in WUI(becouse I have other problems with concurrent access otherwise, mutable dirs at least work and manual deep-checks of subdirs - shows all is fine) also,
this one is from linux
[root@lalangamena ~]# tahoe deep-check --repair -v --add-lease URI:DIR2-MDMF:yhu<skip> ERROR: ValueError(need more than 8 values to unpack) [Failure instance: Traceback: exceptions.ValueError: need more than 8 values to unpack /root/tahoe-new/allmydata-tahoe-1.9.1/Twisted-10.2.0-py2.4-linux-i686.egg/twisted/internet/base.py:789:runUntilCurrent /root/tahoe-new/allmydata-tahoe-1.9.1/support/lib/python2.4/site-packages/foolscap-0.6.3-py2.4.egg/foolscap/eventual.py:26:_turn /root/tahoe-new/allmydata-tahoe-1.9.1/Twisted-10.2.0-py2.4-linux-i686.egg/twisted/internet/defer.py:361:callback /root/tahoe-new/allmydata-tahoe-1.9.1/Twisted-10.2.0-py2.4-linux-i686.egg/twisted/internet/defer.py:455:_startRunCallbacks --- <exception caught here> --- /root/tahoe-new/allmydata-tahoe-1.9.1/Twisted-10.2.0-py2.4-linux-i686.egg/twisted/internet/defer.py:542:_runCallbacks /root/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/checker.py:269:_repair_finished /root/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/checker.py:130:_fill_checker_results /root/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/servermap.py:257:recoverable_versions [root@lalangamena ~]#
CentOS5.4 is exact python versions be of interest?
comment:7 Changed at 2012-02-17T19:19:35Z by davidsarah
The "exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this." is probably a duplicate of #1664; I don't see any strong evidence that it's related to the ValueError.
comment:8 Changed at 2012-04-01T22:54:08Z by zooko
- Milestone changed from undecided to 1.9.2
comment:9 Changed at 2012-06-12T16:52:02Z by davidsarah
- Owner changed from nobody to davidsarah
- Status changed from new to assigned
comment:10 Changed at 2012-06-18T15:38:08Z by zooko
- Owner changed from davidsarah to zooko
- Status changed from assigned to new
Investigating.
comment:11 Changed at 2012-06-18T15:38:27Z by zooko
- Status changed from new to assigned
comment:12 Changed at 2012-06-18T20:54:21Z by zooko
Update: I hate it when I post partial results of my investigations and then someone (usually Brian) goes and finishes debugging it and fixing it before I do. But, I figure that's not a good reason to withhold information, so here's my partial results.
- The stack trace in the original post shows a tuple with 8 elements being returned in place of "verinfo", but "verinfos" are supposed to have 9 elements.
- The elements shown in the stack trace in the original post are not the elements of a verinfo. A verinfo is (according to mutable/layout.py) is (int -- seqnum, cryptoval -- hash, cryptoval -- salt or hash, int -- segsize, int -- datalen, int -- k, int -- n, str -- prefix for signing, tuple -- offsets). The stack trace in the original post shows something that is (int -- seqnum, cryptoval -- hash?, int -- k, int -- n, int -- segsize, int -- datalen, str -- prefix for signing?, tuple -- offsets).
- The docstring of SDMFSlotWriteProxy.get_verinfo is wrong about the 2-indexed item -- it is a hash or a salt, not "a blank (nothing)"
- SDMFSlotWriteProxy.get_verinfo and MDMFSlotReadProxy.get_verinfo return the same shape of tuple, but MDMFSlotWriteProxy.get_verinfo returns a different shape. Aha! And the latter matches the data shown in the stack trace in the original post, so it is one of those that is ending up getting parsed as a "verinfo" and causing this exception.
- I wrote a type-checking function that asserts the shape of a verinfo tuple and sprinkled it everywhere and ran unit tests and got unit tests failures.
- I don't know how this code that says "d2 = reader.get_verinfo(); d2.addErrback(…)" in mutable/servermap.py ever worked! None of the functions named "get_verinfo" that I see return a Deferred. But, if that function raises an AttributeError every time (saying "tuple has no such attribute 'addErrback'"), wouldn't we have noticed before now? That line of code was added in bb10d685ed86eb08 11 months ago and has been in 1.9.x all along. Hm... but I see that any AttributeError raised from there would get handled by _query_failed...
Okay, I'm taking a brief break. but DON'T GO AND FIX THIS WITHOUT ME AND HAVE ALL THE FUN.
comment:13 Changed at 2012-06-19T03:48:38Z by zooko
- Owner changed from zooko to nobody
- Status changed from assigned to new
Okay since I wrote comment:12 I patched pristine 1.9.1 to raise an instance of a newly defined exception class instead of to execute d2.addErrback. I expected that this would not change unit test behavior since I believed that d2.addErrback would never do anything but raise an exception (AttributeError). However, instead I got a lot of test failures/errors.
So I commented out the "raise exception" and wrote a log.msg(…) debugprint. That shows that d2 is in fact a Deferred. I don't see how:
_trial_temp/test.log:2012-06-19 00:37:17-0300 [-] xxx testing testing 1 2 3 d2: <Deferred at 0x4719518 current result: (1, '\x08\x92\xa5\x90%\xfbCI0$\xbf8$\x9d[s\x92!\xab1\xfea\xc6aM\x8e\x0ch~\x92r\x03', '\xf4\xce\x8a}G\x94\x87:U\xf0\xf2\xaa\x9e\xa1\x86\xcc', 9, 8, 3, 10, '\x00\x00\x00\x00\x00\x00\x00\x00\x01\x08\x92\xa5\x90%\xfbCI0$\xbf8$\x9d[s\x92!\xab1\xfea\xc6aM\x8e\x0ch~\x92r\x03\xf4\xce\x8a}G\x94\x87:
Oh, okay, I see that !MDMFSlotReadProxy does return a deferred tuple, where the other two implementations of get_verinfo in mutable/layout.py return an immediate tuple... I guess that one is the only one that is getting invoked in that _got_results? ...
Also, I'll attach the patch that adds type-checking.
So, basically I'm pretty sure I see at least one or two serious bugs. 1. different shape of tuple passed around as a "verinfo" from !MDMFSlotWriteProxy than from !SDMFSlotWriteProxy or !MDMFSlotReadProxy, and 2. d2 = reader.get_verinfo(); d2.addErrback(…). However, I don't understand how the latter isn't already showing up as unit test failures and common failures of mutable file operations. And, I don't understand the full scope of these two -- under what conditions they strike and what their consequences are.
Unsetting my ownership of this ticket since I'm going to bed and I no longer mind if you snatch it out from under me. :-)
Changed at 2012-06-19T03:49:07Z by zooko
comment:14 Changed at 2012-06-19T12:54:26Z by zooko
- Owner changed from nobody to zooko
- Status changed from new to assigned
Taking this ticket back...
comment:15 Changed at 2012-06-20T15:55:00Z by zooko
Here's a patch against 1.9.2 which I don't intend to commit for the actual 1.9.2 release. Probably we could use this patch to finish understanding the scope of the bug, commit some variant of this patch to trunk (I like Brian's suggestion to replace the verinfo tuple with an object), and then commit some subset or minimized variant of this patch to 1.9.2.
Also, there's one test failure that I don't understand.
Changed at 2012-06-20T15:55:28Z by zooko
comment:16 Changed at 2012-06-21T02:45:37Z by zooko
Update: I think that the only place the 8-tuple "verinfo" could get generated and passed around is here. However, tracing backwards to see what could have led to that line of code, the only thing that can lead to that code is someone updating an MDMF by changing only part of it, and the only time that is ever done is if someone uses the WAPI to do so. I doubt that anything vikarti did, and certainly not a deep-check, was doing that, which leads me to think that the 8-tuple "verinfo" must have gotten generated and passed around in some other way.
Aha! Then it must be the one other possibility -- here. That is called from exactly one place -- here. Which is called from exactly one place -- here. Which is called from exactly one place -- here. Which means that it is when a Publish object publishes the file's contents...
Investigating a bit more...
comment:17 Changed at 2012-06-21T03:26:36Z by zooko
Okay, I think I've finally figured out why this doesn't happen in the unit tests. There are no unit tests of mutable file repair that involve actually repairing damage!
test_repairer is all immutable The one occurrence of "check_and_repair" in test_mutable says this:
def test_check_and_repair_readcap(self): # we can't currently repair from a mutable readcap: #625
So it doesnt' actually repair, just check, since it is a readcap.
All the occurrences of check_and_repair in test_deepcheck appear to be running on perfectly healthy files, so now repairs are performed.
Here is a patch which adds debugprintouts showing the trail of the problem. In practice, e.g. in vikarti's original post to this ticket, I believe that "xxx 1", then "xxx 2", then "xxx 3" will be encountered, leading to an exception. However when running the unit tests with this patch in place, you'll see xxx 1 and 2 but never "xxx 3", because there is no test of repair-which-actually-fixes-damage.
So I currently believe that vikarti's bug will happen every single time you have an MDMF with damage and you repair it. Next step: to write a unit test that tests that belief! But first, sleep.
Unsetting my ownership of this ticket in order to signal that I would be okay with (indeed, quite grateful for) someone else working on it while I sleep.
diff -u -r 1.9.1-pristine/src/allmydata/mutable/publish.py 1.9.1/src/allmydata/mutable/publish.py --- 1.9.1-pristine/src/allmydata/mutable/publish.py 2012-01-12 16:05:20.000000000 -0400 +++ 1.9.1/src/allmydata/mutable/publish.py 2012-06-21 00:13:01.051139281 -0300 @@ -883,7 +883,8 @@ def _record_verinfo(self): self.versioninfo = self.writers.values()[0].get_verinfo() - + if len(self.versioninfo) != 9: + print "xxx 1 %r" % (self.versioninfo,) def _connection_problem(self, f, writer): """ @@ -1141,6 +1142,8 @@ # If we get there, we know that responses correspond to placed # shares, and can safely execute these statements. if self.versioninfo: + if len(self.versioninfo) != 9: + print "xxx 2 self.versioninfo: %r" % (self.versioninfo,) self.log("wrote successfully: adding new share to servermap") self._servermap.add_new_share(peerid, writer.shnum, self.versioninfo, started) diff -u -r 1.9.1-pristine/src/allmydata/mutable/servermap.py 1.9.1/src/allmydata/mutable/servermap.py --- 1.9.1-pristine/src/allmydata/mutable/servermap.py 2012-01-12 16:05:20.000000000 -0400 +++ 1.9.1/src/allmydata/mutable/servermap.py 2012-06-21 00:13:13.631090217 -0300 @@ -254,6 +254,8 @@ versionmap = self.make_versionmap() recoverable_versions = set() for (verinfo, shares) in versionmap.items(): + if len(verinfo) != 9: + print "xxx 3 %r" % (verinfo,) (seqnum, root_hash, IV, segsize, datalength, k, N, prefix, offsets_tuple) = verinfo shnums = set([shnum for (shnum, peerid, timestamp) in shares])
comment:18 Changed at 2012-06-21T03:34:00Z by zooko
Here's the minimal patch that probably fixes the bug. Please don't ship this in a new stable release until we have a unit test of it!
HACK zompu:~/playground/tahoe-lafs/ticket1669-debuggery/1.9.2-pristine$ darcs diff -u diff -rN -u old-1.9.2-pristine/src/allmydata/mutable/layout.py new-1.9.2-pristine/src/allmydata/mutable/layout.py --- old-1.9.2-pristine/src/allmydata/mutable/layout.py 2012-06-21 00:33:48.246275199 -0300 +++ new-1.9.2-pristine/src/allmydata/mutable/layout.py 2012-06-21 00:33:48.386274653 -0300 @@ -1102,10 +1102,11 @@ def get_verinfo(self): return (self._seqnum, self._root_hash, - self._required_shares, - self._total_shares, + None, self._segment_size, self._data_length, + self._required_shares, + self._total_shares, self.get_signable(), self._get_offsets_tuple())
comment:19 Changed at 2012-06-21T03:37:17Z by zooko
Attaching a patch which is intended to eventually be submitted for trunk (i.e., for Tahoe-LAFS v1.10) but not for 1.9.2. It is basically "a bunch of small cleanups" around mutable files. Needs work. I think it causes one unit test to fail. There are several things I want to talk to Kevan about, most of which are currently indicated in this patch with the addition of a comment that says something "XXX Why such and such?".
Brian: your idea of replace verinfo tuple with an object is a good one, and I would be happy if you would do it, but I would prefer it if you would base that on top of this patch.
Changed at 2012-06-21T03:37:37Z by zooko
comment:20 Changed at 2012-06-21T03:43:00Z by zooko
- Owner changed from zooko to nobody
- Status changed from assigned to new
comment:21 Changed at 2012-06-21T15:28:23Z by zooko
- Owner changed from nobody to zooko
- Status changed from new to assigned
Ah, I was wrong to say that there are no tests. There is test_mdmf_repairable_5shares. Now why doesn't this test exercise the bug...
comment:22 Changed at 2012-06-21T15:54:24Z by zooko
Aha. I turned on code coverage display (see wiki:HowToWriteTests) and ran test_mdmf_repairable_5shares. This immediately shows that the code in question never gets exercised by that test. Aha! Because that test invokes check and then invokes repair. The code in question is in the implementation of check_and_repair.
There, I made a new test that invokes check_and_repair and it exercises the bug. Victory!
comment:23 Changed at 2012-06-21T16:31:04Z by zooko
- Keywords review-needed added
- Owner changed from zooko to nobody
- Status changed from assigned to new
Hm, I can't attach files of this size to this ticket (due to #1581). Fortunately, I can use tahoe-lafs instead!
Review needed!
comment:24 Changed at 2012-06-21T18:58:12Z by davidsarah
I'd like there to be less duplicated code in the check/repair tests, but that can wait until 1.10. Continuing review...
comment:25 Changed at 2012-06-21T20:53:28Z by david-sarah@…
- Resolution set to fixed
- Status changed from new to closed
In 8c6c410b87ebeea1:
comment:26 Changed at 2012-06-21T20:55:01Z by davidsarah
Also adca74d9f8a07d18, which did not get auto-commented for some reason: "Tests for ref #1669. Also refactor the existing tests to reduce duplicated code and to fix a cut-and-paste error that caused one case (successful SDMF repair) to go untested."
comment:27 Changed at 2012-06-21T21:59:03Z by david-sarah <david-sarah@…>
In 8c6c410b87ebeea1:
comment:28 Changed at 2012-06-21T22:53:06Z by david-sarah@…
In 5524/1.9.2:
comment:29 Changed at 2012-07-02T16:12:17Z by davidsarah
Also [5523/1.9.2] "Tests for ref #1669. Also refactor the existing tests to reduce duplicated code and to fix a cut-and-paste error that caused one case (successful SDMF repair) to go untested."
this check is that happened after https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1670