#1669 closed defect (fixed)

ValueError (need more than 8 values to unpack) during mutable MDMF repair

Reported by: vikarti Owned by: nobody
Priority: critical Milestone: 1.9.2
Component: code-mutable Version: 1.9.1
Keywords: mutable mdmf repair error regression preservation review-needed Cc:
Launchpad Bug:

Description

Gateway is running Mac OS X, mutable.format=mdmf

Sometimes running deep check from WUI results in

{no
<type 'exceptions.ValueError'>: need more than 8 values to unpack
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py, line 542 in _runCallbacks
540                    current._runningCallbacks = True
541                    try:
542                        current.result = callback(current.result, *args, **kw)
543                    finally:
Locals
callback	<function _repair_finished at 0x10e9b8050>
args	()
current	<Deferred at 0x10dfba6c8 current result: None>
kw	{}
/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/checker.py, line 269 in _repair_finished
267            r = CheckResults(from_string(self._node.get_uri()), self._storage_index)
268            self.cr_results.post_repair_results = r
269            self._fill_checker_results(repair_results.servermap, r)
270            self.cr_results.repair_results = repair_results # TODO? 
Locals
repair_results	<allmydata.mutable.repairer.RepairResults instance at 0x10df61488>
r	<allmydata.check_results.CheckResults instance at 0x10e8faef0>
self	<allmydata.mutable.checker.MutableCheckAndRepairer instance at 0x10df75908>
Globals
CheckResults	<class allmydata.check_results.CheckResults at 0x10d0b6d50>
from_string	<function from_string at 0x10d024140>
/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/checker.py, line 130 in _fill_checker_results
128        summary = []
129        vmap = smap.make_versionmap()
130        recoverable = smap.recoverable_versions()
131        unrecoverable = smap.unrecoverable_versions()
Locals
smap	<allmydata.mutable.servermap.ServerMap instance at 0x10e97cab8>
summary	[]
vmap	{(2, "\x94(:\xb2\xb1}\x7f\xbe\xf3\x08\xdac\x1f\xaf)r\x89V\xdfZJ\xa1[\x1c\xfb\x19\r'\xb9\x96s\xbd", None, 131073, 343, 3, 10, "\x01\x00\x00\x00\x00\x00\x00\x00\x02\x94(:\xb2\xb1}\x7f\xbe\xf3\x08\xdac\x1f\xaf)r\x89V\xdfZJ\xa1[\x1c\xfb\x19\r'\xb9\x96s\xbd\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x01W", (('EOF', 2330), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2298), ('share_hash_chain', 1339), ('verification_key_end', 2023))): set([(8, '\x1f\x0b\xf9H\x83\xca\xeas\xea5\xdf\x8eo\xd2u\xfa\xb1\x87\xe1\xd9', 1328857611.610529)]), (5, '-F\xf0\xd4T\x82k\xd2\x02{\xf2{7\xf3\x9c\x1d\xfb\xc9Z<J\xcb\xbf\x80\xe4MP\xffMI\x89R', 3, 10, 131073, 598, '\x01\x00\x00\x00\x00\x00\x00\x00\x05-F\xf0\xd4T\x82k\xd2\x02{\xf2{7\xf3\x9c\x1d\xfb\xc9Z<J\xcb\xbf\x80\xe4MP\xffMI\x89R\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x02V', (('EOF', 2415), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2383), ('share_hash_chain', 1339), ('verification_key_end', 2023))): set([(4, '\xfd\xc7gI\xc4m\xba!\x97\xf9\xe6\x82`\xf16\xb4\xfaY\xbf7', 1328857613.566889), (8, "'W\xb0P\xc3\xeaH\xfc\x18\xe5\xdb\xc2\x87\x8b\xab\x8efn\xfa\xaa", 1328857613.566889), (0, "'W\xb0P\xc3\xeaH\xfc\x18\xe5\xdb\xc2\x87\x8b\xab\x8efn\xfa\xaa", 1328857613.566889), (1, '\xba=\x91Y\x1a\xf3&\xaeXm\xc8\x1b\x0eO4!\xef\xa9g.', 1328857613.566889), (5, '\x9e\xf2\xc7\x10 \x90\x80W\xd0 \xce\xe2\xf2\xc2+Z\xaf\xb2\xb1+', 1328857613.566889), (9, '\xba=\x91Y\x1a\xf3&\xaeXm\xc8\x1b\x0eO4!\xef\xa9g.', 1328857613.566889), (6, ']\xe9\x1d:\x1b\x05\x04\xba\xa4\xf3\xc7xuq\xa83\xaeT\x04\x90', 1328857613.566889), (3, 'YE\xcb1\x8af0\x918J\xed\x1d@ed\xf6B\xd0?q', 1328857613.566889), (7, '\xcb\xd9\xdc\xa2\x07m\xd8\x10Szq\x1b\xde\x8c\xf5\xf6\xa3\x97\x10\x93', 1328857613.566889), (2, '\xc1|\x03\xe0\x1c"\xb6\xd58\xccO\xc6\x90\xf6\xe8f\x1dk\xc3\x86', 1328857613.566889)]), (3, '\x15%\xa7\\\xd3p\xa0\xe1\xe7\xbc\xd9\xe8\xb6\x9c/\xd90\xd5\xd9:\x92\x07\x0e.Q\x0c\x1b\xc5l\xb0\xf5\xbe', None, 131073, 598, 3, 10, '\x01\x00\x00\x00\x00\x00\x00\x00\x03\x15%\xa7\\\xd3p\xa0\xe1\xe7\xbc\xd9\xe8\xb6\x9c/\xd90\xd5\xd9:\x92\x07\x0e.Q\x0c\x1b\xc5l\xb0\xf5\xbe\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x02V', (('EOF', 2415), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2383), ('share_hash_chain', 1339), ('verification_key_end', 2023))): set([(9, '\x88\xb2wj\x93\x07k\xf5D\xc9M2\x92d\xbdt>N\xb50', 1328857611.60796)])}
/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/servermap.py, line 258 in recoverable_versions
256        for (verinfo, shares) in versionmap.items():
257            (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
258             offsets_tuple) = verinfo
259            shnums = set([shnum for (shnum, peerid, timestamp) in shares])
Locals
root_hash	"\x94(:\xb2\xb1}\x7f\xbe\xf3\x08\xdac\x1f\xaf)r\x89V\xdfZJ\xa1[\x1c\xfb\x19\r'\xb9\x96s\xbd"
timestamp	1328857611.610529
offsets_tuple	(('EOF', 2330), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2298), ('share_hash_chain', 1339), ('verification_key_end', 2023))
IV	None
prefix	"\x01\x00\x00\x00\x00\x00\x00\x00\x02\x94(:\xb2\xb1}\x7f\xbe\xf3\x08\xdac\x1f\xaf)r\x89V\xdfZJ\xa1[\x1c\xfb\x19\r'\xb9\x96s\xbd\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x01W"
peerid	'\x1f\x0b\xf9H\x83\xca\xeas\xea5\xdf\x8eo\xd2u\xfa\xb1\x87\xe1\xd9'
N	10
seqnum	2
k	3
versionmap	{(2, "\x94(:\xb2\xb1}\x7f\xbe\xf3\x08\xdac\x1f\xaf)r\x89V\xdfZJ\xa1[\x1c\xfb\x19\r'\xb9\x96s\xbd", None, 131073, 343, 3, 10, "\x01\x00\x00\x00\x00\x00\x00\x00\x02\x94(:\xb2\xb1}\x7f\xbe\xf3\x08\xdac\x1f\xaf)r\x89V\xdfZJ\xa1[\x1c\xfb\x19\r'\xb9\x96s\xbd\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x01W", (('EOF', 2330), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2298), ('share_hash_chain', 1339), ('verification_key_end', 2023))): set([(8, '\x1f\x0b\xf9H\x83\xca\xeas\xea5\xdf\x8eo\xd2u\xfa\xb1\x87\xe1\xd9', 1328857611.610529)]), (5, '-F\xf0\xd4T\x82k\xd2\x02{\xf2{7\xf3\x9c\x1d\xfb\xc9Z<J\xcb\xbf\x80\xe4MP\xffMI\x89R', 3, 10, 131073, 598, '\x01\x00\x00\x00\x00\x00\x00\x00\x05-F\xf0\xd4T\x82k\xd2\x02{\xf2{7\xf3\x9c\x1d\xfb\xc9Z<J\xcb\xbf\x80\xe4MP\xffMI\x89R\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x02V', (('EOF', 2415), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2383), ('share_hash_chain', 1339), ('verification_key_end', 2023))): set([(4, '\xfd\xc7gI\xc4m\xba!\x97\xf9\xe6\x82`\xf16\xb4\xfaY\xbf7', 1328857613.566889), (8, "'W\xb0P\xc3\xeaH\xfc\x18\xe5\xdb\xc2\x87\x8b\xab\x8efn\xfa\xaa", 1328857613.566889), (0, "'W\xb0P\xc3\xeaH\xfc\x18\xe5\xdb\xc2\x87\x8b\xab\x8efn\xfa\xaa", 1328857613.566889), (1, '\xba=\x91Y\x1a\xf3&\xaeXm\xc8\x1b\x0eO4!\xef\xa9g.', 1328857613.566889), (5, '\x9e\xf2\xc7\x10 \x90\x80W\xd0 \xce\xe2\xf2\xc2+Z\xaf\xb2\xb1+', 1328857613.566889), (9, '\xba=\x91Y\x1a\xf3&\xaeXm\xc8\x1b\x0eO4!\xef\xa9g.', 1328857613.566889), (6, ']\xe9\x1d:\x1b\x05\x04\xba\xa4\xf3\xc7xuq\xa83\xaeT\x04\x90', 1328857613.566889), (3, 'YE\xcb1\x8af0\x918J\xed\x1d@ed\xf6B\xd0?q', 1328857613.566889), (7, '\xcb\xd9\xdc\xa2\x07m\xd8\x10Szq\x1b\xde\x8c\xf5\xf6\xa3\x97\x10\x93', 1328857613.566889), (2, '\xc1|\x03\xe0\x1c"\xb6\xd58\xccO\xc6\x90\xf6\xe8f\x1dk\xc3\x86', 1328857613.566889)]), (3, '\x15%\xa7\\\xd3p\xa0\xe1\xe7\xbc\xd9\xe8\xb6\x9c/\xd90\xd5\xd9:\x92\x07\x0e.Q\x0c\x1b\xc5l\xb0\xf5\xbe', None, 131073, 598, 3, 10, '\x01\x00\x00\x00\x00\x00\x00\x00\x03\x15%\xa7\\\xd3p\xa0\xe1\xe7\xbc\xd9\xe8\xb6\x9c/\xd90\xd5\xd9:\x92\x07\x0e.Q\x0c\x1b\xc5l\xb0\xf5\xbe\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x02V', (('EOF', 2415), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2383), ('share_hash_chain', 1339), ('verification_key_end', 2023))): set([(9, '\x88\xb2wj\x93\x07k\xf5D\xc9M2\x92d\xbdt>N\xb50', 1328857611.60796)])}
shnum	8
shares	set([(4, '\xfd\xc7gI\xc4m\xba!\x97\xf9\xe6\x82`\xf16\xb4\xfaY\xbf7', 1328857613.566889), (8, "'W\xb0P\xc3\xeaH\xfc\x18\xe5\xdb\xc2\x87\x8b\xab\x8efn\xfa\xaa", 1328857613.566889), (0, "'W\xb0P\xc3\xeaH\xfc\x18\xe5\xdb\xc2\x87\x8b\xab\x8efn\xfa\xaa", 1328857613.566889), (1, '\xba=\x91Y\x1a\xf3&\xaeXm\xc8\x1b\x0eO4!\xef\xa9g.', 1328857613.566889), (5, '\x9e\xf2\xc7\x10 \x90\x80W\xd0 \xce\xe2\xf2\xc2+Z\xaf\xb2\xb1+', 1328857613.566889), (9, '\xba=\x91Y\x1a\xf3&\xaeXm\xc8\x1b\x0eO4!\xef\xa9g.', 1328857613.566889), (6, ']\xe9\x1d:\x1b\x05\x04\xba\xa4\xf3\xc7xuq\xa83\xaeT\x04\x90', 1328857613.566889), (3, 'YE\xcb1\x8af0\x918J\xed\x1d@ed\xf6B\xd0?q', 1328857613.566889), (7, '\xcb\xd9\xdc\xa2\x07m\xd8\x10Szq\x1b\xde\x8c\xf5\xf6\xa3\x97\x10\x93', 1328857613.566889), (2, '\xc1|\x03\xe0\x1c"\xb6\xd58\xccO\xc6\x90\xf6\xe8f\x1dk\xc3\x86', 1328857613.566889)])
verinfo	(5, '-F\xf0\xd4T\x82k\xd2\x02{\xf2{7\xf3\x9c\x1d\xfb\xc9Z<J\xcb\xbf\x80\xe4MP\xffMI\x89R', 3, 10, 131073, 598, '\x01\x00\x00\x00\x00\x00\x00\x00\x05-F\xf0\xd4T\x82k\xd2\x02{\xf2{7\xf3\x9c\x1d\xfb\xc9Z<J\xcb\xbf\x80\xe4MP\xffMI\x89R\x03\n\x00\x00\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00\x00\x02V', (('EOF', 2415), ('verification_key', 1731), ('enc_privkey', 123), ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2383), ('share_hash_chain', 1339), ('verification_key_end', 2023)))
shnums	set([8])
datalength	343
segsize	131073
<type 'exceptions.ValueError'>: need more than 8 values to unpack

Attachments (6)

incident-2012-02-10--07-10-36Z-yyfpk7a.flog.bz2 (41.4 KB) - added by vikarti at 2012-02-10T07:38:51Z.
possible related incident file
incident-2012-02-10--07-11-26Z-qwg5ivy.flog.bz2 (41.1 KB) - added by vikarti at 2012-02-10T07:39:07Z.
possible related incident file #3
incident-2012-02-10--07-14-46Z-fpevtli.flog.bz2 (41.6 KB) - added by vikarti at 2012-02-10T07:39:53Z.
possible related incident file 4
1.9.1-debugging-ticket1669.diff.txt (6.8 KB) - added by zooko at 2012-06-19T03:49:07Z.
patch1.diff.txt (40.2 KB) - added by zooko at 2012-06-20T15:55:28Z.
diff.txt (39.6 KB) - added by zooko at 2012-06-21T03:37:37Z.

Download all attachments as: .zip

Change History (36)

comment:1 Changed at 2012-02-10T07:31:43Z by vikarti

this check is that happened after https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1670

Changed at 2012-02-10T07:38:51Z by vikarti

possible related incident file

Changed at 2012-02-10T07:39:07Z by vikarti

possible related incident file #3

Changed at 2012-02-10T07:39:53Z by vikarti

possible related incident file 4

comment:2 Changed at 2012-02-10T07:40:30Z by vikarti

added possible related incident files

comment:3 Changed at 2012-02-15T22:37:47Z by zooko

  • Priority changed from major to critical

comment:4 Changed at 2012-02-17T18:17:18Z by vikarti

Addition: same from console, reproducible. even if gateway is not doing anything. no incident file this time(none was generated)

bash-3.2$ ./tahoe-fullrepair.sh
*** checking alias tahoe
ERROR: ValueError(need more than 8 values to unpack)
"[Failure instance: Traceback: <type 'exceptions.ValueError'>: need more than 8 values to unpack"
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/base.py:793:runUntilCurrent
/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/support/lib/python2.7/site-packages/foolscap-0.6.3-py2.7.egg/foolscap/eventual.py:26:_turn
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py:361:callback
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py:455:_startRunCallbacks
--- <exception caught here> ---
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py:542:_runCallbacks
/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/checker.py:269:_repair_finished
/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/checker.py:130:_fill_checker_results
/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/servermap.py:258:recoverable_versions

*** checking alias test
'<root>': healthy
done: 1 objects checked
 pre-repair: 1 healthy, 0 unhealthy
 0 repairs attempted, 0 successful, 0 failed
 post-repair: 1 healthy, 0 unhealthy


bash-3.2$ tahoe ls
WindowsSupport
backups_Calibre_Library
backups_INARI4_Evernote
backups_INARI4_Libs
backups_INARI4_Pictures
backups_INARI4_Python26
backups_INARI4_Python27
backups_INARI4_Win_Libs
backups_INARI4_Win_Python26
backups_INARI4_Win_Python27
backups_INARI4_Win_TahoeSupport
backups_INARI4_Win_Work
backups_Work
bash-3.2$ cat ./tahoe-fullrepair.sh 
#!/bin/sh
#sometimes we are not started in login shell!
export PATH="/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/bin:$PATH"

                                                                                                                                                         
for item in `tahoe list-aliases | cut -f 1 -d :`; do
    echo '*** checking alias '"$item"
    #--verify -v --repair
    tahoe deep-check --repair -v --add-lease $item: 
    echo
done


bash-3.2$ tahoe list-aliases
  tahoe: URI:DIR2-MDMF:yhu<skipped>
   test: URI:DIR2:2lz<skipped>
   

last (possible relevant) lines of twisted.log

2012-02-18 01:05:06+0700 [-] Unhandled error in Deferred:
2012-02-18 01:05:06+0700 [-] Unhandled Error
	Traceback (most recent call last):
	  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/base.py", line 793, in runUntilCurrent
	    call.func(*call.args, **call.kw)
	  File "/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/support/lib/python2.7/site-packages/foolscap-0.6.3-py2.7.egg/foolscap/eventual.py", line 26, in _turn
	    cb(*args, **kwargs)
	  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 361, in callback
	    self._startRunCallbacks(result)
	  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 455, in _startRunCallbacks
	    self._runCallbacks()
	--- <exception caught here> ---
	  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 542, in _runCallbacks
	    current.result = callback(current.result, *args, **kw)
	  File "/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/support/lib/python2.7/site-packages/Nevow-0.10.0-py2.7.egg/nevow/appserver.py", line 91, in processingFailed
	    request.finishRequest( False )
	  File "/Users/vikarti/tahoe-new/allmydata-tahoe-1.9.1/support/lib/python2.7/site-packages/Nevow-0.10.0-py2.7.egg/nevow/appserver.py", line 176, in finishRequest
	    server.Request.finish(self)
	  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/web/http.py", line 866, in finish
	    "Request.finish called on a request after its connection was lost; "
	exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this.
	

attempt to perform same repair operation on Windows or Linux nodes in same grid results in basically same issue.

comment:5 Changed at 2012-02-17T18:56:30Z by davidsarah

  • Component changed from unknown to code-mutable
  • Keywords mutable mdmf repair error regression preservation added
  • Summary changed from Crashes while repair on Mac OS X to ValueError (need more than 8 values to unpack) during mutable MDMF repair

#1672 was marked as a duplicate of this.

Note that although mutable.format is mdmf, we don't have enough evidence to conclude that this is specific to MDMF. It seems likely that it is a regression in the 1.9.x mutable publisher/repairer, though.

comment:6 Changed at 2012-02-17T18:57:20Z by vikarti

this was planned to be added to #1672 davidsarah, mutable.format is not set but directory was created as mutable in WUI(becouse I have other problems with concurrent access otherwise, mutable dirs at least work and manual deep-checks of subdirs - shows all is fine) also,

this one is from linux

[root@lalangamena ~]# tahoe deep-check --repair -v --add-lease URI:DIR2-MDMF:yhu<skip>
ERROR: ValueError(need more than 8 values to unpack)
[Failure instance: Traceback: exceptions.ValueError: need more than 8 values to unpack
/root/tahoe-new/allmydata-tahoe-1.9.1/Twisted-10.2.0-py2.4-linux-i686.egg/twisted/internet/base.py:789:runUntilCurrent
/root/tahoe-new/allmydata-tahoe-1.9.1/support/lib/python2.4/site-packages/foolscap-0.6.3-py2.4.egg/foolscap/eventual.py:26:_turn
/root/tahoe-new/allmydata-tahoe-1.9.1/Twisted-10.2.0-py2.4-linux-i686.egg/twisted/internet/defer.py:361:callback
/root/tahoe-new/allmydata-tahoe-1.9.1/Twisted-10.2.0-py2.4-linux-i686.egg/twisted/internet/defer.py:455:_startRunCallbacks
--- <exception caught here> ---
/root/tahoe-new/allmydata-tahoe-1.9.1/Twisted-10.2.0-py2.4-linux-i686.egg/twisted/internet/defer.py:542:_runCallbacks
/root/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/checker.py:269:_repair_finished
/root/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/checker.py:130:_fill_checker_results
/root/tahoe-new/allmydata-tahoe-1.9.1/src/allmydata/mutable/servermap.py:257:recoverable_versions
[root@lalangamena ~]# 

CentOS5.4 is exact python versions be of interest?

comment:7 Changed at 2012-02-17T19:19:35Z by davidsarah

The "exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this." is probably a duplicate of #1664; I don't see any strong evidence that it's related to the ValueError.

comment:8 Changed at 2012-04-01T22:54:08Z by zooko

  • Milestone changed from undecided to 1.9.2

comment:9 Changed at 2012-06-12T16:52:02Z by davidsarah

  • Owner changed from nobody to davidsarah
  • Status changed from new to assigned

comment:10 Changed at 2012-06-18T15:38:08Z by zooko

  • Owner changed from davidsarah to zooko
  • Status changed from assigned to new

Investigating.

comment:11 Changed at 2012-06-18T15:38:27Z by zooko

  • Status changed from new to assigned

comment:12 Changed at 2012-06-18T20:54:21Z by zooko

Update: I hate it when I post partial results of my investigations and then someone (usually Brian) goes and finishes debugging it and fixing it before I do. But, I figure that's not a good reason to withhold information, so here's my partial results.

  • The stack trace in the original post shows a tuple with 8 elements being returned in place of "verinfo", but "verinfos" are supposed to have 9 elements.
  • The elements shown in the stack trace in the original post are not the elements of a verinfo. A verinfo is (according to mutable/layout.py) is (int -- seqnum, cryptoval -- hash, cryptoval -- salt or hash, int -- segsize, int -- datalen, int -- k, int -- n, str -- prefix for signing, tuple -- offsets). The stack trace in the original post shows something that is (int -- seqnum, cryptoval -- hash?, int -- k, int -- n, int -- segsize, int -- datalen, str -- prefix for signing?, tuple -- offsets).
  • I wrote a type-checking function that asserts the shape of a verinfo tuple and sprinkled it everywhere and ran unit tests and got unit tests failures.
  • I don't know how this code that says "d2 = reader.get_verinfo(); d2.addErrback(…)" in mutable/servermap.py ever worked! None of the functions named "get_verinfo" that I see return a Deferred. But, if that function raises an AttributeError every time (saying "tuple has no such attribute 'addErrback'"), wouldn't we have noticed before now? That line of code was added in bb10d685ed86eb08 11 months ago and has been in 1.9.x all along. Hm... but I see that any AttributeError raised from there would get handled by _query_failed...

Okay, I'm taking a brief break. but DON'T GO AND FIX THIS WITHOUT ME AND HAVE ALL THE FUN.

comment:13 Changed at 2012-06-19T03:48:38Z by zooko

  • Owner changed from zooko to nobody
  • Status changed from assigned to new

Okay since I wrote comment:12 I patched pristine 1.9.1 to raise an instance of a newly defined exception class instead of to execute d2.addErrback. I expected that this would not change unit test behavior since I believed that d2.addErrback would never do anything but raise an exception (AttributeError). However, instead I got a lot of test failures/errors.

So I commented out the "raise exception" and wrote a log.msg(…) debugprint. That shows that d2 is in fact a Deferred. I don't see how:

_trial_temp/test.log:2012-06-19 00:37:17-0300 [-] xxx testing testing 1 2 3 d2: <Deferred at 0x4719518 current result: (1, '\x08\x92\xa5\x90%\xfbCI0$\xbf8$\x9d[s\x92!\xab1\xfea\xc6aM\x8e\x0ch~\x92r\x03', '\xf4\xce\x8a}G\x94\x87:U\xf0\xf2\xaa\x9e\xa1\x86\xcc', 9, 8, 3, 10, '\x00\x00\x00\x00\x00\x00\x00\x00\x01\x08\x92\xa5\x90%\xfbCI0$\xbf8$\x9d[s\x92!\xab1\xfea\xc6aM\x8e\x0ch~\x92r\x03\xf4\xce\x8a}G\x94\x87:

Oh, okay, I see that !MDMFSlotReadProxy does return a deferred tuple, where the other two implementations of get_verinfo in mutable/layout.py return an immediate tuple... I guess that one is the only one that is getting invoked in that _got_results? ...

Also, I'll attach the patch that adds type-checking.

So, basically I'm pretty sure I see at least one or two serious bugs. 1. different shape of tuple passed around as a "verinfo" from !MDMFSlotWriteProxy than from !SDMFSlotWriteProxy or !MDMFSlotReadProxy, and 2. d2 = reader.get_verinfo(); d2.addErrback(…). However, I don't understand how the latter isn't already showing up as unit test failures and common failures of mutable file operations. And, I don't understand the full scope of these two -- under what conditions they strike and what their consequences are.

Unsetting my ownership of this ticket since I'm going to bed and I no longer mind if you snatch it out from under me. :-)

comment:14 Changed at 2012-06-19T12:54:26Z by zooko

  • Owner changed from nobody to zooko
  • Status changed from new to assigned

Taking this ticket back...

comment:15 Changed at 2012-06-20T15:55:00Z by zooko

Here's a patch against 1.9.2 which I don't intend to commit for the actual 1.9.2 release. Probably we could use this patch to finish understanding the scope of the bug, commit some variant of this patch to trunk (I like Brian's suggestion to replace the verinfo tuple with an object), and then commit some subset or minimized variant of this patch to 1.9.2.

Also, there's one test failure that I don't understand.

Changed at 2012-06-20T15:55:28Z by zooko

comment:16 Changed at 2012-06-21T02:45:37Z by zooko

Update: I think that the only place the 8-tuple "verinfo" could get generated and passed around is here. However, tracing backwards to see what could have led to that line of code, the only thing that can lead to that code is someone updating an MDMF by changing only part of it, and the only time that is ever done is if someone uses the WAPI to do so. I doubt that anything vikarti did, and certainly not a deep-check, was doing that, which leads me to think that the 8-tuple "verinfo" must have gotten generated and passed around in some other way.

Aha! Then it must be the one other possibility -- here. That is called from exactly one place -- here. Which is called from exactly one place -- here. Which is called from exactly one place -- here. Which means that it is when a Publish object publishes the file's contents...

Investigating a bit more...

comment:17 Changed at 2012-06-21T03:26:36Z by zooko

Okay, I think I've finally figured out why this doesn't happen in the unit tests. There are no unit tests of mutable file repair that involve actually repairing damage!

test_repairer is all immutable The one occurrence of "check_and_repair" in test_mutable says this:

def test_check_and_repair_readcap(self):
    # we can't currently repair from a mutable readcap: #625

So it doesn't actually repair, just check, since it is a readcap.

All the occurrences of check_and_repair in test_deepcheck appear to be running on perfectly healthy files, so now repairs are performed.

Here is a patch which adds debugprintouts showing the trail of the problem. In practice, e.g. in vikarti's original post to this ticket, I believe that "xxx 1", then "xxx 2", then "xxx 3" will be encountered, leading to an exception. However when running the unit tests with this patch in place, you'll see xxx 1 and 2 but never "xxx 3", because there is no test of repair-which-actually-fixes-damage.

So I currently believe that vikarti's bug will happen every single time you have an MDMF with damage and you repair it. Next step: to write a unit test that tests that belief! But first, sleep.

Unsetting my ownership of this ticket in order to signal that I would be okay with (indeed, quite grateful for) someone else working on it while I sleep.

diff -u -r 1.9.1-pristine/src/allmydata/mutable/publish.py 1.9.1/src/allmydata/mutable/publish.py
--- 1.9.1-pristine/src/allmydata/mutable/publish.py     2012-01-12 16:05:20.000000000 -0400
+++ 1.9.1/src/allmydata/mutable/publish.py      2012-06-21 00:13:01.051139281 -0300
@@ -883,7 +883,8 @@
 
     def _record_verinfo(self):
         self.versioninfo = self.writers.values()[0].get_verinfo()
-
+        if len(self.versioninfo) != 9:
+            print "xxx 1 %r" % (self.versioninfo,)
 
     def _connection_problem(self, f, writer):
         """
@@ -1141,6 +1142,8 @@
         # If we get there, we know that responses correspond to placed
         # shares, and can safely execute these statements.
         if self.versioninfo:
+            if len(self.versioninfo) != 9:
+                print "xxx 2 self.versioninfo: %r" % (self.versioninfo,)
             self.log("wrote successfully: adding new share to servermap")
             self._servermap.add_new_share(peerid, writer.shnum,
                                           self.versioninfo, started)
diff -u -r 1.9.1-pristine/src/allmydata/mutable/servermap.py 1.9.1/src/allmydata/mutable/servermap.py
--- 1.9.1-pristine/src/allmydata/mutable/servermap.py   2012-01-12 16:05:20.000000000 -0400
+++ 1.9.1/src/allmydata/mutable/servermap.py    2012-06-21 00:13:13.631090217 -0300
@@ -254,6 +254,8 @@
         versionmap = self.make_versionmap()
         recoverable_versions = set()
         for (verinfo, shares) in versionmap.items():
+            if len(verinfo) != 9:
+                print "xxx 3 %r" % (verinfo,)
             (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
              offsets_tuple) = verinfo
             shnums = set([shnum for (shnum, peerid, timestamp) in shares])
Last edited at 2012-06-21T03:42:50Z by zooko (previous) (diff)

comment:18 Changed at 2012-06-21T03:34:00Z by zooko

Here's the minimal patch that probably fixes the bug. Please don't ship this in a new stable release until we have a unit test of it!

HACK zompu:~/playground/tahoe-lafs/ticket1669-debuggery/1.9.2-pristine$ darcs diff -u
diff -rN -u old-1.9.2-pristine/src/allmydata/mutable/layout.py new-1.9.2-pristine/src/allmydata/mutable/layout.py
--- old-1.9.2-pristine/src/allmydata/mutable/layout.py  2012-06-21 00:33:48.246275199 -0300
+++ new-1.9.2-pristine/src/allmydata/mutable/layout.py  2012-06-21 00:33:48.386274653 -0300
@@ -1102,10 +1102,11 @@
     def get_verinfo(self):
         return (self._seqnum,
                 self._root_hash,
-                self._required_shares,
-                self._total_shares,
+                None,
                 self._segment_size,
                 self._data_length,
+                self._required_shares,
+                self._total_shares,
                 self.get_signable(),
                 self._get_offsets_tuple())
 

comment:19 Changed at 2012-06-21T03:37:17Z by zooko

Attaching a patch which is intended to eventually be submitted for trunk (i.e., for Tahoe-LAFS v1.10) but not for 1.9.2. It is basically "a bunch of small cleanups" around mutable files. Needs work. I think it causes one unit test to fail. There are several things I want to talk to Kevan about, most of which are currently indicated in this patch with the addition of a comment that says something "XXX Why such and such?".

Brian: your idea of replace verinfo tuple with an object is a good one, and I would be happy if you would do it, but I would prefer it if you would base that on top of this patch.

Changed at 2012-06-21T03:37:37Z by zooko

comment:20 Changed at 2012-06-21T03:43:00Z by zooko

  • Owner changed from zooko to nobody
  • Status changed from assigned to new

comment:21 Changed at 2012-06-21T15:28:23Z by zooko

  • Owner changed from nobody to zooko
  • Status changed from new to assigned

Ah, I was wrong to say that there are no tests. There is test_mdmf_repairable_5shares. Now why doesn't this test exercise the bug...

comment:22 Changed at 2012-06-21T15:54:24Z by zooko

Aha. I turned on code coverage display (see wiki:HowToWriteTests) and ran test_mdmf_repairable_5shares. This immediately shows that the code in question never gets exercised by that test. Aha! Because that test invokes check and then invokes repair. The code in question is in the implementation of check_and_repair.

There, I made a new test that invokes check_and_repair and it exercises the bug. Victory!

comment:23 Changed at 2012-06-21T16:31:04Z by zooko

  • Keywords review-needed added
  • Owner changed from zooko to nobody
  • Status changed from assigned to new

Hm, I can't attach files of this size to this ticket (due to ). Fortunately, I can use tahoe-lafs instead!

https://lafsgateway.zooko.com/file/URI:CHK:bbrr25dwjjmirkmhefxfaepj5y:4p56766gfzu4pztkya3rpo3sok5bffyggrp3u7ody5nnwjipntfa:1:1:102848/@@named=/fix.dpatch.txt

Review needed!

Version 0, edited at 2012-06-21T16:31:04Z by zooko (next)

comment:24 Changed at 2012-06-21T18:58:12Z by davidsarah

I'd like there to be less duplicated code in the check/repair tests, but that can wait until 1.10. Continuing review...

comment:25 Changed at 2012-06-21T20:53:28Z by david-sarah@…

  • Resolution set to fixed
  • Status changed from new to closed

In 8c6c410b87ebeea1:

mutable: fix shape of 'verinfo' tuple returned from MDMFSlotWriteProxy.get_verinfo(). fixes #1669

comment:26 Changed at 2012-06-21T20:55:01Z by davidsarah

Also adca74d9f8a07d18, which did not get auto-commented for some reason: "Tests for ref #1669. Also refactor the existing tests to reduce duplicated code and to fix a cut-and-paste error that caused one case (successful SDMF repair) to go untested."

comment:27 Changed at 2012-06-21T21:59:03Z by david-sarah <david-sarah@…>

In 8c6c410b87ebeea1:

mutable: fix shape of 'verinfo' tuple returned from MDMFSlotWriteProxy.get_verinfo(). fixes #1669

comment:28 Changed at 2012-06-21T22:53:06Z by david-sarah@…

In 5524/1.9.2:

mutable: fix shape of 'verinfo' tuple returned from MDMFSlotWriteProxy.get_verinfo(). fixes #1669

comment:29 Changed at 2012-07-02T16:12:17Z by davidsarah

Also [5523/1.9.2] "Tests for ref #1669. Also refactor the existing tests to reduce duplicated code and to fix a cut-and-paste error that caused one case (successful SDMF repair) to go untested."

comment:30 Changed at 2012-07-16T16:33:00Z by david-sarah@…

In 5878/cloud-backend:

mutable: fix shape of 'verinfo' tuple returned from MDMFSlotWriteProxy.get_verinfo(). fixes #1669

Note: See TracTickets for help on using tickets.