[tahoe-dev] So, I wanted to track down and properly report a reproducable error.

Wed Mar 21 13:56:01 UTC 2012

So, I wanted to track down and properly report a reproducable error.
The following is a write-up of the procedure that lead to
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1689

On #tahoe-lafs folks are using flogtool for such a task:

 flogtool dump .tahoe/private/logport.furl
Traceback (most recent call last):
  File "/usr/bin/flogtool", line 18, in <module>
    run_flogtool()
  File "/usr/lib64/python2.6/site-packages/foolscap/logging/cli.py", line 103, in run_flogtool
    dispatch(command, so)
  File "/usr/lib64/python2.6/site-packages/foolscap/logging/cli.py", line 63, in dispatch
    ld.run(options)
  File "/usr/lib64/python2.6/site-packages/foolscap/logging/dumper.py", line 36, in run
    self.start(f)
  File "/usr/lib64/python2.6/site-packages/foolscap/logging/dumper.py", line 47, in start
    for e in self.get_events(f):
  File "/usr/lib64/python2.6/site-packages/foolscap/logging/dumper.py", line 130, in get_events
    e = pickle.load(f)
  File "/usr/lib64/python2.6/pickle.py", line 1370, in load
    return Unpickler(file).load()
  File "/usr/lib64/python2.6/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib64/python2.6/pickle.py", line 1165, in load_put
    self.memo[self.readline()[:-1]] = self.stack[-1]
IndexError: list index out of range

Great, one more bug! Let's use flogtool tail instead of flogtool dump:

 flogtool tail .tahoe/private/logport.furl                 
starting..
Connecting..

...and it just sits there. Might be network related, let's check
what logport.furl looks like:

cat .tahoe/private/logport.furl
pb://hjsdfuzz2jfsjkhfjksdfuisdzfifffa@127.0.0.1/hkjahf82364jgadhkhfllkjlsjhhhfhb

But tahoe.cfg shows a port that isn't listed in that furl:
tub.port = 12344
tub.location = 127.0.0.1

Adding the port to tub.location and restarting the node does the trick:
tub.location = 127.0.0.1:12344

cat .tahoe/private/logport.furl
pb://hjsdfuzz2jfsjkhfjksdfuisdzfifffa@127.0.0.1:12344/hkjahf82364jgadhkhfllkjlsjhhhfhb

and now we can finally use flogtool as intended:

 flogtool tail --save-to=/tmp/flogtool.tail-1.txt .tahoe/private/logport.furl
starting..
Connecting..
Connected (to pid 1729)
Remote Versions:
 Nevow: 0.10.0
 Twisted: 11.1.0
 allmydata-tahoe: unknown
 foolscap: 0.6.3
 mock: 0.7.2
 platform: Linux-slackware_13.37.0-x86_64-64bit_ELF
 pyOpenSSL: 0.13
 pyasn1: unknown
 pycrypto: 2.5
 pycryptopp: 0.6.0.1206569328141510525648634803928199668821045408958
 python: 2.6.6
 setuptools: 0.6c11
 simplejson: 2.3.2
 sqlite3: 2.4.1
 twisted: 11.1.0
 zfec: 1.4.22
 zope.interface: unknown
[...]

flogtool.tail-1.txt contains the flogtool output of:

tahoe deep-check -v --repair --add-lease tahoe:
'<root>': healthy
done: 1 objects checked
 pre-repair: 1 healthy, 0 unhealthy
 0 repairs attempted, 0 successful, 0 failed
 post-repair: 1 healthy, 0 unhealthy

when 9 out of 9 storage servers are available.

tahoe: is an alias for an empty directory.

The error I have found is reproducable by just stopping one storage
server.
After having done that, I again run flogtool to capture debug data:

flogtool tail --save-to=/tmp/flogtool.tail-2.txt .tahoe/private/logport.furl

and run once more:
tahoe deep-check -v --repair --add-lease tahoe:
ERROR: AssertionError()
"[Failure instance: Traceback: <type 'exceptions.AssertionError'>: "
/usr/lib64/python2.6/site-packages/allmydata/mutable/filenode.py:563:upload
/usr/lib64/python2.6/site-packages/allmydata/mutable/filenode.py:661:_do_serialized
/usr/lib64/python2.6/site-packages/twisted/internet/defer.py:298:addCallback
/usr/lib64/python2.6/site-packages/twisted/internet/defer.py:287:addCallbacks
--- <exception caught here> ---
/usr/lib64/python2.6/site-packages/twisted/internet/defer.py:545:_runCallbacks
/usr/lib64/python2.6/site-packages/allmydata/mutable/filenode.py:661:<lambda>
/usr/lib64/python2.6/site-packages/allmydata/mutable/filenode.py:689:_upload
/usr/lib64/python2.6/site-packages/allmydata/mutable/publish.py:404:publish

I can see from the second capture that tahoe connects exactly to that storage
server that has been stopped: connectTCP to ('256.256.256.256', 66666)

Versions used locally:
python -c "import pkg_resources;print ', '.join([d.project_name+': '+d.version for d in set(pkg_resources.require('allmydata-tahoe'))])"

Nevow: 0.10.0, foolscap: 0.6.3, setuptools: 0.6c11, Twisted: 11.1.0, zfec: 1.4.22, zbase32: 1.1.3, pyOpenSSL: 0.13, simplejson: 2.3.2, mock: 0.7.2, argparse: 1.2.1, pycryptopp: 0.6.0.1206569328141510525648634803928199668821045408958, pyutil: 1.8.4, zope.interface: 3.8.0, allmydata-tahoe: allmydata-tahoe-1.9.0-94-gcef646c, pycrypto: 2.5, pyasn1: 0.0.13

all 9 storage servers use these:
Nevow: 0.10.0, foolscap: 0.6.3, setuptools: 0.6c11, Twisted: 11.1.0, zfec: 1.4.22, pycrypto: 2.4.1, zbase32: 1.1.3, pyOpenSSL: 0.13, simplejson: 2.3.2, mock: 0.7.2, argparse: 1.2.1, pyutil: 1.8.4, zope.interface: 3.8.0, allmydata-tahoe: 1.9.1, pyasn1: 0.0.13, pycryptopp: 0.5.29

-- 
left blank, right bald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20120321/a80d4e9a/attachment.pgp>