[tahoe-dev] So, I wanted to track down and properly report a reproducable error.

Brian Warner warner at lothar.com
Thu Mar 29 22:53:35 UTC 2012


On 3/21/12 6:56 AM, markus reichelt wrote:

> So, I wanted to track down and properly report a reproducable error.
> The following is a write-up of the procedure that lead to
> https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1689

Thanks for the detailed writeup!

>  flogtool dump .tahoe/private/logport.furl

Yeah, "flogtool dump" is for dumping the contents of saved logfiles.
"flogtool tail" is the one that knows how to connect to a running
program and stream out log events. I just added code to Foolscap to
notice when you point "dump" at a furl file and suggest that you try
"tail" instead.

> ...and it just sits there. Might be network related, let's check
> what logport.furl looks like:
> 
> cat .tahoe/private/logport.furl
> pb://hjsdfuzz2jfsjkhfjksdfuisdzfifffa@127.0.0.1/hkjahf82364jgadhkh

> tub.location = 127.0.0.1

Ah, good catch. "tub.location" is specified to contain a comma-joined
list of "connection hints", and each hint is either "HOST:PORT" (for
ipv4 hosts) or "PREFIX:STUFF:MORESTUFF" for other kinds of hints that
are invented in the future (ipv6, i2p, whatever). So this should have
been rejected much earlier: that client node shouldn't have been willing
to start with a tub.location like that, and "flogtool tail" should have
rejected it too.

I've updated Foolscap to reject these locations.

> The error I have found is reproducable by just stopping one storage
> server. After having done that, I again run flogtool to capture debug
> data:
> 
> flogtool tail --save-to=/tmp/flogtool.tail-2.txt .tahoe/private/logport.furl

FYI, that --save-to= file is an event log (usable by "flogtool dump"),
not a text file, so usually name them "stuff.flog" instead of ".txt",
just to avoid confusion later.

> ERROR: AssertionError()
> "[Failure instance: Traceback: <type 'exceptions.AssertionError'>: "
> /usr/lib64/python2.6/site-packages/allmydata/mutable/filenode.py:563:upload
> /usr/lib64/python2.6/site-packages/allmydata/mutable/filenode.py:661:_do_serialized
> /usr/lib64/python2.6/site-packages/twisted/internet/defer.py:298:addCallback
> /usr/lib64/python2.6/site-packages/twisted/internet/defer.py:287:addCallbacks
> --- <exception caught here> ---
> /usr/lib64/python2.6/site-packages/twisted/internet/defer.py:545:_runCallbacks
> /usr/lib64/python2.6/site-packages/allmydata/mutable/filenode.py:661:<lambda>
> /usr/lib64/python2.6/site-packages/allmydata/mutable/filenode.py:689:_upload
> /usr/lib64/python2.6/site-packages/allmydata/mutable/publish.py:404:publish
> 
> I can see from the second capture that tahoe connects exactly to that storage
> server that has been stopped: connectTCP to ('256.256.256.256', 66666)

Hmm.. that suggests that we're not paying attention to the server
disconnect, and then trying to use them during the repair, and then get
surprised when we can't talk to them, or something.

I'll see if I can reproduce this locally and track it down.

Great detective work!

cheers,
 -Brian


More information about the tahoe-dev mailing list