#896 assigned defect

log more info about Foolscap disconnections on storages nodes (to twistd.log)

Reported by: francois Owned by: francois
Priority: minor Milestone: undecided
Component: code-network Version: 1.5.0
Keywords: foolscap DeadReferenceError Cc:
Launchpad Bug:

Description

All the storage nodes in my grid are logging the following exception a few times a day in logs/twistd.log.

2010-01-04 01:37:21+0100 [-] Unhandled Error
        Traceback (most recent call last):
        Failure: foolscap.ipb.DeadReferenceError: Connection was lost

It'd be great to have a little more details about those disconnections, especially the peerid of the node with which the connection was lost.

Change History (3)

comment:1 follow-up: Changed at 2010-01-13T00:13:39Z by warner

  • Summary changed from Foolscap disconnections on storages nodes to log more info about Foolscap disconnections on storages nodes (to twistd.log)

hm, is there anything else in twistd.log that might help us figure out which piece of code is emitting that? We probably have dozens of bare log.err calls that could be producing it.

OTOH, we should probably do a comprehensive audit and add at least a msgid to all of our log.err calls.

comment:2 in reply to: ↑ 1 Changed at 2010-01-13T08:54:39Z by francois

  • Owner set to francois
  • Status changed from new to assigned

Replying to warner:

hm, is there anything else in twistd.log that might help us figure out which piece of code is emitting that? We probably have dozens of bare log.err calls that could be producing it.

Unfortunately not.

OTOH, we should probably do a comprehensive audit and add at least a msgid to all of our log.err calls.

I can take care of it, but how do you generate these msgid's ?

comment:3 Changed at 2010-01-13T18:56:26Z by warner

In the context of twisted's log.err, it means adding a second positional argument to all invocations of twisted.python.log.err, a string with a unique name. For example, turning log.err(f) into log.err(f, "OMG kaboom") will turn this:

2010-01-13 10:26:08-0800 [-] Unhandled Error
        Traceback (most recent call last):
          File "/usr/lib/python/twisted/internet/base.py", line 757, in runUntilCurrent
            call.func(*call.args, **call.kw)
        --- <exception caught here> ---
          File "/tmp/ftest.py", line 13, in fire
            raise ValueError("oops")
        exceptions.ValueError: oops

into this:

2010-01-13 10:26:08-0800 [-] OMG kaboom
        Traceback (most recent call last):
          File "/usr/lib/python/twisted/internet/base.py", line 757, in runUntilCurrent
            call.func(*call.args, **call.kw)
        --- <exception caught here> ---
          File "/tmp/ftest.py", line 13, in fire
            raise ValueError("oops")
        exceptions.ValueError: oops

The source tree has a simple tool for creating (probably-) unique message ids (UMIDs), just run python misc/make_umid. Putting that in the log.err() call will then let us grep the source tree for the UMID later. It would also be appropriate to include a short description of the code path that led to the log.err call, but in practice it's usually more precise to use the UMID, since the descriptions frequently end up saying stuff like "got unhandled error from peer", without also saying "while we were trying to download the share hash tree for an immutable file in src/allmydata/immutable/downloader.py line 1234". But if you're feeling wordy, go for it :)

For *foolscap* log calls, we usually pass this UMID as the msgid= argument, because it's pretty easy to extract the extra keyword arguments from the logged message later on. But Twisted's log.err doesn't make this so easy (kwargs are passed to observers but the twisted.log writer ignores them), so the msgid needs to be passed as the second positional argument (which *does* get written to twistd.log). Foolscap's log.err (and the tahoe allmydata.util.log.err wrapper) accept the same second posarg.

find src -name '*.py' | xargs grep -n log.err will tell you about all the calls to log.err. There are three forms to be aware of:

d = create_some_deferred()
d.addCallback(do_some_stuff)
d.addErrback(log.err)   # turn into d.addErrback(log.err, "message")
d = create_some_deferred()
d.addCallback(do_some_stuff)
def error_handler(f)
    log.err(f)   # turn into log.err(f, "message")
d.addErrback(error_handler)
from twisted.python.failure import Failure
try:
    do_stuff()
except ExceptionType:
    log.err()  # turn into log.err(Failure(), "message")

thanks!

Note: See TracTickets for help on using tickets.