Opened at 2010-01-12T13:37:10Z
Last modified at 2010-01-13T18:56:26Z
#896 assigned defect
log more info about Foolscap disconnections on storages nodes (to twistd.log)
Reported by: | francois | Owned by: | francois |
---|---|---|---|
Priority: | minor | Milestone: | undecided |
Component: | code-network | Version: | 1.5.0 |
Keywords: | foolscap DeadReferenceError | Cc: | |
Launchpad Bug: |
Description
All the storage nodes in my grid are logging the following exception a few times a day in logs/twistd.log.
2010-01-04 01:37:21+0100 [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost
It'd be great to have a little more details about those disconnections, especially the peerid of the node with which the connection was lost.
Change History (3)
comment:1 follow-up: ↓ 2 Changed at 2010-01-13T00:13:39Z by warner
- Summary changed from Foolscap disconnections on storages nodes to log more info about Foolscap disconnections on storages nodes (to twistd.log)
comment:2 in reply to: ↑ 1 Changed at 2010-01-13T08:54:39Z by francois
- Owner set to francois
- Status changed from new to assigned
Replying to warner:
hm, is there anything else in twistd.log that might help us figure out which piece of code is emitting that? We probably have dozens of bare log.err calls that could be producing it.
Unfortunately not.
OTOH, we should probably do a comprehensive audit and add at least a msgid to all of our log.err calls.
I can take care of it, but how do you generate these msgid's ?
comment:3 Changed at 2010-01-13T18:56:26Z by warner
In the context of twisted's log.err, it means adding a second positional argument to all invocations of twisted.python.log.err, a string with a unique name. For example, turning log.err(f) into log.err(f, "OMG kaboom") will turn this:
2010-01-13 10:26:08-0800 [-] Unhandled Error Traceback (most recent call last): File "/usr/lib/python/twisted/internet/base.py", line 757, in runUntilCurrent call.func(*call.args, **call.kw) --- <exception caught here> --- File "/tmp/ftest.py", line 13, in fire raise ValueError("oops") exceptions.ValueError: oops
into this:
2010-01-13 10:26:08-0800 [-] OMG kaboom Traceback (most recent call last): File "/usr/lib/python/twisted/internet/base.py", line 757, in runUntilCurrent call.func(*call.args, **call.kw) --- <exception caught here> --- File "/tmp/ftest.py", line 13, in fire raise ValueError("oops") exceptions.ValueError: oops
The source tree has a simple tool for creating (probably-) unique message ids (UMIDs), just run python misc/make_umid. Putting that in the log.err() call will then let us grep the source tree for the UMID later. It would also be appropriate to include a short description of the code path that led to the log.err call, but in practice it's usually more precise to use the UMID, since the descriptions frequently end up saying stuff like "got unhandled error from peer", without also saying "while we were trying to download the share hash tree for an immutable file in src/allmydata/immutable/downloader.py line 1234". But if you're feeling wordy, go for it :)
For *foolscap* log calls, we usually pass this UMID as the msgid= argument, because it's pretty easy to extract the extra keyword arguments from the logged message later on. But Twisted's log.err doesn't make this so easy (kwargs are passed to observers but the twisted.log writer ignores them), so the msgid needs to be passed as the second positional argument (which *does* get written to twistd.log). Foolscap's log.err (and the tahoe allmydata.util.log.err wrapper) accept the same second posarg.
find src -name '*.py' | xargs grep -n log.err will tell you about all the calls to log.err. There are three forms to be aware of:
d = create_some_deferred() d.addCallback(do_some_stuff) d.addErrback(log.err) # turn into d.addErrback(log.err, "message")
d = create_some_deferred() d.addCallback(do_some_stuff) def error_handler(f) log.err(f) # turn into log.err(f, "message") d.addErrback(error_handler)
from twisted.python.failure import Failure try: do_stuff() except ExceptionType: log.err() # turn into log.err(Failure(), "message")
thanks!
hm, is there anything else in twistd.log that might help us figure out which piece of code is emitting that? We probably have dozens of bare log.err calls that could be producing it.
OTOH, we should probably do a comprehensive audit and add at least a msgid to all of our log.err calls.