#1936 new defect

the tahoe-lafs logging system is hard to discover

Reported by: zooko Owned by:
Priority: normal Milestone: undecided
Component: code-nodeadmin Version: 1.9.2
Keywords: usability transparency logging Cc:
Launchpad Bug:

Description (last modified by leif)

docs/logging.rst explains how to get logging info out of your Tahoe-LAFS node. (Mostly for developers editing the source code, but also potentially useful for operators.) Nobody discovers this! Many people have reported a similar problem, frustration, and waste of time as Itamar did just now on IRC:

<itamar1> can someone help me with tahoe's logging code?
<zooko> itamar1: I can!  [10:14]
<zooko> Actually, there's a doc.
<itamar1> oh
<zooko> So, let's see if that doc answers your questions
<itamar1> I guess I should've looked for docs first :)
<zooko> find docs -iname '*log*'
* itamar1 goes looking
<zooko> Well, secondly I wonder if the existence of that doc could be
		better-documented!  [10:15]
<zooko> Maybe linked from some "devel" page about tahoe-lafs...
* zooko looks at https://tahoe-lafs.org/trac/tahoe-lafs/wiki/Dev
<itamar1> I sort've expected everything to be logged by default  [10:16]
<zooko> Hm, well I'm not sure how to make the logging doc more discoverable.
<zooko> Many people have had that exact same problem, including me many times.
<zooko> I more or less approve of Brian's decision to make logging terse by
		default.
<zooko> But
<itamar1> and then spent sometime trying to find log level settings in command
		  line help
<zooko> I'd really like to help people figure it out more easily.
<zooko> Yeah, I'm sure a lot of people have lost time like that. Many have
		reported it on the IRC channel.

Change History (13)

comment:1 Changed at 2013-03-25T16:26:15Z by zooko

Proposal: the tahoe --help output points to the logging docs.

<zooko> Maybe the command-line help should point to the logging doc file.
<itamar1> seems reasonable

comment:2 Changed at 2013-04-23T16:50:31Z by zooko

This problem is compounded by the fact that there exists a ~/.tahoe/logs directory which looks at first glance like it has all the logs in it. This means people will assume that they've seen the logs after looking in there, and if they don't see any error messages in there then there aren't any error message being logged. An example of this (probably) just happened:

<dash> hmm. anyone tried bzr with tahoe-lafs over sftp recently?   [10:01]
<dash> getting errors about "readv() returned 0 bytes"  [10:02]
<zooko> dash: hm. I have not.
<zooko> This is... bzr talking to the tahoe-lafs SFTP interface?  [10:28]
<zooko> dash: do you know how to loo for tahoe-lafs logs?
<dash> sure, in ~/.tahoe/logs

comment:3 follow-up: Changed at 2013-04-23T18:35:02Z by warner

Let's see, ~/.tahoe/logs contains two things:

  • twistd.log, which has top-level crashes that happen after daemonization (like "address already in use"), and other unhandled exceptions
  • incidents/, which are foolscap "flog" bundles, created when something weird happens

In addition, there's stuff written to stderr during tahoe start, if it happens before daemonization. There's also runtime logging accessible with flogtool tail NODEDIR/private/logport.furl.

We have a couple tickets about moving errors from post-daemonization to pre- , where they're immediately visible to the person running tahoe start or tahoe run (in contrast, post-daemonization errors are pretty hard to discover). I think we've managed to move most configuration errors to the pre-daemonization phase (by doing all our checking in Client.__init__), but address-already-in-use might be a significant exception.

What if we had the node write out a ~/.tahoe/logs/README each time it started, with a few lines about what each logfile was for, and how to run flogtool tail ?

comment:4 in reply to: ↑ 3 Changed at 2013-04-23T18:41:59Z by zooko

Replying to warner:

Let's see, ~/.tahoe/logs contains two things:

  • twistd.log, which has top-level crashes that happen after daemonization (like "address already in use"), and other unhandled exceptions
  • incidents/, which are foolscap "flog" bundles, created when something weird happens

In addition, there's stuff written to stderr during tahoe start, if it happens before daemonization. There's also runtime logging accessible with flogtool tail NODEDIR/private/logport.furl.

We have a couple tickets about moving errors from post-daemonization to pre- , where they're immediately visible to the person running tahoe start or tahoe run (in contrast, post-daemonization errors are pretty hard to discover). I think we've managed to move most configuration errors to the pre-daemonization phase (by doing all our checking in Client.__init__), but address-already-in-use might be a significant exception.

I agree it would be an improvement to move more things to pre-daemonization. For that matter, what about the major strategy change of not implementing daemonization ourselves at all! We could leave that job up to daemontools, supervisord, upstart, systemd, or tahoe run > tahoe-stdout.log.txt 2> tahoe-stderr.log.txt &.

What if we had the node write out a ~/.tahoe/logs/README each time it started, with a few lines about what each logfile was for, and how to run flogtool tail ?

+1

Well, actually it should probably instruct you to read the docs/logging.rst file (on the principle of Don't Repeat Yourself).

Also, what do you think about the earlier proposal to have tahoe --help mention the existence of the docs/logging.rst file?

Last edited at 2014-03-05T02:49:45Z by daira (previous) (diff)

comment:5 Changed at 2013-04-23T19:01:08Z by zooko

<zooko> Updated #1936 to say that our system misleads people like dash into
		thinking they've seen the logs when they haven't.  [10:51]

comment:6 Changed at 2013-04-23T19:19:20Z by warner

Replying to zooko:

For that matter, what about the major strategy change of not implementing daemonization ourselves at all! We could leave that job up to daemontools, supervisord, upstart, systemd, or {{{tahoe run > tahoe-stdout.log.txt 2> tahoe-stderr.log.txt &}}}.

I dunno. Daemonizing a tahoe node is a very common thing for people to do. Twisted has excellent built-in daemonization tools. I don't want to ask users to discover and learn some external daemonization tool before they're able to do a very common operation. And my own test/deployment workflow would be a lot harder if I had to use daemontools just to launch a half-dozen local nodes. I still miss tahoe's built-in --multiple feature. So I'm -1 on removing this functionality from tahoe.

Incidentally, that tahoe run >logs & command doesn't record a PID anywhere (making it hard to kill, especially when you run multiple nodes), doesn't necessarily detach from the terminal (so logging out might kill the process), doesn't rotate the logfiles, throws any init-time errors into the stderr logfile where they won't be noticed, and has no way to report setup/config errors via an exit status code. All things that tahoe start provides :).

OS packagers, on the other hand, will have some preferred daemonization solution in mind already, so the non-daemonizing tahoe run is important to maintain for them to use.

Also, what do you think about the earlier proposal to have tahoe --help mention the existence of the docs/logging.rst file?

It's not a bad idea, although I worry that tahoe --help is too long (61 lines at the moment). Once it grows beyond a page, it becomes hard to use. But one more line might be ok.

comment:7 follow-ups: Changed at 2013-04-24T09:46:44Z by gdt

I don't think it's accurate to say "OS packagers will have a daemonization plan". As a packager I expect daemons to be well behaved by themselves. Having a pidfile written would be good.

It would also be nice to have an easy way to use syslog. I gather flogtool has advantages, but syslog is the standard approach.

comment:8 in reply to: ↑ 7 Changed at 2013-04-24T21:42:43Z by zooko

Replying to gdt:

I don't think it's accurate to say "OS packagers will have a daemonization plan". As a packager I expect daemons to be well behaved by themselves. Having a pidfile written would be good.

It would also be nice to have an easy way to use syslog. I gather flogtool has advantages, but syslog is the standard approach.

Would you want the complete, verbose logs sent to syslog? It is currently possible by running flogtool tail ~/.tahoe/private/logport.furl | logger. Is that the behavior you want?

comment:9 in reply to: ↑ 7 Changed at 2013-04-24T21:43:01Z by zooko

Replying to gdt:

I don't think it's accurate to say "OS packagers will have a daemonization plan". As a packager I expect daemons to be well behaved by themselves. Having a pidfile written would be good.

It currently has all of the above.

comment:10 follow-up: Changed at 2013-05-01T22:24:13Z by gdt

Replying to https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1936#comment:8:

What I would expect is for tahoe to use the LOG_{DEBUG,INFO,NOTICE,WARNING,ERR} hierarchy from syslog.h, and for most of the current logs to be DEBUG. INFO might be peer connections coming and going, and perhaps NOTICE for the introducer. Then people can configure syslogd appropriately for what they want. So piping flogtool to logger fails to label messages with priority level. And then there's assessing what logs make sense for routine operations when nothing is known to be wrong. (Presumably fetching files when all works ok is not noteworthy.)

One thing I noticed about flogtool is that when starting it, the past is apparently in the past, and not accessible. That sort of makes sense, but it somewhat defeats the purpose of logging.

comment:11 in reply to: ↑ 10 Changed at 2013-05-02T00:07:38Z by daira

Replying to gdt:

One thing I noticed about flogtool is that when starting it, the past is apparently in the past, and not accessible. That sort of makes sense, but it somewhat defeats the purpose of logging.

Did you see the --catch-up option to flogtool tail? It only outputs entries that are still in memory, but that's often enough.

comment:12 Changed at 2013-05-03T15:48:47Z by zooko

Let's open a separate ticket for integrating tahoe-lafs logging with syslog.

comment:13 Changed at 2013-05-19T07:11:28Z by leif

  • Description modified (diff)

Related: #1974 (wui: the "report an incident" feature doesn't submit incident reports)

Note: See TracTickets for help on using tickets.