#562 new defect

add a "censor" command to filter out sensitive information from log files — at Version 18

Reported by: zooko Owned by: somebody
Priority: major Milestone: eventually
Component: code Version: 1.2.0
Keywords: privacy logging confidentiality Cc:
Launchpad Bug:

Description (last modified by zooko)

per tahoe-dev/2008-December/000946.html it would be good to omit the introducer furl from the log file.

This is part of a cluster of tickets including: #562, #563, #685, #1008, and #1904.

Change History (20)

comment:1 Changed at 2009-11-01T02:04:45Z by davidsarah

If you like this bug, you might also like #823.

comment:2 Changed at 2009-12-20T23:41:01Z by davidsarah

  • Keywords privacy added

If you like this bug, you might also like #860.

comment:3 Changed at 2010-02-01T19:51:37Z by davidsarah

  • Keywords logging added
  • Milestone changed from undecided to 1.7.0

comment:4 Changed at 2010-02-21T20:29:58Z by kevan

  • Owner changed from somebody to kevan

comment:5 Changed at 2010-02-23T01:25:02Z by kevan

First, note that the log file that inspired this ticket is here: pipermail/tahoe-dev/attachments/20081222/20cc919e/attachment-0001.html

The tahoe-lafs code itself, unless I'm missing something, doesn't ever print the introducer_furl to a log. I notice that there's one exception in there with a censored furl; perhaps that's an artifact from how things were then, or something that foolscap is doing? I'll look into that more thoroughly later.

I do notice that the storage server furls are also censored in the motivating log file. I don't mind having them there in my log files, and, as Zooko points out in that thread, censoring too much makes the log files less useful. Maybe this can be a configuration switch -- if paranoid logging is turned on, then IP addresses, storage server furls, storage indices/verify caps are censored somehow, and if not they aren't.

Last edited at 2014-08-27T04:44:46Z by zooko (previous) (diff)

comment:6 Changed at 2010-02-23T03:43:22Z by kevan

..alternatively, maybe there's a way that we could add a tool to censor logs after they've been created.

For example, you can do

flogtool filter --after=5 logs/from-2010-02-21-124158--to-present.flog filtered.flog

to post-process logs that way. So maybe you could, if you wanted a censored log snippet to post to tahoe-dev or on the Trac, do something like

flogtool censor logs/from-2010-02-21-124158--to-present.flog censored.log

and have flogtool (or whatever) obfuscate the SIs, furls, and so on. Of course, it's probably much harder to do it that way.

Censorship in a running node is relatively easy, as you can easily determine what is what as it is being logged, and censor accordingly. Censorship after the fact is much harder, because you need to be able to reliably determine whether a certain string is a furl, a storage index, an IP address, something else that should be censored, or nothing at all. It seems to be closer to what I as a user would want, though; if I want to have a useful, low-effort log to attach to a bug report, I shouldn't have to run my node such that it never produces logs with information that might help me later, nor should I have to stop, reconfigure, and restart my node, then hope that the problem reappears.

comment:7 Changed at 2010-04-05T12:23:40Z by francois

Kevan,

I like your idea of creating a new 'flogtool censor' command.

What about tagging potentially sensitive informations at logging time? For example, let's modify this type of log line

 connectTCP to ('127.0.0.1', 55368)

into

 connectTCP to ('<IP>127.0.0.1</IP>', 55368)

It will then by pretty easy to filter out IP addresses, furls, storage indexes and so on.

comment:8 Changed at 2010-04-13T22:53:46Z by kevan

  • Status changed from new to assigned

That would solve the problem.

I haven't had much time to play with the censorer lately, but it's more or less functional now, with that idea. I'm hoping I can have some patches and tests for people to play with by the end of this weekend.

comment:9 Changed at 2010-05-01T23:48:24Z by kevan

A correct solution to this will probably need to be implemented in foolscap, since it turns out that a lot of the compromising log entries come from there.

David-Sarah suggested that foolscap could offer callers of its logging system a way to mark certain log messages (or certain parts of certain log messages) as sensitive, so flogtool censor or whatever would know to censor them. For example,

from foolscap.logging import log
[...]
log.msg("some stuff" + log.sensitive("sensitive information")

You'd basically need to do the following to solve this ticket, if you wanted to do it as above:

  1. Decide how to represent sensitive information in foolscap logs, and implement the sensitive function.
  2. Implement flogtool censor.
  3. Go through and audit logging code in foolscap and tahoe-lafs so that it uses sensitive where appropriate.
  4. Make patches for your changes and get them accepted into foolscap and tahoe-lafs.

Between GSoC and school, I'm not going to have time to do all of that before 1.7 is due, so I'm unaccepting this ticket in case someone else wants to finish what I've started. I implemented 2, but as tahoe censor. I'm attaching that, and the tests I wrote for it to this ticket -- maybe they'll be useful somehow to whoever accepts this ticket. If I do get time, I'll re-accept it and continue working on it.

Changed at 2010-05-01T23:49:00Z by kevan

implementation of 'tahoe censor'

Changed at 2010-05-01T23:49:22Z by kevan

tests for 'tahoe censor'

comment:10 Changed at 2010-05-01T23:50:09Z by kevan

  • Owner changed from kevan to somebody
  • Status changed from assigned to new

comment:11 Changed at 2010-06-16T04:25:27Z by davidsarah

  • Keywords review-needed added
  • Milestone changed from 1.7.0 to 1.7.1

comment:12 Changed at 2010-07-11T17:41:57Z by zooko

  • Keywords review-needed removed
  • Milestone changed from 1.7.1 to undecided

It sounds like from Kevan's comment:9 that he would not recommend committing these patches to Tahoe-LAFS trunk. Therefore I'm unsetting "review-needed".

comment:13 Changed at 2012-02-23T00:35:24Z by davidsarah

  • Milestone changed from undecided to soon

comment:14 Changed at 2013-01-14T06:29:15Z by zooko

  • Description modified (diff)
  • Keywords confidentiality added
  • Milestone changed from soon to eventually
  • Summary changed from censor introducer furl from log files to add a "censor" command to filter out sensitive information from log files

comment:15 Changed at 2013-01-14T08:02:57Z by zooko

Other potentially sensitive information that shows up in foolscap logs (including incident report files):

  • storage server furls
  • the exact sizes of files
  • the self-chosen nicknames of servers

comment:16 Changed at 2013-01-14T08:18:34Z by zooko

This issue is interfering with debugging #1670, because a user has reported an occurrence of #1670, but their incident report files contain information which is sensitive to them, so they don't want their flog files posted to the issue tracker.

comment:17 Changed at 2013-01-14T09:00:43Z by zooko

  • Description modified (diff)

comment:18 Changed at 2013-01-14T09:06:26Z by zooko

  • Description modified (diff)
Note: See TracTickets for help on using tickets.