#1010 closed enhancement

use only 127.0.0.1 as local address — at Version 34

Reported by: duck Owned by: warner
Priority: minor Milestone: 1.12.0
Component: code-network Version: 1.6.1
Keywords: privacy anonymity docs anti-censorship forward-compatibility i2p-collab i2p tor-protocol Cc: killyourtv@…
Launchpad Bug:

Description (last modified by killyourtv)

For the anonymous network use case (such as I2P), we want to only use 127.0.0.1 as loopback address. Right now Tahoe discovers all local addresses through various strategies and discloses them to (atleast) the introducer.

For I2P we have introduced the configuration option anonymize_local_addresses (which we consider renaming to tub.anonymize) to disable this lookup.

Example of configuration in tahoe.cfg:

    [node]
    ...
    anonymize_local_addresses = true

Snippit showing how this is used in node.py:

    if self.get_config("node", "anonymize_local_addresses", False, boolean=True):
        d.addCallback(lambda res: ['127.0.0.1'])
    else:
        d.addCallback(lambda res: iputil.get_local_addresses_async())
    d.addCallback(self._setup_tub)

Change History (41)

comment:1 Changed at 2010-04-03T23:00:44Z by davidsarah

What is the motivation for using all local addresses normally?

comment:2 follow-up: Changed at 2010-04-08T10:24:13Z by ioerror

It seems like a great idea to have a tub.anonymize flag. It would be fantastic to have Tahoe throw exceptions like confetti if that option is set and a few key (tub.address) settings aren't configured. Anything less may lead to information PII leakage.

There are a number of sanitation issues that need to be carefully handled. Having a single bit to say we want to try to handle those issues is probably a good start.

comment:3 in reply to: ↑ 2 Changed at 2010-04-12T21:08:23Z by davidsarah

  • Keywords test added

Replying to ioerror:

It seems like a great idea to have a tub.anonymize flag. It would be fantastic to have Tahoe throw exceptions like confetti if that option is set and a few key (tub.address) settings aren't configured. Anything less may lead to information PII leakage.

There are a number of sanitation issues that need to be carefully handled. Having a single bit to say we want to try to handle those issues is probably a good start.

+1 for having a single config bit to indicate that the user wants anonymous operation.

However,

  • "tub.*" options apply to foolscap tubs, and the single config bit may need to change behaviour other than in foolscap.
  • rather than giving an error if the other relevant options aren't configured, this option can simply override them. It might give an error if the other option has been explicitly set in conflict with what is needed for anonymity.

comment:4 Changed at 2010-04-13T00:31:20Z by davidsarah

  • Keywords docs added
  • Milestone changed from undecided to 1.7.0

Also needs a doc patch.

comment:5 Changed at 2010-04-13T00:32:20Z by davidsarah

  • Owner set to duck

comment:6 Changed at 2010-04-27T16:48:18Z by zooko

  • Keywords review-needed removed

Unsetting review-needed since it needs a doc patch before going back to the review-needed status.

comment:7 follow-up: Changed at 2010-04-28T03:35:13Z by warner

I think that "tub.location = " (setting it to an empty string) is even better. tub.location is the right thing to set here: everything sent to the introducer will derive from what the Tub concludes, and tub.location is the way to override that automatically-figure-out-my-own-addresses behavior.

Perhaps a "[node]anonymous" flag would be useful as a statement of policy, and implemented as a check in various places: if "anonymous=true", then we scan the tub's location just before sending it to the introducer, and throw an exception if it isn't empty? And if we add other places that reveal identifiable information in the future, we also guard those with the "if not anonymous" check?

davidsarah: the motivation for including 127.0.0.1 in the list-of-addresses is to allow two nodes on the same machine to establish a fast (loopback) connection to each other. I use this all the time in test scenarios, and in grids in which the helper runs on the same node as something else (generally an introducer or a storage server). There's half an argument to remove it, but I think that most of those cases are handled better by having people publish an explicit tub.location that doesn't include it.

comment:8 in reply to: ↑ 7 Changed at 2010-04-28T16:28:33Z by davidsarah

Replying to warner:

davidsarah: the motivation for including 127.0.0.1 in the list-of-addresses is to allow two nodes on the same machine to establish a fast (loopback) connection to each other.

I meant, what is lost by only having 127.0.0.1 in the list of addresses, all the time?

comment:9 Changed at 2010-06-17T04:25:54Z by zooko

  • Milestone changed from 1.7.0 to eventually

comment:10 Changed at 2010-09-07T17:20:51Z by zooko

What's the next step on this ticket? Brian answer David-Sarah's question from comment:8? Duck write docs as requested in comment:4?

comment:11 Changed at 2010-10-06T17:50:44Z by zooko

  • Owner changed from duck to warner

Brian: please answer David-Sarah's question from comment:8.

comment:12 Changed at 2010-10-22T14:50:27Z by zooko

  • Keywords tor added

Adding keyword "tor" since the same issues probably apply to tor users as apply to i2p users. (Indeed, I suspect all "tor" and "i2p" tags should probably just be converted to "privacy" tags.)

comment:13 Changed at 2010-10-23T00:51:12Z by davidsarah

  • Keywords anonymity added; i2p tor removed

comment:14 Changed at 2010-12-16T01:26:30Z by davidsarah

  • Keywords anti-censorship added

Changed at 2011-01-12T21:54:12Z by duck

Allow for local address anonymization

comment:15 Changed at 2011-01-12T22:02:43Z by duck

Brian's suggestion from comment:7 has been taken; if "tub.location = " (an empty string), then the local address discovery is replaced by just using 127.0.0.1 as address; this replaces the previously suggested anonymize_local_addresses = true option.

While an anonymous policy flag as suggested by ioerror in comment:2 and by Brian in comment:6 would be a good idea, I consider that material for another ticket.

In addition to this an unit test has been written. Once #1301 is implemented the trial patch decorator can be used instead.

comment:16 Changed at 2011-01-12T22:03:17Z by duck

  • Keywords review-needed added

comment:17 Changed at 2011-06-26T15:47:26Z by zooko

This is the Ticket of the Week in Tahoe-LAFS Weekly News edition 4: http://tahoe-lafs.org/~zooko/TWN4.html

comment:18 Changed at 2011-06-26T15:52:08Z by gdt

If a node is only a client, then perhaps it should default to not disclosing addresses. ( I realize that in theory a client with a public address could get incoming connections from a server without a routable address, but I consider servers without routable addresses to be buggy.)

comment:19 Changed at 2011-07-23T20:31:07Z by zooko

  • Summary changed from Only use 127.0.0.1 as local address to use only 127.0.0.1 as local address

comment:20 Changed at 2011-07-23T21:08:39Z by zooko

  • Milestone changed from eventually to 1.9.0

I'm putting this into the 1.9 Milestone because duck has done the work we asked of him and it would feel good to therefore include his patch in 1.9. (review-needed!)

comment:21 Changed at 2011-07-25T15:02:04Z by marlowe

  • Keywords reviewed added; review-needed removed

Reviewed patch and looks good to me.

comment:22 follow-up: Changed at 2011-07-31T20:00:13Z by warner

whoops, I *really* lost track of this one.

DS> I meant, what is lost by only having 127.0.0.1 in the list of DS> addresses, all the time?

Um, if everyone in the grid only publishes 127.0.0.1, then how will distant nodes ever connect to each other? Since we don't have UPnP or NAT traversal, we need everyone (well, N-1) to have+publish a public IP address.

This patch needs docs: in addition to making sure people can successfully use this feature, we need to a place to answer the user confusion that's likely to occur when someone assumes that "tub.location=" should behave the same way as "#tub.location=". (I'm not as sure that ["tub.location=" == anonymous] is the best UI for this, at least not as sure as I was 15 months ago. The whole-config "anonymous" flag feels like an important addition.)

gdt's observation in comment:18 is a good one. Ideally, pure-clients should be able to hang out behind NAT and not admit to having a real address. (I *think* the FURL location-hint format will tolerate this, but I haven't actually tested it). We've always been on the fence about whether Tahoe is a client-server system or a P2P system. Having clients announce their addresses makes it more P2Pish.

I've had topology problems (servers behind NAT) which made me glad that it's possible for servers to connect to clients too. To actually enable this, I had to make my "clients" pretend to be servers (but with storage.readonly=true): otherwise the servers wouldn't hear about the client and wouldn't try to connect. Nodes only actually publish their storage FURLs to the Introducer if they're configured as servers (see init_storage() in client.py). But they'll reveal their FURLs (along with their IP address) in any reference that passes over the wire.

So anyways, I'm ok with this patch if it includes a paragraph in docs/configuration.rst (in the section on tub.location) explaining what happens when you use "tub.location=" and why you might want to do that. If we find that it's hard to explain this feature in there, then maybe it's not a good feature to add.

Last edited at 2011-08-01T01:09:18Z by warner (previous) (diff)

comment:23 in reply to: ↑ 22 Changed at 2011-08-01T00:34:31Z by davidsarah

Replying to warner:

whoops, I *really* lost track of this one.

DS> I meant, what is lost by only having 127.0.0.1 in the list of addresses, all the time?

Um, if everyone in the grid only publishes 127.0.0.1, then how will distant nodes ever connect to each other? Since we don't have UPnP or NAT traversal, we need everyone (well, N-1) to have+publish a public IP address.

Yes, I realized that was a silly question (for nodes in general) but forgot to unask it. I suppose I was thinking only of clients, or more precisely non-servers.

gdt's suggestion of only including 127.0.0.1 in the list of addresses for non-servers makes sense to me, especially given this:

Nodes only actually publish their storage FURLs to the Introducer if they're configured as servers (see init_storage() in client.py).

comment:24 follow-up: Changed at 2011-08-01T01:14:53Z by warner

actually in that case we might as well not advertise *any* addresses. Thta's a fairly clear hint that we don't want other people trying to contact us :)

comment:25 in reply to: ↑ 24 Changed at 2011-08-01T04:32:08Z by davidsarah

Replying to warner:

actually in that case we might as well not advertise *any* addresses.

I wasn't sure if that would break anything, but yes.

How does this interact with #1086? Do we still want servers not to try to connect to clients by default?

Changed at 2011-08-11T01:50:03Z by davidsarah

docs/configuration.rst: document 'tub.location =' for hiding local IP addresses. refs #1010

Changed at 2011-08-11T02:16:48Z by davidsarah

node.py: implement 'tub.location =' for hiding local IP addresses. fixes #1010

Changed at 2011-08-11T02:23:29Z by davidsarah

test_node.py: test that 'tub.location =' hides local IP addresses. This version unpatches on synchronous exceptions, and uses fileutil.write. refs #1010

comment:26 Changed at 2011-08-11T02:28:45Z by davidsarah

  • Keywords review-needed added; test reviewed removed

Let's kick the question of what addresses the client should advertise by default out to the next release. I don't think that this patch conflicts with any decision we would be likely to make about that. It also doesn't conflict with adding a whole-config anonymous flag.

I've recorded the changes as darcs patches, added some docs, and made a couple of minor improvements to the test (see its description). The fix itself hasn't changed and is already reviewed.

comment:27 Changed at 2011-08-11T03:31:50Z by Zarutian

Reviewed 1010-docs.darcs.patch and found no glaring errors. Review still needed for test-1010.darcs.patch.

Changed at 2011-08-11T18:22:37Z by warner

combined some cleanup with the other three patches

comment:28 Changed at 2011-08-11T18:34:05Z by zooko

Review:

I haven't thought through davidsarah's assertion in comment:26 that this patch won't make it harder to do what I want (clients listen for connections from servers by default, and anonymous-mode is an explicit flag instead of setting location='') in the future, but I'll take their word for it.

I find it very confusing that location='' in the tahoe.cfg file means "Emit only 127.0.0.1" but location='' in src/allmydata/node.py's _setup_tub() means "Discover all local IP addresses and emit them.". This is on top of my slight confusion about the fact that location=None in tahoe.cfg has a different meaning from location=''. Or wait -- does it? Is one of them the same as not having a location entry at all?

Off I go to search for answers in the docs in attachment:cleanup-1010.dpatch. But the fact that I experience this much confusion after glancing at a few lines of the code and the config file is a bad sign.

comment:29 Changed at 2011-08-11T19:06:03Z by zooko

  • Keywords review-needed removed

Okay, I've started reading the docs patch from attachment:cleanup-1010.dpatch:

It helps to allay my confusion because it explicitly says "Note that this is not the same as omitting tub.location.". However, it doesn't help all the way: what does it mean if you omit tub.location? (The answer, I believe is, that it discovers your local IP addresses and advertises them.) Does it mean anything if you say tub.location=None, or is that an error?

I think there are four different use cases here, three of which are currently supported, and our docs should be more explicit about enumerating them.

  1. Discover your local IP addresses and announce them.
  2. Don't announce any routable IP address.
  3. Announce a configured IP address instead of the dynamically discovered one(s).
  4. [not supported yet] Announce a configured IP address in addition to the dynamically discovered one(s).

Also: is there a valid distinction between

  • 2.a. Don't announce any routable IP address, but announce unrouteable (LAN-scoped) IP addresses. Then nodes from out on the Internet cannot open connection to you, but nodes on your LAN can. Also, people on your LAN can use this to confirm your identity as a certain Tahoe-LAFS node, even if nodes on the Internet can't. (Is this true? If you've for example, configured all of your outgoing connections to go through a Tor or I2P proxy, but you used this setting, then nodes within your LAN can open a direct TCP connection to you, do foolscap negotiation with you, and thus learn the mapping between your LAN-internal IP address and your foolscap node ID.)
  • 2.b. Don't announce any IP address except 127.0.0.1, meaning that nodes off your own host can't open TCP connections to you but people on your own host can. Again, does this mean you're vulnerable to an identity-revealing attack, even if you use Tor, if the attacker can open TCP connections from your own host? Is this a valid attack? It seems like it might be. What if there is some sort of TCP proxy running on your host so that, even though they can't execute arbitrary code on your host, they can open a TCP connection which looks to you as though it comes from your own host?
  • 2.c. Don't announce any IP address.

Now this patch currently lets the user express option 2 by setting tub.location=, option 1 by setting no tub.location, and option 3 by setting tub.location=207.7.145.194. I'm -1 on this design:

  • I find it confusing.
  • The docs here mix this issue with issue #1086, by saying that advertising an unrouteable IP puts you in "client only mode". In my opinion (and I recognize that gdt, at least, disagrees), whether your tahoe node initiates or accepts TCP connections should be independent of whether it acts a storage server, storage client, or both. (Or introducer or helper.)
  • I'm not sure if it is sufficiently safe to protect the identities (by which I mean IP-address-to-node-ID mappings) of Tor or I2P users. Perhaps we need to support option 2.c., above, instead.
  • I would prefer a design with an explicit node.anonymous = True, as we discussed above.
  • I would also prefer, I think, a design with option 2 being implemented by setting tub.location=127.0.0.1.

We discussed one of those alternative designs above, and we said maybe we can changed out minds later, but this may be a mistake because

  • configuration file semantics are major backward-compatibility issues.

Do we have a plan for how to provide backwards compatibility if we were to change to a different design in a future release? I guess we might need to have a phase where the tahoe node understood both old and new configuration formats, stopped with a fatal error if they were inconsistent, and emitted a warning if the old one existed at all. Then eventually we might go through another cycle like the one we just finished with #1385 where we stop with a fatal error if the old style is present. (By the way, #1385 turned out to be a lot more painful of a patch to integrate and debug than I had anticipated.)

I'm willing to listen to counter-arguments, but at the moment I'm -1 on this design. I haven't finished reviewing the actual patches in attachment:cleanup-1010.dpatch, but I'm going to stop here and focus on #393 instead. I'm sorry I didn't think about the backward-compatibility issues earlier and that I didn't think about the specific configuration format earlier so I would realize I was uncomfortable with it before this late stage. (Also it is too bad I didn't think of the potential identity-revealing weakness earlier so we could have time to think it through before this late stage.)

comment:30 Changed at 2011-08-11T19:08:38Z by zooko

Once we finish this ticket, we should see if that means ticket #517 can also be closed or if there is further work to do for #517.

comment:31 Changed at 2011-08-11T19:13:49Z by zooko

#1207 is a closely related ticket which shows that some people (starting with gdt) want to have yet another variation, where unrouteable IP addresses are excluded from the advertised list.

comment:32 Changed at 2011-08-12T01:22:48Z by davidsarah

  • Keywords forward-compatibility added

It seems as though you can already achieve exactly the effect of the current patch with "tub.location = 127.0.0.1" -- or a similar effect with "tub.location = unreachable.example.org:0" as the documentation suggests.

So on reflection, I'm also -1 on including this in v1.9.

(The cleanup patch also has the improvement of not calling iputil.get_local_addresses_async() if its value is going to be discarded, but that's not urgent.)

Last edited at 2011-08-12T01:23:40Z by davidsarah (previous) (diff)

comment:33 Changed at 2011-08-15T03:06:25Z by davidsarah

  • Milestone changed from 1.9.0 to 1.10.0

Changed at 2013-08-07T16:47:54Z by killyourtv

cleanup and refactored against current trunk

comment:34 Changed at 2013-08-07T16:48:46Z by killyourtv

  • Cc killyourtv@… added
  • Description modified (diff)

I added an updated patch against trunk which uses Brian's last patch.

Note: See TracTickets for help on using tickets.