Opened at 2010-03-29T21:40:08Z
Last modified at 2016-10-09T06:11:26Z
#1010 closed enhancement
anonymous client mode — at Version 47
Reported by: | duck | Owned by: | warner |
---|---|---|---|
Priority: | minor | Milestone: | 1.12.0 |
Component: | code-network | Version: | 1.6.1 |
Keywords: | privacy anonymity docs anti-censorship forward-compatibility i2p-collab i2p tor-protocol | Cc: | killyourtv@… |
Launchpad Bug: |
Description (last modified by zooko)
For the anonymous network use case (such as I2P and Tor), we want to only use 127.0.0.1 as loopback address. Right now Tahoe discovers all local addresses through various strategies and discloses them to (at least) the introducer.
For I2P we have introduced the configuration option anonymize_local_addresses (which we consider renaming to tub.anonymize) to disable this lookup.
Example of configuration in tahoe.cfg:
[node] ... anonymize_local_addresses = true
Snippit showing how this is used in node.py:
if self.get_config("node", "anonymize_local_addresses", False, boolean=True): d.addCallback(lambda res: ['127.0.0.1']) else: d.addCallback(lambda res: iputil.get_local_addresses_async()) d.addCallback(self._setup_tub)
Change History (54)
Changed at 2010-03-29T21:43:07Z by duck
comment:1 Changed at 2010-04-03T23:00:44Z by davidsarah
comment:2 follow-up: ↓ 3 Changed at 2010-04-08T10:24:13Z by ioerror
It seems like a great idea to have a tub.anonymize flag. It would be fantastic to have Tahoe throw exceptions like confetti if that option is set and a few key (tub.address) settings aren't configured. Anything less may lead to information PII leakage.
There are a number of sanitation issues that need to be carefully handled. Having a single bit to say we want to try to handle those issues is probably a good start.
comment:3 in reply to: ↑ 2 Changed at 2010-04-12T21:08:23Z by davidsarah
- Keywords test added
Replying to ioerror:
It seems like a great idea to have a tub.anonymize flag. It would be fantastic to have Tahoe throw exceptions like confetti if that option is set and a few key (tub.address) settings aren't configured. Anything less may lead to information PII leakage.
There are a number of sanitation issues that need to be carefully handled. Having a single bit to say we want to try to handle those issues is probably a good start.
+1 for having a single config bit to indicate that the user wants anonymous operation.
However,
- "tub.*" options apply to foolscap tubs, and the single config bit may need to change behaviour other than in foolscap.
- rather than giving an error if the other relevant options aren't configured, this option can simply override them. It might give an error if the other option has been explicitly set in conflict with what is needed for anonymity.
comment:4 Changed at 2010-04-13T00:31:20Z by davidsarah
- Keywords docs added
- Milestone changed from undecided to 1.7.0
Also needs a doc patch.
comment:5 Changed at 2010-04-13T00:32:20Z by davidsarah
- Owner set to duck
comment:6 Changed at 2010-04-27T16:48:18Z by zooko
- Keywords review-needed removed
Unsetting review-needed since it needs a doc patch before going back to the review-needed status.
comment:7 follow-up: ↓ 8 Changed at 2010-04-28T03:35:13Z by warner
I think that "tub.location = " (setting it to an empty string) is even better. tub.location is the right thing to set here: everything sent to the introducer will derive from what the Tub concludes, and tub.location is the way to override that automatically-figure-out-my-own-addresses behavior.
Perhaps a "[node]anonymous" flag would be useful as a statement of policy, and implemented as a check in various places: if "anonymous=true", then we scan the tub's location just before sending it to the introducer, and throw an exception if it isn't empty? And if we add other places that reveal identifiable information in the future, we also guard those with the "if not anonymous" check?
davidsarah: the motivation for including 127.0.0.1 in the list-of-addresses is to allow two nodes on the same machine to establish a fast (loopback) connection to each other. I use this all the time in test scenarios, and in grids in which the helper runs on the same node as something else (generally an introducer or a storage server). There's half an argument to remove it, but I think that most of those cases are handled better by having people publish an explicit tub.location that doesn't include it.
comment:8 in reply to: ↑ 7 Changed at 2010-04-28T16:28:33Z by davidsarah
Replying to warner:
davidsarah: the motivation for including 127.0.0.1 in the list-of-addresses is to allow two nodes on the same machine to establish a fast (loopback) connection to each other.
I meant, what is lost by only having 127.0.0.1 in the list of addresses, all the time?
comment:9 Changed at 2010-06-17T04:25:54Z by zooko
- Milestone changed from 1.7.0 to eventually
comment:10 Changed at 2010-09-07T17:20:51Z by zooko
comment:11 Changed at 2010-10-06T17:50:44Z by zooko
- Owner changed from duck to warner
Brian: please answer David-Sarah's question from comment:8.
comment:12 Changed at 2010-10-22T14:50:27Z by zooko
- Keywords tor added
Adding keyword "tor" since the same issues probably apply to tor users as apply to i2p users. (Indeed, I suspect all "tor" and "i2p" tags should probably just be converted to "privacy" tags.)
comment:13 Changed at 2010-10-23T00:51:12Z by davidsarah
- Keywords anonymity added; i2p tor removed
comment:14 Changed at 2010-12-16T01:26:30Z by davidsarah
- Keywords anti-censorship added
comment:15 Changed at 2011-01-12T22:02:43Z by duck
Brian's suggestion from comment:7 has been taken; if "tub.location = " (an empty string), then the local address discovery is replaced by just using 127.0.0.1 as address; this replaces the previously suggested anonymize_local_addresses = true option.
While an anonymous policy flag as suggested by ioerror in comment:2 and by Brian in comment:6 would be a good idea, I consider that material for another ticket.
In addition to this an unit test has been written. Once #1301 is implemented the trial patch decorator can be used instead.
comment:16 Changed at 2011-01-12T22:03:17Z by duck
- Keywords review-needed added
comment:17 Changed at 2011-06-26T15:47:26Z by zooko
This is the Ticket of the Week in Tahoe-LAFS Weekly News edition 4: http://tahoe-lafs.org/~zooko/TWN4.html
comment:18 Changed at 2011-06-26T15:52:08Z by gdt
If a node is only a client, then perhaps it should default to not disclosing addresses. ( I realize that in theory a client with a public address could get incoming connections from a server without a routable address, but I consider servers without routable addresses to be buggy.)
comment:19 Changed at 2011-07-23T20:31:07Z by zooko
- Summary changed from Only use 127.0.0.1 as local address to use only 127.0.0.1 as local address
comment:20 Changed at 2011-07-23T21:08:39Z by zooko
- Milestone changed from eventually to 1.9.0
I'm putting this into the 1.9 Milestone because duck has done the work we asked of him and it would feel good to therefore include his patch in 1.9. (review-needed!)
comment:21 Changed at 2011-07-25T15:02:04Z by marlowe
- Keywords reviewed added; review-needed removed
Reviewed patch and looks good to me.
comment:22 follow-up: ↓ 23 Changed at 2011-07-31T20:00:13Z by warner
whoops, I *really* lost track of this one.
DS> I meant, what is lost by only having 127.0.0.1 in the list of DS> addresses, all the time?
Um, if everyone in the grid only publishes 127.0.0.1, then how will distant nodes ever connect to each other? Since we don't have UPnP or NAT traversal, we need everyone (well, N-1) to have+publish a public IP address.
This patch needs docs: in addition to making sure people can successfully use this feature, we need to a place to answer the user confusion that's likely to occur when someone assumes that "tub.location=" should behave the same way as "#tub.location=". (I'm not as sure that ["tub.location=" == anonymous] is the best UI for this, at least not as sure as I was 15 months ago. The whole-config "anonymous" flag feels like an important addition.)
gdt's observation in comment:18 is a good one. Ideally, pure-clients should be able to hang out behind NAT and not admit to having a real address. (I *think* the FURL location-hint format will tolerate this, but I haven't actually tested it). We've always been on the fence about whether Tahoe is a client-server system or a P2P system. Having clients announce their addresses makes it more P2Pish.
I've had topology problems (servers behind NAT) which made me glad that it's possible for servers to connect to clients too. To actually enable this, I had to make my "clients" pretend to be servers (but with storage.readonly=true): otherwise the servers wouldn't hear about the client and wouldn't try to connect. Nodes only actually publish their storage FURLs to the Introducer if they're configured as servers (see init_storage() in client.py). But they'll reveal their FURLs (along with their IP address) in any reference that passes over the wire.
So anyways, I'm ok with this patch if it includes a paragraph in docs/configuration.rst (in the section on tub.location) explaining what happens when you use "tub.location=" and why you might want to do that. If we find that it's hard to explain this feature in there, then maybe it's not a good feature to add.
comment:23 in reply to: ↑ 22 Changed at 2011-08-01T00:34:31Z by davidsarah
Replying to warner:
whoops, I *really* lost track of this one.
DS> I meant, what is lost by only having 127.0.0.1 in the list of addresses, all the time?
Um, if everyone in the grid only publishes 127.0.0.1, then how will distant nodes ever connect to each other? Since we don't have UPnP or NAT traversal, we need everyone (well, N-1) to have+publish a public IP address.
Yes, I realized that was a silly question (for nodes in general) but forgot to unask it. I suppose I was thinking only of clients, or more precisely non-servers.
gdt's suggestion of only including 127.0.0.1 in the list of addresses for non-servers makes sense to me, especially given this:
Nodes only actually publish their storage FURLs to the Introducer if they're configured as servers (see init_storage() in client.py).
comment:24 follow-up: ↓ 25 Changed at 2011-08-01T01:14:53Z by warner
actually in that case we might as well not advertise *any* addresses. Thta's a fairly clear hint that we don't want other people trying to contact us :)
comment:25 in reply to: ↑ 24 Changed at 2011-08-01T04:32:08Z by davidsarah
Changed at 2011-08-11T01:50:03Z by davidsarah
docs/configuration.rst: document 'tub.location =' for hiding local IP addresses. refs #1010
Changed at 2011-08-11T02:16:48Z by davidsarah
node.py: implement 'tub.location =' for hiding local IP addresses. fixes #1010
Changed at 2011-08-11T02:23:29Z by davidsarah
test_node.py: test that 'tub.location =' hides local IP addresses. This version unpatches on synchronous exceptions, and uses fileutil.write. refs #1010
comment:26 Changed at 2011-08-11T02:28:45Z by davidsarah
- Keywords review-needed added; test reviewed removed
Let's kick the question of what addresses the client should advertise by default out to the next release. I don't think that this patch conflicts with any decision we would be likely to make about that. It also doesn't conflict with adding a whole-config anonymous flag.
I've recorded the changes as darcs patches, added some docs, and made a couple of minor improvements to the test (see its description). The fix itself hasn't changed and is already reviewed.
comment:27 Changed at 2011-08-11T03:31:50Z by Zarutian
Reviewed 1010-docs.darcs.patch and found no glaring errors. Review still needed for test-1010.darcs.patch.
comment:28 Changed at 2011-08-11T18:34:05Z by zooko
Review:
I haven't thought through davidsarah's assertion in comment:26 that this patch won't make it harder to do what I want (clients listen for connections from servers by default, and anonymous-mode is an explicit flag instead of setting location='') in the future, but I'll take their word for it.
I find it very confusing that location='' in the tahoe.cfg file means "Emit only 127.0.0.1" but location='' in src/allmydata/node.py's _setup_tub() means "Discover all local IP addresses and emit them.". This is on top of my slight confusion about the fact that location=None in tahoe.cfg has a different meaning from location=''. Or wait -- does it? Is one of them the same as not having a location entry at all?
Off I go to search for answers in the docs in attachment:cleanup-1010.dpatch. But the fact that I experience this much confusion after glancing at a few lines of the code and the config file is a bad sign.
comment:29 Changed at 2011-08-11T19:06:03Z by zooko
- Keywords review-needed removed
Okay, I've started reading the docs patch from attachment:cleanup-1010.dpatch:
It helps to allay my confusion because it explicitly says "Note that this is not the same as omitting tub.location.". However, it doesn't help all the way: what does it mean if you omit tub.location? (The answer, I believe is, that it discovers your local IP addresses and advertises them.) Does it mean anything if you say tub.location=None, or is that an error?
I think there are four different use cases here, three of which are currently supported, and our docs should be more explicit about enumerating them.
- Discover your local IP addresses and announce them.
- Don't announce any routable IP address.
- Announce a configured IP address instead of the dynamically discovered one(s).
- [not supported yet] Announce a configured IP address in addition to the dynamically discovered one(s).
Also: is there a valid distinction between
- 2.a. Don't announce any routable IP address, but announce unrouteable (LAN-scoped) IP addresses. Then nodes from out on the Internet cannot open connection to you, but nodes on your LAN can. Also, people on your LAN can use this to confirm your identity as a certain Tahoe-LAFS node, even if nodes on the Internet can't. (Is this true? If you've for example, configured all of your outgoing connections to go through a Tor or I2P proxy, but you used this setting, then nodes within your LAN can open a direct TCP connection to you, do foolscap negotiation with you, and thus learn the mapping between your LAN-internal IP address and your foolscap node ID.)
- 2.b. Don't announce any IP address except 127.0.0.1, meaning that nodes off your own host can't open TCP connections to you but people on your own host can. Again, does this mean you're vulnerable to an identity-revealing attack, even if you use Tor, if the attacker can open TCP connections from your own host? Is this a valid attack? It seems like it might be. What if there is some sort of TCP proxy running on your host so that, even though they can't execute arbitrary code on your host, they can open a TCP connection which looks to you as though it comes from your own host?
- 2.c. Don't announce any IP address.
Now this patch currently lets the user express option 2 by setting tub.location=, option 1 by setting no tub.location, and option 3 by setting tub.location=207.7.145.194. I'm -1 on this design:
- I find it confusing.
- The docs here mix this issue with issue #1086, by saying that advertising an unrouteable IP puts you in "client only mode". In my opinion (and I recognize that gdt, at least, disagrees), whether your tahoe node initiates or accepts TCP connections should be independent of whether it acts a storage server, storage client, or both. (Or introducer or helper.)
- I'm not sure if it is sufficiently safe to protect the identities (by which I mean IP-address-to-node-ID mappings) of Tor or I2P users. Perhaps we need to support option 2.c., above, instead.
- I would prefer a design with an explicit node.anonymous = True, as we discussed above.
- I would also prefer, I think, a design with option 2 being implemented by setting tub.location=127.0.0.1.
We discussed one of those alternative designs above, and we said maybe we can changed out minds later, but this may be a mistake because
- configuration file semantics are major backward-compatibility issues.
Do we have a plan for how to provide backwards compatibility if we were to change to a different design in a future release? I guess we might need to have a phase where the tahoe node understood both old and new configuration formats, stopped with a fatal error if they were inconsistent, and emitted a warning if the old one existed at all. Then eventually we might go through another cycle like the one we just finished with #1385 where we stop with a fatal error if the old style is present. (By the way, #1385 turned out to be a lot more painful of a patch to integrate and debug than I had anticipated.)
I'm willing to listen to counter-arguments, but at the moment I'm -1 on this design. I haven't finished reviewing the actual patches in attachment:cleanup-1010.dpatch, but I'm going to stop here and focus on #393 instead. I'm sorry I didn't think about the backward-compatibility issues earlier and that I didn't think about the specific configuration format earlier so I would realize I was uncomfortable with it before this late stage. (Also it is too bad I didn't think of the potential identity-revealing weakness earlier so we could have time to think it through before this late stage.)
comment:30 Changed at 2011-08-11T19:08:38Z by zooko
comment:31 Changed at 2011-08-11T19:13:49Z by zooko
#1207 is a closely related ticket which shows that some people (starting with gdt) want to have yet another variation, where unrouteable IP addresses are excluded from the advertised list.
comment:32 Changed at 2011-08-12T01:22:48Z by davidsarah
- Keywords forward-compatibility added
It seems as though you can already achieve exactly the effect of the current patch with "tub.location = 127.0.0.1" -- or a similar effect with "tub.location = unreachable.example.org:0" as the documentation suggests.
So on reflection, I'm also -1 on including this in v1.9.
(The cleanup patch also has the improvement of not calling iputil.get_local_addresses_async() if its value is going to be discarded, but that's not urgent.)
comment:33 Changed at 2011-08-15T03:06:25Z by davidsarah
- Milestone changed from 1.9.0 to 1.10.0
comment:34 Changed at 2013-08-07T16:48:46Z by killyourtv
- Cc killyourtv@… added
- Description modified (diff)
I added an updated patch against trunk which uses Brian's last patch.
comment:35 Changed at 2013-08-21T15:44:17Z by psi
- Keywords i2p-collab added
comment:36 Changed at 2013-08-21T16:05:32Z by psi
In the dev meeting zooko suggested an anonymize flag that would instead of having a blank tub.location. The anonymize flag would ensure that sensitive information like IP addresses are not broadcast. Having a blank tub.location value would be confused with auto detected location. Perhaps using keywords like "auto" could be used to indicate auto configured address with that behavior being default.
comment:37 Changed at 2013-08-23T21:21:52Z by zooko
I just reviewed attachment:1010-use-only-127.patch (during Weekly Dev Chat).
Thank you for updating this patch to apply to the current trunk! The patch makes sense and is usefully addressing this issue. However, we talked it over at our recent Weekly Dev Chat (notes), and have a few requirements for safety of the configuration:
- Let's add a [node]anonymize flag to the tahoe.cfg file. The meaning of this flag is: stop the process and print an error message if any of the configuration options would compromise my identity. There are also probably going to be other meanings of this flag added in other patches (i.e., this flag will probably come to mean also: do not allow any outgoing connections that are not over a anonymous routing layer such as Tor or I2P).
- Instead of "tub.location=" (the empty string) meaning to not advertise any location, let tub.location=UNREACHABLE mean that. (This is in order to avoid confusion in the mind of the user about the distinction between tub.location being absent versus it being present with an empty value. See also below, about backward compatibility.)
- If tub.location=UNREACHABLE, then pass the special hardcoded value unreachable.example.org:0 to foolscap instead of the empty string to foolscap. (This is because foolscap currently can't handle the empty string for its connection hints — see http://foolscap.lothar.com/trac/ticket/208 .)
- Instead of expressing that the node's IP address should be auto-detected by the absence of tub.location, express it by tub.location being set to AUTODETECT.
Note that there is a third option besides AUTODETECT and UNREACHABLE, and that is to set tub.location to a specific set of IP address+port, DNS name+port, I2P addresses, or Tor (.onion) addresses. I don't know if Tor or I2P users would always do the latter, or if they would sometimes set it to UNREACHABLE.
Therefore, if [node]anonymize is set to True, then:
- If there is no tub.location setting (including if tub.location is commented-out), the node will abort on startup. (This is important because people who created their node with an older release of Tahoe-LAFS will have a tahoe.cfg with tub.location commented out. See below about backward-compatibility.)
- If tub.location is set to AUTODETECT, the node will abort on a startup with an error message.
- If tub.location is set to a specific connection-hints value which includes an IP address or domain name, then the node will abort on startup with an error message.
- If tub.location is set to a UNREACHABLE, the node will start up normally.
- If tub.location is set to a specific connection-hints value which contains only I2P and/or Tor (.onion) addresses, the node will start up normally.
- Newly generated tahoe.cfg's (generated by the create-client or {{create-node}}} command) should come with tub.location = AUTODETECT instead of a commented out "#tub.location = put your IP address here" (see create_node.py.)
Okay, now what about backward-compatibility?
- For backwards compatibility, we still accept the absence of tub.location as meaning to AUTODETECT. But only if the [node]anonymize flag isn't on! Because if the [node]anonymize flag makes a setting for tub.location be required.
- Maybe in a future release we'll start emitting a warning about the absence of a tub.location setting, but for now, no warning.
comment:38 Changed at 2013-08-26T21:05:21Z by daira
I agree with most of this design, but I'm unconvinced of the value of requiring an explicit tub.location = AUTODETECT, rather than keeping that as the default as it is now. The [node] anonymize flag would still disallow auto-detection, that's independent of whether auto-detection is the default.
comment:39 Changed at 2013-08-26T21:09:39Z by daira
Note that if we keep auto-detection as the default, we can still change the comment that is added to a new tahoe.cfg to something like
#tub.location = auto-detected by default (this is unsuitable for anonymous deployments)
comment:40 Changed at 2013-08-27T11:56:25Z by zooko
Thanks for the design-review, daira. I still want to eventually switch to spelling this as tub.location = AUTODETECT, even if it isn't necessary to do so, because:
- Explicit is better than implicit.
- I'm not really that comfortable with the autodetect feature anyway (see the 'iputil' tickets) and would like to support users who don't use it or for whom it doesn't work.
- Users could easily confuse tub.location = for meaning "set my tub location to the empty string".
So for those reasons, I'd prefer to move ahead with making an explicit AUTODETECT be the preferred way to indicate this configuration (as in comment:37).
Oh, I see that my specification in comment:37 omits a case: what if tub.location is set to the empty string? I propose that this is treated as a configuration error (the node stops at startup with a verbose error message about this), regardless of the setting of [node]anonymize.
comment:41 Changed at 2013-10-04T17:10:56Z by zooko
comment:42 Changed at 2014-01-14T17:47:57Z by zooko
- Keywords i2p tor added
comment:43 Changed at 2014-01-14T18:19:57Z by gdt
Note that in the modern world, "only use 127.0.0.1" should really be "only use 127.0.0.1 and ::1". I will refrain from updating the ticket-title, but IMHO we should purge v4-only statements.
comment:44 Changed at 2014-09-04T03:48:48Z by dawuud
Greetings,
My branch implements of Zooko's design in comment:37 : https://github.com/david415/tahoe-lafs/tree/david-ticket1010-unittests
Please let me know what else I can do to get this trac ticket resolved.
comment:45 Changed at 2014-09-04T04:33:50Z by str4d
For reference, the relevant commits are here: https://github.com/david415/tahoe-lafs/compare/david-truckee-venv...david-ticket1010-unittests
There is also an earlier commit in the same branch which drafts a Tor-only mode, meaning that the branch can't be directly merged to close this ticket: https://github.com/david415/tahoe-lafs/commit/d9757d75aebe675ca6114d63673ac597e1198084
comment:46 Changed at 2014-09-04T05:28:30Z by str4d
Reviewed the commits for this ticket (from the link I posted above).
- In comment:37 Zooko suggested [node]anonymize for the flag. The commits implement [node]tub.anonymize instead. I don't know what the Tahoe and Foolscap conventions are, but I expect that [node]tub.* are intended to be passed through to Foolscap, and this flag will not be.
- In check_anonymity_config():
- Either the common error messages should be merged, or the error messages for each type of error should be more specific.
- The code for parsing / generating endpoint strings (used here and elsewhere) should probably be centralized to reduce scope for bugs.
- Lines 357 (in _startService()) and 429 (in _setup_tub()) confuse me. Why is tub.location being set to empty? Is this being written to the config file?
- There are no changes yet to new tahoe.cfg files.
comment:47 Changed at 2014-09-04T23:36:00Z by zooko
- Description modified (diff)
- Summary changed from use only 127.0.0.1 as local address to anonymous client mode
What is the motivation for using all local addresses normally?