#1010 closed enhancement (fixed)
anonymous client mode
Reported by: | duck | Owned by: | warner |
---|---|---|---|
Priority: | minor | Milestone: | 1.12.0 |
Component: | code-network | Version: | 1.6.1 |
Keywords: | privacy anonymity docs anti-censorship forward-compatibility i2p-collab i2p tor-protocol | Cc: | killyourtv@… |
Launchpad Bug: |
Description (last modified by zooko)
For the anonymous network use case (such as I2P and Tor), we want to only use 127.0.0.1 as loopback address. Right now Tahoe discovers all local addresses through various strategies and discloses them to (at least) the introducer.
For I2P we have introduced the configuration option anonymize_local_addresses (which we consider renaming to tub.anonymize) to disable this lookup.
Example of configuration in tahoe.cfg:
[node] ... anonymize_local_addresses = true
Snippit showing how this is used in node.py:
if self.get_config("node", "anonymize_local_addresses", False, boolean=True): d.addCallback(lambda res: ['127.0.0.1']) else: d.addCallback(lambda res: iputil.get_local_addresses_async()) d.addCallback(self._setup_tub)
Attachments (7)
Change History (80)
Changed at 2010-03-29T21:43:07Z by duck
comment:1 Changed at 2010-04-03T23:00:44Z by davidsarah
comment:2 follow-up: ↓ 3 Changed at 2010-04-08T10:24:13Z by ioerror
It seems like a great idea to have a tub.anonymize flag. It would be fantastic to have Tahoe throw exceptions like confetti if that option is set and a few key (tub.address) settings aren't configured. Anything less may lead to information PII leakage.
There are a number of sanitation issues that need to be carefully handled. Having a single bit to say we want to try to handle those issues is probably a good start.
comment:3 in reply to: ↑ 2 Changed at 2010-04-12T21:08:23Z by davidsarah
- Keywords test added
Replying to ioerror:
It seems like a great idea to have a tub.anonymize flag. It would be fantastic to have Tahoe throw exceptions like confetti if that option is set and a few key (tub.address) settings aren't configured. Anything less may lead to information PII leakage.
There are a number of sanitation issues that need to be carefully handled. Having a single bit to say we want to try to handle those issues is probably a good start.
+1 for having a single config bit to indicate that the user wants anonymous operation.
However,
- "tub.*" options apply to foolscap tubs, and the single config bit may need to change behaviour other than in foolscap.
- rather than giving an error if the other relevant options aren't configured, this option can simply override them. It might give an error if the other option has been explicitly set in conflict with what is needed for anonymity.
comment:4 Changed at 2010-04-13T00:31:20Z by davidsarah
- Keywords docs added
- Milestone changed from undecided to 1.7.0
Also needs a doc patch.
comment:5 Changed at 2010-04-13T00:32:20Z by davidsarah
- Owner set to duck
comment:6 Changed at 2010-04-27T16:48:18Z by zooko
- Keywords review-needed removed
Unsetting review-needed since it needs a doc patch before going back to the review-needed status.
comment:7 follow-up: ↓ 8 Changed at 2010-04-28T03:35:13Z by warner
I think that "tub.location = " (setting it to an empty string) is even better. tub.location is the right thing to set here: everything sent to the introducer will derive from what the Tub concludes, and tub.location is the way to override that automatically-figure-out-my-own-addresses behavior.
Perhaps a "[node]anonymous" flag would be useful as a statement of policy, and implemented as a check in various places: if "anonymous=true", then we scan the tub's location just before sending it to the introducer, and throw an exception if it isn't empty? And if we add other places that reveal identifiable information in the future, we also guard those with the "if not anonymous" check?
davidsarah: the motivation for including 127.0.0.1 in the list-of-addresses is to allow two nodes on the same machine to establish a fast (loopback) connection to each other. I use this all the time in test scenarios, and in grids in which the helper runs on the same node as something else (generally an introducer or a storage server). There's half an argument to remove it, but I think that most of those cases are handled better by having people publish an explicit tub.location that doesn't include it.
comment:8 in reply to: ↑ 7 Changed at 2010-04-28T16:28:33Z by davidsarah
Replying to warner:
davidsarah: the motivation for including 127.0.0.1 in the list-of-addresses is to allow two nodes on the same machine to establish a fast (loopback) connection to each other.
I meant, what is lost by only having 127.0.0.1 in the list of addresses, all the time?
comment:9 Changed at 2010-06-17T04:25:54Z by zooko
- Milestone changed from 1.7.0 to eventually
comment:10 Changed at 2010-09-07T17:20:51Z by zooko
comment:11 Changed at 2010-10-06T17:50:44Z by zooko
- Owner changed from duck to warner
Brian: please answer David-Sarah's question from comment:8.
comment:12 Changed at 2010-10-22T14:50:27Z by zooko
- Keywords tor added
Adding keyword "tor" since the same issues probably apply to tor users as apply to i2p users. (Indeed, I suspect all "tor" and "i2p" tags should probably just be converted to "privacy" tags.)
comment:13 Changed at 2010-10-23T00:51:12Z by davidsarah
- Keywords anonymity added; i2p tor removed
comment:14 Changed at 2010-12-16T01:26:30Z by davidsarah
- Keywords anti-censorship added
comment:15 Changed at 2011-01-12T22:02:43Z by duck
Brian's suggestion from comment:7 has been taken; if "tub.location = " (an empty string), then the local address discovery is replaced by just using 127.0.0.1 as address; this replaces the previously suggested anonymize_local_addresses = true option.
While an anonymous policy flag as suggested by ioerror in comment:2 and by Brian in comment:6 would be a good idea, I consider that material for another ticket.
In addition to this an unit test has been written. Once #1301 is implemented the trial patch decorator can be used instead.
comment:16 Changed at 2011-01-12T22:03:17Z by duck
- Keywords review-needed added
comment:17 Changed at 2011-06-26T15:47:26Z by zooko
This is the Ticket of the Week in Tahoe-LAFS Weekly News edition 4: http://tahoe-lafs.org/~zooko/TWN4.html
comment:18 Changed at 2011-06-26T15:52:08Z by gdt
If a node is only a client, then perhaps it should default to not disclosing addresses. ( I realize that in theory a client with a public address could get incoming connections from a server without a routable address, but I consider servers without routable addresses to be buggy.)
comment:19 Changed at 2011-07-23T20:31:07Z by zooko
- Summary changed from Only use 127.0.0.1 as local address to use only 127.0.0.1 as local address
comment:20 Changed at 2011-07-23T21:08:39Z by zooko
- Milestone changed from eventually to 1.9.0
I'm putting this into the 1.9 Milestone because duck has done the work we asked of him and it would feel good to therefore include his patch in 1.9. (review-needed!)
comment:21 Changed at 2011-07-25T15:02:04Z by marlowe
- Keywords reviewed added; review-needed removed
Reviewed patch and looks good to me.
comment:22 follow-up: ↓ 23 Changed at 2011-07-31T20:00:13Z by warner
whoops, I *really* lost track of this one.
DS> I meant, what is lost by only having 127.0.0.1 in the list of DS> addresses, all the time?
Um, if everyone in the grid only publishes 127.0.0.1, then how will distant nodes ever connect to each other? Since we don't have UPnP or NAT traversal, we need everyone (well, N-1) to have+publish a public IP address.
This patch needs docs: in addition to making sure people can successfully use this feature, we need to a place to answer the user confusion that's likely to occur when someone assumes that "tub.location=" should behave the same way as "#tub.location=". (I'm not as sure that ["tub.location=" == anonymous] is the best UI for this, at least not as sure as I was 15 months ago. The whole-config "anonymous" flag feels like an important addition.)
gdt's observation in comment:18 is a good one. Ideally, pure-clients should be able to hang out behind NAT and not admit to having a real address. (I *think* the FURL location-hint format will tolerate this, but I haven't actually tested it). We've always been on the fence about whether Tahoe is a client-server system or a P2P system. Having clients announce their addresses makes it more P2Pish.
I've had topology problems (servers behind NAT) which made me glad that it's possible for servers to connect to clients too. To actually enable this, I had to make my "clients" pretend to be servers (but with storage.readonly=true): otherwise the servers wouldn't hear about the client and wouldn't try to connect. Nodes only actually publish their storage FURLs to the Introducer if they're configured as servers (see init_storage() in client.py). But they'll reveal their FURLs (along with their IP address) in any reference that passes over the wire.
So anyways, I'm ok with this patch if it includes a paragraph in docs/configuration.rst (in the section on tub.location) explaining what happens when you use "tub.location=" and why you might want to do that. If we find that it's hard to explain this feature in there, then maybe it's not a good feature to add.
comment:23 in reply to: ↑ 22 Changed at 2011-08-01T00:34:31Z by davidsarah
Replying to warner:
whoops, I *really* lost track of this one.
DS> I meant, what is lost by only having 127.0.0.1 in the list of addresses, all the time?
Um, if everyone in the grid only publishes 127.0.0.1, then how will distant nodes ever connect to each other? Since we don't have UPnP or NAT traversal, we need everyone (well, N-1) to have+publish a public IP address.
Yes, I realized that was a silly question (for nodes in general) but forgot to unask it. I suppose I was thinking only of clients, or more precisely non-servers.
gdt's suggestion of only including 127.0.0.1 in the list of addresses for non-servers makes sense to me, especially given this:
Nodes only actually publish their storage FURLs to the Introducer if they're configured as servers (see init_storage() in client.py).
comment:24 follow-up: ↓ 25 Changed at 2011-08-01T01:14:53Z by warner
actually in that case we might as well not advertise *any* addresses. Thta's a fairly clear hint that we don't want other people trying to contact us :)
comment:25 in reply to: ↑ 24 Changed at 2011-08-01T04:32:08Z by davidsarah
Changed at 2011-08-11T01:50:03Z by davidsarah
docs/configuration.rst: document 'tub.location =' for hiding local IP addresses. refs #1010
Changed at 2011-08-11T02:16:48Z by davidsarah
node.py: implement 'tub.location =' for hiding local IP addresses. fixes #1010
Changed at 2011-08-11T02:23:29Z by davidsarah
test_node.py: test that 'tub.location =' hides local IP addresses. This version unpatches on synchronous exceptions, and uses fileutil.write. refs #1010
comment:26 Changed at 2011-08-11T02:28:45Z by davidsarah
- Keywords review-needed added; test reviewed removed
Let's kick the question of what addresses the client should advertise by default out to the next release. I don't think that this patch conflicts with any decision we would be likely to make about that. It also doesn't conflict with adding a whole-config anonymous flag.
I've recorded the changes as darcs patches, added some docs, and made a couple of minor improvements to the test (see its description). The fix itself hasn't changed and is already reviewed.
comment:27 Changed at 2011-08-11T03:31:50Z by Zarutian
Reviewed 1010-docs.darcs.patch and found no glaring errors. Review still needed for test-1010.darcs.patch.
comment:28 Changed at 2011-08-11T18:34:05Z by zooko
Review:
I haven't thought through davidsarah's assertion in comment:26 that this patch won't make it harder to do what I want (clients listen for connections from servers by default, and anonymous-mode is an explicit flag instead of setting location='') in the future, but I'll take their word for it.
I find it very confusing that location='' in the tahoe.cfg file means "Emit only 127.0.0.1" but location='' in src/allmydata/node.py's _setup_tub() means "Discover all local IP addresses and emit them.". This is on top of my slight confusion about the fact that location=None in tahoe.cfg has a different meaning from location=''. Or wait -- does it? Is one of them the same as not having a location entry at all?
Off I go to search for answers in the docs in attachment:cleanup-1010.dpatch. But the fact that I experience this much confusion after glancing at a few lines of the code and the config file is a bad sign.
comment:29 Changed at 2011-08-11T19:06:03Z by zooko
- Keywords review-needed removed
Okay, I've started reading the docs patch from attachment:cleanup-1010.dpatch:
It helps to allay my confusion because it explicitly says "Note that this is not the same as omitting tub.location.". However, it doesn't help all the way: what does it mean if you omit tub.location? (The answer, I believe is, that it discovers your local IP addresses and advertises them.) Does it mean anything if you say tub.location=None, or is that an error?
I think there are four different use cases here, three of which are currently supported, and our docs should be more explicit about enumerating them.
- Discover your local IP addresses and announce them.
- Don't announce any routable IP address.
- Announce a configured IP address instead of the dynamically discovered one(s).
- [not supported yet] Announce a configured IP address in addition to the dynamically discovered one(s).
Also: is there a valid distinction between
- 2.a. Don't announce any routable IP address, but announce unrouteable (LAN-scoped) IP addresses. Then nodes from out on the Internet cannot open connection to you, but nodes on your LAN can. Also, people on your LAN can use this to confirm your identity as a certain Tahoe-LAFS node, even if nodes on the Internet can't. (Is this true? If you've for example, configured all of your outgoing connections to go through a Tor or I2P proxy, but you used this setting, then nodes within your LAN can open a direct TCP connection to you, do foolscap negotiation with you, and thus learn the mapping between your LAN-internal IP address and your foolscap node ID.)
- 2.b. Don't announce any IP address except 127.0.0.1, meaning that nodes off your own host can't open TCP connections to you but people on your own host can. Again, does this mean you're vulnerable to an identity-revealing attack, even if you use Tor, if the attacker can open TCP connections from your own host? Is this a valid attack? It seems like it might be. What if there is some sort of TCP proxy running on your host so that, even though they can't execute arbitrary code on your host, they can open a TCP connection which looks to you as though it comes from your own host?
- 2.c. Don't announce any IP address.
Now this patch currently lets the user express option 2 by setting tub.location=, option 1 by setting no tub.location, and option 3 by setting tub.location=207.7.145.194. I'm -1 on this design:
- I find it confusing.
- The docs here mix this issue with issue #1086, by saying that advertising an unrouteable IP puts you in "client only mode". In my opinion (and I recognize that gdt, at least, disagrees), whether your tahoe node initiates or accepts TCP connections should be independent of whether it acts a storage server, storage client, or both. (Or introducer or helper.)
- I'm not sure if it is sufficiently safe to protect the identities (by which I mean IP-address-to-node-ID mappings) of Tor or I2P users. Perhaps we need to support option 2.c., above, instead.
- I would prefer a design with an explicit node.anonymous = True, as we discussed above.
- I would also prefer, I think, a design with option 2 being implemented by setting tub.location=127.0.0.1.
We discussed one of those alternative designs above, and we said maybe we can changed out minds later, but this may be a mistake because
- configuration file semantics are major backward-compatibility issues.
Do we have a plan for how to provide backwards compatibility if we were to change to a different design in a future release? I guess we might need to have a phase where the tahoe node understood both old and new configuration formats, stopped with a fatal error if they were inconsistent, and emitted a warning if the old one existed at all. Then eventually we might go through another cycle like the one we just finished with #1385 where we stop with a fatal error if the old style is present. (By the way, #1385 turned out to be a lot more painful of a patch to integrate and debug than I had anticipated.)
I'm willing to listen to counter-arguments, but at the moment I'm -1 on this design. I haven't finished reviewing the actual patches in attachment:cleanup-1010.dpatch, but I'm going to stop here and focus on #393 instead. I'm sorry I didn't think about the backward-compatibility issues earlier and that I didn't think about the specific configuration format earlier so I would realize I was uncomfortable with it before this late stage. (Also it is too bad I didn't think of the potential identity-revealing weakness earlier so we could have time to think it through before this late stage.)
comment:30 Changed at 2011-08-11T19:08:38Z by zooko
comment:31 Changed at 2011-08-11T19:13:49Z by zooko
#1207 is a closely related ticket which shows that some people (starting with gdt) want to have yet another variation, where unrouteable IP addresses are excluded from the advertised list.
comment:32 Changed at 2011-08-12T01:22:48Z by davidsarah
- Keywords forward-compatibility added
It seems as though you can already achieve exactly the effect of the current patch with "tub.location = 127.0.0.1" -- or a similar effect with "tub.location = unreachable.example.org:0" as the documentation suggests.
So on reflection, I'm also -1 on including this in v1.9.
(The cleanup patch also has the improvement of not calling iputil.get_local_addresses_async() if its value is going to be discarded, but that's not urgent.)
comment:33 Changed at 2011-08-15T03:06:25Z by davidsarah
- Milestone changed from 1.9.0 to 1.10.0
comment:34 Changed at 2013-08-07T16:48:46Z by killyourtv
- Cc killyourtv@… added
- Description modified (diff)
I added an updated patch against trunk which uses Brian's last patch.
comment:35 Changed at 2013-08-21T15:44:17Z by psi
- Keywords i2p-collab added
comment:36 Changed at 2013-08-21T16:05:32Z by psi
In the dev meeting zooko suggested an anonymize flag that would instead of having a blank tub.location. The anonymize flag would ensure that sensitive information like IP addresses are not broadcast. Having a blank tub.location value would be confused with auto detected location. Perhaps using keywords like "auto" could be used to indicate auto configured address with that behavior being default.
comment:37 Changed at 2013-08-23T21:21:52Z by zooko
I just reviewed attachment:1010-use-only-127.patch (during Weekly Dev Chat).
Thank you for updating this patch to apply to the current trunk! The patch makes sense and is usefully addressing this issue. However, we talked it over at our recent Weekly Dev Chat (notes), and have a few requirements for safety of the configuration:
- Let's add a [node]anonymize flag to the tahoe.cfg file. The meaning of this flag is: stop the process and print an error message if any of the configuration options would compromise my identity. There are also probably going to be other meanings of this flag added in other patches (i.e., this flag will probably come to mean also: do not allow any outgoing connections that are not over a anonymous routing layer such as Tor or I2P).
- Instead of "tub.location=" (the empty string) meaning to not advertise any location, let tub.location=UNREACHABLE mean that. (This is in order to avoid confusion in the mind of the user about the distinction between tub.location being absent versus it being present with an empty value. See also below, about backward compatibility.)
- If tub.location=UNREACHABLE, then pass the special hardcoded value unreachable.example.org:0 to foolscap instead of the empty string to foolscap. (This is because foolscap currently can't handle the empty string for its connection hints — see http://foolscap.lothar.com/trac/ticket/208 .)
- Instead of expressing that the node's IP address should be auto-detected by the absence of tub.location, express it by tub.location being set to AUTODETECT.
Note that there is a third option besides AUTODETECT and UNREACHABLE, and that is to set tub.location to a specific set of IP address+port, DNS name+port, I2P addresses, or Tor (.onion) addresses. I don't know if Tor or I2P users would always do the latter, or if they would sometimes set it to UNREACHABLE.
Therefore, if [node]anonymize is set to True, then:
- If there is no tub.location setting (including if tub.location is commented-out), the node will abort on startup. (This is important because people who created their node with an older release of Tahoe-LAFS will have a tahoe.cfg with tub.location commented out. See below about backward-compatibility.)
- If tub.location is set to AUTODETECT, the node will abort on a startup with an error message.
- If tub.location is set to a specific connection-hints value which includes an IP address or domain name, then the node will abort on startup with an error message.
- If tub.location is set to a UNREACHABLE, the node will start up normally.
- If tub.location is set to a specific connection-hints value which contains only I2P and/or Tor (.onion) addresses, the node will start up normally.
- Newly generated tahoe.cfg's (generated by the create-client or {{create-node}}} command) should come with tub.location = AUTODETECT instead of a commented out "#tub.location = put your IP address here" (see create_node.py.)
Okay, now what about backward-compatibility?
- For backwards compatibility, we still accept the absence of tub.location as meaning to AUTODETECT. But only if the [node]anonymize flag isn't on! Because if the [node]anonymize flag makes a setting for tub.location be required.
- Maybe in a future release we'll start emitting a warning about the absence of a tub.location setting, but for now, no warning.
comment:38 Changed at 2013-08-26T21:05:21Z by daira
I agree with most of this design, but I'm unconvinced of the value of requiring an explicit tub.location = AUTODETECT, rather than keeping that as the default as it is now. The [node] anonymize flag would still disallow auto-detection, that's independent of whether auto-detection is the default.
comment:39 Changed at 2013-08-26T21:09:39Z by daira
Note that if we keep auto-detection as the default, we can still change the comment that is added to a new tahoe.cfg to something like
#tub.location = auto-detected by default (this is unsuitable for anonymous deployments)
comment:40 Changed at 2013-08-27T11:56:25Z by zooko
Thanks for the design-review, daira. I still want to eventually switch to spelling this as tub.location = AUTODETECT, even if it isn't necessary to do so, because:
- Explicit is better than implicit.
- I'm not really that comfortable with the autodetect feature anyway (see the 'iputil' tickets) and would like to support users who don't use it or for whom it doesn't work.
- Users could easily confuse tub.location = for meaning "set my tub location to the empty string".
So for those reasons, I'd prefer to move ahead with making an explicit AUTODETECT be the preferred way to indicate this configuration (as in comment:37).
Oh, I see that my specification in comment:37 omits a case: what if tub.location is set to the empty string? I propose that this is treated as a configuration error (the node stops at startup with a verbose error message about this), regardless of the setting of [node]anonymize.
comment:41 Changed at 2013-10-04T17:10:56Z by zooko
comment:42 Changed at 2014-01-14T17:47:57Z by zooko
- Keywords i2p tor added
comment:43 Changed at 2014-01-14T18:19:57Z by gdt
Note that in the modern world, "only use 127.0.0.1" should really be "only use 127.0.0.1 and ::1". I will refrain from updating the ticket-title, but IMHO we should purge v4-only statements.
comment:44 Changed at 2014-09-04T03:48:48Z by dawuud
Greetings,
My branch implements of Zooko's design in comment:37 : https://github.com/david415/tahoe-lafs/tree/david-ticket1010-unittests
Please let me know what else I can do to get this trac ticket resolved.
comment:45 Changed at 2014-09-04T04:33:50Z by str4d
For reference, the relevant commits are here: https://github.com/david415/tahoe-lafs/compare/david-truckee-venv...david-ticket1010-unittests
There is also an earlier commit in the same branch which drafts a Tor-only mode, meaning that the branch can't be directly merged to close this ticket: https://github.com/david415/tahoe-lafs/commit/d9757d75aebe675ca6114d63673ac597e1198084
comment:46 Changed at 2014-09-04T05:28:30Z by str4d
Reviewed the commits for this ticket (from the link I posted above).
- In comment:37 Zooko suggested [node]anonymize for the flag. The commits implement [node]tub.anonymize instead. I don't know what the Tahoe and Foolscap conventions are, but I expect that [node]tub.* are intended to be passed through to Foolscap, and this flag will not be.
- In check_anonymity_config():
- Either the common error messages should be merged, or the error messages for each type of error should be more specific.
- The code for parsing / generating endpoint strings (used here and elsewhere) should probably be centralized to reduce scope for bugs.
- Lines 357 (in _startService()) and 429 (in _setup_tub()) confuse me. Why is tub.location being set to empty? Is this being written to the config file?
- There are no changes yet to new tahoe.cfg files.
comment:47 Changed at 2014-09-04T23:36:00Z by zooko
- Description modified (diff)
- Summary changed from use only 127.0.0.1 as local address to anonymous client mode
comment:48 Changed at 2014-09-05T23:22:02Z by dawuud
OK... I've cleaned up my code here: https://github.com/david415/tahoe-lafs/tree/david-ticket1010-unittests
Additionally this latest change uses the my Foolscap branch from Foolscap trac ticket 208: http://foolscap.lothar.com/trac/ticket/208
This ticket now requires code review. Thanks!
comment:49 Changed at 2014-09-06T11:11:21Z by str4d
Reviewed :)
src/allmydata/node.py:
- Line 72 can be removed (left over after creating anonymize.py).
- is_err in check_anonymity_config() can be replaced with set unions (joining the expressions with or-s).
- Line 236 (tubport = ...) can be moved inside the self.anonymize check.
- Is a separate _unreachable_tub() necessary? AFAICT removing lines 356-359 and changing line 360 to if location != "UNREACHABLE" and not self.anonymize: would have the same effect (apart from the log line, which could be checked inside _setup_tub() instead. This is probably a coding style decision, I will defer to Zooko et al. on this.
src/allmydata/util/anonymize.py:
- is_anonymous() still contains code to split a location hint into parts, which is what IMHO should be centralized (outside of anonymize.py). I haven't hunted for where else location hints are split up currently, but per the anonymity roadmap there will be more parts in Tahoe that will need to do so (e.g. per-config client endpoint string parameters).
Overall IMHO this is looking very nice. It can't be directly merged to close this ticket because of the Tor-only content, and I'm not sure whether it can even be cherry-picked (I haven't checked which commits do what).
comment:50 Changed at 2015-02-06T20:38:04Z by daira
- Keywords tor-protocol added; tor removed
comment:51 Changed at 2015-04-14T07:33:50Z by daira
I would prefer [node]anonymous rather than [node]anonymize, because it has the same spelling in U.S. and British English. Also I think it more clearly conveys that this option is asserting that other options are compatible with anonymity, not changing the behaviour itself.
comment:52 Changed at 2015-06-14T00:31:29Z by warner
FYI, http://foolscap.lothar.com/trac/ticket/236 is a plan that dawuud and I came up with for making Foolscap handle Tor/i2p Listeners and connection-hints cleanly. I need to re-read this ticket and see how/if it interacts with the changes we propose over there.
comment:53 Changed at 2015-09-06T04:34:42Z by str4d
I have cherry-picked the #1010 changes out of dawuud's branch and rebased them onto master:
https://github.com/str4d/tahoe-lafs/tree/1010-anonymous-client-mode
I have intentionally left out commit e03ac001387f8341240e730cd918027c2d111b7d because Foolscap #208 is still undecided.
Other changes:
- I have included relevant changes from my review in comment:49.
- #754 introduced the AUTO as a flag in tub.location, basically implementing use case 4 from comment:29. Therefore I have replaced tub.location == AUTODETECT checks with AUTO in tub.location.
- [node]anonymize has been changed to [node]anonymous per comment:51.
comment:54 Changed at 2015-09-08T19:15:31Z by warner
In today's meeting, while sketching out the tor-socks-proxy syntax (ticket:517#comment:34), we talked briefly about how the Accounting "client ID" would interact with anonymity. In particular, without other changes, Accounting-enabled Tahoe clients will use the same ed25519 key for all connections. There are (at least) three things that might be linkable, and using Tor/I2P will only remove one of them:
- client IP address
- accounting client-id
- which shares are being accessed
We could change the accounting system to provide a way to use random client-ids (or none at all), but then storage servers can't enforce any of their Accounting things. And even with that, clients who always start at the same rootcap will be linkable by the storage server observing repeated accesses to the same storage index.
The upshot is that now I'm wondering if a tahoe.cfg flag named anonymous might be better spelled psuedonymous, or ip-anonymous, or network-anonymous, or something that makes a slightly weaker claim. Or, should we say that anonymous=true also requires that you've added some other (as-yet undefined) tahoe.cfg flag which disables Accounting client-ids?
comment:55 Changed at 2015-09-15T20:39:38Z by str4d
For more terminology discussion, see this tweet by Zooko, and this one, and their replies.
IMHO random Accounting client-ids are not necessary. From the I2P perspective, a Tahoe client is going to have a visible known I2P Destination at least for the duration of the process (by default client sessions are transient), and so an Accounting client-id that persists as long as the I2P Destination does is within the existing threat model. For a Tahoe client that is also a server, this will be long-term; for client-only nodes, this will be until restart (unless code was added to intentionally cycle Destinations on a shorter timescale). The Tor side will have similar considerations. I don't see a use case that requires a Tahoe client's I2P activity and Tor activity to be unlinked.
Another consideration raised by this: we need to ensure that if the Tahoe configuration is changed, all identifiers are cycled (as best as possible). That means, if a user is running a "non-anonymous" client, and subsequently configures it to be "anonymous", then Tahoe needs to regenerate:
- The I2P Destination
- The Tor HS key
- The Accounting client-id(s)
The shares being accessed is a harder identifier to prevent leaking across, and I'm not sure how it could be done within the Tahoe-LAFS operational model. For comparison: the BitTorrent client Vuze has an I2P plugin, and internally it runs two I2P Destinations - one for "pure" I2P traffic, and one for "mixed" I2P traffic (for any torrents being seeded simultaneously to/from I2P and the clearnet). It explicitly warns the user that once a torrent has been added with a given privacy setting, it should not be changed, because the already-fetched-blocks are a unique fingerprint for incomplete torrents. To change, they recommend deleting and re-adding the torrent. It's not something they can defend against, but they try to at least give the user plenty of warning.
comment:56 Changed at 2015-10-14T04:28:47Z by zooko
- Milestone changed from soon to 1.10.3
comment:57 Changed at 2016-02-02T19:12:12Z by daira
- Milestone changed from 1.10.3 to 1.11.0
We agreed in today's Nuts & Bolts meeting to bump better Tor/I2P support out to 1.11.0.
comment:58 Changed at 2016-03-22T05:02:52Z by warner
- Milestone changed from 1.11.0 to 1.12.0
Milestone renamed
comment:59 Changed at 2016-03-27T10:44:18Z by dawuud
We should release some basic features as described in this ticket soon so that we can complete Phase 1 of native Tor integration for Tahoe-LAFS as described in warner's roadmap in a comment here: https://tahoe-lafs.org/trac/tahoe-lafs/ticket/517#comment:41
It seems like progress on this ticket has stalled. What can I do about it? I don't think we need to implement accounting first. We just need to have some basic features and I think it should be really simple to have an anonymize=true option; it should error if another option is set in conflict with the policy and it should turn off autodetect. anything else? str4d rebased my old patch. perhaps i need to change it to use the foolscap trac ticket 208 feature now that ticket is closed?: https://foolscap.lothar.com/trac/ticket/208
comment:60 Changed at 2016-03-28T01:20:07Z by dawuud
This here is the latest with upstream/master merged in;
https://github.com/david415/tahoe-lafs/tree/1010-anonymous-client-mode
It seems like we need to fix at least two things in the above changeset:
- the tub.port = UNREACHABLE isn't correct. currently it's "tub.port=" to express "no listening"
- the is_anonymous helper function seems to not account for storage server operators who specify their listening port via a TCP endpoint instead of tor/i2p. Referring to code here:
Question: Should is_anonymous only check for "tub.port=AUTO" condition?
We should not assume that tor will be listening on loopback. As various virtualization environments are becoming more popular tor might well be listening on a bridge interface or something like that.
comment:61 Changed at 2016-03-31T09:03:21Z by dawuud
In various conversations and tickets Leif and Zooko both point out that our language is misleading because in this case the word "anonymous" doesn't mean unlinkable identity from the storage server's perspective at all... but merely means that our origin IP is hidden via the network transport. This is arguably not anonymity at all. It is important to make this distinction I think.
Question: Do any you have suggestions for how to make this explicitly clear to the user? Should we change the name of the configuration option to something else?
comment:62 Changed at 2016-03-31T18:04:02Z by marlowe
How about mask-origin-ip?
comment:63 Changed at 2016-06-28T18:20:37Z by warner
- Milestone changed from 1.12.0 to 1.13.0
moving most tickets from 1.12 to 1.13 so we can release 1.12 with magic-folders
comment:64 Changed at 2016-08-30T07:00:11Z by warner
- Milestone changed from 1.13.0 to 1.12.0
I think we're ready to add this flag, and then a tahoe create-client CLI argument to turn it on from the very beginning. So we need to make some decisions. I'm going to propose the following.. please let me know what you think.
- tahoe create-client --anonymous or tahoe create-node --anonymous causes [node] anonymous = true to be written to tahoe.cfg
- when [node] anonymous = true, any of the following problems will cause tahoe start to throw an exception before any network traffic has occurred:
- [node] tub.location = contains any tcp: hints
- [node] tub.location = is empty or missing, since that means AUTO, which means a tcp: hint with automatically-detected addresses
- [connections] lacks a tcp = tor line, since otherwise introducer and server connections could use raw TCP connections
There are a few other things we might consider adding, but I'm inclined to not include them:
- require all tub.location hostnames (for any type of hint) to end in .onion or .i2p
- require tub.socks_port to point at a local host (maybe limit it to 127.0.0.1 and localhost, or maybe to any RFC1918 address)
- if [storage] enabled = false and [helper] enabled = false (i.e. we're a pure client), then require tub.port= (empty), to forbid the main tub from listening at all
I'm tentatively pulling this into the 1.12 milestone, because I think we're close, and it'd be awesome to include proper (client-side) Tor/I2P support, and I think this flag is a necessary part of that.
comment:65 Changed at 2016-08-30T19:19:01Z by warner
From today's devchat, folks seemed ok with my proposal, and with omitting the other three items (constraints on tub.location hostnames, tub.socks_port, and forbidding pure-clients from listening).
However Zooko (and others) pointed out that "anonymous" is not the best name for this flag (it's inaccurate, imprecise, and carries negative connotations for a lot of folks outside our community). private, or private-ip seems better:
- Tor/I2P can protect your IP address
- servers can still figure out they're dealing with the same client as last time (e.g. you always start by fetching the same rootcap)
- the IntroducerClient will use a persistent TubID, so the Introducer (server) knows they're seeing the same client as last time
- when we add Accounting, the client's accounting pubkey will (probably) be persistent, making it immediately obvious to servers that they're dealing with the same client as last time
Switching to a term that makes it clear that we're specifically protecting the IP address means that we don't need to include #2384 in its scope (randomized TubIDs).
Do people prefer private = true, or private-ip = true ? Or something else?
comment:66 Changed at 2016-08-31T09:04:16Z by warner
My pal George Tankersley suggested a great idea for this today:
[node] reveal-IP-address = false
(the default value of reveal-IP-address= is True, when left unspecified)
That is specific (Tor/I2P are only about not revealing your IP address), non-negatively-connotative, and encourages the obvious constructive question of "why the heck isn't false the default?" (which then begins the conversation about performance consequences of Tor/I2P connections and the additional install/run-time dependencies).
comment:67 Changed at 2016-08-31T09:51:30Z by warner
https://github.com/tahoe-lafs/tahoe-lafs/pull/326 adds this safety flag, and docs/tests.
If the syntax is ok with everyone, the next step will be to add a tahoe create-client/create-node CLI argument which sets this flag. Maybe --no-reveal-IP-address? Or --reveal-IP-address=false? --hide-IP-address?
comment:68 Changed at 2016-08-31T19:32:00Z by Brian Warner <warner@…>
In d47fc0f/trunk:
comment:69 Changed at 2016-08-31T22:20:23Z by warner
I think we're converging on --hide-ip. It's not as provocative as --no-reveal-ip or --reveal-ip=false, but I think it's simpler, and just as accurate.
comment:70 Changed at 2016-09-02T06:26:21Z by warner
--hide-ip patch in https://github.com/tahoe-lafs/tahoe-lafs/pull/330
comment:71 Changed at 2016-09-02T17:04:35Z by Brian Warner <warner@…>
In d0da17a/trunk:
comment:72 Changed at 2016-09-02T23:56:50Z by warner
- Resolution set to fixed
- Status changed from new to closed
Ok, with the landing of --hide-ip, I think we can close this one: we've implemented pretty much everything we've talked about in this ticket.
comment:73 Changed at 2016-10-09T06:11:26Z by Brian Warner <warner@…>
In 5a195e2/trunk:
What is the motivation for using all local addresses normally?