removing IP-address autodetection, Tor integration

Thu Jun 18 19:31:16 UTC 2015

So Tahoe's current default, when you create a client+server node with
"tahoe create-node", is to:

* allocate an unused TCP listening port number
* run iputil (or ifconfig/etc) to figure out all your IP addresses
* build ADDR:PORT foolscap "connection hints" for all of them
* concatenate the hints to come up with the Tub location (but allow
  tahoe.cfg "tub.location" to override this)
* use the location to generate the storage-server FURL
* advertise the storage-server's FURL through the Introducer

Zooko argues that this is not helping most people [1], and causes a
bunch of problems [2], so he advocates removing the autodetection
feature. The argument is that most server operators 1) know what their
hostname is, 2) the don't mind (and in fact prefer) to configure it
manually, and 3) the automatic hints are pretty much useless.

I agree that autodetection only provides value when:

1: you have a public IP address, or
2: you're running a local test grid that uses 127.0.0.1 exclusively, or
3: you're running unit tests

and even in case #1 it's common to edit the location ("tub.location" in
tahoe.cfg) to use HOSTNAME:PORT instead of ip addresses, and to remove
the generally-useless 127.0.0.1 hint. Cases 2 and 3 are not end-user
facing, and could be handled differently (by configuring the nodes to
only advertise 127.0.0.1 hints)

If you don't have a public IP address, but have configured
port-forwarding to allow access from the outside world anyways, you'd
tell the server to listen on one port, but advertise something different
to the Introducer.

My hope is that we'll eventually get to a peer-to-peer architecture
where you can run a storage server without knowing about NAT or what an
IP address is, but (as Zooko points out) today's autodetection feature
is not really a stepping stone towards that world.

Overlapping with this is a recent plan to make it easier to have your
Tahoe server listen on a Tor "hidden service", and to make sure all your
outbound connections are running through Tor too. With hidden services,
you *can* run a storage server without knowing about IP addresses,
because they effectively bypass NAT and similar obstacles (it's not
fast, but there are fast non-Tor things that work this way without
providing anonymity). In this mode, we want to autodetect the .onion
address that we'll be listening on, rather than the IP address (which we
want to hide anyways).

So our plan is to remove the autodetection logic, and make it possible
to tell Tahoe to tell Foolscap to listen on a generalized "listening
specification string", and to have a collection of Foolscap plugins that
can handle Tor, i2p, and other protocols (include a funky IPFS-based
thing which I think might get us closer to that ideal p2p world). Tahoe
would be agnostic about these protocols: it would just pass the strings
through to Foolscap without looking at them.

Here are some examples of creating a Tahoe storage server with various
configurations:

* tahoe create-node --listen tcp:0:hostname=example.com

 This would allocate a TCP port, then advertise "example.com:PORT".

* tahoe create-node --listen tcp:8091:hostname=example.com

 This would listen on a fixed TCP port, advertising "example.com:8091".
 We'd probably recommend something like this in the docs.

* tahoe create-node --listen tcp:8091 --advertise tcp:example.com:5008

 This would override the advertisement, useful when you've externally
 configured a port forwarding through your firewall. "example.com" is
 the external address, 5008 is the external listening port, and 8091 is
 the local port to which those connections will be forwarded.

* tahoe create-node --listen tcp:8091:interface=127.0.0.1
                    --advertise tor:XYZ.onion:80

 This would be useful if you've manually configured a local Tor daemon
 to route hidden-service connections for "XYZ.onion" to local port 8091.

* tahoe create-node --listen onion

 This would use a txtorcon[3]-based foolscap plugin to find a copy of
 Tor (on $PATH), create a persistent working directory for it, allocate
 a local listening port, configure a hidden service which directs
 incoming Tor connections to that port, then advertises the XYZ.onion
 address (instead of a normal hostname). The configuration is
 persistent, so the next time you launch tahoe, it will re-launch Tor in
 the same directory, and will use the same onion address.

* tahoe create-node --listen onion:controlsocket=X:authcookie=Y

 This would use a pre-existing Tor daemon, controlled with the named
 socket and cookie, to allocate an onion address.

There will be a different option (maybe "--connect-with"?) that will
specify how outbound connections are made. The basic idea is that
Foolscap will have some plugins that enable it to use non-TCP
connections hints, and that if the right pieces are in place (i.e.
txtorcon and /usr/bin/tor are installed), it will use them
automatically. But the --connect-with= argument will let you override
the built-in handling of TCP hints, routing them through Tor instead.
Other options to connect-with (maybe "socksport=" ?) might let you use a
pre-existing Tor daemon instead of having Foolscap launch one itself.

(note that Foolscap already ignores connection hints that it can't use)

Then we'll add a "--anonymous" option, which sets a safety flag: if
--anonymous is set, but something about the rest of your configuration
could leak your IP address, the node will refuse to start (exiting with
an error message that tells you what needs to be fixed). (ticket #1010)

When "tahoe create-node" is run with --listen but not --advertise, the
command will run a Foolscap API (during node creation, before tahoe.cfg
is written out) that figures out what needs to be advertised. This may
be as simple as allocating an unused listening port, or as complex as
configuring/launching Tor and registering the hidden service. Once all
of this is complete, the "tub.location" advertisement string is written
into tahoe.cfg, then everything gets shut down. Later, when "tahoe
start" is run, everything will be pre-computed and available in
tahoe.cfg, which will simplify node startup considerably (the FURLs can
be computed immediately, instead of waiting for the Tub to start up).
Starting the Tub might involve starting a Tor daemon, but we don't need
to wait for that to learn what our FURLs are.

(For the curious: the --listen and computed/overridden --advertise
values would be written into tahoe.cfg as-is. All calls to Foolscap will
include a state directory, somewhere under NODEDIR/private/foolscap/ ,
which the plugin can use to record allocated ports/addresses. The plugin
will use the provided --listen value and the recorded state to come up
with a twisted ServerEndpoint object, which will do the actual
listening. Likewise the outbound plugin will take a connection hint and
the recorded state to create a CliendEndpoint for the outbound side).

txtorcon is pretty small, so we can probably depend upon it
unconditionally. Then it's a matter of whether the Tor executable is
available or not. We can probably do something similar for i2p and other
connection technologies that require a helper daemon.

thoughts?
 -Brian

[1]: https://tahoe-lafs.org/pipermail/tahoe-dev/2012-July/007533.html
[2]:
https://tahoe-lafs.org/trac/tahoe-lafs/query?status=!closed&keywords=~iputil&order=priority
[3]: https://github.com/meejah/txtorcon