[tahoe-lafs-trac-stream] [Tahoe-LAFS] #2759: separate Tub per server connection

Tue Mar 29 21:41:38 UTC 2016

#2759: separate Tub per server connection
--------------------------+---------------------------
 Reporter:  warner        |          Owner:
     Type:  enhancement   |         Status:  new
 Priority:  normal        |      Milestone:  undecided
Component:  code-network  |        Version:  1.10.2
 Keywords:                |  Launchpad Bug:
--------------------------+---------------------------
 Leif, dawuud, and I had an idea during today's devchat: what it we used a
 separate Tub for each server connection?

 The context was Leif's use case, where he wants a grid in which all
 servers (including his own) advertise a Tor .onion address, but he wants
 to connect to his own servers over faster direct TCP connections (these
 servers are on the local network).

 Through a combination of the #68 multi-introducer work, and the #517
 server-override work, the plan is:

 * each introducer's data is written into a cache file (YAML-format, with
 one clause per server)
 * there is also an override file, which contains YAML clauses of server
 data that should be used instead-of/in-addition-to the data received from
 the introducer
 * the !StorageFarmBroker, when deciding how to contact a server, combines
 data from all introducers, then updates that dict with data from the
 override file

 So Leif can:

 * start up the node normally, wait for the introducers to collect
 announcements
 * copy the cached clauses for his local servers into the override file
 * edit the override file to modify the FURL to use a direct
 "tcp:HOST:PORT" hint, instead of the "tor:XYZ.onion:80" hint that they
 advertised

 But now the issue is: tahoe.cfg has an `anonymous=true` flag, which tells
 it to configure Foolscap to remove the `DefaultTCP` connection-hint
 handler, for safety: no direct-TCP hints will be honored. So how should
 this overridden server use an otherwise-prohibited TCP connection?

 So our idea was that each YAML clause has two chunks of data: one local,
 one copied from the introducer announcement. The local data should include
 a string of some form that specifies the properties of the Tub that should
 be used for connections to this server. The !StorageFarmBroker will spin
 up a new Tub for each connection, configure it according to those
 properties, then call `getReference()` (actually `connectTo()`, to get the
 reconnect-on-drop behavior).

 The tahoe.cfg settings for foolscap connection-hint handlers get written
 into the cached introducer data. !StorageFarmBroker creates Tubs that obey
 those rules because those rules are sitting next to the announcement that
 will contain the FURL.

 In this world, we'll have one Tub for the server (if any), with a
 persistent identity (storing its key in private/node.privkey as usual).
 Then we'll have a separate ephemeral Tub for each storage server, which
 doesn't store its private key anywhere. (I think we'll also have a
 separate persistent Tub for the control-port / logport).

 Potential issues:

 * performance: we have to make a new TLS key (probably RSA?) for each
 connection. probably not a big deal.
 * We can't share client-side objects between storage servers. We don't do
 this now, so it's no big loss. The idea would be something like: instead
 of the client getting access to a server !ShareWriter object and sending
 `.write(data)` messages to it, we could flip it around and *give* the
 server access to a client-side !ShareReader object, and the server would
 issue `.read(length)` calls to it. That would let the server set the pace
 more directly. And then the server could sub-contract to a different
 server by passing it the !ShareReader object, then step out of the
 conversation entirely. However this would only work if our client could
 accept inbound connections, or if the subcontractor server already had a
 connection to the client (maybe the client connected to them as well).
 * We lose the sneaky NAT-bypass trick that lets you run a storage server
 on a NAT-bound machine. The trick is that you also run a client on your
 machine, it connects to other client+server nodes, then when those nodes
 want to use your server, they utilize the existing reverse connection
 (Foolscap doesn't care who originally initiated a connection, as long as
 both sides have proved control over the right TLS key). This trick only
 worked when those other clients had public IP addresses, so your box could
 connect to them.

 None of those issues are serious: I think we could live with them.

 And one benefit is that we'd eliminate the TubID-based correlation between
 connections to different storage servers. This is the correlation that
 foils your plans when you call yourself Alice when you connect to server1,
 and Bob when you connect to server2.

 It would leave the #666 Accounting pubkey relationship (but you'd probably
 turn that off if you wanted anonymity), and the timing relationship
 (server1 and server2 compare notes, and see that "Alice" and "Bob" connect
 at exactly the same time, and conclude that Alice==Bob). And of course
 there's the usual storage-index correlation: Alice and Bob are always
 asking for the same shares. But removing the TubID correlation is a good
 (and necessary) first step.

 The !StorageFarmBroker object has responsibility for creating IServer
 objects for each storage server, and it doesn't have to expose what Tub
 it's using, so things would be encapsulated pretty nicely. (In the long
 run, the IServer objects it provides won't be using Foolscap at all).

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2759>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage