#2759 closed enhancement (fixed)

separate Tub per server connection

Reported by: warner Owned by: dawuud
Priority: normal Milestone: 1.12.0
Component: code-network Version: 1.10.2
Keywords: Cc:
Launchpad Bug:

Description

Leif, dawuud, and I had an idea during today's devchat: what it we used a separate Tub for each server connection?

The context was Leif's use case, where he wants a grid in which all servers (including his own) advertise a Tor .onion address, but he wants to connect to his own servers over faster direct TCP connections (these servers are on the local network).

Through a combination of the #68 multi-introducer work, and the #517 server-override work, the plan is:

  • each introducer's data is written into a cache file (YAML-format, with one clause per server)
  • there is also an override file, which contains YAML clauses of server data that should be used instead-of/in-addition-to the data received from the introducer
  • the StorageFarmBroker, when deciding how to contact a server, combines data from all introducers, then updates that dict with data from the override file

So Leif can:

  • start up the node normally, wait for the introducers to collect announcements
  • copy the cached clauses for his local servers into the override file
  • edit the override file to modify the FURL to use a direct "tcp:HOST:PORT" hint, instead of the "tor:XYZ.onion:80" hint that they advertised

But now the issue is: tahoe.cfg has an anonymous=true flag, which tells it to configure Foolscap to remove the DefaultTCP connection-hint handler, for safety: no direct-TCP hints will be honored. So how should this overridden server use an otherwise-prohibited TCP connection?

So our idea was that each YAML clause has two chunks of data: one local, one copied from the introducer announcement. The local data should include a string of some form that specifies the properties of the Tub that should be used for connections to this server. The StorageFarmBroker will spin up a new Tub for each connection, configure it according to those properties, then call getReference() (actually connectTo(), to get the reconnect-on-drop behavior).

The tahoe.cfg settings for foolscap connection-hint handlers get written into the cached introducer data. StorageFarmBroker creates Tubs that obey those rules because those rules are sitting next to the announcement that will contain the FURL.

In this world, we'll have one Tub for the server (if any), with a persistent identity (storing its key in private/node.privkey as usual). Then we'll have a separate ephemeral Tub for each storage server, which doesn't store its private key anywhere. (I think we'll also have a separate persistent Tub for the control-port / logport).

Potential issues:

  • performance: we have to make a new TLS key (probably RSA?) for each connection. probably not a big deal.
  • We can't share client-side objects between storage servers. We don't do this now, so it's no big loss. The idea would be something like: instead of the client getting access to a server ShareWriter object and sending .write(data) messages to it, we could flip it around and *give* the server access to a client-side ShareReader object, and the server would issue .read(length) calls to it. That would let the server set the pace more directly. And then the server could sub-contract to a different server by passing it the ShareReader object, then step out of the conversation entirely. However this would only work if our client could accept inbound connections, or if the subcontractor server already had a connection to the client (maybe the client connected to them as well).
  • We lose the sneaky NAT-bypass trick that lets you run a storage server on a NAT-bound machine. The trick is that you also run a client on your machine, it connects to other client+server nodes, then when those nodes want to use your server, they utilize the existing reverse connection (Foolscap doesn't care who originally initiated a connection, as long as both sides have proved control over the right TLS key). This trick only worked when those other clients had public IP addresses, so your box could connect to them.

None of those issues are serious: I think we could live with them.

And one benefit is that we'd eliminate the TubID-based correlation between connections to different storage servers. This is the correlation that foils your plans when you call yourself Alice when you connect to server1, and Bob when you connect to server2.

It would leave the #666 Accounting pubkey relationship (but you'd probably turn that off if you wanted anonymity), and the timing relationship (server1 and server2 compare notes, and see that "Alice" and "Bob" connect at exactly the same time, and conclude that Alice==Bob). And of course there's the usual storage-index correlation: Alice and Bob are always asking for the same shares. But removing the TubID correlation is a good (and necessary) first step.

The StorageFarmBroker object has responsibility for creating IServer objects for each storage server, and it doesn't have to expose what Tub it's using, so things would be encapsulated pretty nicely. (In the long run, the IServer objects it provides won't be using Foolscap at all).

Change History (25)

comment:1 Changed at 2016-03-30T23:34:28Z by dawuud

Here's my latest dev branch that partially implements this design:

https://github.com/david415/tahoe-lafs/tree/introless-multiintro_yaml_config.1

  • the caching needs a little bit of work still; i never delete the old cache but grow it. Maybe we should delete the old cache file upon connecting to the introducer?

TODO:

  • teach tahoe to use another YAML configuration file that specifies ALL the introducers w/ furl and transport handler map + server overrides with transport handler map
Last edited at 2016-03-30T23:40:25Z by dawuud (previous) (diff)

comment:2 Changed at 2016-03-30T23:54:16Z by dawuud

like this?

connections.yaml

introducers:
    intro_nick1:
      furl: "furl://my_furl1"
      connection_types:
        tor:
          handler: foolscap_plugins.socks
          parameters:
            endpoint: "unix:/var/lib/tor/tor_unix.socket"
    intro_nick2:
      furl: "fur2://my_furl2"
      connection_types: ...
servers:
    server_id_1:
       server_options:
         key_s: "..."
         announcement:
           server_id: "..."
           furl: "furl://my_storage_server1/..."
           nickname: "storage1"
       connection_types: ...
    server_id_2:
       server_options:
         key_s: "my_secret_crypto_key2"
         announcement: announcement_2
       connection_types: ...
Last edited at 2016-03-31T11:28:39Z by dawuud (previous) (diff)

comment:3 Changed at 2016-03-31T12:01:03Z by leif

I like this connections.yaml layout a lot!

Maybe we should have a top-level default connection_types key too, to avoid repeating ourselves in each server and introducer definition? (When it exists, the server and introducer-level connection_types dictionary should be used in place of the default dictionary, not in addition to it).

I'm a little hesitant about requiring (local) introducer nicknames because people will have to make one up and it'll probably often end up being "My Introducer" or something like that, but it will certainly make the introducer list on the welcome page easier to understand when there are several introducers. The nickname can also be used as the filename for the introducer's yaml announcement cache.

comment:4 Changed at 2016-03-31T23:36:17Z by dawuud

my latest dev branch i have all the unit tests working... and i rewrote the multi-intro tests to use our new connections.yaml file; also got the static server config working although i haven't written unit tests for that yet:

https://github.com/david415/tahoe-lafs/tree/introless-multiintro_yaml_config.1

the next step is to load the connection_types sections of the yaml file.

comment:5 Changed at 2016-04-02T12:08:47Z by dawuud

ok! my dev branch is ready for code review. it passes ALL unit tests except two:

  • allmydata.test.test_introducer.SystemTest?.test_system_v1_server
  • allmydata.test.test_introducer.SystemTest?.test_system_v2_server

note: i did not make these features work for the v1 intro client.

comment:6 Changed at 2016-04-02T17:07:24Z by dawuud

  • Owner set to dawuud

comment:7 Changed at 2016-04-02T17:09:46Z by dawuud

here's a usefull diff to show how my dev branch differs from my introless-multiintro which is the same as Leif's introless-multiintro except that it has the latest upstream/master merged in.

https://github.com/david415/tahoe-lafs/pull/7/files

comment:8 Changed at 2016-04-19T12:17:49Z by dawuud

i made a new foolscap dev branch with the SOCKS5 plugin and merge upstream master into it https://github.com/david415/foolscap/tree/tor-client-plugin.4

i've also updated the latest tahoe-lafs dev branch and i fixed some of the introducer unit tests that were failing... but i thought that i had previously gotten all or almost all of them to pass.

https://github.com/david415/tahoe-lafs/tree/introless-multiintro_yaml_config.1

i'm also a bit confused as to why the web interface is totally broken.

comment:9 Changed at 2016-04-19T12:19:26Z by dawuud

replying here to comment of meejah https://tahoe-lafs.org/trac/tahoe-lafs/ticket/517#comment:73

since my foolscap changes aren't merged upstream they require you to do some extra work to get it all to build correctly. i usually pip install tahoe-lafs first and then uninstall old foolscap and install my new foolscap.

comment:10 Changed at 2016-04-19T17:05:48Z by dawuud

this shows a diff relative to the changes in leif's multiintro introducerless branch but it also has the upstream/master merged in: https://github.com/david415/tahoe-lafs/pull/8

this is diffed against upstream/master https://github.com/tahoe-lafs/tahoe-lafs/pull/260

further code review should be conducted against one of these pull requests and not an older one

comment:11 Changed at 2016-04-29T10:01:24Z by dawuud

09:33 < warner> dawuud: I haven't looked at the branch recently, so maybe you already did it this way, but I think the first step would 
                be a single (merging + test-passing) PR that only adds Tub creation to the NativeStorageServer, and doesn't make any 
                changes to the Introducer or adds the yaml stuff
09:34 < warner> I think that's something which would be ok to land on it's own, and ought not to break anything
09:35 < warner> and wouldn't change user-visible behavior (mostly), so wouldn't require a lot of docs or study
09:35 < warner> step 2 is probably to have the introducer start writing to the yaml file, but not have anything which reads from it yet
09:36 < warner> step 3 would be reading from the yaml file too, but still have exactly one introducer
09:36 < warner> step 4 is to add the override file (but still the only permissible connection type is "tcp")
09:37 < warner> step 5 is to add multiple/zero introducers
09:37 < warner> step 6 is to add tor and the allowed-connection-types stuff
Last edited at 2016-04-29T10:01:42Z by dawuud (previous) (diff)

comment:12 Changed at 2016-05-02T13:26:06Z by dawuud

here's my attempt to make the storage broker client make one tub per storage server:

https://github.com/david415/tahoe-lafs/tree/storage_broker_tub.0

so far i've been unable to make some of the unit tests pass.

comment:13 Changed at 2016-05-03T06:45:27Z by dawuud

last night meejah fixed the test that was failing, here: https://github.com/meejah/tahoe-lafs/tree/storage_broker_tub.0

warner, please review. this is step 1 as you outlined above.

i'm going to begin work on step 2

comment:14 Changed at 2016-05-03T09:05:15Z by dawuud

warner,

step 2 --> https://github.com/david415/tahoe-lafs/tree/intro_yaml_cache.0

I'll wait for review before proceeding further with this ticket.

comment:15 Changed at 2016-05-03T23:12:57Z by Brian Warner <warner@…>

In f5291b9/trunk:

document Tub-per-server change

refs ticket:2759

comment:16 Changed at 2016-05-03T23:53:26Z by warner

Landed step 1.. thanks!

Looking at step 2: here's some thoughts:

  • which yaml library should we use? Could you update _auto_deps.py to add it's pypi name?
  • let's make self.cache_filepath into self._cache_filepath
  • should we use yaml.safe_load() instead of plain load()? (I don't know what exactly is unsafe about plain load, and we aren't parsing files written by others, but maybe it's good general practice to use safe_load() by default)
  • I didn't know about FilePath.setContent().. that's cool, maybe we should replace fileutil.write_atomically() with it
  • at some point (maybe not now) we should put a helper method in allmydata.node.Node which returns the NODEDIR/private/ filename for a given basename, so the magic string "private" isn't duplicated all over the place.
  • let's wrap the new long lines in test_introducer.py
  • we need a test that adds an announcement, then loads the YAML file and makes sure the announcement is present. Probably in test_introducer.Announcements.test_client_*, basically a yaml.load(filename) and checking that the announcement and key string are correct (including the cases when there is no key, because the sender didn't sign their announcement, or because it went through an old v1 introducer)
  • we should also test duplicate announcements: I'm guessing that we want the YAML file to only contain a single instance of each announcement, and new announcements of the same server should replace the old one (instead of storing both). What's our plan for managing the lifetime of these cached servers? Do we remember everything forever? Or until they've been unreachable for more than X days? (in that case we need to store last-reached timestamps too).

I like where this is going!

comment:17 Changed at 2016-05-04T02:56:48Z by warner

idnar (on IRC) pointed out that yaml.load() will, in fact, perform arbitrary code execution. So I guess safe_load() is a good idea.

comment:18 Changed at 2016-05-04T10:14:05Z by dawuud

OK i've made those corrections. Although I think my unit tests need a bit of work. I found that the nickname was not propagated into the announcement for some reason.

I was thinking that instead of having a cache expirey policy we can just replace the old cache file once we connect to the introducer. What do you think of this?

comment:19 Changed at 2016-05-04T18:19:49Z by warner

Oh, I like that. It sounds like the simplest thing to implement, and mostly retains the current behavior.

We need to think through how replacement announcements get made: I think announcements have sequence numbers, and highest-seqnum wins. If we write all announcements into the cache (as opposed to rewriting the cache each time with only the latest announcement for each server), then we'll have lots of old seqnums in the file, but we can filter those out when we read it.

Also there's a small window when the introducer restarts, before the servers have reconnected to it, when it won't be announcing very much. Our client will erase its cache when it reconnects, and we'll have a small window when the cache is pretty empty. However if the client is still running (it hasn't bounced), it will still remember all the old announcements in RAM, so those connections will stay up. And if it does bounce, then it's no worse than it was before the cache.

comment:20 Changed at 2016-05-05T10:06:59Z by dawuud

OK here's my "step 3 would be reading from the yaml file too, but still have exactly one introducer"

https://github.com/david415/tahoe-lafs/tree/read_intro_yaml_cache.0

I am not sure exactly how to implement cache purging or announcement replacements. The naive way I described isn't even implemented here... but to do that I could simply remove the cache file when we successfully connect to the introducer.

comment:21 Changed at 2016-05-10T18:27:10Z by dawuud

Here's the latest "step 2 is probably to have the introducer start writing to the yaml file, but not have anything which reads from it yet" :

https://github.com/tahoe-lafs/tahoe-lafs/pull/278

please review

I also have dev branches available for "step 3", here: https://github.com/david415/tahoe-lafs/tree/read_intro_yaml_cache.2

but maybe i can "regenerate" that branch after "step 2" is landed... please do let us know.

comment:22 Changed at 2016-05-10T20:04:21Z by Brian Warner <warner@…>

In b49b409/trunk:

Merge branch 'pr278': write (but don't read) YAML cache

refs ticket:2759

comment:24 Changed at 2016-05-11T10:07:41Z by dawuud

09:36 < warner> step 4 is to add the override file (but still the only permissible connection type is "tcp")

"step 4" -> https://github.com/david415/tahoe-lafs/tree/2759.add_connections_yaml_config.0

Here in this minimal code change I've only added one feature:

  • a connections.yaml configuration file with a "storage" section which allows the user to specify storage nodes. This effectively overrides announcements from the introducer about those storage nodes.

please review.

comment:25 Changed at 2016-05-11T23:58:28Z by warner

  • Milestone changed from undecided to 1.12.0
  • Resolution set to fixed
  • Status changed from new to closed

I think we've exhausted the purview of this ticket, which is specifically about using a separate Tub for each storage-server connection. Let's move the more general "cache server information and use it later, maybe with overrides" into a separate ticket: #2788

Since f5291b9 landed the per-server Tub, I'm closing this ticket. Work on PR281 and dawuud's other branches will continue in #2788.

Note: See TracTickets for help on using tickets.