[tahoe-lafs-trac-stream] [Tahoe-LAFS] #2788: add connection-policy configuration (was: cache server information, use later (with overrides))

Tue May 17 22:05:53 UTC 2016

#2788: add connection-policy configuration
------------------------------+-----------------------
     Reporter:  warner        |      Owner:
         Type:  enhancement   |     Status:  new
     Priority:  normal        |  Milestone:  undecided
    Component:  code-network  |    Version:  1.11.0
   Resolution:                |   Keywords:
Launchpad Bug:                |
------------------------------+-----------------------

Comment (by warner):

 Updating ticket title: this ticket is now the right place for
 connections.yaml stuff, and using the cache is just a minor part of that
 effort.

 We spent most of today's devchat exploring syntax options for the
 "connection policy configuration file", aka connections.yaml.

 We're still not entirely sure what to call it. `connections.yaml`?
 `policy.yaml`? `config.yaml`? We're also not sure about using YAML at all.
 We did decide that we like the nested-dictionary aspect of YAML, but we're
 leery of the gotchas that YAML offers (like strings vs identifiers vs
 booleans vs numbers, where some things are quoted and others are not). I'd
 still like to evaluate TOML for this, but I don't know if it does nested
 dictionaries nicely.

 But we did come up with syntax ideas that seem pretty good.

 == Introducers ==

 First off: introducers. The `introducers:` section would map introducer
 nickname to FURL:

 {{{
 introducers:
     intro_nick1:
       furl:
 pb://u33m4y7klhz3bypswqkozwetvabelhxt@tor:abcdef.onion:80/eiu2i7p6d6mm4ihmss7ieou5hac3wn6b
     intro_nick2:
       furl:
 pb://u33m4y7klhz3bypswqkozwetvabelhxt@tcp:example.org:12345/eiu2i7p6d6mm4ihmss7ieou5hac3wn6b
 }}}

 The nickname is used to name the cache file. The introducer.furl from
 `tahoe.cfg` is also used (if any), and it gets some default-ish nickname
 like `introducer`.

 If someone really convinces us, we could add connection-policy clauses
 (described below). But in the devchat we couldn't think of a use case
 where you'd want different policies for connecting to different
 introducers. And you can already influence the connection by just editing
 the FURL to change the hint types.

 We also agreed that having different connection policies for the servers
 you learned from one introducer versus another was not likely to be a good
 idea (as in, "every server I learn from introducer A should get Tor-ified,
 but I'll make direct TCP connections to servers I meet through introducer
 B). Especially when dawuud pointed out that you might learn about the same
 storage server from two different introducers, in which case you'd have to
 figure out how to reconcile their two policies. But if you really wanted
 to do this, you could have another policy stanza under each introducer
 section, somehow named to make it clear that you're talking about servers
 learned *through* the introducer, rather than how you talk to the
 introducer itself.

 "Introducerless mode" means `tahoe.cfg` has an empty `introducer.furl=`,
 and there is no `introducers:` section in the YAML file. "Multi-introducer
 mode" means there are two or more introducers among tahoe.cfg and
 `introducers:`.

 == Connection Policy ==

 Now the global connection policy section. This tells Tahoe what to do when
 asked to make connections (to Foolscap connection hints now, later to HTTP
 things). For each type of connection hint, the policy file specifies a
 "handler" and some arguments. The default values are something like:

 {{{
 global:
   connection_types:
     tcp:
       handler: tcp
     tor:
       handler: tor
       launch_my_own_tor: true
 }}}

 (We aren't sure this should be in a "global" section.. maybe the
 "connection_types" string should be at the top level. Also we aren't sure
 that it should be spelled "connection_types": maybe "connections" or
 "foolscap_connections" or something shorter)

 The `tcp:` section controls what happens when a FURL wants you to connect
 via plain TCP (because the connection hint looks like
 `tcp:example.org:1234`). The `handler: tcp` goes to a lookup table that
 tells Tahoe to use a foolscap connection handler that uses plain TCP. In
 the future, `handler: xyz` could ask the setuptools/twisted/zope.interface
 plugin system for a module that has registered itself with the name "xyz"
 (and to handle some specific Interface).

 The `tor:` section handles connection hints like `tor:abcxyz.onion:80`.
 The `tor` handler would probably live in the Tahoe source tree (for now),
 and would try to import txtorcon (and log+ignore if it couldn't be
 imported). The keys under `tor:` that aren't "handler" are passed as
 keyword arguments into the plugin. In this case, we're telling the tor-
 for-foolscap plugin that it's expected to launch it's own copy of the Tor
 daemon. Other options would be like `control_endpoint:
 tcp:localhost:9051`, to use a pre-running system Tor daemon.

 (question: we originally discussed:

 {{{
     tor:
       handler: tor
       args:
         control_endpoint: tcp:localhost:9051
 }}}

 with the extra `args:` because we originally thought `handler:` would be a
 fully-qualified import+funcname string, like what setuptools entrypoint
 specifications do. In that case, `args` would be passed as keyword
 arguments to the referenced callable. But with `handler:` being an index
 instead, we could probably be a bit more flexible about how arguments are
 passed. Also having the args at the same level as `handler:` is a bit
 cleaner. Maybe we should just pass the entire dictionary into the plugin
 and let it ignore `handler:` itself)

 For nodes that are configured to use Tor instead of plain TCP, there would
 be two nearly-identical sections, one for `tcp:` and one for `tor:`, both
 specifying the "tor" handler:

 {{{
 global:
   connection_types:
     tcp:
       handler: tor
       launch_my_own_tor: true
     tor:
       handler: tor
       launch_my_own_tor: true
 }}}

 Note that any connection hint type *not* listed here would be ignored, to
 avoid the possibility of a new future connection type accidentally
 violating anonymity. Or maybe we only do that if `anonymous=1` is set.

 We can also imagine:

 {{{
     tcp:
       handler: socks
         socks_endpoint: tcp:localhost:9050
 }}}

 == Server Overrides and New Servers ==

 The `servers:` clause would serve two purposes. The first is to provide
 data on brand new (synthetic) servers, ones that might not be advertised
 through any introducer. This would mostly be used for the "introducerless"
 mode, but could also be used augment the regular introducer-based grid
 with a private server for just your own client.

 The second is to modify data about introducer-advertised servers,
 generally their connection policy and/or FURL. The following example would
 allow a private TCP-based server on a mostly-Tor grid (where the global
 policy says tcp should go to tor).

 {{{
 servers: # storage servers only, not hypothetical other kinds of servers
     "v0-c2ng2pbrmxmlwpijn3mr72ckk5fmzk6uxf6nhowyosaubrt6y5mq": # does this
 need quotes?
       nickname: foo # mainly for brand-new-servers, not for overriding
       anonymous-storage-FURL:
 pb://u33m4y7klhz3bypswqkozwetvabelhxt@tcp:10.1.2.3:51298/eiu2i7p6d6mm4ihmss7ieou5hac3wn6b
       permutation-seed-base32: w2hqnbaa25yw4qgcvghl5psa3srpfgw3 # maybe
 default to pubkey
       connection_types: # replaces global mapping
         tcp:
           handler: tcp
 }}}

 The dictionary key is the server identity key (an !Ed25519 public
 verifying key, currently used as the main index for introducer
 announcements). (we need to experiment to see if YAML can have hyphens or
 non-identifier characters in dictionary keys; we'd rather not put quotes
 around the string if we can avoid it).

 Many of the other keys would replace or augment the data heard from the
 introducer. "nickname" is published by servers themselves, but could be
 overridden locally, in which case it would behave more like a pet name.
 `anonymous-storage-FURL` is the current way that clients connect to
 storage servers, although that will change when #666 Accounting happens
 (probably to `accounting-storage-FURL` or similar), and `storage-URL` will
 happen when we move to an HTTP-based storage protocol. `permutation-seed-
 base32` is important for compatibility of share placement for older
 servers, but I think newer servers default to using the !Ed25519 pubkey
 for this, so it could maybe be omitted for synthetic servers.

 And then the `connection_types` section would control the connection
 policy. The idea is that each server (the
 `allmydata.storage_client.NativeStorageServer` instance) will use the
 connection policy from the `servers[$SERVERID][connection_types]` dict if
 that's present, else it will fall back to the `global[connection_types]`
 config.

 We must be careful to prevent servers from publishing `connection_types:`
 themselves. We originally thought of putting the server-provided keys in
 one place, and the locally-specified keys in another, like:

 {{{
 servers:
     v0-c2ng2pbrmxmlwpijn3mr72ckk5fmzk6uxf6nhowyosaubrt6y5mq:
       announcement:
         nickname: foo
         anonymous-storage-FURL:
 pb://u33m4y7klhz3bypswqkozwetvabelhxt@tcp:10.1.2.3:51298/eiu2i7p6d6mm4ihmss7ieou5hac3wn6b
       connection_types:
         tcp:
           handler: tcp
 }}}

 but it felt too verbose. Without the separate `announcement:` section, the
 code should probably first check for `connection_types`, then overlay the
 announcement with all remaining keys:

 {{{
   ct = servers[serverid].get("connection_types",
 global["connection_types"])
   ann = announcement.copy()
   for key,value in servers[serverid]:
     if key not in ["connection_types"]:
       ann[key] = value
 }}}

 The introducer announcement includes a lot of additional data that isn't
 very useful to override: version strings, sequence numbers, a "nonce" that
 I don't even remember the purpose of. While it might be useful to allow
 the YAML file to override *any* possible announcement key, it might be
 better to throw an error if we see any unknown key, or a key that doesn't
 make sense to override.

 Finally, note that the Introducer was designed to let you publish more
 than just storage servers: each announcement includes a `service-name`
 field, and we originally planned to advertise Helpers, repairers, and even
 extra introducers through these announcements. The client only subscribes
 to `storage`, and we don't have any code to publish anything else. For
 brevity, we'll probably declare that the YAML file's `servers:` section is
 only referring to *storage* servers. If some future version of tahoe adds
 a new thing that we really want to call a "server" but which isn't a
 *storage* server, we'll have to come up with a new YAML section to
 describe it. Or, we could change the YAML section from `servers:` to
 `storage:` or `storage-servers:` or something.

 == Accidentally Introducing A New Config-File Format ==

 We know that we'd like to eventually move away from the INI-format
 `tahoe.cfg` file, for two reasons. The first is that it doesn't really
 support structured data (the original multi-introducer syntax would have
 needed lines like `introducer.furl1=` and `introducer.furl2=`, since INI
 doesn't have lists). The second is that it'd be nice to safely machine-
 edit the config file.

 In Petmail, I've been experimenting with keeping *all* state in a SQLite
 database (config, runtime updates, user messages, everything). Config
 changes are done with CLI tools, or a web-based wizard-thing that gets to
 write changes as you click the boxes. While it's sad to not be able to
 point emacs at the file, you win transactional changes, schema
 enforcement, and no data-killing race conditions between user edits and
 program changes.

 We talked a week or two ago about a `tahoe configure` command which could
 ask you some questions ("should we offer storage? you have X GB, how much
 should I use?") walk you through the setup process, examine your network
 situation ("it looks like you're behind NAT. Do you know what port-
 forwarding is? Let me test your setup. Great, we can be reached from the
 outside, enabling server."), sanity-check the config, then start the node.
 To re-run this tool inside a running node, we'd need the ability to safely
 modify the config file after startup.

 So I just want to avoid accidentally moving us to a YAML-based config
 file, if maybe we should be deliberately moving to an SQLite-based one.
 This YAML syntax is kind of nice, I can imagine moving other tahoe.cfg
 items over to it (eventually deprecating tahoe.cfg and automatically
 converting it to YAML at startup). But if we're going to do that, we
 should probably be intentional about it.

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2788#comment:7>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage