[tahoe-lafs-trac-stream] [Tahoe-LAFS] #2788: add connection-policy configuration (was: cache server information, use later (with overrides))
Tahoe-LAFS
trac at tahoe-lafs.org
Tue May 17 22:05:53 UTC 2016
#2788: add connection-policy configuration
------------------------------+-----------------------
Reporter: warner | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: undecided
Component: code-network | Version: 1.11.0
Resolution: | Keywords:
Launchpad Bug: |
------------------------------+-----------------------
Comment (by warner):
Updating ticket title: this ticket is now the right place for
connections.yaml stuff, and using the cache is just a minor part of that
effort.
We spent most of today's devchat exploring syntax options for the
"connection policy configuration file", aka connections.yaml.
We're still not entirely sure what to call it. `connections.yaml`?
`policy.yaml`? `config.yaml`? We're also not sure about using YAML at all.
We did decide that we like the nested-dictionary aspect of YAML, but we're
leery of the gotchas that YAML offers (like strings vs identifiers vs
booleans vs numbers, where some things are quoted and others are not). I'd
still like to evaluate TOML for this, but I don't know if it does nested
dictionaries nicely.
But we did come up with syntax ideas that seem pretty good.
== Introducers ==
First off: introducers. The `introducers:` section would map introducer
nickname to FURL:
{{{
introducers:
intro_nick1:
furl:
pb://u33m4y7klhz3bypswqkozwetvabelhxt@tor:abcdef.onion:80/eiu2i7p6d6mm4ihmss7ieou5hac3wn6b
intro_nick2:
furl:
pb://u33m4y7klhz3bypswqkozwetvabelhxt@tcp:example.org:12345/eiu2i7p6d6mm4ihmss7ieou5hac3wn6b
}}}
The nickname is used to name the cache file. The introducer.furl from
`tahoe.cfg` is also used (if any), and it gets some default-ish nickname
like `introducer`.
If someone really convinces us, we could add connection-policy clauses
(described below). But in the devchat we couldn't think of a use case
where you'd want different policies for connecting to different
introducers. And you can already influence the connection by just editing
the FURL to change the hint types.
We also agreed that having different connection policies for the servers
you learned from one introducer versus another was not likely to be a good
idea (as in, "every server I learn from introducer A should get Tor-ified,
but I'll make direct TCP connections to servers I meet through introducer
B). Especially when dawuud pointed out that you might learn about the same
storage server from two different introducers, in which case you'd have to
figure out how to reconcile their two policies. But if you really wanted
to do this, you could have another policy stanza under each introducer
section, somehow named to make it clear that you're talking about servers
learned *through* the introducer, rather than how you talk to the
introducer itself.
"Introducerless mode" means `tahoe.cfg` has an empty `introducer.furl=`,
and there is no `introducers:` section in the YAML file. "Multi-introducer
mode" means there are two or more introducers among tahoe.cfg and
`introducers:`.
== Connection Policy ==
Now the global connection policy section. This tells Tahoe what to do when
asked to make connections (to Foolscap connection hints now, later to HTTP
things). For each type of connection hint, the policy file specifies a
"handler" and some arguments. The default values are something like:
{{{
global:
connection_types:
tcp:
handler: tcp
tor:
handler: tor
launch_my_own_tor: true
}}}
(We aren't sure this should be in a "global" section.. maybe the
"connection_types" string should be at the top level. Also we aren't sure
that it should be spelled "connection_types": maybe "connections" or
"foolscap_connections" or something shorter)
The `tcp:` section controls what happens when a FURL wants you to connect
via plain TCP (because the connection hint looks like
`tcp:example.org:1234`). The `handler: tcp` goes to a lookup table that
tells Tahoe to use a foolscap connection handler that uses plain TCP. In
the future, `handler: xyz` could ask the setuptools/twisted/zope.interface
plugin system for a module that has registered itself with the name "xyz"
(and to handle some specific Interface).
The `tor:` section handles connection hints like `tor:abcxyz.onion:80`.
The `tor` handler would probably live in the Tahoe source tree (for now),
and would try to import txtorcon (and log+ignore if it couldn't be
imported). The keys under `tor:` that aren't "handler" are passed as
keyword arguments into the plugin. In this case, we're telling the tor-
for-foolscap plugin that it's expected to launch it's own copy of the Tor
daemon. Other options would be like `control_endpoint:
tcp:localhost:9051`, to use a pre-running system Tor daemon.
(question: we originally discussed:
{{{
tor:
handler: tor
args:
control_endpoint: tcp:localhost:9051
}}}
with the extra `args:` because we originally thought `handler:` would be a
fully-qualified import+funcname string, like what setuptools entrypoint
specifications do. In that case, `args` would be passed as keyword
arguments to the referenced callable. But with `handler:` being an index
instead, we could probably be a bit more flexible about how arguments are
passed. Also having the args at the same level as `handler:` is a bit
cleaner. Maybe we should just pass the entire dictionary into the plugin
and let it ignore `handler:` itself)
For nodes that are configured to use Tor instead of plain TCP, there would
be two nearly-identical sections, one for `tcp:` and one for `tor:`, both
specifying the "tor" handler:
{{{
global:
connection_types:
tcp:
handler: tor
launch_my_own_tor: true
tor:
handler: tor
launch_my_own_tor: true
}}}
Note that any connection hint type *not* listed here would be ignored, to
avoid the possibility of a new future connection type accidentally
violating anonymity. Or maybe we only do that if `anonymous=1` is set.
We can also imagine:
{{{
tcp:
handler: socks
socks_endpoint: tcp:localhost:9050
}}}
== Server Overrides and New Servers ==
The `servers:` clause would serve two purposes. The first is to provide
data on brand new (synthetic) servers, ones that might not be advertised
through any introducer. This would mostly be used for the "introducerless"
mode, but could also be used augment the regular introducer-based grid
with a private server for just your own client.
The second is to modify data about introducer-advertised servers,
generally their connection policy and/or FURL. The following example would
allow a private TCP-based server on a mostly-Tor grid (where the global
policy says tcp should go to tor).
{{{
servers: # storage servers only, not hypothetical other kinds of servers
"v0-c2ng2pbrmxmlwpijn3mr72ckk5fmzk6uxf6nhowyosaubrt6y5mq": # does this
need quotes?
nickname: foo # mainly for brand-new-servers, not for overriding
anonymous-storage-FURL:
pb://u33m4y7klhz3bypswqkozwetvabelhxt@tcp:10.1.2.3:51298/eiu2i7p6d6mm4ihmss7ieou5hac3wn6b
permutation-seed-base32: w2hqnbaa25yw4qgcvghl5psa3srpfgw3 # maybe
default to pubkey
connection_types: # replaces global mapping
tcp:
handler: tcp
}}}
The dictionary key is the server identity key (an !Ed25519 public
verifying key, currently used as the main index for introducer
announcements). (we need to experiment to see if YAML can have hyphens or
non-identifier characters in dictionary keys; we'd rather not put quotes
around the string if we can avoid it).
Many of the other keys would replace or augment the data heard from the
introducer. "nickname" is published by servers themselves, but could be
overridden locally, in which case it would behave more like a pet name.
`anonymous-storage-FURL` is the current way that clients connect to
storage servers, although that will change when #666 Accounting happens
(probably to `accounting-storage-FURL` or similar), and `storage-URL` will
happen when we move to an HTTP-based storage protocol. `permutation-seed-
base32` is important for compatibility of share placement for older
servers, but I think newer servers default to using the !Ed25519 pubkey
for this, so it could maybe be omitted for synthetic servers.
And then the `connection_types` section would control the connection
policy. The idea is that each server (the
`allmydata.storage_client.NativeStorageServer` instance) will use the
connection policy from the `servers[$SERVERID][connection_types]` dict if
that's present, else it will fall back to the `global[connection_types]`
config.
We must be careful to prevent servers from publishing `connection_types:`
themselves. We originally thought of putting the server-provided keys in
one place, and the locally-specified keys in another, like:
{{{
servers:
v0-c2ng2pbrmxmlwpijn3mr72ckk5fmzk6uxf6nhowyosaubrt6y5mq:
announcement:
nickname: foo
anonymous-storage-FURL:
pb://u33m4y7klhz3bypswqkozwetvabelhxt@tcp:10.1.2.3:51298/eiu2i7p6d6mm4ihmss7ieou5hac3wn6b
connection_types:
tcp:
handler: tcp
}}}
but it felt too verbose. Without the separate `announcement:` section, the
code should probably first check for `connection_types`, then overlay the
announcement with all remaining keys:
{{{
ct = servers[serverid].get("connection_types",
global["connection_types"])
ann = announcement.copy()
for key,value in servers[serverid]:
if key not in ["connection_types"]:
ann[key] = value
}}}
The introducer announcement includes a lot of additional data that isn't
very useful to override: version strings, sequence numbers, a "nonce" that
I don't even remember the purpose of. While it might be useful to allow
the YAML file to override *any* possible announcement key, it might be
better to throw an error if we see any unknown key, or a key that doesn't
make sense to override.
Finally, note that the Introducer was designed to let you publish more
than just storage servers: each announcement includes a `service-name`
field, and we originally planned to advertise Helpers, repairers, and even
extra introducers through these announcements. The client only subscribes
to `storage`, and we don't have any code to publish anything else. For
brevity, we'll probably declare that the YAML file's `servers:` section is
only referring to *storage* servers. If some future version of tahoe adds
a new thing that we really want to call a "server" but which isn't a
*storage* server, we'll have to come up with a new YAML section to
describe it. Or, we could change the YAML section from `servers:` to
`storage:` or `storage-servers:` or something.
== Accidentally Introducing A New Config-File Format ==
We know that we'd like to eventually move away from the INI-format
`tahoe.cfg` file, for two reasons. The first is that it doesn't really
support structured data (the original multi-introducer syntax would have
needed lines like `introducer.furl1=` and `introducer.furl2=`, since INI
doesn't have lists). The second is that it'd be nice to safely machine-
edit the config file.
In Petmail, I've been experimenting with keeping *all* state in a SQLite
database (config, runtime updates, user messages, everything). Config
changes are done with CLI tools, or a web-based wizard-thing that gets to
write changes as you click the boxes. While it's sad to not be able to
point emacs at the file, you win transactional changes, schema
enforcement, and no data-killing race conditions between user edits and
program changes.
We talked a week or two ago about a `tahoe configure` command which could
ask you some questions ("should we offer storage? you have X GB, how much
should I use?") walk you through the setup process, examine your network
situation ("it looks like you're behind NAT. Do you know what port-
forwarding is? Let me test your setup. Great, we can be reached from the
outside, enabling server."), sanity-check the config, then start the node.
To re-run this tool inside a running node, we'd need the ability to safely
modify the config file after startup.
So I just want to avoid accidentally moving us to a YAML-based config
file, if maybe we should be deliberately moving to an SQLite-based one.
This YAML syntax is kind of nice, I can imagine moving other tahoe.cfg
items over to it (eventually deprecating tahoe.cfg and automatically
converting it to YAML at startup). But if we're going to do that, we
should probably be intentional about it.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2788#comment:7>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list