[tahoe-dev] accounting: reachability-graph

Sun Jun 10 19:48:15 UTC 2012

I'm planning out the next step of the Accounting work, with an eye
towards how users are going to control and monitor everything. I wanted
to summarize a scheme that came out of discussions with Zooko and
David-Sarah back at the first Tahoe Summit, which are inching closer to
implementability.

## Outputs

The basic decision that each client node needs to make is which servers
to send shares to. You only have so much bandwidth, and (if you're
paying for storage space) only so much money/credit/etc to spend on
space, so you want to avoid sending shares to unreliable servers.
Whatever the criteria, the output of this function is a list of servers
that should be included in the share-placement algorithm.

The corresponding decision that server nodes need to make is which
clients to accept shares from. Some sort of accounting relationship
exists, in which the server can keep track of how much space each client
is using, and then cut them off when they exceed what they've paid for
(or provided in trade, etc). Unknown clients, with whom no agreement has
been made, need to be excluded to prevent uncontrolled consumption.

## Basic (manual) UI

We can always offer manual control over these two lists. I'm thinking of
holding the raw data in a SQLite database (NODEDIR/private/grid.db), or
perhaps some human-readable (but machine-editable) format, then offering
commands like:

 tahoe admin set-server SERVERID (use,ignore,require)
 tahoe admin set-client CLIENTID (allow-read,allow-write,deny)

I'm also thinking that the WUI can show status (which clients/servers
are known, which are accepted, how much space clients are using, etc),
but to keep it read-only for now. Later, maybe, we can enhance it to
safely let folks click on web buttons to configure their node, but we
need some better web-security infrastructure before we can do that
safely. (One task is to split the control port from the download port,
so HTML+JS in downloaded files can't take over the control port. We also
need some non-cookie secure-token-based web login mechanism.. I have a
prototype in https://github.com/warner/toolbed that seems promising).

## Common Use Patterns

But manually asking all grid members to approve all new client/server
nodes is a drag, and a bunch will probably forget. So Tahoe should
provide easy support for at least two common use cases:

* friendnet: all members run client+server nodes, all clients can+will
  use all servers. Any member can invite a new member to join the grid,
  and they'll get full access to the whole grid, but the invitation is
  public so people know who to blame if the new member is a jerk.

* AllMyData: nodes are split into two categories: client-only
  (customers) and server-only (company-run storage). Customers should
  not send shares to other customers (e.g. client nodes that are
  misconfigured to provide storage service).

In both cases, neither clients nor servers should require
reconfiguration to accommodate new nodes being added to the grid.

## Invitation UI

In these two use cases, when adding a new node to an existing grid, the
primary action is for two specific nodes to get connected. Some node
Alice (who is already a member of the grid) establishes a connection
with some other node Bob (who is not yet a member). For the friendnet,
Alice is a full member, and Bob is the new guy. For the AllMyData case,
Alice is a storage-server-like "Storage Manager" node hosted by the
company, and Bob is a new customer.

My plan is to build an Invitation protocol, using a baked-in relay
server (e.g. relay.tahoe-lafs.org), through which clients can find each
other using only short identifiers. Then, Alice can do:

 tahoe admin invite [(--storage-only,--client-only)] Bob

and gets back an "Invitation Code" like "iqa3wihm2bwtpewpepohz644kg4".
(the "Bob" argument is Alice's pet name for Bob, and can be used later
to reference the node that accepts the invitation). She gives this to
Bob (via email, IM, postcard, carrier pigeon). Bob then accepts the
invitation (using "Alice" as his pet name for Alice) with:

 tahoe admin accept Alice iqa3wihm2bwtpewpepohz644kg4

In the AllMyData case, the Storage Manager node (which is a regular
storage server) would be instructed to emit --client-only invitations to
new customers.

Note: eventually this is intended to replace the centralized Introducer.
As long as all nodes can connect enough to create a broadcast channel,
we can set up both network reachability data (IP address, port number)
and storage-authority information with the same mechanism. When this
works, you'll create a tahoe node with "tahoe create-node; tahoe start;
tahoe admin accept", instead of needing somebody to create an Introducer
and give you the introducer.furl.

I'll be writing another email about the invitation protocol. The
important aspect for this discussion is that it results in both nodes
acquiring each other's public key. In this case, Alice will get a
(petname=Bob, key=XYZ, what-I-invited-them-to-do=storage/client/both)
record, and Bob gets a (petname=Alice, key=ABC,
what-they-invited-me-to-do=storage/client/both) record.

## Recommendation Records, Reachability Graphs

Here's a proposal to make these two use cases work smoothly:

* there are two types of Recommendation Records: "send-shares-to" and
  "accept-shares-from"

* a full Invitation results in each side publishing two Recommendation
  Records (total of four):
  - Alice says "I recommend all clients send shares to Bob"
  - Alice says "I recommend all servers accept shares from Bob"
  - Bob says "I recommend all clients send shares to Alice"
  - Bob says "I recommend all servers accept shares from Alice"
 * a --client-only invitation results in one record each:
   - Alice says "I recommend all servers accept shares from Bob"
   - Bob says "I recommend all clients send shares to Alice"
 * nodes are identified by their pubkey

* all Recommendation Records are broadcast to all clients and servers
* all Tahoe nodes take the set of broadcast records and construct two
  directed graphs: one for send-shares-to, one for accept-shares-from.
  Each graph node is a Tahoe node, each graph edge is a Recommendation
  Record of the corresponding type.
* the upload/download engine (specifically the StorageFarmBroker) will,
  unless overridden by the user, use all storage nodes that are
  reachable through the send-shares-to graph:
    reachable = set()
    newnodes = set([me])
    while newnodes:
      fromnode = newnodes.pop()
      reachable.add(fromnode)
      for tonode in fromnode.edges:
        if tonode not in reachable:
          newnodes.add(tonode)
* the storage server will (unless overridden) accept shares from all
  clients whose nodes are reachable through the accept-shares-from graph

The upshot is that, when Alice invites Bob to join her friendnet, Bob
will use (and will be allowed to use) any servers that Alice already
uses. If Alice is an AllMyData "storage manager" node and issues a
--client-only invitation instead, Bob will get to use all the storage
nodes that Alice can use (which would be "all the AllMyData servers"),
but won't acquire any send-shares-to power, keeping him isolated as a
client-only node.

We'll have viewer/status tools on the web-ui that show the reachability
graphs and let you see what path connects you with other nodes (using
d3.js like the example on http://bl.ocks.org/1377729).

## Open Issues

* should Recommendation Records include petnames? This would enable
  nodes to be identified with petname-chains like "your friend Alice's
  friend Bob's friend Carol". OTOH, petnames are sometimes considered
  secret. Having a separate property for secret notes might be useful.

* this scheme defines a grid as "anyone reachable through Invitations"
  rather than having some explicit (cryptographic) "Grid ID". That makes
  it hard to build filecaps that include a grid-id, so they can be
  meaningful outside their home grid. Is this a problem? Is there some
  other scheme we could use? (maybe create a new grid keypair, share the
  private key among all clients in the grid).

* we must decide about revocability of the Recommendations. From a UI
  point of view, revoking access may transitively disrupt lots of other
  nodes, so it needs some feedback to indicate the consequences. There
  will be cases where you want to kick out a node but retain their
  invitees, so that needs to be easy. From a code POV, we must decide
  whether Recommendations expire (and must be renewed), or if
  replacements are good enough (assuming that a gossip-based broadcast
  is hard to stifle), and build in some sequence-number mechanism.

* publishing a set of Records via the #466 signed-Announcement system
  (see my other email on "grid-control" announcements) would not be as
  fine-grained as sending individual records. This means slightly more
  traffic in a selective-flood/gossip protocol. Is that ok?

* once operators exercise their power to selectively deny server- or
  client- access, the use of graph-reachability makes it possible for
  weird situtations to exist, like two different clients having access
  to overlapping-but-not-identical sets of servers. This is bad for file
  storage and makes filecaps even more weirdly scoped. Simple bi-cliques
  are less weird.

let me know your thoughts!
 -Brian