"Grid Manager" agent: design ideas

Brian Warner warner at lothar.com
Thu May 11 19:16:48 UTC 2017


At the last few devchats, we've discussed what would be the next
most-useful step in the Accounting space, and we're kind of settling on
the idea of a "Grid Manager" agent. It would work something like this:

* Current grids are "unmanaged": clients use whatever storage servers
  the introducer tells them about, and storage servers accept shares
  from any client that can connect to them. The only access control is
  to keep the introducer FURL secret (or clients can override the
  introducer's announcements, to use additional servers).

* "Managed grids" are controlled by a Grid Manager, which is a special
  kind of node that decides which servers and which clients can be part
  of the grid. Whoever runs the Grid Manager is the Grid Manager Admin,
  and they make the decisions for everyone else.

* Both clients and servers are added to a managed grid by using a
  magic-wormhole -style "invitation code". This would be based on the
  work that Chris Wood and the LAE team are doing with GridSync. The CLI
  commands might be something like "tahoe-grid-manager invite client"
  and "tahoe grid accept".

* The invitation process exchanges public keys (Ed25519 verifying keys):
  the client/server learns the manager's pubkey, and the manager learns
  the client/server's node identity pubkey. In addition, the
  client/server learns the introducer.furl (or maybe multiple ones for
  redundancy).

* Clients are only going to pay attention to servers that are authorized
  by the manager. We have some options for this:

  1: The manager publishes a signed list of all blessed storage server
     IDs in a new kind of introducer announcement. Clients subscribe to
     this, and use it as a filter on the normal server announcements
     that they hear.
  2: The manager gives a signed certificate to each server, saying "to
     whom it may concern: server X is cool. love, The Manager". Servers
     include this certificate in their announcements, and clients only
     pay attention to the ones with a valid certificate.

* Servers only accept requests from authorized clients. We've got the
  same options.

* The grid manager might also publish recommended values for k/H/N,
  since it knows how big the grid is supposed to be.

* Payment: nothing in this deals with payment. The human who runs the
  Grid Manager may choose servers on the basis of cost or reliability,
  and they may charge clients for service, but nothing in the Grid
  Manager code or protocol knows about that. Servers might bill the grid
  admin at the end of the month, or ahead of time, but that will happen
  out-of-band for now.

* Usage/Quotas: to support servers deciding *how much* to bill the grid
  manager, and to let the grid manager know how much to bill each user,
  servers in this scheme *will* keep track of per-client usage, and will
  deliver a machine-readable report to the grid manager. The manager
  will aggregate these reports and display a per-client usage table to
  the admin (via some grid-manager UI, probably web-based).

* Price Lists: We might include something like a price list here, so the
  server->manager report could say "Alice is using 2GB, and I charge
  $0.03/GB*month", and then the grid manager can add all that up and say
  "hey admin, based on the price you told me, you should charge Alice
  $0.06". The actual billing would be out-of-band. The idea is to
  provide enough information to correlate the server's out-of-band bill
  to the manager with the manager's out-of-band bill to the client. But
  maybe we just show bytes everywhere and let the humans decide how to
  translate that into money.

Ideally, we'd provide a couple of pre-built Grid Manager modes for
common use cases:

* friendnet: somebody volunteers to run the Grid Manager, they're
  responsible for inviting the right people. They get reports of usage
  but no money ever trades hands. If they go away, somebody else starts
  a new manager and everyone switches over, hopefully keeping the same
  set of storage nodes.

* cloud-storage backend: for individuals who like Tahoe's encryption but
  want to use e.g. S3 for the backend. They can run a Grid Manager and
  give it their AWS credentials, and the manager will configure and
  launch an S3-backed storage server. The user has to pay their AWS
  bill, and they can get a report about how much space each client is
  using. (it's not clear whether the server would run in EC2 or run in
  the same process as the grid manager, or maybe even both).

* commercial grid provider (S4): someone like LAE or Matador could run
  the grid manager and add servers of their choice to it. Signing up
  with them would get you an invitation code for your clients. The
  reporting would give them enough information to know how much to bill
  you each month. This looks just like the previous case, except that
  you'd write more code around it to automate the billing.

* other needs could be handled by extending or wrapping the Grid Manager
  code

This "Grid Manager" approach is an alternative to some of the other
ideas we've discussed, hopefully easier to implement:

* clients read from some large "yellow pages" of servers, automatically
  choose servers on the basis of price and crowd-sourced reputation
  data, then make direct BTC/ETH/ZEC/etc payments to each.

* clients (i.e. their human admins) exchange invitations directly with
  (human admins of) servers, maybe with payment involved

The general idea is that reliable long-term storage wants to mostly use
the same set of servers for long periods of time (the set must be *able*
to change, as servers are retired, but we don't want it to change with
every upload). And scaling requires hierarchy: we want to *not* have
some kind of relationship between every client and every server in the
universe. So clients mostly only know about their grid manager, and the
grid manager choose a mostly-stable set of servers. All your uploads go
to that set. When the human that runs a client first sets it up, they
may look at a yellow pages of grid managers and choose one based upon
price and reputation, but then the (machine) client is explicitly told
which one to use.

There's still a bunch of design space to figure out:

* Publishing a signed list of authorized clients/servers via the
  introducer would reveal this data to more people than really need it.
  Client 1 doesn't need to know that client 2 is on the list. And we
  generally want the Introducer to be less powerful than it is now.

* We could gather an *encryption* key from each client/server, and then
  the grid manager could encrypt this list to just the folks who need
  it. Or the manager could publish a FURL for a subscription port: these
  nodes could connect to it, prove their identity by signing a
  challenge, then be allowed to fetch the current list.

* Servers need a way to deliver the "usage report" to the grid manager.
  This could go to a FURL published by the manager, and servers could
  sign each report. Or servers could advertise a FURL from which the
  report could be fetched, but then do something to keep it private.

* Should clients be able to fetch their own usage report from each
  server? Probably.. needs some more API.

* Revocation: when the grid manager decides to remove a server or
  client, how does everyone else find out? How quickly does it
  propagate, and how do connectivity failures (accidental or deliberate)
  affect it?

  1: If the introducer merely publishes a list (encrypted or not), with
     a sequence number (highest seqnum wins), then a failure of the
     introducer or the grid manager just freezes the membership until
     both are running again and a new list can go out (just like the
     current storage server announcements work). If the manager drops
     offline, but the introducer still retains the latest announcement,
     then clients and servers can be bounced and still get the
     authorization list.

  2: If servers connect to a manager FURL and subscribe to the client
     list, then an offline manager prevents bounced servers from
     learning the current list (unless we cache the list on the server,
     in which case we need to think about expiration)

  3: If the manager delivers certificates to clients/servers, we need to
     think about expiration and renewal, and we need a channel for those
     deliveries. The manager could publish a FURL, nodes could prove
     their identity with signed challenges, and then they could
     subscribe to get fresh certs. This would add another persistent TCP
     connection. Expiration time would dictate how long good usage
     continues once the manager goes down, also how long bad usage could
     continue if the manager were DoSsed offline, and finally how much
     network traffic is added for all the renewals.


This will also tie in to the "federated inter-grid access" scheme we're
working on. I'll write more in a later email, but the basic idea is that
all your uploads go to your local grid, but you can download files from
other grids. This might involve a "clearinghouse", where your grid
manager signs up for inter-grid access. Requests for "foreign" filecaps
would go through the manager, who would look up the remote gateway for
that filecap and fetch the erasure-decoded ciphertext for you (maybe
charging you an extra fee). Filecaps might be augmented with a grid-id
(like the area code on a phone number), or might be looked up in a big
table managed by the clearinghouse. More details later.


thoughts?
 -Brian


More information about the tahoe-dev mailing list