[tahoe-lafs-trac-stream] [Tahoe-LAFS] #1310: separate "gateway state directory" from "client state directory"

Mon Sep 22 21:40:20 UTC 2014

#1310: separate "gateway state directory" from "client state directory"
-----------------------------------+-----------------------
     Reporter:  zooko              |      Owner:  warner
         Type:  defect             |     Status:  reopened
     Priority:  major              |  Milestone:  undecided
    Component:  code-frontend-cli  |    Version:  1.8.1
   Resolution:                     |   Keywords:  usability
Launchpad Bug:                     |
-----------------------------------+-----------------------

Comment (by warner):

 (circling back to this ticket thanks to zooko's link from #2045, which is
 about larger-scale changes to the node's and code's directory layout)

 Rereading zooko's initial issue, I found myself tempted to yell out "don't
 do that!". I guess I've always optimized tahoe's frontend- and setup-
 management tools for the common case of a single "gateway" per
 (user*computer) tuple. I really want the instructions to be as simple as
 "tahoe create; tahoe start; tahoe webopen". I don't want to complicate
 that for the sake of the less-common use case of multiple
 nodes/gateways/clients/whatevers.

 Partly that indicates a lack of universality in our design (which we've
 always known about, and always regretted, but also know better than to try
 and fix, because it's very hard, certainly distracting, probably
 confusing, and slightly impossible). There's no "one true grid" (#2009).
 Grids can't be too big (#235, #444), increasing the demand for using
 multiple ones, in particular if you want to use tahoe to share files with
 other people. There are no "grid identifiers" in filecaps (#403), so if
 you want to use multiple distinct grids, you need multiple distinct client
 nodes ("gateways" in zooko's lexicon) and must be careful to give the
 right filecap to the right node (WAPI port / gateway process).

 A side-note on terminology mismatches: I think (and talk) about tahoe in
 terms of three pieces:

 * 1: frontends (CLI scripts, web browsers, FTP/SFTP clients), all
   talking over a network connection (WAPI or others) to the client node
 * 2: client nodes, which respond to WAPI requests, perform the
   upload/download/encode/decode algorithms and make connections to server
 nodes
 * 3: server nodes, which are (so far) agnostic about file-encoding
   formats and just respond to PUT/GET-share requests from client nodes

 I think Zooko thinks/talks in the same three pieces but with different
 names:

 * 1: clients (CLI scripts, web browsers, FTP/SFTP clients)
 * 2: gateways
 * 3: servers

 A related angle is the imperfect distinction between the functions
 performed by pieces 1 and 2. `tahoe backup` is a good example: this is
 currently a CLI command, but I feel that it should really be moved into
 the client node (#1018). Backup is more of an ongoing process than a one-
 off action (#643). A one-shot CLI command needs to be run from cron to
 make it into a process, and then it doesn't have enough information to
 coordinate with other (overlapping) runs (#2062, #2053). I'd like to have
 backups be managed through some sort of control panel (#1588, #1587),
 where you can express your priorities and preferences about what you want
 to be backed up and how much network/CPU it's allowed to consume, and then
 the backup agent handles the rest. This control panel should also be a
 place to check in on the process, especially for progress reports during
 the long initial upload.

 There are two big blockers for this sort of long-running agent. The first
 is how/whether to split this from the long-running code that knows how to
 upload/download files. We had a good [wiki:Summit2Day1#AgentGatewaysplit
 discussion] about this at the [wiki:Summit2011 2011 summit]. Zooko's
 mental model, which uses the word "gateway", helps make this a bit more
 clear: my desired backup-manager process would live in an "Agent", and the
 upload/download stuff would live in the "Gateway", and maybe the Agent
 would use te Gateway but not vice-versa. Depending upon the value of
 caching server connections and generally having a long-term relationship
 with servers (tracking uptime/speed/reliability), it might even make sense
 for the Gateway functionality to *not* live in a long-term process, and
 instead be a short-lifetime library that gets loaded on demand (imagine if
 "tahoe put" were standalone, maybe learning about cached server
 information from a sqlite database, but establishing its own server
 connections as necessary).

 The other is how to safely talk to this agent, honoring our objcap no-
 ambient-authority style: some sort of restricted-access web-based control
 panel (#674, wiki:Summit2Day2#ControlPanel) which I've been prototyping
 externally in my "toolbed" and "petmail" projects (but it's very JS-
 heavy).

 Anyways, that was a long diversion away from the main point: the use of a
 single NODEDIR to manage the states and configurations of all these pieces
 (client-ish stuff, agent-ish stuff, gateway-ish stuff, heck even server-
 ish stuff) is ideal for one-grid cases, and confusing for multiple-grid
 cases.

 I'm warming slightly to the `--cli-directory=` idea. Maybe by splitting
 these different bits of functionality into separate subdirs, putting all
 of them in the single NODEDIR by default, but making it clear that e.g.
 CLI commands only touch stuff in NODEDIR/cli/* . Then make it possible to
 either override the top-level `--nodedir=`, or a CLI-functionality-
 specific `--cli-directory=`.

 It's worth remembering the overlap between these components, though. The
 gateway writes out a `node.url` file, the frontend commands read it.
 Control panels will involve access keys being read or written to agent-
 accessible databases. We might be able to statically construct enough of
 this that we don't need to think about ongoing config-directory-based
 communication between components after initial `tahoe create`, but maybe
 not.

 But I still think it may be easier to tell people "don't do that" and try
 to make the one-grid-per-user case work better.

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1310#comment:14>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage