wiki:AccountingDesign

Version 6 (modified by warner, at 2008-06-30T22:40:52Z) (diff)

--

This is a place to share our work on the Accounting task.

Accounting

Tahoe-1.1 is lacking a significant feature: the ability for the admin of a storage server to keep track of how much disk space is being used by various clients. Friendnet operators can use this information to see how much their friends are using (and perhaps ask them to use less). Commercial grid operators can use this to bill customers according to how much space they use, or to limit them to a pre-defined plan ("1GB maximum"). (related tickets: #119)

Tahoe needs better handling of disk-full situations (related tickets: #390), but this is a server-wide issue, whereas Accounting is specifically per-user.

Potential Requirements

Here is a list of potential requirements, in no particular order of when (or if) we actually need them to be completed:

  1. only grant space to approved clients: "Larry the Leech" should not be able to upload files or cause existing files to be retained
  2. be able to answer the question "How much space is Bob using?" 2a. asking this question about a single server (friendnet) 2b. asking this question system-wide: summed across all storage servers (commercial grid)
  3. prior restraint: prevent Bob from consuming more than X bytes per server
  4. disable a previously-allowed account / revocation
  5. expiration: revoke permission after some amount of time, unless explicitly renewed
  6. delegation: if Bob has permission, he can grant some to Little Bobby Junior 6a. subdivision / resellers : commercial grid operator grants space to a business partner 6b. repair caps: clients delegate limited upload authority to a repairer 6c. renewal cap: clients delegate lease-renewal authority 6d. helper: clients enable the helper to upload files for them
  7. auditing: who owns this share, how did they get permission to upload it
  8. reconcilliation / garbage collection : which shares does Bob own?
  9. measure traffic: how many bytes did Bob upload or download (as opposed to how much is he currently storing)

Immediate Goals

We've established a smaller set of goals for the next few releases:

  • 1.2 (august 08):
    • be able to answer "how much does Bob store?"
    • deny service to Larry The Leech
    • enable accounting in both friendnet and commercial grids
    • enable accounting in webapi and helper interfaces
  • 1.3 (september?)
    • be able to enumerate Bob's shares, for reconcilliation
  • next after that
    • more generalized delegation

Design Overview

As touched upon in source:docs/proposed/accounts-pubkey.txt (and source:docs/proposed/accounts-introducer.txt), each share on a storage server is kept alive (in a garbage-collection sense) by one or more "leases", and each lease is assigned to a given "account"/"user"/"owner". The server has an imaginary "lease table" (imaginary in the snese that it is not actually implemented as a giant table: instead the data is broken up into more efficient/robust pieces). This two-dimensional lease table has Storage Index along one axis, and Account on the other, and each cell of the table represents a potential lease.

Each account-owner gets control over their column of the table: they can add leases to existing shares, upload new shares (which immediately acquire new leases), cancel their lease on a share (possibly causing the share to be garbage-collected), or get a list of all of their leases (for reconcilliation).

Some clients may be "super-powered", meaning that they may have the authority to affect more than one row of the table. It may be necessary to give a Repairer this sort of authority to let it keep files alive when the original uploading client is not participating in the maintenance process. POLA dictates that we try to avoid needing this sort of authority inflation, so superpower delegation is just a fallback plan.

The admin of each storage server decides their own policy by configuring their server with various certificates and public keys: fundamentally, storage authority originates with the server, and is delegated outwards to the clients. Clients are configured with certificates and private keys that allow them to use some portion of the server's authority.

Each time a client uploads a file (or otherwise makes use of storage authority), they must demonstrate their authority to each server, through a negotiation protocol. The client.upload() API will be modified to accept a new argument, tenatively named "cred=", that represents this authority. The webapi will also acquire such an argument, allowing the HTTP client to pass its authority to the webapi server so the server can perform the upload.

Design Pieces

Rough design tasks that need to be done.

  • Add cred= to upload API
    • client.upload() needs a cred= argument
    • the webapi PUT/POST commands need a cred= argument
    • the javascript-based webfront program (used by allmydata.com) needs cred
    • the human-oriented "wui" needs a way (cookies? sessions?) to express storage authority
  • Define how to configure clients with their storage authority
  • define how to create these credentials
    • certificate-signing tools
    • "tahoe sign" subcommand
  • define how to configure servers with their certificates
  • changes to Introduction
    • advertise accepted pubkeys in the storage-v2 announcements?
  • changes to peer selection
  • furlification process, persistence/optimization
  • label format: how should leases be labeled
  • usage-table management: databases, size totals, what to store in each lease
  • Usage/Aggregator? service
    • web interface
    • petname database / display

Design

Things that we've come to an agreement about.

Terminology

  • pubkey: enough data to securely verify a signature
  • pubkey identifier: enough data to securely identify a pubkey
  • pubkey hint: when trying to find a pubkey that validates a signature,

the pubkey hint provides enough data to reduce the search space to an acceptable level.

(Since we're planning to use ECDSA-192, public keys are short enough to use them directly as both pubkey identifiers and pubkey hints. But if we were using, say, RSA-2048, then we might instead want to use the SHA-256 hash of the pubkey as its identifier. If we are tight on space, we can use an arbitrarily short prefix of the ECDSA-192 public key as the pubkey hint).

Lease Labels

Each lease will be labeled with a single public key. This identifies who is responsible for the lease: which account should "pay" for the storage required by this share. The actual definition of "pay" will depend upon the server's policy: in most systems, simply being able to produce a total of the sizes of all shares with leases held by a given user will be enough to make decisions about that user (restrict to limit, pay-per-byte, nag-above-limit, whatever).

Certificate Chain

The v1 cert chain format: each element in the chain has three parts: the encoded certificate, the signature, and the pubkey hint. The encoded certificate has a number of fields that describe what is being delegated, but the most important is a pubkey identifier that indicates to whom this authority is being delegated. The fields we'll define for v1 are:

  • delegate-pubkey: (string) a pubkey identifier. The holder of the corresponding private key is hereby authorized to use the authority of the signer, as attenuated by the remainder of the fields in this certificate.
  • signer-gets-lease: (bool) if True, the signer of this certificate will be given a lease on the resulting shares. A privkey authorized by this chain will have control over a single full column of the lease table (all leases labeled with the signer's pubkey). In a full request chain (which contains a signed operation as well as the certificate chain), there must be exactly one True signer-gets-lease field, to make sure that there is exactly one lease on the resulting share.
  • other attenuations: TBD (things like until=, SI=, UEBhash=, operation=, max-size=)

Attachments (15)