wiki:Accounting

(This was copied from a LeastAuthority? wiki page, summarizing steps and desire to get cloud-backend things into master .. mostly related directly to the S4 service, but is fairly general)

# background

We wish to get the 2237-cloud-backend branch onto master. The cloud-backend branch was built off of a minimal Accounting prototype (warner/accounting-2) so that the new "lease-db" could have somewhere to hang.

## currently

As far as leases and accounting go, 2237 / accounting-3 have the following design:

  • Accountant hold accounts. There are just 2 accounts and no way (yet) to create or manage them:
  • "starter" account
  • "anonymous" account
  • an Account object now implments RIStorageServer (formerly implemented by StorageServer?). So from a client perspective, nothing changes: they contact a fURL that implements the RIStorageServer API. During client setup, that fURL is now pointed at the anonymous Account instance (instead of the StorageServer? instance).
  • leases are stored in a local sqlite database
  • new "starter" leases are created for anything which lacks a lease
  • all the code that reads/writes leases to the shares themselves is gone
  • the Accountant and Account objects have access to the leasedb
  • the Account object manages leases
  • an AccountingCrawler? replaces the LeaseCheckingCrawler?. This new crawler will:
    • Remove leases that are past their expiration time.
    • Delete objects containing unleased shares.
    • Discover shares that have been manually added to storage.
    • Discover shares that are present when a storage server is upgraded from a pre-leasedb version, and give them "starter leases".
    • Recover from a situation where the leasedb is lost or detectably corrupted. This is handled in the same way as upgrading.
    • Detect shares that have unexpectedly disappeared from storage.

## problems

There are a few problems with this:

### database durability, ops burden

  • ultimately, cloud-backend uses "not local disk" for storage
  • ...but the leasedb is "a thing that should be backed up", but isn't stored in the "not local disk" storage. That is, if we're using an S3 thing, it would be best to have the lease-db in S3 (or AWS database)
  • this is "okay" for now, because the lease-db is built to recover from "zero leases". Basically:
    • if there's no lease for a share, add a "starter" one
    • eventually (after the default-30-days expiry) we will either learn which clients care about that share (because they renewed their leases) or the starter lease expires (and we delete the share)
    • ...but this means we can't use the lease-db to definitely answer the question "how much space is Alice using" if our lease-db is younger than "default-expiry-time".

### non-async APIs

  • the current LeaseDB API is synchronous. This is "sort of fine if you squint" for a local sqlite database (although still not correct, because a database read can take an arbitrary amount of time). Ideally the LeaseDB API should be async.
  • e.g. by using twisted.enterprise.adbapi (or similar "general-pupose Twisted database API" -- is there a better one?)

### "database as cache"

  • currently, the database is completely throw-away
  • that may limit future designs (i.e. we can't put anything "permanent" in the leasedb)
  • is this a problem? (if so, is it a problem we *can't* easily fix later? i.e. if and when we want to add a feature that needs durable lease-db data?)
  • I *think* we decided in last Nuts&Bolts that treating the database as "mostly disposible" is okay

## the future

### Remote API Design

  • obviously, to support "not yet upgraded" clients, the "anonymous-storage-FURL" API can't change. That is, it must implement RIStorageServer.
  • but maybe having Account directly implement that isn't great.
  • Consider this:
  • we want introducers to go away
  • thus, "tahoe storage servers" need to stay (as "the" smart thing)
  • what if we call these "tahoe servers" instead, and they provide services
  • one of those services is "storage"
  • (another service might be e.g. a "membrane" that provides temporary access to a read-cap)
  • (another might be a payment API of some kind, to pay for "storage" or other services)
  • ...so I think a better API might be this:
  • Account just provides a "services" API
  • "storage" is one of those services (the only one we provide right now)
  • ...and "storage" implements RIStorageServer
  • not much changes, except the shape of the code: during client setup, we get the "anonymous-storage-FURL" from the "storage" service of the anonymous Account (instead of it just *being* the Account directly).

### Backing up the Databse

  • one thing suggested was to just periodically (e.g. every hour) back up the sqlite database to "whatever storage the backend is using". That is, a "storage backend" has an API to backup (and restore) an sqlite file.
  • then can "mostly" still answer the "how much space is Alice using" stuff (except for the possibility that shares were added by Alice after the last database backup)
  • ...but you get fast, local queries most of the time for other things
  • (I still think we should make the LeaseDB API async even if we're "always" using sqlite)
Last modified at 2019-06-12T20:18:37Z Last modified on 2019-06-12T20:18:37Z