Changes between Version 1 and Version 2 of QuotaManagement

2008-03-07T02:30:02Z (14 years ago)

document my current plan for shared-secret -based quotas


  • QuotaManagement

    v1 v2  
     2Quota management is about protecting the Storage Servers. Given a limited
     3amount of storage space, how should it be allocated between the users? The
     4admin of a server may give out space in exchange for money, for other storage
     5space, or out of friendship, but many use cases work better if the admin can
     6track and control how much space any given user can consume.
     8There are roughly three different approaches to this: secret-based,
     9private-key-based, and FURL-based. From certain points of view, these are all
     10equivalent: a private key is like a remote capability that can be used to
     11create attenuated capabilities on demand, which is kind of like an asymmetric
     12form of hash derivation. The differences consist of tradeoffs between CPU
     13usage, storage space, and online message exchanges: it is possible to reduce
     14the dependence upon a central server being available by using signed messages
     15and certificates that can be generated and verified offline, at the cost of
     16CPU space and complexity. Likewise the online check can be replaced by a big
     17table of shared secrets, at the cost of storage space.
     19All of these schemes use the same core design: each share kept on a Storage
     20Server is associated with a list of leases. Each lease has an "account
     21number" that refers to some particular user/account that we might like to
     22track. The storage servers have the ability to sum the size of all their
     23shares by account number, to report that account number 4 is consuming 2GB.
     24By adding these reports across all storage servers, we discover that account
     254 is using a total of 10GB. Some other mechanism is used to give a name
     26(Alice) to account 4. This mechanism might also be able to instruct the
     27storage servers to stop accepting new leases for that account until it
     28returns below-quota.
     30The schemes differ in the way that we decide which account number to put in
     31the lease. If accounts are in use, the client must be confined to use a
     32specific (authorized) account number: Bob should not be able to get leases
     33placed using Alice's account number, and Carol (who is not a valid user)
     34should not be able to place leases at all.
     36The main design goals we seek to attain here are:
     38 * optional central control over storage space consumed ("quotas") and
     39   ability to participate in the grid at all (storage authority). Grids
     40   which do not wish to maintain this level of control are not required
     41   to use Account Servers at all. Grids can also enable opt-in quota
     42   checking, in which clients are trusted to provide their correct
     43   account number, and to refrain from stealing each other's accounts.
     44 * when an account is added, the user should be able to use the grid
     45   immediately
     46 * when an account is removed, the user should eventually be prohibited
     47   from using the grid. It is ok if it takes a month for this revocation
     48   to take effect.
     49 * most functionality should continue uninterrupted even if a central Account
     50   Server falls offline
     52Secondary design goals (not all of which we can meet) are:
     54 * clients should be able to easily delegate their storage authority to
     55   someone else, like a trusted Helper or a Repair agent. These two agents
     56   may need to create leases in the user's name.
     57 * clients should be able to delegate storage authority for servers they
     58   haven't met yet. Specifically:
     59   * the Repairer may be creating leases on new storage servers that were
     60     added while the client was offline (otherwise the client would be
     61     repairing its own files).
     62   * one purpose of the Helper is to allow clients to be unaware of all
     63     storage servers. If this is the case, the client won't know which
     64     storage servers it should be delegating authority for.
     65 * Storage authority may need to be passed through the web API as a query
     66   parameter. The WUI (the human-facing side) may need a storage-authority
     67   mechanism as well, perhaps through cookies.
     68 * storage requirements should remain sensible. A large grid may have one
     69   million accounts: the storage servers may need to record a shared secret
     70   for each one, but it would be nicer if they didn't have to.
     71 * it should be possible to delegate limited authority. It would be nice if
     72   we could run Helpers on untrusted machines, but it the Helper gets to
     73   consume the full quotas of all clients who use it, then it must be trusted
     74   to not do that. If we could delegate just 2MB of storage authority, or
     75   authority that expired after an hour, we could use more machines for these
     76   services.
     78My current plan is to pursue the "secret-based" approach described here. The
     79other approaches are summarized in subsequent sections.
     81== Secret-based "storage authority" approach ==
     83In this scheme, each user has a master storage authority secret: just a
     84random string, either 128 or 256 bits long. They also have a unique (but
     85non-secret) "Account Number". In centrally-managed grids, these are both
     86created and stored by an Account Server, which uses sequential account
     87numbers. In friend-nets, there is no Account Server, account numbers are
     88either made unique by agreement or by making them large and random, and there
     89is more manual work involved to distribute the various secrets.
     91Each time a client performs a storage operation, it does so under the
     92auspices of a specific storage authority. The Tahoe node on Alice's computer
     93is running solely for the benefit of Alice, so all operations it performs
     94will use Alice's storage authority (i.e. all leases that it creates will
     95include Alice's account number). On the other hand, a shared Tahoe node
     96accessed through its web-API port may be working for a variety of users. This
     97node must be given the storage authority to use for each operation. The means
     98to do this is still under investigation: adding an account= query argument is
     99one approach, passing the information through cookies is another, each with
     100their own advantages and drawbacks.
     102Each time the client talks to a storage server, it computes a per
     103(account*SS) secret by hashing the master secret with the Storage Server's
     104nodeid (the "SSid"). It then prepends the account number, resulting in a
     105per-SS authority string that looks like 123-lmaypcuoh6c4l3icvvloo2656y. The
     106Storage Server has a function that takes this string and decides whether or
     107not the authority is valid, and if valid, which account number to use.
     109(the "123" account number is used as an index when communicating with the AS,
     110to avoid requiring a complete table of user-to-secret mappings. This might
     111not be the same account number that is used in the final lease, to allow the
     112creation of "temporary account numbers" that are attenuated in some way, like
     113a short validity period or limited to a certain number of bytes)
     115For friend-nets, this "authority-is-valid" function is implemented by a
     116simple static table lookup. The storage server has a file named
     117NODE/private/valid-accounts, that contains one line per account. Each line
     118looks like "123-lmaypcuoh6c4l3icvvloo2656y 123 Alice", and contains the
     119authority string, followed by the account number to use, followed by the
     120account's nickname. The client node has a function that creates one of these
     121lines for every known storage server, and the user can send each line to the
     122SS's admin and ask them to add it to their valid-accounts file. By doing so,
     123the SS's admin is granting that user the right to claim storage on the
     124account named "Alice".
     126For a centrally-managed grid, there is a special Account Server node which
     127manages these authorities. Each storage server is configured with a reference
     128to the AS by providing NODE/account-server.furl . The authority-is-valid
     129function works by telling the AS the (SSID,authority-string) pair for each
     130query. The AS responds with either a "no" or a "yes, account=123" answer.
     132To reduce network traffic and improve tolerance to AS downtime, the SS
     133maintains a cache of positive responses. The cache entries are aged out after
     134one month. Negative responses are not cached.
     136Furthermore, the AS gets to pre-emptively manipulate this cache. When the SS
     137connects to the AS, it makes its "valid-accounts-manager" object available to
     138it, and this manager object gives the AS complete control over the
     139valid-accounts table.
     141A user starts by creating a new account, using some AS-specific mechanism.
     142(in the case of the commercial grid, this uses a PHP script
     143that also accepts credit-card payments). The AS records the new user's
     144storage authority in a table, which is used to answer the subsequent
     145"authority-is-valid" queries from storage servers. It extracts the account
     146number from the authority string, looks up the corresponding table entry,
     147hashes the secret it finds there with the SSID, then compares the result to
     148the authority string.
     150When the AS creates a new account, it also creates authority strings for all
     151current storage servers, and uses its valid-accounts-manager connections to
     152push these strings into the SS caches. This improves availability: even if
     153the AS fell over and died a moment later, the new user would still be able to
     154use their storage authority for a month without problems.
     156If the AS deletes an account (because the user has stopped paying their
     157bills), the AS uses its valid-accounts-manager connection to delete the cache
     158entries for that account. This accomplishes fast revocation for all storage
     159servers that are currently online. Any SS which were offline at the time of
     160the account termination will continue to provide service for the rest of the
     161month. Other timings are possible: for example the SS might refresh its cache
     162after a day, but treat AS unavailability as meaning it should keep using the
     163previous answer.
     165=== creating attenuated authorities ===
     167Eventually we may want to take advantage of untrusted Helpers, by allowing
     168clients to create attenuated storage authority strings. The possessor of
     169these strings might be allowed to claim leases for a specific storage index,
     170or only for a certain number of bytes. The untrusted Helper might abuse this
     171authority, but the damage it can do is limited by the extra restrictions.
     173Likewise, the Repairer might get a "repair-cap" which contains enough
     174information to download and verify the plaintext, and enough authority to
     175upload new shares in the name of the original uploader. This repair-cap could
     176contain an authority string which can only be used to create shares for the
     177specific storage index. It might also be restricted to creating shares that
     178contain a specific root hash, to prevent the repairer from using the
     179authority to store its own data in the same slot (on new storage servers).
     181The shared-secret scheme is the least favorable for creating attenuate
     182authorities: it requires more work (and more network traffic) than DSA
     183private-key approaches. To provide this, the client contacts the AS and asks
     184it for an attenuated authority string: it specifies the conditions of use
     185(validity period, storage index restrictions, size limits, etc), and gets
     186back a new string and account number. The client uses this pair when
     187delegating authority to Helpers and Repairers, instead of their usual
     188(full-powered) pair.
     190The Account Server adds an entry to its authority table with the new number
     191and string. When storage servers eventually come asking about the validity of
     192derived strings, the AS will find this entry, read out the restrictions, and
     193respond to the SS with the (restrictions, real account number) pair. (one
     194might think of the "real account number" as a special kind of restriction:
     195the string grants the authority to consume space in account 123, possibly
     196with other restrictions).
     198The SS will enforce these restrictions. When the restriction involves total
     199storage space consumed, the SS will need to maintain a table that is indexed
     200by the authority string, counting bytes. This sort of restriction will be
     201much easier to manage if the authority includes a duration restriction,
     202because that will tend to limit the size of this table.
     204Clearly, the AS must be online and reachable to generate these attenuated
     205authorities. Likewise, either the AS must inform the SS about the strings and
     206restrictions at the time of creation (storage+network), or the AS must be
     207reachable by the SS when the strings are used (availability). A private-key
     208-based scheme would not suffer from this tradeoff.
     210The need for attenuated authorities is not fully established at this point,
     211so the relative simplicity of the shared-secret approach remains appealing
     212(i.e. the fact that private-key makes attenuation easier is not a string
     213motivation to go with DSA over shared-secret). Untrusted Helpers could
     214alternatively be required to establish leases under their own authority, in a
     215make-before-break handoff: the Helper uploads the shares and adds temporary
     216helper leases, the client is informed about the shares and establishes its
     217own leases, then the helper cancels its temporary leases. Likewise the
     218Repairer could maintain its own leases on behalf of offline clients, keeping
     219track of how much space it is consuming for whom, and reporting the quota
     220data to the accounting machinery (and perhaps simply refusing to repair
     221accounts which have gone over-quota). When the client comes back online and
     222syncs up with the Repairer, the client can establish its own leases on the
     223repairer-generated shares, allowing the repairier to drop the temporary ones.
     225These temporary-leases would add traffic but would remove the need to
     226delegate storage authority to the helper, removing some of the need for
     227attenuated authorities.
     229Another conceivable use for attenuated authority would be to give "drop-box"
     230access to other users: provide a write-cap (or a hypothetical append-cap) to
     231a directory, along with enough storage authority to use it. This would allow
     232Alice to safely grant non-account-holders the ability to send her files. The
     233current accounting mechanisms we are developing do not allow this:
     234non-account-holders can read files, but not add new ones.
     237== DSA private-key -based "membership card" approach ==
     239In this scheme, each user has a DSA private key, and a "membership card"
     240(signed by a central Account Server) that declares that the associated pubkey
     241is an authorized member of the grid. Clients then use that key to sign
     242messages that are sent to the storage servers. The SS will verify the
     243signatures before accepting lease requests. Attenuated authority is
     244implemented by certificate chains: the client creates a new private/public
     245key pair, uses their main key to sign a cert declaring that the new key has
     246certain (limited) powers, then gives the new privkey and certificate chain to
     247the delegate. The delegate uses the new privkey to sign request messages.
     249To handle revocation, the certificates need to have expiration dates, and get
     250replaced every once in a while.
     252This approach maximizes offline-ness: the Account Server is only known by its
     253public key, and almost never needs to be spoken to directly. It also
     254minimizes storage requirements: everything can be computed from the
     255certificates that are passed around. PK operations are somewhat expensive,
     256but this can be mitigated through more protocol work: creating a
     257limited-purpose shared secret on-demand that effectively caches the PK
     258verification check.
     261== FURL-based "managed introducer" approach ==
    1263Here's a basic plan for how to configure "managed introducers". The basic
    2264idea is that we have two types of grids: managed and unmanaged. The current