This page is for notes and design considerations for the next version of tahoe's mutable files. NewCapDesign includes a lot of desired features, this page is about the backend layout that would make those features possible. * #217 contains a lot of the original notes. * #492 adds a ciphertext hash tree * #794 is about creating writecaps from passphrases * #795 is about append-only filecaps, #796 is closely related * #308 is about deep-traversal dircaps, which will require support from the underlying mutable files * [http://allmydata.org/pipermail/tahoe-dev/2009-July/002345.html this tahoe-dev message] and its neighbors have some good discussion about cap design for mutable files * [http://allmydata.org/pipermail/tahoe-dev/2009-September/002848.html The Elk Point design] is very interesting, and has not yet been transcribed into this wiki page. = Yay ECDSA = Once we have ECDSA (#331), we'll have a general-purpose signing primitive with short fields (K=kappa bits for the signing key, 2K for the verifying key, and 4K for the signature, with an expected K=128bits, so 16-byte signing keys, 32-byte verifying keys, and 64-byte signatures). Our current RSA-based signatures use 1216-*byte* signing keys, 292-byte verifying keys, and 256-byte signatures. The RSA fields are so large that we clearly cannot put them in the filecaps, so the encoding scheme requires them to be stored in the shares, encrypted and hashed as necessary. The DSA keys are short enough (in most cases) to put directly in the filecap, simplifying the design considerably. = Desired Features = * fewer roundtrips: mutable retrieve must currently fetch the pubkey (or encrypted privkey) from at least one server, which complicates the state machine and can add roundtrips when we guess incorrectly about a good amount of data to fetch on the initial read * offline attenuation: it should be possible to attenuate/diminish the writecap into the readcap without retrieving any shares, otherwise operations like adding a child dircap to a parent directory will be much slower (a roundtrip per child). * writecap -> readcap -> deep-traversal-cap -> verifycap . This requires some form of intermediate key scheme. * server-side share validation * one option is to get rid of the "write-enabler" shared secret and rely upon server validation exclusively. This would make share migration easier and remove one need for an encrypted channel (lease secrets would continue to need protection unless/until they too are replaced with signature verification). However, it would also increase server load. == Filecap Length == A likely security parameter K (=kappa) would be 96 or 128 bits, and most of the filecaps will be some multiple of K. Assuming a {{{tahoe:}}} prefix and no additional metadata, here's what various lengths of base62-encoded filecaps would look like: * 1*K: * 96 {{{tahoe:14efs6T5YNim0vDVV}}} * 128 {{{tahoe:4V2uIYVX0PcHu9fQrJ3GSH}}} * 2*K: * 192 {{{tahoe:072Og6e75IOP9ZZsbR1Twjs6X5xXJnBAF}}} * 256 {{{tahoe:fZeioazoWrO62reiAjzUAyV0uz3ssh6Hnanv8cKMClY}}} * 3*K: * 288 {{{tahoe:11DriaxD9nipA10ueBvv5uoMoehvxgPerpQiXyvMPeiUUdtf6}}} * 384 {{{tahoe:3a31SqUbf8fpWE1opRCT3coDhRqTU7bDU2AvC3RQJBu6ZNFhVscyxA9slYtPVT79x}}} Adding 2 metadata characters and a clear separator gives us: * 96: {{{tahoe:MW-14efs6T5YNim0vDVV}}} * 128: {{{tahoe:DW-4V2uIYVX0PcHu9fQrJ3GSH}}} * 192: {{{tahoe:MR-072Og6e75IOP9ZZsbR1Twjs6X5xXJnBAF}}} * 256: {{{tahoe:DR-fZeioazoWrO62reiAjzUAyV0uz3ssh6Hnanv8cKMClY}}} * 288: {{{tahoe:MR-11DriaxD9nipA10ueBvv5uoMoehvxgPerpQiXyvMPeiUUdtf6}}} * 384: {{{tahoe:MR-3a31SqUbf8fpWE1opRCT3coDhRqTU7bDU2AvC3RQJBu6ZNFhVscyxA9slYtPVT79x}}} [http://allmydata.org/trac/tahoe/ticket/217#comment:44 #217:c44] says that, if we don't need to prevent collisions, then we can use a K-bit hash for K-bit second-pre-image resistance. = Design Proposals = == Commonalities == * once we get the ciphertext, it gets segmented and erasure-coded in the same way as immutable files. Shares include a merkle tree over the share blocks, and a second one over the ciphertext segments. * we'd like to add a merkle tree over the plaintext, without reintroducing the partial-information-guessing attack that prompted us to remove it. This means encrypting the nodes of this merkle tree with a key derived from the readcap. * We'll continue to use the signing layout of the current mutable files: there will be a UEB that includes the root of the hash trees (and everything else in the share), which will be hashed to compute the "roothash" (which changes with each publish). A block of data that includes the roothash and a sequence number (as well as any data-encrypting salt) will be signed. * It might be good to make the layout a bit more extensible, like the way that immutable files have a dictionary-like structure for the UEB. * In general, the share will always contain a full copy of the pubkey, for the benefit of server-side validation. I don't think it matters whether the pubkey is stored inside or outside of the signed block, but it will probably make the upload-time share-verification code simpler to put it inside. * In general, the storage-index will be equal to the pubkey. If the pubkey is too long for this, the storage-index will be a sufficiently-long secure hash of the pubkey. The SI must be long enough to meet our collision-resistance criteria. == ECDSA, semi-private keys, no traversalcap == Zooko captured the current leading semi-private-key-using mutable file design nicely in the [http://allmydata.org/~zooko/lafs.pdf "StorageSS08" paper] (in Figure 3). The design is: * (1K) writecap = K-bit random string (perhaps derived from user-supplied material) (remember, K=kappa, probably 128bits) * (2K) readcap = 2*K-bit semiprivate key * verifycap = 2*K-bit public key * storage-index = truncated verifycap On each publish, a random salt is generated and stored in the share. The data is encrypted with H(salt, readcap) and the ciphertext stored in the share. We store the usual merkle trees. This provides offline attenuation and full server-side validation. It removes the need to pull a copy of the pubkey or encprivkey from just one of the servers (the salt must still be fetched, but it's small and lives in the signed block that must be fetched anyways). === add traversalcap === Like above, but create two levels of semiprivate keys instead of just one: * (1K) writecap = K-bit random string * (2K) readcap = 2*K-bit first semiprivate key * (2K) traversalcap = 2*K-bit second semiprivate key * verifycap = 2*K-bit public key * storage-index = truncated verifycap The dirnode encoding would use H(writecap) to protect the child writecaps, H(readcap) to protect the child readcaps, and H(traversapcap) to protect the child verifycap/traversalcaps. == ECDSA, no semi-private keys, no traversalcap == Without semi-private keys, we need something more complicated to protect the readkey: the only thing that can be mathematically derived from the writecap is the pubkey, and that can't be used to protect the data because it's public (and used by the server to validate shares). One approach is to use the current (discrete-log DSA) mutable file structure, and merely move the private key out of the share and into the writecap: * (1K) writecap = K-bit random string = privkey * (3K) readcap = H(writecap)[:K] + H(pubkey) * verifycap = H(pubkey) * storage-index = truncated verifycap In this case, the readcap/verifycap holder is obligated to fetch the pubkey from one of the shares, since they cannot derive it themselves. This preserves offline attenuation and server-side validation. The readcap grows to (1+2)*K : we can truncate the AES key since we only need K bits for K-bit confidentiality, but require 2*K bits on H(pubkey) to attain K-bit collision resistance. The verifycap is 2*K. === include pubkey in cap === Or, if the pubkey is short enough, include it in the cap rather than requiring the client to fetch a copy: * (1K) writecap = K-bit random string = privkey * (3K) readcap = H(writecap)[:K] + pubkey * verifycap = pubkey * storage-index = H(pubkey) I think ECDSA pubkeys are 2*K long, so this would not change the length of the readcaps. It would just simplify/speed-up the download process. If we could use shorter pubkeys, this design might give us slightly shorter keys. Alternately, if we could use shorter hashes, then the H(pubkey) design might give us slightly shorter keys. === add traversalcap === Since a secure pubkey identifier (either H(pubkey) or the original privkey) is present in all caps, it's easy to insert arbitrary intermediate levels. It doesn't even change the way the existing caps are used: * (1K) writecap = K-bit random string = privkey * (3K) readcap = H(writecap)[:K] + H(pubkey) * (3K) traversalcap: H(readcap)[:K] + H(pubkey) * verifycap = H(pubkey) * storage-index = truncated verifycap == Shorter readcaps == (oh, oops, ignore this part. HMACs using the readcap as key are vulnerable to manipulation by a collusion between Rose-the-readcap-holder and the storage servers, and could be used to cause another readcap-holder to see the wrong data. Nevermind.) To make the readcap shorter, we must give up something, like complete server-side validation and complete offline attenuation. * (1K) writecap = K-bit random string = privkey * (1K) readcap = H(writecap)[:K] * storage-index = H(readcap) * verifycap = storage-index + pubkey The readcap is used as an HMAC key, and the share contains (inside the signed block) an HMAC of the pubkey. The readcap is also hashed with the per-publish salt to form the AES key with which the actual data is encrypted. The writecap begets the readcap, and the readcap begets the storage-index, so both writers and readers will be able to find the shares, and writecaps can be attenuated into readcaps offline. Wally the writecap-holder can generate the pubkey himself and not use (or validate) the value stored in the share. But Rose the readcap-holder must first retrieve the (pubkey,HMAC) pair and validate them, then she can use the pubkey to validate the rest of the share. Wally can generate the verifycap offline, but Rose cannot, since she has to fetch the pubkey first. The verifycap must contain a copy of the pubkey (or its hash), because the storage-index is not usable to validate the pubkey (the HMAC doesn't help, because it is keyed on the readcap, which is unavailable to the Val the verifycap-holder). And it must contain a copy of the storage-index, because the pubkey is insufficient to generate it. The storage-index must be derived from the readcap, not the pubkey, because the pubkey is too long to get into the readcap, and Rose the readcap-holder must have some way of getting the storage-index. The server can check the signature against the embedded pubkey, but has no way to confirm that the embedded pubkey is correct, because the validatable binding between pubkey and storage-index is only available to Rose. You could copy the verifycap into the share, but there's no cryptographic binding between it and the storage index. You could put a copy of the storage-index in the signed block, but again that doesn't prove that the storage-index is the right one. Only a scheme in which the storage-index is securely derived from the pubkey will give the desired property. Another possibility is to have a 2K-long readcap and put K bits of a pubkey hash in it. That would look like: * (1K) writecap = K-bit random string = privkey * (1K) storage-index = H(pubkey)[:K] * (2K) readcap = H(writecap)[:K] + storage-index * verifycap = storage-index This "half-verifycap" approach restores full offline attenuation, and gives the server 1K bits of validation, but reduces Val the verifycap-holder's validation bits in half (from 2K to 1K). A full verifycap, H(pubkey), could be generated offline by Wally, or by Rose after fetching the pubkey. You still need the HMAC on the pubkey to give Rose 2K confidence that she's got the right pubkey: the storage-index only gives 1K.