This page is for notes and design considerations for the next version of tahoe's mutable files. NewCapDesign includes a lot of desired features, this page is about the backend layout that would make those features possible. * #217 contains a lot of the original notes. * #492 adds a ciphertext hash tree * #794 is about creating writecaps from passphrases * #795 is about append-only filecaps, #796 is closely related * #308 is about deep-traversal dircaps, which will require support from the underlying mutable files * [http://allmydata.org/pipermail/tahoe-dev/2009-July/002345.html this tahoe-dev message] and its neighbors have some good discussion about cap design for mutable files * [http://allmydata.org/pipermail/tahoe-dev/2009-September/002848.html The Elk Point design] is very interesting, and has not yet been transcribed into this wiki page. = Parameters K and T = Below, K is the security level needed for an attack against confidentiality: such attacks will require p * 2^K^ work factor (product of machine size and time) for a probability of success p. T is the additional number of bits needed to take into account [http://allmydata.org/trac/tahoe/ticket/882#comment:6 multi-target attacks], so that attacks against integrity also require p * 2^K^ work factor for 2^T^ target files. = Yay ECDSA = Once we have ECDSA (#331), we'll have a general-purpose signing primitive with short fields (K=kappa bits for the signing key, 2K for the verifying key, and 4K for the signature, with an expected K=128bits, so 16-byte signing keys, 32-byte verifying keys, and 64-byte signatures). Our current RSA-based signatures use 1216-*byte* signing keys, 292-byte verifying keys, and 256-byte signatures. The RSA fields are so large that we clearly cannot put them in the filecaps, so the encoding scheme requires them to be stored in the shares, encrypted and hashed as necessary. The ECDSA keys are short enough (in most cases) to put directly in the filecap, although note that this still increases the cap length relative to using a hash truncated to K+T bits. = Desired Features = * fewer roundtrips: mutable retrieve must currently fetch the pubkey (or encrypted privkey) from at least one server, which complicates the state machine and can add roundtrips when we guess incorrectly about a good amount of data to fetch on the initial read * offline attenuation: it should be possible to attenuate/diminish the writecap into the readcap without retrieving any shares, otherwise operations like adding a child dircap to a parent directory will be much slower (a roundtrip per child). * writecap -> readcap -> deep-traversal-cap -> verifycap . This requires some form of intermediate key scheme. * server-side share validation * one option is to get rid of the "write-enabler" shared secret and rely upon server validation exclusively. This would make share migration easier and remove one need for an encrypted channel (lease secrets would continue to need protection unless/until they too are replaced with signature verification). However, it would also increase server load. == Filecap Length == A likely security parameter K (=kappa) would be 96, 128, or 160 bits. [96 bits is too short IMHO --David-Sarah] Assuming a {{{lafs:}}} prefix and no additional metadata, here's what various lengths of base62-encoded filecaps would look like: * 1*K: * 96 {{{lafs:14efs6T5YNim0vDVV}}} * 128 {{{lafs:4V2uIYVX0PcHu9fQrJ3GSH}}} * 160 {{{lafs:8gdR7Epld72UvkF6Pe9hhT8NQx3}}} * 2*K: * 192 {{{lafs:072Og6e75IOP9ZZsbR1Twjs6X5xXJnBAF}}} * 256 {{{lafs:fZeioazoWrO62reiAjzUAyV0uz3ssh6Hnanv8cKMClY}}} * 320 {{{lafs:j6Re0BbWp7DqJtgd9fUXl4pWiD5kr1mjT9DtudJ72o0vhPen83Utza}}} * 3*K: * 288 {{{lafs:11DriaxD9nipA10ueBvv5uoMoehvxgPerpQiXyvMPeiUUdtf6}}} * 384 {{{lafs:3a31SqUbf8fpWE1opRCT3coDhRqTU7bDU2AvC3RQJBu6ZNFhVscyxA9slYtPVT79x}}} * 480 {{{lafs:P6rGeI6CwlG4i8W2l6haSoC9rfPjw0KHBi8xAk4F8vdhnRhL8nwE27CitJ9Rld06HuKt8VucypJRcbGFo}}} Adding 2 metadata characters and a clear separator gives us: * 96: {{{lafs:MW-14efs6T5YNim0vDVV}}} * 128: {{{lafs:DW-4V2uIYVX0PcHu9fQrJ3GSH}}} * 160: {{{lafs:DW-8gdR7Epld72UvkF6Pe9hhT8NQx3}}} * 192: {{{lafs:MR-072Og6e75IOP9ZZsbR1Twjs6X5xXJnBAF}}} * 256: {{{lafs:DR-fZeioazoWrO62reiAjzUAyV0uz3ssh6Hnanv8cKMClY}}} * 288: {{{lafs:MV-11DriaxD9nipA10ueBvv5uoMoehvxgPerpQiXyvMPeiUUdtf6}}} * 320: {{{lafs:MR-j6Re0BbWp7DqJtgd9fUXl4pWiD5kr1mjT9DtudJ72o0vhPen83Utza}}} * 384: {{{lafs:DV-3a31SqUbf8fpWE1opRCT3coDhRqTU7bDU2AvC3RQJBu6ZNFhVscyxA9slYtPVT79x}}} * 480: {{{lafs:MV-P6rGeI6CwlG4i8W2l6haSoC9rfPjw0KHBi8xAk4F8vdhnRhL8nwE27CitJ9Rld06HuKt8VucypJRcbGFo}}} [http://allmydata.org/trac/tahoe/ticket/217#comment:44 #217:c44] says that, if we don't need to prevent collisions, then we can use a K-bit hash for K-bit second-pre-image resistance. However, see [http://allmydata.org/trac/tahoe/ticket/882#comment:6 #882:c6] for a counterargument saying that 50 extra bits or so are needed to be secure against multi-target attacks (i.e. T = 50). This page has now been updated assuming the counterargument is correct. = Design Proposals = == Commonalities == * once we get the ciphertext, it gets segmented and erasure-coded in the same way as immutable files. Shares include a merkle tree over the share blocks, and a second one over the ciphertext segments. * we'd like to add a merkle tree over the plaintext, without reintroducing the partial-information-guessing attack that prompted us to remove it. This means encrypting the nodes of this merkle tree with a key derived from the readcap. * We'll continue to use the signing layout of the current mutable files: there will be a UEB that includes the root of the hash trees (and everything else in the share), which will be hashed to compute the "roothash" (which changes with each publish). A block of data that includes the roothash and a sequence number (as well as any data-encrypting salt) will be signed. * It might be good to make the layout a bit more extensible, like the way that immutable files have a dictionary-like structure for the UEB. * In general, the share will always contain a full copy of the pubkey, for the benefit of server-side validation. I don't think it matters whether the pubkey is stored inside or outside of the signed block, but it will probably make the upload-time share-verification code simpler to put it inside. * In general, the storage-index will be equal to the pubkey. If the pubkey is too long for this, the storage-index will be a sufficiently-long secure hash of the pubkey. The SI must be long enough to meet our collision-resistance criteria. == ECDSA, semi-private keys, no traversalcap == Zooko captured the current leading semi-private-key-using mutable file design nicely in the [http://allmydata.org/~zooko/lafs.pdf "StorageSS08" paper] (in Figure 3). The design is: * (K + T) writecap = (K+T)-bit random string, perhaps derived from user-supplied material (remember, K=kappa, probably 128bits) * (minimum 2K) readcap = minimum 2*K-bit semiprivate key * (minimum 2K) verifycap = public key * storage-index = (possibly truncated) verifycap On each publish, a random salt is generated and stored in the share. The data is encrypted with H(salt, readcap) and the ciphertext stored in the share. We store the usual merkle trees. This provides offline attenuation and full server-side validation. It removes the need to pull a copy of the pubkey or encprivkey from just one of the servers (the salt must still be fetched, but it's small and lives in the signed block that must be fetched anyways). === add traversalcap === Like above, but create two levels of semiprivate keys instead of just one: * (K + T) writecap = (K+T)-bit random string * (minimum 2K) readcap = minimum 2*K-bit first semiprivate key * (minimum 2K) traversalcap = minimum 2*K-bit second semiprivate key * (minimum 2K) verifycap = public key * storage-index = (possibly truncated) verifycap The dirnode encoding would use H(writecap) to protect the child writecaps, H(readcap) to protect the child readcaps, and H(traversapcap) to protect the child verifycap/traversalcaps. == Any signature algorithm, no semi-private keys, no traversalcap == Without semi-private keys, we need something more complicated to protect the readkey: the only thing that can be mathematically derived from the writecap is the pubkey, and that can't be used to protect the data because it's public (and used by the server to validate shares). One approach is to use the current (discrete-log DSA) mutable file structure, and merely move the private key out of the share and into the writecap: * (K + T) writecap = (K+T)-bit random string = privkey * (2K + T) readcap = H(writecap)[:K] + H(pubkey)[:K+T] * (K + T) verifycap = H(pubkey)[:K+T] * storage-index = verifycap In this case, the readcap/verifycap holder is obligated to fetch the pubkey from one of the shares, since they cannot derive it themselves. This preserves offline attenuation and server-side validation. The readcap grows to 2K + T : we only need K bits for K-bit confidentiality, but require K+T bits on H(pubkey) to attain K-bit second-preimage resistance for 2^T^ targets. (To obtain collision-resistance, set T = K, although that shouldn't be necessary for mutable files.) The verifycap is K+T bits. === include ECDSA pubkey in cap === Or, if the pubkey is short enough, include it in the cap rather than requiring the client to fetch a copy: * (K + T) writecap = (K+T)-bit random string = privkey * (minimum 3K) readcap = H(writecap)[:K] + pubkey * (minimum 2K) verifycap = pubkey * storage-index = H(pubkey) ECDSA pubkeys are slightly more than 2*K long, so this would increase the length of the readcaps whenever K > T. The advantage would be simplifying/speeding up the download process. It is highly unlikely that there is any public key algorithm with keys shorter than 2*K for a K-bit security level. Since we can use shorter hashes than public keys, the H(pubkey) design above gives us shorter read caps, although they are not shorter than using semi-private keys. === Any signature algorithm, no semi-private keys, with traversalcap === Since a secure pubkey identifier (either H(pubkey)[:K+T] or the original privkey) is present in all caps, it's easy to insert arbitrary intermediate levels. It doesn't even change the way the existing caps are used: * (K + T) writecap = (K+T)-bit random string = privkey * (2K + T) readcap = H(writecap)[:K] + H(pubkey)[:K+T] * (2K + T) traversalcap: H(readcap)[:K] + H(pubkey)[:K+T] * (K + T) verifycap = H(pubkey)[:K+T] * storage-index = verifycap == Shorter readcaps (insecure) == This section used to describe a scheme using a MAC (with the read cap as key) instead of signatures. However this was vulnerable to manipulation by a collusion between Rose-the-readcap-holder and the storage servers, and could be used to cause another readcap-holder to see the wrong data. See the [http://allmydata.org/trac/tahoe/wiki/NewMutableEncodingDesign?version=7#Shorterreadcaps history] if you're interested.