| | 1 | The NewCapDesign page describes desired features of the next filecap design. |
| | 2 | This page is for designing the encoding format for these new immutable files. |
| | 3 | |
| | 4 | = Features = |
| | 5 | |
| | 6 | * as described on NewCapDesign#filecaplength, we probably need 128bit |
| | 7 | confidentiality "C" bits, 256bit integrity "I" bits, and 128bit |
| | 8 | storage-collision resistance. There are encoding schemes that can combine |
| | 9 | the C and I bits (at the expense of convergence, or certain forms of |
| | 10 | offline attenutation). |
| | 11 | * we may define a "server-selection-index" (which is used to permute or |
| | 12 | otherwise narrow the list of servers to be used) to be separate from the |
| | 13 | "storage-index" (which is used to identify a specific share on whichever |
| | 14 | servers we actually talk to). This may involve a separate field in the |
| | 15 | filecap, or it may continue to be derived from the storage index. |
| | 16 | * some encoding schemes allow the readcap to be attenuated to a verifycap |
| | 17 | offline |
| | 18 | * in general, we don't care how long the verifycap is |
| | 19 | * the server should be able to validate the entire share by itself, without |
| | 20 | the readcap. In general, this means that the storage-index must also be |
| | 21 | the verifycap. |
| | 22 | * note that this implies that the storage-index cannot be computed until |
| | 23 | the end of encoding, when all shares have been generated, the share hash |
| | 24 | tree has been built, and its root has been added to the UEB. |
| | 25 | * this implies that we can't use the storage-index to detect convergence |
| | 26 | with earlier uploads of the same file. To retain convergence may require |
| | 27 | a lookup table on the server (mapping hash-of-readkey to storage-index, |
| | 28 | or something) |
| | 29 | * it also implies that storage-index can't be used as a |
| | 30 | server-selection-index, which again points to using hash-of-readkey as |
| | 31 | SSI (to retain convergence of server-selection). Setting the |
| | 32 | storage-index at the end of upload requires a new uploader protocol, |
| | 33 | which uses an "upload handle" for the data transfer, and finishes with a |
| | 34 | "now commit this share to storage-index=X" message. |
| | 35 | * the original CHK design uses hash-of-readkey as storage-index, which has |
| | 36 | all these good properties except server-side full share validation. |
| | 37 | (servers can compare share contents against the UEB, and we could put a |
| | 38 | copy of the UEB hash into the share, but servers would continue to be |
| | 39 | unable to make sure the share was in the right place) |
| | 40 | |
| | 41 | = Options = |
| | 42 | |
| | 43 | note: all cap-length computations assume the integrity-providing "I" field is |
| | 44 | 256bits long, and the confidentiality-providing "C" field is 128bits long. If |
| | 45 | we decide on different values, the sums below should be updated. |
| | 46 | |
| | 47 | == One: current CHK design == |
| | 48 | |
| | 49 | Readcaps consist of two main pieces: C bits and I bits, plus: |
| | 50 | |
| | 51 | * k (which improves the accuracy of the initial number of queries to send |
| | 52 | out) |
| | 53 | * N (which improves the guessed upper bound on number of queries to send |
| | 54 | out, and used to be required by the abandoned TahoeThree algorithm) |
| | 55 | * filesize (advisory only, used by deep-size measurements in lieu of |
| | 56 | fetching share data to measure filesize) |
| | 57 | |
| | 58 | SI = H(C), SSI=SI. Verifycap is SI+I. |
| | 59 | |
| | 60 | * SSI and SI are known ahead of time, uploader protocol starts with SI |
| | 61 | * good convergence |
| | 62 | * long caps (128+256+len(k+N+filesize)) ~= 400bits |
| | 63 | * server cannot verify entire share |
| | 64 | |
| | 65 | == Two: Zooko's scheme == |
| | 66 | |
| | 67 | Readcaps contain one crypto value that combines C and I fields. (I forget how |
| | 68 | this worked.. it was clever, but I think it had some fatal flaw, like not |
| | 69 | being able to get a storage-index from the readcap without first retrieving |
| | 70 | shares, or something. One of us will dig up the notes on it and describe it |
| | 71 | here). |
| | 72 | |
| | 73 | * short caps |
| | 74 | * convergence problems |
| | 75 | |
| | 76 | == Others? == |
| | 77 | |
| | 78 | == Ideas == |
| | 79 | |
| | 80 | It might be possible to have the uploader give two values to the server, at |
| | 81 | different stages of the upload process, which (together) would allow full |
| | 82 | validation of the resulting share. Using a single value (the verifycap), as a |
| | 83 | storage index, would be cleaner, but might not be strictly necessary. |
| | 84 | |
| | 85 | The servers could maintain a table, mapping from one sort of index to |
| | 86 | another, if that made it easier for the upload process to proceed (or to |
| | 87 | achieve convergence). For example, H(readkey) is known at the beginning of |
| | 88 | upload, but the I bits aren't known until the end. If the client could use |
| | 89 | SSI=H(readkey) and then ask each server to tell them the storage-index of any |
| | 90 | shares which used H(readkey), it could achieve convergence and still use the |
| | 91 | I bits as the storage-index. The servers would be obligated to maintain a |
| | 92 | table with one entry per bucket (so probably ~20M entries), and |
| | 93 | errors/malicious behavior in this table would cause convergence failures |
| | 94 | (which are hardly fatal). |
| | 95 | |
| | 96 | The SSI can be much shorter than the SI. It only needs to be long enough to |
| | 97 | provide good load-balancing properties. It could be included explicitly in |
| | 98 | the filecap. Alternate (non-TahoeTwo) peer-selection strategies could encode |
| | 99 | whatever per-file information they needed into the SSI, assuming some sort of |
| | 100 | tradeoff between cap length (i.e. SSI length) and work done by the downloader |
| | 101 | to find the right servers. |