#678 new enhancement

converge same file, same K, different N — at Version 8

Reported by: zooko Owned by:
Priority: major Milestone: undecided
Component: code-encoding Version: 1.3.0
Keywords: newcaps space-efficiency performance research Cc:
Launchpad Bug:

Description (last modified by warner)

The underlying erasure code is "systematic", which means the first K shares are simply the segments of the file (except that the last one, which contains the end of the file, is padded out to be the same size as the others). The erasure code also has the property (I don't know what it is called) that the "check shares" or "secondary shares" -- the ones after the first K -- are also the same regardless of what N is.

Therefore if you upload a file with, e.g. K=13, N=16 and then you re-upload it with K=13, N=26, then the index 0 through the index 15 share that you upload would be exactly the same as they were in the original upload (if you used the same encryption key to encrypt the file before erasure-coding it).

However, Tahoe currently doesn't take advantage of this coincidence at all, because it doesn't use the same encryption key for those two files. Instead it includes N in the generation of the encryption key, and in the resulting immutable-file read-cap, so that the two uploads of the same file with the same K and different N result in completely different sets of shares and different read-caps. specs: file-encoding.txt; source code: upload.py

To fix this ticket, figure out how to retain all of the safety and security properties that Tahoe immutable-file currently have, while also letting those two uploads share their first 16 shares.

(By the way, the reason I was reminded of this is that I'm currently doing an upload exactly like this: dogfood tasting report . :-))

Change History (8)

comment:1 Changed at 2009-08-10T15:29:00Z by zooko

The following clump of tickets might be of interest to people who are interested in this ticket: #711 (repair to different levels of M), #699 (optionally rebalance during repair or upload), #543 ('rebalancing manager'), #232 (Peer selection doesn't rebalance shares on overwrite of mutable file.), #678 (converge same file, same K, different M), #610 (upload should take better advantage of existing shares), #573 (Allow client to control which storage servers receive shares).

comment:2 Changed at 2009-08-10T15:45:46Z by zooko

Also related: #778 ("shares of happiness" is the wrong measure; "servers of happiness" is better).

comment:3 Changed at 2009-08-25T08:37:44Z by warner

One significant barrier to this is the share hash tree. When a file is uploaded, we take the hash of all shares (well, the block hash tree root for all shares) and put them into a list, then we build another merkle tree over that list, and store the root of that tree in the UEB (where it gets folded into the integrity-checking part of the filecap).

This list is of length "N", the number of shares that were created, padded up to a power-of-two boundary (for the merkle tree). This makes it impossible to add more shares to the same hash tree.

So if a file has already been uploaded with 3-of-10, and now you decide you'd like to encode it with 3-of-16, then shares 0..9 will be the same, but the merkle tree that was distributed with the first upload won't contain the hashes for shares 10..15, and the UEB will be different for the two uploads, so the filecaps will be different (to be specific, the readkey [and therefore the storage-index] part will be the same, but the UEB hash that provides all the integrity information will be different).

Let's analyze the compatibility grid. One one axis is the person doing the download: either Sally holding the "short" 3-of-10 filecap, or Lucy holding the "long" 3-of-16 filecap. The other is the class of share being used:

  • A: the original 10 shares from the 3-of-10 upload
  • B: shares 0..9 from the new 3-of-16 upload
  • C: shares 10..15 from the new 3-of-16 upload

(in ideal circumstances, the 3-of-16 upload would skip the "B" shares, but if that uploader didn't happen to see the "A" shares out there already, they might generate "B" shares)

Sally handles "A" shares just fine, as usual. When she sees "B" shares, the current code will reject them because the UEB is different than the filecap. The shares don't contain a full share hash tree, but rather just the minimal chain necessary to validate that one share. If she has enough "A" shares to know the expected value of the "B" share in question (i.e. that leaf of the short tree), then she can use the "B" share with confidence (she'd use the share, but ignore the [long] hash chain that comes with it). This basically means she must get that share's sibling: if she has A0, she can validate B1; if she has A3, she can validate B2, etc.

Should she see "C" shares (say that 5 times fast!), she's stuck. There is no hash chain that will get her from the C share's hash to her filecap's UEB hash. She could download the C share anyways and throw it into FEC, and then test the resulting ciphertext segment against the ciphertext hash tree: if it matches, great, the share was good. If it doesn't, then she won't even know which share had a problem.

Now, what about Lucy? When she sees an "A" share, she'll be in the same boat as Sally looking at a "B" share. If Lucy manages to find a sibling "B" share for each potential "A" share, then she can validate the share hash, and use the share. But if she can't (such as if there are *no* "B" shares, because the uploader didn't create any because it saw the "A" shares out there), then she won't be able to validate the "A" shares, and then she'll be in the Sally-vs-C-shares boat: try FEC and catch problems with the ciphertext hash.

So, what could be done to improve this?

  • include a full copy of the share hash tree in each share, not just the minimal chain: this would let Sally use arbitrary "B" shares and Lucy use arbitrary "A" shares, without forcing them to locate a sibling share. It still wouldn't let Sally use "C" shares with confidence.
  • allow the Downloader to speculate a bit: keep track of which shares went into FEC, if the ciphertext hash fails then iterate through the various combinations of shares, use some sort of constraint-based logic to figure out which combinations are still worth trying. The combinatorics will be smaller with fewer non-validated shares, so the peer-selection code should prioritize validatable shares.
  • if you plan ahead, you can pretend you're encoding to 3-of-16 or 3-of-32 or whatever, but then only actually upload the first 10 shares. You still have to do all the encoding work, however, because you have to compute the correct merkle tree to put in those 10 shares, but you can save the bandwidth and storage space for the shares that you'll create later. It's not clear where the break-even point would be.

I'm not sure what else could be done.

For mutable files, it's a slightly different story. The encoding scheme is basically the same, but if you're willing to modify all the existing shares, you can increase the size of the share hash tree later on (build a bigger tree, sign the root, then update all shares with the new tree). Unfortunately I believe that the share format puts the share hash tree before the share data, so if you increase its size, you must also move the data (and the remote API has no insert() method), so you'll basically have to re-upload everything, including the old share data. So it's possible, but expensive.

comment:4 Changed at 2009-10-28T04:10:07Z by davidsarah

  • Keywords newcaps added

Tagging issues relevant to new cap protocol design.

comment:5 Changed at 2009-12-22T18:54:49Z by davidsarah

  • Keywords space-efficiency added

comment:6 Changed at 2010-01-07T04:00:59Z by davidsarah

  • Keywords performance added

comment:7 Changed at 2011-01-27T19:57:20Z by zooko

See new related ticket #1340 (consider share-at-a-time uploader).

comment:8 Changed at 2011-01-29T21:10:17Z by warner

  • Description modified (diff)
  • Summary changed from converge same file, same K, different M to converge same file, same K, different N

s/M/N/ to match our existing "k-of-N" terminology

Note: See TracTickets for help on using tickets.