[tahoe-dev] Upload misunderstanding or bug?

Brian Warner warner at lothar.com
Wed Aug 4 23:50:08 UTC 2010


On 7/28/10 5:30 AM, Greg Troxel wrote:
>>
>> I needed a workaround, so I thought of one. Since I wanted to use
>> 2/4/4, shares.happy == shares.total, and I had exactly 4 storage
>> nodes in my grid, so I reasoned that I should expect every storage
>> node to get exactly one share. My workaround was to write a script
>> that I ran on each storage node. The script identified storage
>> indexes with more than one share, and deleted those storage indexes.
>>
>> Firstly, was my workaround correct, at least in principle? (It sure
>> seemed to be effective in practice.)

It's correct, except that when you upload files that are already present
in the grid, it gets treated as a repair instead of a brand-new upload,
and the repair code is not particularly clever. In an ideal world, it
would leave all shares in exactly the right places (the places they'd
have been placed if it were a brand new upload), which would involve
moving some shares and maybe even deleting some shares (pretty weird
behavior for an uploader).

But our uploader leaves all existing shares in place, and doesn't
necessarily refrain from uploading shares to the wrong places (in some
places, it commits to uploading a share before it learns that the share
already exists somewhere else).

So I wouldn't be surprised if your workaround experiences funny behavior
when it ends up re-uploading an existing file, for example of a 'tahoe
backup' or 'tahoe repair' command decides that the file isn't completely
healthy and should be re-uploaded.

> Kyle Markley <kyle at arbyte.us> writes:

> One thing that I'm not 100% clear on: I think the 2/3/4 params may be
> encoded in the URI, and changing the config only affects the creation
> of new objects. If what you really want is to change the rules for the
> root directory, you might need to 'cp -r' and repoint your root.

The "2" and the "4" are encoded in the URI (ie "k" and "N", ie
required-shares and total-shares). The "happy" value is only used during
the upload and then discarded (it is neither copied into the URI nor
into any of the uploaded shares).

The "2" (ie "k") value directly affects the shares being created:
several block sizes are influenced by it, as is the way the zfec encoder
works.

Both "2" and "4" are folded into the "convergence hash", meaning that
any given file will get a different encryption key (and therefore
storage index) when re-upload it with different values of k or N. Both
the first part of the URI (the encryption key) and the second part (the
integrity-providing "UEB hash") will be different, because k and N are
also copied into the UEB.

> It would be nice if the command-line 'tahoe check' could print out
> more info about this.

Hm, yeah. There's a "tahoe debug dump-cap" command to take a URI and
unpack the different fields, and there's "tahoe debug dump-share" to
take a share and tell you about its pieces (including values of k and
N).. running those manually on all the things you can find might be
useful.


cheers,
 -Brian


More information about the tahoe-dev mailing list