[tahoe-dev] blocks instead of files?

Zooko O'Whielacronx zookog at gmail.com
Wed Mar 10 20:35:53 PST 2010


On Wed, Mar 10, 2010 at 5:55 PM, David-Sarah Hopwood
<david-sarah at jacaranda.org> wrote:
>
> Suppose that each segment is copied to an independent random choice
> of m of the available servers. Then if all m of the servers for *any*
> segment die, then part of the file will be lost. Losing part of a file
> is essentially equivalent to losing all of it for most applications.

I'm sure that keeping all the blocks of a share together is a win if
your measure of success is 1. "minimize the chance that any part of
this file will be lost". That's why we do it the way we do. (Jukka
Santala and Bram Cohen figured this out in the context of Mojo Nation,
which broke up a file into a number of blocks proportional to the size
of the file and distributed the blocks independently. This was part of
the process of Bram inventing BitTorrent.)

However, I'm not sure if that is the best measure of success. And I'm
not sure if there are others good ways to do it which would still be
good for this measure of success and also good for other measures of
success.

Other possible measures of success include:

2. minimize the chance that at all parts of this file will be lost

This might make more sense for a file where parts of the file still
have value even when other parts are lost, such as a long audio
recording. It makes less sense for a file where interpreting some
parts of the file depends upon on other parts, such as an
LZMA-compressed database snapshot.

Then there is also the question of the longevity of many files:

A. minimize the chance that any file out of a large set of files will be damaged
B. minimize the number of files from the large set of files which will
be damaged

Suppose you have ten thousand files. Would you rather have a 1-in-10
chance that one of your files dies or a 1-in-100,000 chance that all
of your files die? I don't know. Tough call.

Maybe we shouldn't be trying to optimize too much for these sorts of
questions. When Brian is describing the difference between Mountain
View and Tahoe-LAFS, he often starts by saying that Tahoe-LAFS stores
entire shares (all the blocks of the share) together in one place so
that the uploader, downloader, and server have fewer separate objects
to keep track of. Maybe that's the best reason to keep doing it the
way we're doing it now.

Regards,

Zooko


More information about the tahoe-dev mailing list