[tahoe-dev] Questions

Jason Wood jwood275 at googlemail.com
Wed May 12 13:24:49 PDT 2010


>
> Message: 3
> Date: Wed, 12 May 2010 11:33:27 -0700
> From: Brian Warner <warner at lothar.com>
> Subject: Re: [tahoe-dev] Questions
> To: tahoe-dev at allmydata.org
> Message-ID: <4BEAF477.9010302 at lothar.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On 5/12/10 9:44 AM, Jason Wood wrote:
> > Hi,
>
> Welcome!
>
> > Suppose I have a file of 100GB and 2 storage nodes each with 75GB
> > available, will I be able to store the file or does it have to fit
> > within the realms of a single node?
>
> I think tahoe does what you want here. What matters is the size of the
> shares that Tahoe generates, not the size of the original file, and you
> have control over those shares.
>
> The ability to store the file will depend upon how you set the encoding
> parameters: you get to choose the tradeoff between expansion (how much
> space gets used) and reliability. The default settings are "3-of-10"
> (very conservative), which means the file is encoded into 10 shares, and
> any 3 will be sufficient to reconstruct it. That means each share will
> be 1/3rd the size of the original file (plus a small overhead, less than
> 0.5% for large files). For your 100GB file, that means 10 shares, each
> of which is 33GB in size, which would not fit (it could get two shares
> on each server, but it couldn't place all ten, so it would return an
> error).


> But you could set the encoding to 2-of-2, which would give you two 50GB
> shares, and it would happily put one share on each server. That would
> store the file, but it wouldn't give you any redundancy: a failure of
> either server would prevent you from recovering the file.
>
> You could also set the encoding to 4-of-6, which would generate six 25GB
> shares, and put three on each server. This would still be vulnerable to
> either server being down (since neither server has enough shares to give
> you the whole file by itself), but would become tolerant to errors in an
> individual share (if only one share file were damaged, there are still
> five other shares, and we only need four). A lot of disk errors affect
> only a single file, so there's some benefit to this even if you're still
> vulnerable to a full disk/server failure.
>
> So, you can set the encoding parameters (in the "tahoe.cfg" file) to
> whatever you like, to meet your goals.
>

That's good to hear. My question was more about whether a large file could
be stored if not enough space is available on a single storage node. So if I
had 10 storage nodes but no single node has enough room for the file, can it
be stored. Seems the answer is yes which is fantastic.


>
> > Do I need to shutdown all clients/servers to add a storage node?
>
> Nope. You can add or remove clients or servers anytime you like. The
> central "Introducer" is responsible for telling clients and servers
> about each other, and it acts as a simple publish-subscribe hub, so
> everything is very dynamic. Clients re-evaluate the list of available
> servers each time they do an upload.
>
> This is great for long-term servers, but can be a bit surprising in the
> short-term: if you've just started your client and upload a file before
> it has a chance to connect to all of the servers, your file may be
> stored on a small subset of the servers, with less reliability than you
> wanted. We're still working on a good way to prevent this while still
> retaining the dynamic server discovery properties (probably in the form
> of a client-side configuration statement that lists all the servers that
> you expect to connect to, so it can refuse to do an upload until it's
> connected to at least those). A list like that might require a client
> restart when you wanted to add to this "required" list, but we could
> implement such a feature without a restart requirement too


Again, great news. If new nodes are introduced, they would be long-term
nodes and only added as we start to run out of room on the grid.

.
>
> > Finally, I see I can link files on the cluster (very useful!), does this
> > make an actual link or copy the data? Does the target file have to
> > reside on the same storage node as the source file? I think I know the
> > answer to this but just want to clarify.
>
> It's just a link. From the point of view of the directories, each file
> just lives "in the cloud", and is not associated with any particular
> storage nodes: each file has a "filecap" string, and directories are
> just lists of filecaps.
>
> Each file has shares on a set of storage nodes (a different set for each
> file). Directories are just special kinds of files, so directories also
> have shares on a set of storage nodes. The storage nodes used for a
> directory are unrelated to the ones used for the files therein.
>
> "Copying" an immutable file from one directory to another just creates a
> second link to that file. In fact, "uploading a file to a directory"
> actually has two steps: first the file is uploaded into the grid and
> returns a filecap, second the directory is modified (by adding the new
> filecap to its list). So copying from one directory to another just does
> the second step (modifies the target directory), and the original file
> isn't touched.
>
> Of course, copying a *mutable* file is different, because the copy must
> be a new object (changing the copy should not cause the original to
> change). In that case, the data itself must be copied. We don't yet
> support efficient large mutable files, and Tahoe uses immutable files by
> default, so in practice you don't tend to run into this very much


This keeps getting better and better! That is exactly what I was hoping to
hear!

Ok, so it seems to do everything I need, now for a couple of questions about
"nice to have" features...

I believe I can set up a grid to consist of nodes on a LAN and WAN as part
of the same storage cluster. So, if I had 3 locations each with 5 storage
nodes, could I configure the grid to ensure a file is written to each
location so that I could handle all servers at a particular location going
down?

And finally, is it possible to modify a mutable file by "patching" it? So,
if I have a file stored and I want to update a section of the file in the
middle, is that possible or would be file need to be downloaded, patched and
re-uploaded? I think I'm asking a lot here and I already have a plan to work
around it but as the system seems to do everything else I need I figured it
was worth asking.


>
> Hope that helps! Let us know how it goes!
>

Will do! Thanks for your very useful answers!

Thanks,

Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://allmydata.org/pipermail/tahoe-dev/attachments/20100512/c382a8b4/attachment.htm 


More information about the tahoe-dev mailing list