[tahoe-dev] the small nodes problem

Sun Dec 27 17:49:23 PST 2009

I am forwarding this discussion per David-Sarah's suggestion. I was not sure
it was pertinent to the list at the time I initiated the conversation based
on a statement made in another thread.

---------- Forwarded message ----------
From: David-Sarah Hopwood <david-sarah at jacaranda.org>
Date: Sat, Dec 26, 2009 at 10:24 PM
Subject: Re: off-list question -- small nodes
To: Jody Harris <havoc at harrisdev.com>

Jody Harris wrote:
> David-Sarah,
>
> I have a specific question with regard to the following statement you made
> to the Tahoe-dev list:
>
> ----- quote
> However, "4 GB to 2 TB" is probably asking too much: the servers with
> 4 GB will inevitably receive a lot of requests to store shares that they
> don't have room for, and to download shares that they haven't stored.
> I've submitted ticket #872 about that.
> ----- end quote
>
> That statement implies (but does not "say"), "very small storage nodes are
a
> distinct disadvantage to tahoe-lafs storage grids."
>
> Is that true?

A large ratio between the largest and smallest node capacities is a distinct
disadvantage, yes. The peer selection algorithm works by choosing a
permutation of all the servers, as a function of the file's storage index.
Since it treats all the servers the same, shares will be uploaded to each
server at roughly the same rate. That will cause the smallest servers to
fill up first. At that point,
 - all of the attempts to upload more shares to a full server are wasted
  effort;
 - when a share cannot be uploaded to a particular server, it must be
  uploaded to one later in the list. But every time the file is downloaded,
  the list is consulted in order, so the servers that didn't have room for
  shares will still be contacted.

> My reason for asking is that I am trying to pull together a tahoe-lafs
grid,
> and I have one participant who insists on creating "discrete" nodes
(1-4GB)
> in virtual machines. He convinced that this is the best method. I have
> attempted reason, but it has failed. I did NOT see the implication you
> pointed out in your message.
>
> If this is true, it should be documented somewhere. If my extrapolation of
> your statement is correct, then it could be beneficial to the grid in the
> extreme long-run to remove smaller contributions to the grid (40-200GB) at
> some point in the grid's life, if 1-4 TB storage nodes became the nominal
> storage node size -- depending on the usage of the grid and other factors,
> of course.

Probably, yes.

> Thanks for your time!

BTW, you should ask questions like this on the mailing list, because I may
well be wrong, and other folks (particularly Zooko and Brian) have much
better knowledge of the Tahoe implementation. Feel free to forward this
reply to the list.

--
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://allmydata.org/pipermail/tahoe-dev/attachments/20091227/73d35cb5/attachment-0001.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 299 bytes
Desc: not available
Url : http://allmydata.org/pipermail/tahoe-dev/attachments/20091227/73d35cb5/attachment-0001.pgp