[tahoe-dev] Faster share retrieval by using probabilistic set-testing ?

Mon Mar 5 16:34:29 UTC 2012

On Mon, Mar 5, 2012 at 9:16 AM, David-Sarah Hopwood <
david-sarah at jacaranda.org> wrote:

> On 05/03/12 07:30, Johannes wrote:
> > As far as I understand, if a k/N share is retrieved, the client has to
> > ask each of at least k storage nodes whether they possess matching
> > shares. Now, the simplest way is just to look whether a file name with
> > the share key exists, which seem to happen above. This consumes a little
> > bit time, as disk access is relatively slow.
> >
> > As a possible alternative, there exists an efficient algorithm
> > called "Bloom Filter" which is able to tell very fast whether a set
> > member is *probably* there, this works by implementing a probabilistic
> > set which can be held in memory and is much smaller than the actual
> > list of shares.
>
> Thanks for the suggestion, but I don't think the disk access time here is
> likely to be significant relative to communication latency. Also remember
> that the storage directory contents may be cached by the OS.
>

I think the potential win is if clients can query storage nodes for their
filters.  Then they can determine, with some probability, which storage
nodes have the shares they're seeking without communicating.

That said, for small to moderate-sized grids, I think it's better to
distribute shares to (nearly) all nodes, because that provides the lowest
expansion factor (and hence network overhead) for a given level of
reliability.  With that usage model, there isn't much value in being able
to locally determine which nodes hold shares, because nearly all of them
will.

-- 
Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20120305/8aa9ecac/attachment.html>