On Mon, Mar 5, 2012 at 9:16 AM, David-Sarah Hopwood <span dir="ltr"><<a href="mailto:david-sarah@jacaranda.org">david-sarah@jacaranda.org</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">On 05/03/12 07:30, Johannes wrote:<br>

> As far as I understand, if a k/N share is retrieved, the client has to<br>

> ask each of at least k storage nodes whether they possess matching<br>

> shares. Now, the simplest way is just to look whether a file name with<br>

> the share key exists, which seem to happen above. This consumes a little<br>

> bit time, as disk access is relatively slow.<br>

><br>

> As a possible alternative, there exists an efficient algorithm<br>

> called "Bloom Filter" which is able to tell very fast whether a set<br>

> member is *probably* there, this works by implementing a probabilistic<br>

> set which can be held in memory and is much smaller than the actual<br>

> list of shares.<br>

<br>

</div>Thanks for the suggestion, but I don't think the disk access time here is<br>

likely to be significant relative to communication latency. Also remember<br>

that the storage directory contents may be cached by the OS.<br></blockquote><div><br></div><div>I think the potential win is if clients can query storage nodes for their filters.  Then they can determine, with some probability, which storage nodes have the shares they're seeking without communicating.</div>

<div><br></div><div>That said, for small to moderate-sized grids, I think it's better to distribute shares to (nearly) all nodes, because that provides the lowest expansion factor (and hence network overhead) for a given level of reliability.  With that usage model, there isn't much value in being able to locally determine which nodes hold shares, because nearly all of them will.</div>

</div><div><br></div>-- <br>Shawn<br>