[tahoe-dev] [tahoe-lafs] #700: have servers publish Bloom filter of which shares they have

tahoe-lafs trac at allmydata.org
Sat Dec 12 21:02:25 PST 2009


#700: have servers publish Bloom filter of which shares they have
--------------------------------+-------------------------------------------
 Reporter:  warner              |           Owner:           
     Type:  enhancement         |          Status:  new      
 Priority:  major               |       Milestone:  undecided
Component:  code-storage        |         Version:  1.4.1    
 Keywords:  performance repair  |   Launchpad_bug:           
--------------------------------+-------------------------------------------

Comment(by zooko):

 I always use Norm Hardy's pages on Bloom Filters, which have a very clear
 explanation of how to calculate the size:

 http://www.cap-lore.com/code/BloomTheory.html

 (Also I very much like Norm Hardy as a scientist and a person.)

 I like Shawn's observation, which I interpret as: A checker (unlike a
 downloader) is typically interested in lots and lots of shares at once.
 If the checker has a pregenerated manifest of files that he wants to check
 on (as opposed to if he is actively traversing a directory structure and
 checking on files as he finds them), then he might have a list of lots of
 shares in hand, and want to use this Bloom Filter trick to download one
 Bloom Filter from each storage server and then probabilistically find out
 which of them are likely to be all right without doing any further
 transactions -- sweet!  Oh, and if he didn't start with a pregenerated
 manifest, it is still a win to traverse the ''directory'' structure, and
 accumulate the verify caps of all the files in the first pass and then do
 the Bloom Filter check in the second pass.  Interesting!

 By the way, you can always add new elements into a Bloom Filter cheaply,
 so storage servers which have a Bloom Filter already built when they
 receive a new share can instantly add that new share into the filter.  You
 can't remove items from a Bloom Filter.  I guess the only time storage
 servers remove shares from disk is when they do a garbage collection pass
 over their entire store, so I guess the thing to do is when you start a
 garbage collection pass you also throw away your Bloom Filter and start
 building a new one, adding in every share which you don't collect, and
 then the new Bloom Filter will be ready at the end of your garbage
 collection pass.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/700#comment:5>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list