[tahoe-dev] [tahoe-lafs] #700: have servers publish Bloom filter of which shares they have
tahoe-lafs
trac at allmydata.org
Sat Dec 12 21:02:25 PST 2009
#700: have servers publish Bloom filter of which shares they have
--------------------------------+-------------------------------------------
Reporter: warner | Owner:
Type: enhancement | Status: new
Priority: major | Milestone: undecided
Component: code-storage | Version: 1.4.1
Keywords: performance repair | Launchpad_bug:
--------------------------------+-------------------------------------------
Comment(by zooko):
I always use Norm Hardy's pages on Bloom Filters, which have a very clear
explanation of how to calculate the size:
http://www.cap-lore.com/code/BloomTheory.html
(Also I very much like Norm Hardy as a scientist and a person.)
I like Shawn's observation, which I interpret as: A checker (unlike a
downloader) is typically interested in lots and lots of shares at once.
If the checker has a pregenerated manifest of files that he wants to check
on (as opposed to if he is actively traversing a directory structure and
checking on files as he finds them), then he might have a list of lots of
shares in hand, and want to use this Bloom Filter trick to download one
Bloom Filter from each storage server and then probabilistically find out
which of them are likely to be all right without doing any further
transactions -- sweet! Oh, and if he didn't start with a pregenerated
manifest, it is still a win to traverse the ''directory'' structure, and
accumulate the verify caps of all the files in the first pass and then do
the Bloom Filter check in the second pass. Interesting!
By the way, you can always add new elements into a Bloom Filter cheaply,
so storage servers which have a Bloom Filter already built when they
receive a new share can instantly add that new share into the filter. You
can't remove items from a Bloom Filter. I guess the only time storage
servers remove shares from disk is when they do a garbage collection pass
over their entire store, so I guess the thing to do is when you start a
garbage collection pass you also throw away your Bloom Filter and start
building a new one, adding in every share which you don't collect, and
then the new Bloom Filter will be ready at the end of your garbage
collection pass.
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/700#comment:5>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list