[tahoe-dev] UEB hash size
Shawn Willden
shawn-tahoe at willden.org
Sun Jul 12 21:25:42 PDT 2009
On Sunday 12 July 2009 09:31:28 pm Zooko Wilcox-O'Hearn wrote:
> Can you characterize it as something like "the difference between a
> 24-byte immutable read cap and a 32-byte and a 64-byte is X times as
> many trie node fetches for an average file download"?
Actually, it's not an issue on the download side. The trie structure ensures
that only a single trie node will be downloaded (even when the local cache is
empty; with a fully-populated cache, there will be no Tahoe traffic). The
issue is on the upload side, and it's a little complicated to quantify, as it
depends heavily on the trie parameters and the "fullness" of trie nodes,
which determines the number of trie nodes that need to be updated to add a
set of entries.
I don't fully understand the mathematical characteristics of the burst trie
yet -- the one paper I found on the structure didn't really analyze that in
detail, and my intuition about node fullness has proven to be way off --
nodes seem to be on average much less full that I'd expected, meaning there
are more nodes than I'd expected.
If there are m nodes (including empty nodes on already-burst levels), and n
entries are inserted and if none of the nodes has to "burst", then on average
it will be necessary to update m(1-(1-1/m)**n) nodes. As m grows, that tends
toward n. For small m, it tends toward m. What I can't do is characterize m
as a function of the number of entries in the trie and the set of trie
parameters (bits per level and max entries per node). I wish I could,
because besides helping me quantify this issue, it would probably also let me
optimize the choice of trie parameters.
I guess I need to sit down and think hard about how the trie behaves as it
grows. It seems like it should be easy to get a handle on...
Shawn.
More information about the tahoe-dev
mailing list