[tahoe-dev] UEB hash size

Shawn Willden shawn-tahoe at willden.org
Sun Jul 12 21:25:42 PDT 2009


On Sunday 12 July 2009 09:31:28 pm Zooko Wilcox-O'Hearn wrote:
> Can you characterize it as something like "the difference between a
> 24-byte immutable read cap and a 32-byte and a 64-byte is X times as
> many trie node fetches for an average file download"?

Actually, it's not an issue on the download side.  The trie structure ensures 
that only a single trie node will be downloaded (even when the local cache is 
empty; with a fully-populated cache, there will be no Tahoe traffic).  The 
issue is on the upload side, and it's a little complicated to quantify, as it 
depends heavily on the trie parameters and the "fullness" of trie nodes, 
which determines the number of trie nodes that need to be updated to add a 
set of entries.

I don't fully understand the mathematical characteristics of the burst trie 
yet -- the one paper I found on the structure didn't really analyze that in 
detail, and my intuition about node fullness has proven to be way off -- 
nodes seem to be on average much less full that I'd expected, meaning there 
are more nodes than I'd expected.

If there are m nodes (including empty nodes on already-burst levels), and n 
entries are inserted and if none of the nodes has to "burst", then on average 
it will be necessary to update m(1-(1-1/m)**n) nodes.  As m grows, that tends 
toward n.  For small m, it tends toward m.  What I can't do is characterize m 
as a function of the number of entries in the trie and the set of trie 
parameters (bits per level and max entries per node).  I wish I could, 
because besides helping me quantify this issue, it would probably also let me 
optimize the choice of trie parameters.

I guess I need to sit down and think hard about how the trie behaves as it 
grows.  It seems like it should be easy to get a handle on...

	Shawn.


More information about the tahoe-dev mailing list