[tahoe-dev] Mutable directory update performance

Kyle Markley kyle at arbyte.us
Tue Dec 4 05:17:40 UTC 2012


Hi,

I've been tinkering around with some code that would like to grow up to 
be a parallelized flavor of "tahoe backup" so I could do backup, and 
eventually deep-check, on trees with tens of thousands of files without 
waiting many hours for the standard tools to serially walk through every 
file.  (And without the first serious error caused by a network glitch 
causing the entire operation to abort halfway through those many hours!)

As soon as I got this functioning, I noticed that it was spending almost 
all its time on directory updates.

My code is just invoking the tahoe CLI to do all its work, and I don't 
see that immutable directories are available to the CLI, so I'm linking 
files into mutable directories.  That turns into a serial operation for 
all files that are supposed to be in the same directory.

Is there a plan to make immutable directories available to the CLI, or 
does anyone have advice on making "tahoe ln" faster?  The only other 
idea I have for the short run is to randomize the file order so I'm 
usually touching many separate directories at once.  That's simple but 
not elegant at all...

-- 
Kyle Markley



More information about the tahoe-dev mailing list