[tahoe-dev] Storing large trees on the grid
Shawn Willden
shawn-tahoe at willden.org
Wed Jan 28 21:33:06 PST 2009
On Wednesday 28 January 2009 08:02:54 pm Brian Warner wrote:
> We don't have
> any answers yet, but I imagine that the "backupdb" mentioned in #598 (and
> #597) could include a table that maps from (devno, inodeno) to filecap
My in-progress backup tool notices hard links and doesn't bother uploading
them more than once. When it lstats a file, it checks the nfiles attribute.
If nfiles > 1, it stores the inode and device number alongside the rest of
the metadata, and tosses the dict like { inode : [ filenames ] }.
When it uploads files (right now I'm just copying them to a different place in
the file system, not actually uploading), it again notices files that have
nfiles > 1. It searches a set of inodes to see if this hardlinked inode has
already been uploaded. If so, it skips it. If not, it does the upload and
adds the inode to the set.
However, Ben's question makes me wonder how this would work for him, because
the inode dict and heap are in-memory structures. It didn't occur to me that
someone might have enough hard links to make those data structures too big.
I'm not sure what sorts of limits dict and set have, nor how much overhead
they have.
Shawn.
More information about the tahoe-dev
mailing list