[tahoe-dev] Storing large trees on the grid
Benjamin Jansen
tahoe at w007.org
Tue Jan 27 18:35:04 PST 2009
Hello,
I have a local tahoe node that is configured for no local storage and
is attached to the allmydata.com storage grid. I am attempting to copy
my BackupPC backend storage to the grid, so that I have an offsite copy.
I executed "tahoe cp -r -v . bupc:backuppc" a while ago... probably
close to a week. After days and about 1.3M lines of "examining N of N"
output, it said:
attaching sources to targets, 0 files / 1 dirs in root
targets assigned, 160300 dirs, 1058696 files
starting copy, 1058696 files, 160300 directories
Right now, it claims that it has copied about 50K files and 7700
directories. If things keep going as they are, that means I have about
5 months remaining. I'd rather not wait that long. :) I have a
synchronous 15Mbit internet connection; most of the time, when I watch
a graph of traffic at my router, it's sitting at < 5KB/sec out. So,
the bottleneck is definitely not my connection.
Based on my understanding of BackupPC's backend storage, most of those
million files are hard links. Knowing what BPC is backing up, I'd say
10-20% are unique files. Does "tahoe cp" recognize hard links and copy
them as such?
I thought about uploading a tarball instead of each file. The nature
of what I'm storing makes it unlikely that I would want to access an
individual file, anyway. However, my understanding is that tahoe
currently cannot store a 156GB file. Is that correct?
I'd appreciate any advice on how I can speed this up - I'd like for my
line to be the bottleneck. ;)
Thanks,
Ben
More information about the tahoe-dev
mailing list