[tahoe-dev] cp -r works!

Brian Warner warner-tahoe at allmydata.com
Wed May 21 18:28:57 PDT 2008


I just finished pushing a big overhaul of the tahoe CLI frontend. The
new syntax uses the rsync-style "alias:" prefix notation that we
discussed a while back. The name of the default alias (and when an
alias is required or not) is still up in the air, but the basic code
works, and uses "tahoe:" as the "default" alias (i.e. "tahoe ls" is the
same as "tahoe ls tahoe:").

To get started, do:

 tahoe add-alias tahoe `tahoe mkdir`

That will create a brand new unlinked directory and then remember it as
"tahoe:". Then do:

 tahoe ls tahoe:

to experience the wonderful emptyness of your new virtual home. Some other
useful commands:

 echo "READ ME" |tahoe put tahoe:README.txt
 tahoe get tahoe:README.txt |grep "invisible pink elephants"
 tahoe cp ~/.emacs tahoe:.emacs
 tahoe ls --uri tahoe:

The patch that just landed an hour ago is recursive copy. This command is
used both to get data into tahoe:

 tahoe cp -r src/allmydata tahoe:src/allmydata

and out of it:

 tahoe cp -r tahoe:src ./new-src

There are still a lot of rough edges, so don't be surprised if you find bugs.
Also, it's not as fast as you might wish for. I just timed a backup of the
Tahoe src/ directory (173 files, 1.9MB of data, in 9 directories) to testgrid
at 4m14s, or about 1.5 seconds per file. There is some per-file overhead that
I haven't figured out yet. Add --verbose to get some rough progress
information.

Also note that, for the moment at least, you can use pure URIs as alias
indicators, e.g. to list the tahoe testgrid's public directory, use:

 tahoe ls URI:DIR2:djrdkfawoqihigoett4g6auz6a:jx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq


Next week I'm going to work on adding the "backupdb": a small local database
that keeps track of what you've uploaded to the grid. The idea is to do less
work to discover that you've already backed something up: if a given filename
has the same size and timestamp (and optionally the same hash), then don't
re-upload it. When "cp -r --use-backupdb" works, running the same command
twice in a row will result in zero network activity.

Once that's is in place, the "tahoe backup FROM tahoe:TO" command will be
written, basically like "rsync -a" with the --delete option turned on and
using the backupdb to minimize work. This will be a big milestone (for me, at
least): Tahoe will finally be useable as a cron-driven backup tool from
linux (or the CLI shell on other platforms).


cheers,
 -Brian


More information about the tahoe-dev mailing list