[tahoe-dev] [tahoe-lafs] #796: write-only backup caps
tahoe-lafs
trac at allmydata.org
Sat Aug 22 16:55:34 PDT 2009
#796: write-only backup caps
--------------------------+-------------------------------------------------
Reporter: warner | Owner:
Type: enhancement | Status: new
Priority: major | Milestone: undecided
Component: code-mutable | Version: 1.5.0
Keywords: | Launchpad_bug:
--------------------------+-------------------------------------------------
David-Sarah Hopwood points out an even more interesting
direction to take in a recent tahoe-dev posting:
http://allmydata.org/pipermail/tahoe-dev/2009-August/002653.html
The goal is to have one cap (used frequently and stored online)
to do write-only backups, and a different cap (used only for
recovery and stored offline) to perform the reads. The effect
would be close to that of the Mac OS-X shared public "Drop Box"
folder, or of GPG-encrypting a piece of data to a private key
that is held offline: normally a one-way operation, but when you
need to, you open up the vault and pull out the decryption key.
This would be pretty cool. This ticket is to sketch out what the
crypto layout would look like. #795 (append-only files) will be
a starting point, and there will certainly be an asymmetric
encryption/decryption keypair involved.
From the UI point of view, you'd have some sort of magic
append-only no-reading directory cap, which you keep in your
private/alises table. There would be a corresponding
read-everything cap (or maybe just the full-fledged writecap;
these could be stored separately), which you keep in a vault and
only type in to test the system and to recover data. Then you
type "tahoe backup ~ backup-appendonlycap:", and you expect that
this unreadable "backup-appendonlycap:" object will acquire
another child, with a timestamp name that is hopefully (but not
guaranteedly) unique.
You might also like the unchanged-directory-sharing properties
of "tahoe backup" to keep working, so that you don't spend a lot
of time or disk on things that haven't changed. I don't know if
it's possible to accomplish this without recording some
information which would violate the no-reading properties of the
parent. This would probably be easier to pull off if we have
immutable directories (#607). I suspect that you'll still have
to read and hash your whole disk, and generate the CHK
identifiers, and then discover that they're already uploaded. So
you might save the storage space and the upload bandwidth, but
not the local disk IO.
(hm, so the current backupdb would record the uploaded filecaps,
which starts to violate the goals once the original file gets
deleted and the backupdb doesn't also delete the stored filecap.
But if your local filesystem allows you to attach metadata to
the files you're backing up, then just attach the tahoe filecap
and a ctime/mtime/filesize snapshot to the original file, so the
filecap dies with the file. The backup process would look for
this metadata, compare the ctime/mtime/size snapshot to decide
if the cached filecap is stale, then upload or not. This would
be pretty slick, actually, and I think several modern
filesystems let you attach this sort of metadata (HFS+ for one).
If you can attach metadata to directories, then you write the
verifycap of the immutable dirnode last used for that directory:
on each new backup, you figure out the new dirnode contents,
hash them into the CHK key, hash *that* and compare it against
the verifycap, if they match then boom now you have the dirnode
readcap for going up to the parent, if they don't match then you
must upload the new version of that dirnode. This avoids keeping
the old dircap cleartext around. The only remaining security
issue is that you'd be keeping the individual filecaps around
for old versions, until the next "tahoe backup" process came
along and replaced them, but this is a much smaller exposure
than the dirnodes. It would leak the following information: if
an attacker gets a copy of your disk at time T=2, they might be
able to learn the contents of modified-but-not-deleted files
that we previously backed up at time T=1.)
It's probably ok for the "tahoe backup" process to upload files
and create directories, generating temporary caps which it is
obligated to forget after the top-level append operation. If the
whole backup is created out of immutable objects, the only
mutable slot is the top-most timestamped-version holding
directory, and that's where the append-only operation would be
used.
I'm trying to imagine if it would make sense to add an
"append-only" or "write-only-no-reading" column to the dirnode
table (to provide something like "transitive append-only-ness").
I'm not even sure if that's sane, so I'll put off thinking about
it until later. (if you can't read, is "transitive" even
defined?).
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/796>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list