#796 new enhancement

write-only backup caps — at Initial Version

Reported by: warner Owned by:
Priority: major Milestone: undecided
Component: code-mutable Version: 1.5.0
Keywords: newcaps tahoe-backup research Cc: tahoe-lafs.org@…
Launchpad Bug:

Description

David-Sarah Hopwood points out an even more interesting direction to take in a recent tahoe-dev posting:

http://allmydata.org/pipermail/tahoe-dev/2009-August/002653.html

The goal is to have one cap (used frequently and stored online) to do write-only backups, and a different cap (used only for recovery and stored offline) to perform the reads. The effect would be close to that of the Mac OS-X shared public "Drop Box" folder, or of GPG-encrypting a piece of data to a private key that is held offline: normally a one-way operation, but when you need to, you open up the vault and pull out the decryption key.

This would be pretty cool. This ticket is to sketch out what the crypto layout would look like. #795 (append-only files) will be a starting point, and there will certainly be an asymmetric encryption/decryption keypair involved.

From the UI point of view, you'd have some sort of magic append-only no-reading directory cap, which you keep in your private/alises table. There would be a corresponding read-everything cap (or maybe just the full-fledged writecap; these could be stored separately), which you keep in a vault and only type in to test the system and to recover data. Then you type "tahoe backup ~ backup-appendonlycap:", and you expect that this unreadable "backup-appendonlycap:" object will acquire another child, with a timestamp name that is hopefully (but not guaranteedly) unique.

You might also like the unchanged-directory-sharing properties of "tahoe backup" to keep working, so that you don't spend a lot of time or disk on things that haven't changed. I don't know if it's possible to accomplish this without recording some information which would violate the no-reading properties of the parent. This would probably be easier to pull off if we have immutable directories (#607). I suspect that you'll still have to read and hash your whole disk, and generate the CHK identifiers, and then discover that they're already uploaded. So you might save the storage space and the upload bandwidth, but not the local disk IO.

(hm, so the current backupdb would record the uploaded filecaps, which starts to violate the goals once the original file gets deleted and the backupdb doesn't also delete the stored filecap. But if your local filesystem allows you to attach metadata to the files you're backing up, then just attach the tahoe filecap and a ctime/mtime/filesize snapshot to the original file, so the filecap dies with the file. The backup process would look for this metadata, compare the ctime/mtime/size snapshot to decide if the cached filecap is stale, then upload or not. This would be pretty slick, actually, and I think several modern filesystems let you attach this sort of metadata (HFS+ for one). If you can attach metadata to directories, then you write the verifycap of the immutable dirnode last used for that directory: on each new backup, you figure out the new dirnode contents, hash them into the CHK key, hash *that* and compare it against the verifycap, if they match then boom now you have the dirnode readcap for going up to the parent, if they don't match then you must upload the new version of that dirnode. This avoids keeping the old dircap cleartext around. The only remaining security issue is that you'd be keeping the individual filecaps around for old versions, until the next "tahoe backup" process came along and replaced them, but this is a much smaller exposure than the dirnodes. It would leak the following information: if an attacker gets a copy of your disk at time T=2, they might be able to learn the contents of modified-but-not-deleted files that we previously backed up at time T=1.)

It's probably ok for the "tahoe backup" process to upload files and create directories, generating temporary caps which it is obligated to forget after the top-level append operation. If the whole backup is created out of immutable objects, the only mutable slot is the top-most timestamped-version holding directory, and that's where the append-only operation would be used.

I'm trying to imagine if it would make sense to add an "append-only" or "write-only-no-reading" column to the dirnode table (to provide something like "transitive append-only-ness"). I'm not even sure if that's sane, so I'll put off thinking about it until later. (if you can't read, is "transitive" even defined?).

Change History (0)

Note: See TracTickets for help on using tickets.