[tahoe-dev] Thinking about building a P2P backup system

Shawn Willden shawn-tahoe at willden.org
Fri Jan 9 07:15:56 PST 2009


On Friday 09 January 2009 01:00:44 am Drew Perttula wrote:
> In my own case, I can guarantee a frozen filesystem for you to back up
> because my files are under ZFS. Obviously not everyone can make such a
> snapshot, so your algorithm is still valuable.

Right.  I use LVM myself, so I can also easily generate a snapshot for backup, 
but I'm aiming at the lowest common denominator -- Windows.

> Even worse, I can't share my files by giving people tahoe readcaps! That
> was going to be the big bonus of using tahoe for backup, for me.

That's actually a goal of mine as well.  I even want to go one step further 
and be able to arrange for specific peers to get k shares so they have fast 
local access to my files.

> For discussion purposes, here are some other layouts that store the
> current version intact:
>
> Whole-file history tree:
>   current/a/b/c.txt   (a link to version2, perhaps?)
>   history/a/b/c.txt/version1
>   history/a/b/c.txt/version2

That works, but it requires uploading full versions with every change.  I 
agree that it should be an option.  Perhaps it should even be the first 
option implemented, although I really want to get to delta-based versionoing.

> Reverse deltas tree:
>   current/a/b/c.txt
>   history/a/b/c.txt/delta1
>   history/a/b/c.txt/delta2

The problem with using reverse deltas is they're even more bandwidth-intensive 
than full versions.  If you have plaintext on both ends and the stored 
version is mutable you can use the rsync algorithm to efficiently patch the 
remote version to current and then save the delta so that you can reverse it 
to recover the previous version (assuming the delta is reversible).

But if the remote store is encrypted, the delta can't be applied.  All you can 
do is store it as a forward delta.  To get reverse deltas, you have to upload 
a full copy of the new version, plus the delta.

That's the primary reason why I suggested maybe Tahoe should optionally allow 
not encrypting the files, to facilitate reverse delta versioning.  Zooko 
doesn't want to go there, and I respect his reasoning.

This is really a tradeoff between three factors:  Access time, storage space 
and bandwidth.

o	Full versioning minimizes access time to all version, at the expense of 
storage and bandwidth.
o	Reverse deltas minimize access time to the current version and storage, at 
the expense of even more bandwidth.
o	Forward deltas minimize storage and bandwidth, at the expense of access 
time.

Ultimately, I envision using both forward and reverse deltas.  Incremental 
backups will use forward deltas, but occasionally the client will 
reconsolidate the history of a file, making a current full version with a 
series of reverse deltas, upon which forward deltas can be incrementally 
added.

It just occurred to me that the reconsolidation process I described in my 
previous post may be unnecessarily complex.  If librsync deltas are 
reversible (need to check), then reconsolidation may be as simple as 
uploading a new full version, deleting the original full version and then 
treating the already-uploaded forward deltas as reverse deltas.

	Shawn.


More information about the tahoe-dev mailing list