[tahoe-dev] Thinking about building a P2P backup system
Shawn Willden
shawn-tahoe at willden.org
Fri Jan 9 07:15:56 PST 2009
On Friday 09 January 2009 01:00:44 am Drew Perttula wrote:
> In my own case, I can guarantee a frozen filesystem for you to back up
> because my files are under ZFS. Obviously not everyone can make such a
> snapshot, so your algorithm is still valuable.
Right. I use LVM myself, so I can also easily generate a snapshot for backup,
but I'm aiming at the lowest common denominator -- Windows.
> Even worse, I can't share my files by giving people tahoe readcaps! That
> was going to be the big bonus of using tahoe for backup, for me.
That's actually a goal of mine as well. I even want to go one step further
and be able to arrange for specific peers to get k shares so they have fast
local access to my files.
> For discussion purposes, here are some other layouts that store the
> current version intact:
>
> Whole-file history tree:
> current/a/b/c.txt (a link to version2, perhaps?)
> history/a/b/c.txt/version1
> history/a/b/c.txt/version2
That works, but it requires uploading full versions with every change. I
agree that it should be an option. Perhaps it should even be the first
option implemented, although I really want to get to delta-based versionoing.
> Reverse deltas tree:
> current/a/b/c.txt
> history/a/b/c.txt/delta1
> history/a/b/c.txt/delta2
The problem with using reverse deltas is they're even more bandwidth-intensive
than full versions. If you have plaintext on both ends and the stored
version is mutable you can use the rsync algorithm to efficiently patch the
remote version to current and then save the delta so that you can reverse it
to recover the previous version (assuming the delta is reversible).
But if the remote store is encrypted, the delta can't be applied. All you can
do is store it as a forward delta. To get reverse deltas, you have to upload
a full copy of the new version, plus the delta.
That's the primary reason why I suggested maybe Tahoe should optionally allow
not encrypting the files, to facilitate reverse delta versioning. Zooko
doesn't want to go there, and I respect his reasoning.
This is really a tradeoff between three factors: Access time, storage space
and bandwidth.
o Full versioning minimizes access time to all version, at the expense of
storage and bandwidth.
o Reverse deltas minimize access time to the current version and storage, at
the expense of even more bandwidth.
o Forward deltas minimize storage and bandwidth, at the expense of access
time.
Ultimately, I envision using both forward and reverse deltas. Incremental
backups will use forward deltas, but occasionally the client will
reconsolidate the history of a file, making a current full version with a
series of reverse deltas, upon which forward deltas can be incrementally
added.
It just occurred to me that the reconsolidation process I described in my
previous post may be unnecessarily complex. If librsync deltas are
reversible (need to check), then reconsolidation may be as simple as
uploading a new full version, deleting the original full version and then
treating the already-uploaded forward deltas as reverse deltas.
Shawn.
More information about the tahoe-dev
mailing list