[tahoe-dev] Thinking about building a P2P backup system

Fri Jan 9 22:00:46 PST 2009

On Friday 09 January 2009 01:33:39 pm Brian Warner wrote:
> The 'revlog' format that I'd like to use for our eventual "LDMF" (Large
> Distributed Mutable Files) design uses forward deltas and an append-only
> data structure, but writes out a full copy of the file whenever the amount
> of data you'd need to read to construct the latest version gets bigger than
> twice the size of the file. I think the Mercurial folks picked this 2x
> thing as a reasonable tradeoff between speed and storage.

Is there infrastructure in place for append-only files?  That would work much 
better for incremental backups than storing the forward deltas as separate 
files.  Actually, the LDMF design you're talking about would make handling 
this in the backup server unnecessary.  What's the status of that?  Does it 
makes sense to get that working first?

One suggestion:  If you compute an rsync-style rolling checksum on the client 
when you upload the original file, and then again with each delta, you won't 
need to download the whole file to compute the next forward delta.  All 
you'll have to download is the rolling checksum for the most recently 
uploaded version (full or delta), which is about 1% of the size of the full 
file.

> For our mutable files, this also gives us versioning: we create a read-cap
> that contains both a pointer to the file overall (i.e. the revlog) and a
> secure identifier for the particular revision that we care about. The
> question of how far back to keep the deltas becomes a matter of retention
> policy and GC, with the obvious limitation that the oldest thing in the
> revlog should be a full copy.

It may also be worth thinking about some support for reverse deltas.  If you 
add a little more work whenever you upload a new full copy, then you can 
reduce storage usage by avoiding the retention of multiple full copies.

Download the last full version plus all of the forward deltas and then as you 
go along patching the full version to bring it up to date, generate a trail 
of reverse deltas.  Then upload the new full copy as the head of the revlog 
and the series of reverse deltas as the older versions.  New versions become  
forward deltas.

GC of versions older than the last full copy is a simple matter of removing 
them from the revlog.  Of course, that breaks the "append only" semantics you 
mentioned.

> The format I've been sketching out would allow for efficient insertions and
> deletions of arbitrary byte spans, which would be pretty wild. The standard
> POSIX file io API doesn't give any way to express this, but an application
> that's speaking directly to Tahoe could take advantage of it. Imagine being
> able to insert a byte in the beginning of your file in constant time.

Cool.

> The limitation is actually a combination of encryption and erasure-coding
> (which, unless you're using the degenerate k=1 replication case, involves
> splitting the data up among multiple servers). Both make it hard or
> impossible to apply rsync-like algorithms on the server side.

Makes sense.  I don't know enough about erasure codes to know if 
they "randomize" the data in the pieces to make patching ineffective.  It 
doesn't surprise me that this is the case, though.

> > If librsync deltas are reversible (need to check)
>
> Incidentally, for deltas to be reversible, they have to contain roughly
> twice the information as an irreversible delta.

That also makes sense.  Incidentally, that doubling may be worthwhile for 
LMDF.  If you decide on an approach that does require downloading and 
reassembling version n-1 in order to calculate the delta from n-1 to n, you 
might also want to calculate the reverse delta and upload that as well.  That 
would enable you to drop an old full copy at will, since you'd have the 
reverse deltas in place to regenerate it at need.

	Shawn.