#78 new enhancement

Cater to rsync as a target Tahoe client.

Reported by: nejucomo Owned by: somebody
Priority: minor Milestone: undecided
Component: code Version: 0.4.0
Keywords: enterprise backup encoding rsync Cc: tahoe-lafs.org@…
Launchpad Bug:

Description (last modified by lpirl)

Imagine a scenario where a sysadmin of a large enterprise network needs to perform routine backups, and does so by rsyncing from many clients to one large raid storage device.

What if they could replace the single large raid with a vdrive, and run tahoe storage nodes on each workstation, and have all of the client-side rsync automation work without change?

If this use case is as common and the Tahoe replacement as useful as I believe it to be, it would behoove Tahoe to cater to rsync for both publication and retrieval.

One sufficient support feature would be file-system emulation (fuse, WebDav?, ...) which rsync can already use. However, it may also be worthwhile to implement an rsync-specialized interface to Tahoe if the efficiency-gains-to-development-time tradeoff was right.

Change History (7)

comment:1 Changed at 2007-07-25T03:06:50Z by warner

  • Keywords encoding added
  • Priority changed from major to minor

this means being able to efficiently modify files in-place, right? and/or record rsync's coarse hashes in some place so the update code could decide which blocks needed to be modified without actually having to download them all?

To support this, we'd probably need to use something other than CHK.

comment:2 Changed at 2008-03-20T02:48:08Z by zooko

Nowadays, we have two things: Small Decentralized Mutable Files and Immutable Files (which Brian called "CHKs" in the previous message). The former might support rsync okay for sufficiently small files. There is currently a hard limit of 3.5 MB, which ought to be raised, but there will remain a couple of soft limits -- see #359 (eliminate hard limit on size of SDMFs).

comment:3 Changed at 2008-06-01T20:53:01Z by warner

  • Milestone changed from eventually to undecided

comment:4 Changed at 2009-11-23T03:34:17Z by davidsarah

rsync coarse hashes are documented at http://klubkev.org/rsync/ . Note that they're not secure hashes (the algorithm uses MD4 and a variant of adler32), so they must be treated as confidential.

comment:5 Changed at 2009-11-24T06:39:38Z by warner

hm, perhaps a more open-ended file format could have room for features like this in the UEB hash. For example, our current immutable UEB hash is specified to be a dictionary, in which e.g. the ["crypttext_hash"] key contains the flat SHA256d hash of the ciphertext. If the share format (or post-decode pre-decrypt ciphertext format) could also be expanded, we could add a section for ["encrypted_rsync_hashes"], covered by a UEB key named ["encrypted_rsync_hash_root"], ignored by older clients, but available for more advanced clients to use for .. whatever it is an rsync hash would be useful for.

(btw, of course we've discussed elsewhere the security implications of an extensible format and the possible benefits of explicitly *disallowing* extensions like this)

comment:6 Changed at 2009-12-07T02:42:33Z by davidsarah

  • Keywords rsync added

comment:7 Changed at 2015-02-01T12:43:20Z by lpirl

  • Cc tahoe-lafs.org@… added
  • Description modified (diff)
Note: See TracTickets for help on using tickets.