Opened at 2009-03-10T20:41:01Z
Last modified at 2015-04-17T22:54:51Z
#658 new enhancement
"tahoe cp" should avoid full upload/download when the destination already exists (using backupdb and/or plaintext hashes)
Reported by: | warner | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | undecided |
Component: | code-frontend-cli | Version: | 1.3.0 |
Keywords: | backupdb tahoe-cp usability newcaps performance | Cc: | tahoe-lafs.org@… |
Launchpad Bug: |
Description (last modified by lpirl)
Now that the backupdb seems to be working well for "tahoe backup", it's time to extend "tahoe cp" to use it too.
In the upload direction (tahoe cp LOCAL REMOTE), the backupdb should be used to let us skip a new upload of a file that's already been uploaded. The goal is to allow periodic "tahoe cp LOCAL REMOTE" (with fixed values of LOCAL and REMOTE) to do as little work as possible.
In the download direction (tahoe cp REMOTE LOCAL), the backupdb should also be used, to let us skip a download of a file that's already been downloaded. When a Tahoe file is downloaded and written to local disk, a path+timestamps-to-URI entry should be added to the db. Before downloading a file to local disk, the disk should be checked for an existing file with the same timestamps: if present, and if the URI matches the URI that was going to be downloaded, the download should be skipped.
Change History (7)
comment:1 Changed at 2009-12-07T02:49:43Z by davidsarah
- Keywords cp usability added
comment:2 Changed at 2009-12-07T03:08:46Z by davidsarah
- Keywords newcaps added
Plaintext hashes would be a more robust way of doing this than URI+timestamp (but dependent on #453).
IOW, for downloading a file:
- if the source cap is to an immutable file, the read cap might be sufficient to verify that the existing copy has the same plaintext hash.
- if the source cap is to a mutable file, cp would need to go to the servers to find the concensus value for the plaintext hash of the current version. Then it would proceed as for an immutable file.
If the existing file is the correct one, it should still be touched to update its mtime.
For uploading a file, if there is an existing copy then you would have to verify it.
The storage server protocol and webapi would need to be able to return a hash of the file first. (See http://www.usenix.org/events/nsdi04/tech/full_papers/mogul/mogul.pdf for a similar protocol with some relevant discussion of design issues.)
comment:3 Changed at 2009-12-07T03:14:06Z by davidsarah
- Keywords performance added
comment:4 Changed at 2009-12-07T03:22:42Z by davidsarah
- Summary changed from "tahoe cp" should use backupdb, in both directions to "tahoe cp" should avoid full upload/download when the destination already exists (using backupdb and/or plaintext hashes)
comment:5 Changed at 2010-02-12T05:03:27Z by davidsarah
- Keywords tahoe-cp added; cp removed
comment:6 Changed at 2015-04-17T18:48:05Z by lpirl
- Cc tahoe-lafs.org@… added
- Description modified (diff)
comment:7 Changed at 2015-04-17T22:54:51Z by daira
This may interact with the planned magic folder db (see docs/proposed/magic-folder/filesystem-integration.rst).
I think this should be gated by an option that is not the default (or else make it the default for a new command called something other than cp). Otherwise, if anything goes wrong then it won't be obvious that the backupdb could be at fault; users are likely consider tahoe cp to be a lower-level operation that copies files unconditionally, like Unix cp does.