#1029 new enhancement

download a subtree as an archive

Reported by: nejucomo Owned by:
Priority: major Milestone: undecided
Component: code-frontend-web Version: 1.6.1
Keywords: usability docs test performance unicode i18n Cc: tahoe-lafs.org@…
Launchpad Bug:

Description (last modified by lpirl)

For some use cases it may be useful to retrieve an entire directory tree as an archive. Perhaps the wapi call would look like:

GET /uri/$DIRCAP?t=archive&archive_format=tgz

-to retrieve a gzipped tarball.

Issues:

Should the action parameter be t= or some other name such as output= ?

How will the browser name this file?

What if the directory structure contains loops?

What if the full directory tree is huge?

Change History (5)

comment:1 Changed at 2010-05-01T23:46:15Z by davidsarah

  • Keywords usability docs test added
  • Priority changed from minor to major
  • Summary changed from Download a dircap as an archive. to download a subtree as an archive

My suggested answers:

The action parameter should be t=, because:

  • different GET ...?t= actions retrieve different kinds of information about the referenced object(s), which is the case here;
  • this better fits the existing structure of the webapi code, which dispatches on t= first.

The format parameter name doesn't need to be as long as archive_format, it could just be format or output.

The filename should be the last component of the path to the directory if given, otherwise the short base32 SI of the directory. The filetype should be given by the format parameter. It should be possible to override the filename+type using @@named.

Loops should cause an error. Since the response may already have been started when the loop is detected, this can't be an HTTP error response -- see #822 for possible ways of dealing with that. The gateway will have to remember the SIs of already-seen directories in order to detect loops. (In theory it should be sufficient to remember only mutable directories. We should already be doing that for recursive operations, but I'm not sure we are.)

The directory tree potentially being huge does not present any opportunities for malicious DoS that aren't already present. To avoid these, don't share a gateway with potential DoS-attackers. It does increase the risk of accidental DoS. OTOH, the client can always abort the HTTP request.

comment:2 Changed at 2010-05-02T02:48:15Z by davidsarah

  • Keywords performance added

#1030 is a CLI interface to this functionality.

Reasons to implement this ticket as a webapi operation rather than directly in the CLI:

  • it requires only one request to the gateway rather than many requests;
  • it allows the gateway to make storage protocol requests in parallel (the CLI could retrieve files in parallel, but all the existing CLI commands are implemented synchronously using a single thread);
  • if #204 ("virtual CDs") were implemented, this would be a more efficient way of obtaining all or part of the contents of a CD;
  • providing this function in the webapi also allows the WUI and any future JavaScript UI to use it.

comment:3 follow-up: Changed at 2010-05-02T18:07:15Z by nejucomo

On directory loops: Some formats, such as tar, allow symlinks. Would it be possible to translate directory loops into symlinks appropriately?

comment:4 in reply to: ↑ 3 Changed at 2010-05-02T19:45:03Z by davidsarah

  • Keywords unicode i18n added

Replying to nejucomo:

On directory loops: Some formats, such as tar, allow symlinks. Would it be possible to translate directory loops into symlinks appropriately?

Yes, for those formats.

Python has built-in zipfile and tarfile modules to create .zip and .tar[.gz,.bz2] archives. The tarfile module appears to support writing an archive with symlinks (using a TarInfo object with .type = SYMTYPE and .linkname set).

Another issue is the character encoding of file paths. For .zip files there is a bit in the local file header of each file that indicates the encoding is UTF-8 (see Appendix D of the zip format spec), although only a few recently updated zip extractors will recognize this; others will misinterpret the path as Cp437. For .tar files, the PAX format always stores paths as UTF-8. PAX might not be supported by as many extractors as the GNU tar format, although it should be fairly widely supported now.

comment:5 Changed at 2016-01-27T04:38:51Z by lpirl

  • Cc tahoe-lafs.org@… added
  • Description modified (diff)
Note: See TracTickets for help on using tickets.