[tahoe-lafs-trac-stream] [tahoe-lafs] #994: support precompressed files
tahoe-lafs
trac at tahoe-lafs.org
Mon Oct 10 18:46:01 PDT 2011
#994: support precompressed files
-------------------------+-------------------------------------------------
Reporter: | Owner: somebody
davidsarah | Status: new
Type: | Milestone: undecided
enhancement | Version: 1.6.0
Priority: major | Keywords: compression space-efficiency
Component: code | performance bandwidth security integrity
Resolution: | backward-compatibility
Launchpad Bug: |
-------------------------+-------------------------------------------------
Changes (by davidsarah):
* keywords: compress performance integrity backward-compatibility =>
compression space-efficiency performance bandwidth security integrity
backward-compatibility
Comment:
Replying to [comment:3 jsgf]:
> Replying to [comment:2 davidsarah]:
> > Actually, we want old clients to fail to download these files (rather
than to misinterpret the compressed data as uncompressed).
>
> That seems like a pretty big semantic change for Tahoe. Thus far it is
more or less a transparent container for arrays of bytes, with a bit of
advisory metadata sprinkled on top. Changing that so that some byte arrays
have an innate property which prevents some clients from being able to
download them is a big change.
The effect of making the file data (as an uncompressed sequence of bytes)
dependent on metadata that is detached from the file URI, would be an even
bigger semantic change. The file URI has to unambiguously determine the
file data.
One way of achieving that would be to put the bit that determines whether
a file has been stored compressed in the URI, for example
"{{{UCHK:gz:...}}}" could be the gzip-decompressed version of
"{{{CHK:...}}}".
> As I mention in ticket:992#comment:3, the same bits can be represented
as either "foo.txt" "text/plain" "encoding: gzip" or "foo.txt.gz"
"application/gzip". The former could be misinterpreted by an old client
which fails to pay attention to content-encoding.
>
> But I don't think this is a huge problem; I suspect most webapi clients
are already using a general-purpose HTTP library, which will already have
to deal with content encoding.
We can't send {{{Content-Encoding: gzip}}} if the client hasn't sent an
{{{Accept-Encoding}}} that includes {{{gzip}}}; that would obviously be
incorrect and not compliant to RFC 2616. We can't do much about clients
that are sometimes unable to correctly decompress encodings that they
advertise they accept, such as
[http://schroepl.net/projekte/mod_gzip/browser.htm Netscape 4.x] (well, we
could blacklist such clients by {{{User-Agent}}}, but yuck).
> Given that the widespread convention is that content type and encoding
are stored (to some extent) in the filename itself as extensions, making
these properties more fully expanded in the directory entries has an
internal consistency.
There's no usable consistency in file extensions.
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/994#comment:4>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list