[tahoe-lafs-trac-stream] [tahoe-lafs] #994: support precompressed files

tahoe-lafs trac at tahoe-lafs.org
Mon Oct 10 18:46:01 PDT 2011


#994: support precompressed files
-------------------------+-------------------------------------------------
     Reporter:           |      Owner:  somebody
  davidsarah             |     Status:  new
         Type:           |  Milestone:  undecided
  enhancement            |    Version:  1.6.0
     Priority:  major    |   Keywords:  compression space-efficiency
    Component:  code     |  performance bandwidth security integrity
   Resolution:           |  backward-compatibility
Launchpad Bug:           |
-------------------------+-------------------------------------------------
Changes (by davidsarah):

 * keywords:  compress performance integrity backward-compatibility =>
     compression space-efficiency performance bandwidth security integrity
     backward-compatibility


Comment:

 Replying to [comment:3 jsgf]:
 > Replying to [comment:2 davidsarah]:
 > > Actually, we want old clients to fail to download these files (rather
 than to misinterpret the compressed data as uncompressed).
 >
 > That seems like a pretty big semantic change for Tahoe.  Thus far it is
 more or less a transparent container for arrays of bytes, with a bit of
 advisory metadata sprinkled on top. Changing that so that some byte arrays
 have an innate property which prevents some clients from being able to
 download them is a big change.

 The effect of making the file data (as an uncompressed sequence of bytes)
 dependent on metadata that is detached from the file URI, would be an even
 bigger semantic change. The file URI has to unambiguously determine the
 file data.

 One way of achieving that would be to put the bit that determines whether
 a file has been stored compressed in the URI, for example
 "{{{UCHK:gz:...}}}" could be the gzip-decompressed version of
 "{{{CHK:...}}}".

 > As I mention in ticket:992#comment:3, the same bits can be represented
 as either "foo.txt" "text/plain" "encoding: gzip" or "foo.txt.gz"
 "application/gzip". The former could be misinterpreted by an old client
 which fails to pay attention to content-encoding.
 >
 > But I don't think this is a huge problem; I suspect most webapi clients
 are already using a general-purpose HTTP library, which will already have
 to deal with content encoding.

 We can't send {{{Content-Encoding: gzip}}} if the client hasn't sent an
 {{{Accept-Encoding}}} that includes {{{gzip}}}; that would obviously be
 incorrect and not compliant to RFC 2616. We can't do much about clients
 that are sometimes unable to correctly decompress encodings that they
 advertise they accept, such as
 [http://schroepl.net/projekte/mod_gzip/browser.htm Netscape 4.x] (well, we
 could blacklist such clients by {{{User-Agent}}}, but yuck).

 > Given that the widespread convention is that content type and encoding
 are stored (to some extent) in the filename itself as extensions, making
 these properties more fully expanded in the directory entries has an
 internal consistency.

 There's no usable consistency in file extensions.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/994#comment:4>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list