[tahoe-lafs-trac-stream] [tahoe-lafs] #1354: compression (e.g. to efficiently store sparse files)
tahoe-lafs
trac at tahoe-lafs.org
Thu Feb 3 21:52:51 PST 2011
#1354: compression (e.g. to efficiently store sparse files)
-------------------------------+--------------------------------------------
Reporter: zooko | Owner:
Type: enhancement | Status: new
Priority: major | Milestone: undecided
Component: code-encoding | Version: 1.8.2
Resolution: | Keywords:
Launchpad Bug: |
-------------------------------+--------------------------------------------
Comment (by warner):
Hm. You could compress data a chunk at a time, watching the output size
until it grew above some min-segment-size threshold, then flush the
compression stream and declare end-of-segment. Then start again with the
remaining data, repeat. Now you've got a list of compressed segments and a
table with the amount of plaintext that went into each one. You encrypt
the table with the readcap and store it in each share. You also store
(unencrypted) the table of ciphertext segment sizes. (unlike the
uncompressed case, the plaintext-segment-size table will differ
significantly from the ciphertext-segment-size table).
Alacrity would rise: you'd have to download the whole encrypted-segment-
size table (which is O(filesize), although the multiplier is very small,
something like 8 bytes per segment). There's probably a clever O(log(N))
scheme lurking in there somewhere, but I expect it'd involve adding
roundtrips (you store multiple layers of offset tables: first you fetch
the coarse one that tells you which parts of the next-finer-grained table
you need to fetch, then the last table you fetch has actual segment
offsets).
This scheme requires a compression library that either avoids deep
pipelining or is willing to tell you how much more compressed output would
be emitted if you did a flush() right now. I don't think most libraries
have this property. You declare a segment to be finished as soon as you've
emitted say 1MB of compressed data, then you tell the compressor to flush
the pipeline and add whatever else it gives you to the segment. The
concern is that you could wind up with a segment significantly greater
than 1MB if the pipeline is deep.
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1354#comment:3>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list