[tahoe-dev] [tahoe-lafs] #1354: compression (e.g. to efficiently store sparse files)

tahoe-lafs trac at tahoe-lafs.org
Thu Feb 3 10:38:37 PST 2011


#1354: compression (e.g. to efficiently store sparse files)
-------------------------------+--------------------------------------------
     Reporter:  zooko          |       Owner:           
         Type:  enhancement    |      Status:  new      
     Priority:  major          |   Milestone:  undecided
    Component:  code-encoding  |     Version:  1.8.2    
   Resolution:                 |    Keywords:           
Launchpad Bug:                 |  
-------------------------------+--------------------------------------------

Comment (by warner):

 Neat trick!

 No, from what I've seen, sparse files are not very common. The only things
 that come to mind are coredumps and database files, and I suspect that
 most modern (cross-platform compatible) DB files are not sparse anymore.

 It shouldn't be too hard to rig up a tool to test that claim:
 {{{os.walk}}}, use {{{stat}}} to count the number of blocks, compare it
 against {{{st_size/blocksize}}}, if they're too far off you've probably
 got a sparse file.

 The question of compression is an interesting one. To retain our low
 alacrity, we'd want to compress each segment separately, which would then
 result in variable-sized segments, so we'd need a new share layout (with a
 start-of-each-block table). Compressing the whole file would let us
 squeeze it down further, of course, but you can't generally get random-
 access that way. There may be some clever trick wherein we might save a
 copy of the compressor's internal state between segments to allow both
 random access *and* good whole-file compression, but I'd be afraid of the
 complexity/fragility of that approach.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1354#comment:1>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-dev mailing list