[tahoe-dev] Is it useful to truncate mutable files?

Brian Warner warner-tahoe at allmydata.com
Wed Jun 17 00:19:25 PDT 2009


On Wed, 17 Jun 2009 00:17:15 -0600
Shawn Willden <shawn-tahoe at willden.org> wrote:

> However, it seems to me that it is possible to reduce the waste 
> by "truncating" mutable files.  By that I mean overwrite them with a 
> zero-length content.  There will still be some space consumed by the
> truncatd file, but it should be less -- ideally about size * n / k
> bytes less.
> 
> Is there any reason this is less effective than it may appear?  Or
> why it might potentially be a bad idea?

None that I can think of. ext3 (and probably everything except reiserfs4)
tends to consume at least one disk block per file, plus Tahoe stores shares
in a per-storage-index directory, so it's likely that the minimum consumption
will be about 2*4KiB*N per file (maybe 2*2KiB*N, depends upon the partition
size). And, we tend to only use mutable files for directories, which tend to
be relatively small, so it might not win you enough to be worth the effort.
But if you've created a lot of large mutable files and then feel like
deleting them, then yeah, truncating them first will save you backend space.

Incidentally, the storage server protocol has a provision for letting clients
explicitly delete mutable-file shares: setting the share size to 0 tells the
server it should just delete the share. I added this to let a mutable-repair
operation get rid of superfluous shares.

(Note that this is not the same as setting the mutable file's contents to the
empty string: even an empty file gets encoded into non-zero-sized shares,
since there are hashes and keys and stuff. It's only when the server-side
container is set to zero-length that the share gets deleted.)

cheers,
 -Brian


More information about the tahoe-dev mailing list