#85 closed defect (fixed)

store shares in single files, instead of 7 files and a directory

Reported by: warner Owned by: warner
Priority: major Milestone: 0.5.0
Component: code Version: 0.4.0
Keywords: storage Cc:
Launchpad Bug:


At the moment we have significant filesystem overhead in our share storage, mostly because common filesystems like ext3 consume a whole disk block (4096 bytes) even for 1-byte files. Since we use 7 files in a separate directory for each share, that means 8*4096=32768 bytes consumed per share, even for 1-byte shares. As a result, the lower bound on storage consumed occurs at 102400 bytes (i.e. all files 102400 bytes or smaller consume the same amount of storage), at which point the storage consumed is 3276800 (3.3MB).

Our storage format was chosen for simplicity and ease of implementation, but this represents a huge overhead. So the plan is to combine all 7 files into a single one, and to not put it in its own directory. That will reduce the minimum share size to one disk block (4096) instead of 8 (32768), and will bring the lower bound on storage to a filesize of 81250, at which point the storage consumed will be 409600 (410kB), an 8x improvement.

Reducing this filesystem-blocksize overhead below that would involve packing multiple shares (for different URIs) into a single file, which complicates the deletion and indexing of them. It might be useful, but hopefully we can avoid this step.

Also, we need to figure out a good place to put leases, once we implement them, but they can probably live in a separate database with different packing and access requirements.

Change History (2)

comment:1 Changed at 2007-07-13T03:08:52Z by warner

  • Owner changed from somebody to warner
  • Status changed from new to assigned

I'm actively working on this one right now. The basics are in place, but the interfaces between the new bucket-proxies and the rest of the system are not working yet. I'm hoping to finish it tomorrow.

comment:2 Changed at 2007-07-14T00:20:39Z by warner

  • Resolution set to fixed
  • Status changed from assigned to closed

Done, in changesets cd8648d39b897684, 1f8e407d9cda19ed, 7589a8ee82eb6531, 35117d77a0bb2177, and 4d868e6649c2c5d8. The new format increases the actual overhead slightly, the layout described in storageserver.py:WriteBucketProxy shows that we add about 36 bytes to allow the share to be self-describing. But the overhead caused by 4kB disk blocks is reduced by 8x (for small files).

Note: See TracTickets for help on using tickets.