#359 closed enhancement (duplicate)

eliminate hard limit on size of SDMFs

Reported by: zooko Owned by:
Priority: major Milestone: 1.5.0
Component: code-mutable Version: 0.9.0
Keywords: memory Cc:
Launchpad Bug:

Description (last modified by warner)

We currently impose a hard limit on SDMFs of 3.5 MB. (It was recently raised from the initial value of 1 MB in order to support directories with up to 10,000 entries.)

We could remove this artificial limit entirely. There would remain "soft limits":

  1. Creating or updating an SDMF would take approximately 1+N/K * filesize RAM.
  1. It would take approximately N/K * filesize upload bandwidth to change even just one byte of the file. (if/when we implement a mutable upload helper, the client-to-helper bandwidth will be equal to the filesize).

Change History (6)

comment:1 Changed at 2008-04-23T18:27:30Z by warner

FYI, we don't have a mutable-file upload helper yet.

comment:2 Changed at 2008-04-24T23:46:34Z by warner

  • Component changed from code-encoding to code-mutable

comment:3 Changed at 2009-12-13T05:15:58Z by davidsarah

  • Keywords memory added

What's the limit on an immutable file?

comment:4 Changed at 2009-12-26T01:41:35Z by zooko

It was ticket #346 to raise it to an extremely high limit. The currently limit is that there is a 64-bit unsigned field which holds the offset in bytes of the next data element that comes after the share contents on the storage server's disk. See the implementation and the in-line docs in src/allmydata/immutable/layout.py@3864.

This means that each individual share is limited to a few bytes less than 2^64^. Therefore the overall file is limited to k*2^64^. There might be some other limitation that I've forgotten about, but we haven't encountered it in practice, where people have many times uploaded files in excess of 12 GiB.

comment:5 Changed at 2009-12-26T03:52:24Z by warner

  • Description modified (diff)
  • Milestone changed from eventually to 1.5.0
  • Resolution set to duplicate
  • Status changed from new to closed

Note that zooko's recent comments are about immutable files and their shares, whereas this ticket is about mutable files and shares, which use a different layout. However the same general statements are true. Mutable files were designed after we had some experience with immutable files, but before I learned to always use 64-bit fields for everything. They've used somewhat larger offset fields since day 1, which are big enough to accomodate very large shares. The layout is described in source:src/allmydata/mutable/layout.py .

To be precise, they use 32-bit fields to hold the offsets of the signature, share_hash_chain, block_hash_tree, and share_data, then use a 64-bit field to hold the offset of the enc_privkey and EOF. So they can tolerate 264 bit share_data sections, which is where the bulk of the share's data lives. The block_hash_tree section is smaller than the share_data section, but still scales linearly with filesize. Because of the 32-bit field for offset[share_data], it must be somewhat shorter than 232 bytes, limiting it to 227 hashes, so 226 segments, which at our default 128KiB (217) segsize means 243 bytes, which is the limiting factor. By raising the segsize to e.g. 4MB (222) this limit grows to 248 bytes.

So, SDMF mutable files are limited by the share format to k*243 bytes, or about 24TiB. Until we implement MDMF and can process mutable files one segment at a time (instead of holding the whole file in RAM), we'll be soft-limited by available memory, so practically speaking the limit is a couple of GB.

If we stick with the same share format for MDMF (which was our goal: old clients should be able to keep using their SDMF code to read MDMF-generated files, unless we really do need a separate salt for each segment: #393), then MDMF files will be limited to k*243 bytes with a RAM footprint of about x*128KiB (where "x" is probably 2 or 3). An uploader-side max_segsize configuration change can scale those two values together up to a filesize limit of k*264 bytes and a RAM footprint of x*256GiB.

If we *do* change the share format for MDMF, then we should of course use 64-bit fields everywhere and remove this 243 limit.

Finally, it turns out that this ticket is actually a dupe of #694, which was closed when we removed the hard limit on SDMF files in db939750a8831c1e back in June 2009. I'd initially imposed the arbitrary 3.5MB limit to discourage people from using the (inefficient, memory-hungry) SDMF format in ways that would disappoint their hopes for high-performance behavior, but I was talked out of this and Kevan implemented the fix, which was first released in 1.5.0 .

comment:6 Changed at 2009-12-27T21:09:28Z by zooko

For the record, my comment:4 was about immutable files because David-Sarah asked about them in comment:3. :-) Thanks for the description of the mutable file size limits.

Note: See TracTickets for help on using tickets.