[tahoe-dev] maximum file sizes: 2GiB with helper, 12GiB without

Brian Warner warner-tahoe at allmydata.com
Tue Mar 11 01:41:30 PDT 2008


Peter asked me what what the largest file that Tahoe could upload. I
flippantly replied "limited only by memory", but then I decided to find a
better answer.

When using an Upload Helper, we limit the file size to 2GiB, or 2**31-1, or
2147483647 bytes. This stems from the casual use of an 'int' constraint in
RIEncryptedUploadable.get_size(), which foolscap uses to mean a number that
can fit in a signed 32-bit quantity. In addition, the file being uploaded
must completely fit on the Helper's disk, and we only have 25GB available on
the prodnet helper's partition. Some minor shuffling would push this to about
950GB.

The storage server imposes a 4GiB limit on the size of any given share (since
it uses a four-byte field to store this length). At 3-of-10 encoding, this is
roughly a 12GiB limit on the file size (about 12.88GB). The immutable share
itself has a data size field, but that is 8 bytes long. The data size is also
stored in the URI, but in a printable and variable-length representation that
doesn't have a significant size limit.

The hash trees that we use to validate the contents of immutable files
require filesize/segsize * 2 * 32 bytes each, and they use 4-byte offsets. So
this imposes a limit of 67M segments, which even for the new+smaller
128KiB-sized segments is only an 8.8TB limit.

The RIBucketWriter.write protocol also uses 'int' for the offset field, which
limits the addressable positions within a share to 2GiB. This translates into
a limit of about 6GiB on the size of the file.

On mutable files, our current 'SMDF' implementation imposes a one-segment
limit on the file size (since we haven't written any code to handle multiple
segments yet). We recently raised the maximum segment size for mutable files
to 3.5MB specifically to enable the use of dirnodes with 10k children. So
3.5MB is currently the upper limit on SDMF mutable files.

Once we implement 'MDMF' files, the limiting factor will be the 8-byte fields
we use inside the storage format for sizes and offsets of all share data.
This will impose a limit of about 18 exabytes per file (16EiB, or 18.44e18).
However, the RIStorageServer.slot_readv/slot_testv_and_readv_and_writev
protocol also uses 'int' as the constraint for offsets, which will limit the
share sizes to 2GiB, and the overall mutable file size to about 6GiB.

So, one lesson is: foolscap's interpretation of 'int' as 2GiB imposes
surprising constraints on file size. Simply replacing this with an
IntegerConstraint(8) would raise the share-size limits to 16EiB and the
overall file size limit to 'k' times that (about 55e18).

I feel a bit embarassed by this, as I was assuming that we had effectively
unlimited file sizes, and I certainly wasn't expecting foolscap (or rather
our use of it) to be the one that imposed a limit. I plan to fix this soon.
The storage server's share format (with the 4-byte share size) is a tougher
limit to raise, since it requires a new version number for the sharefiles
(and backwards-compatibility code). We're getting better at avoiding these
sorts of limitations, but it's obvious that the immutable sharefile format
was one of the first that we committed to code.

On the other hand, the smallest limit we have right now (the helper protocol,
which imposes a 2GiB limit on total file size) represents about 12
hours of full-speed upload on a really good DSL line for a single file.
Do people do this sort of thing a lot? I don't think I have any 2GB
files lying around. I regularly push 2GB of data to a machine in colo,
but it comes in the form of several hundred 10MB photos, rather than a
single giant file, and tahoe is excellent at 10MB files.

cheers,
 -Brian


More information about the tahoe-dev mailing list