[tahoe-dev] File size problem

Brian Warner warner-tahoe at allmydata.com
Fri Nov 28 14:09:02 PST 2008


> As the next step, I tried to upload a file (32472771 bytes) through
> the web frontend, which resulted in two issues:

Yeah, as Zooko pointed out, the Tahoe default is to upload everything
as an immutable file, which can be as large as 12GB (and we're a few
code changes away from raising that limit into the exabyte range).

Mutable files exist mainly to support directories, so we haven't yet
finished the coding necessary to support large mutable files. The
current limit of 3.5MB is somewhat arbitrary, but it accomplishes a
couple of useful goals (reasonable alacrity, easy enough to implement
quickly). Some day we'll have larger mutable files, but since 3.5MB is
enough for a directory with tens of thousands of entries, it hasn't
been a high priority so far.

>   * When is `ps axflwww'ed the process' memory usage, I saw that the
>     python instance that ran the connection where I uploaded the 31MB
>     file grew beyond 300MB of VSZ.

That sounds like a bug in the code that's rejecting the too-large
mutable file. Which version were you running? (1.2.0 or current
trunk?). If it was current trunk, I'll look more closely at the
problem. We've had runaway processes happen before when some bit of
error-handling code got confused inside a loop. In fact, I think I
remember a ticket about this, so I suspect it's been fixed in trunk.

>     I better keep my brain away from
>     thinking about uploading a gigabyte sized file on a 32bit
>     system...

Oh, GB-sized *immutable* files work just fine. In fact we have over 800
files 1GB or larger on our production network right now, and 87 files
in the 3GB-10GB range. It took their owners a long time to upload them,
but as far as we can tell, the uploads succeeded.

> And during create-client and create-introducer, an empty directory is
> required. It would be nice to ignore "lost+found" while looking up
> directory contents...

Well, the idea is that 'tahoe create-client' creates a new directory
for you. That way we can be sure that the directory will be empty. The
Tahoe process wants to own its base directory: sometimes it will delete
things inside it, and various files inside that directory will control
the Tahoe node's configuration. By using a brand-new directory, we
don't have to worry about 1) accidentally deleting some existing file
that the user cares about, and 2) how some pre-existing file might
affect the node's behavior.

In the current trunk, we've moved most configuration settings into a
single 'tahoe.cfg' INI-style file, but a number of discrete files are
still used for backwards compability. For exaple, a file named
'sizelimit' basically controls how much space the storage server is
allowed to use. We don't happen to use a config variable named
"lost+found", but if we did, then allowing some unrelated file or
directory by that name to be present in the tahoe basedir would change
the behavior of the tahoe node in surprising ways.

So I'm inclined to continue to encourage users to have a dedicated
directory for their Tahoe node (by having 'tahoe create-client' create
a brand-new directory for them). It sounds like you have a dedicated
partition for your Tahoe node.. that's great. Just use 'tahoe
create-client /newpartition/tahoe', and let the node have its own
dedicated directory as well.

cheers,
 -Brian


More information about the tahoe-dev mailing list