#694 closed defect (fixed)

remove hard limit on mutable file size

Reported by: sigmonsays Owned by: kevan
Priority: major Milestone: undecided
Component: unknown Version: 1.4.1
Keywords: easy Cc: kevan
Launchpad Bug:

Description (last modified by zooko)

<sigmonsays> the tahoe put returns this after several seconds "error, got 413 
             Request Entity Too Large"                                  [09:06] 
<sigmonsays> allmydata.interfaces.FileTooLargeError: SDMF is limited to one 
             segment, and 3500419 > 3500000                             [09:07] 
<sigmonsays> the file i'm trying to send is only 9k

---- [configuration]------
3 node cluster each w/ 2G (about 75% full on each)

---- [the full exception]------
12148151_1141322388.jpg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 16  7234   16  1232    0     0   5401      0  0:00:01 --:--:--  0:00:01  5401waiting for file data on stdin..
100  7234  100  7234    0     0  22420      0 --:--:-- --:--:-- --:--:-- 63851
error, got 413 Request Entity Too Large
Traceback (most recent call last):
  File "/usr/local/src/tahoe-files/allmydata-tahoe-1.4.1/Twisted-8.1.0-py2.4-linux-x86_64.egg/twisted/internet/defer.py", line 312, in _startRunCallbacks
    self._runCallbacks()
  File "/usr/local/src/tahoe-files/allmydata-tahoe-1.4.1/Twisted-8.1.0-py2.4-linux-x86_64.egg/twisted/internet/defer.py", line 328, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File "/usr/local/src/tahoe-files/allmydata-tahoe-1.4.1/Twisted-8.1.0-py2.4-linux-x86_64.egg/twisted/internet/defer.py", line 289, in _continue
    self.unpause()
  File "/usr/local/src/tahoe-files/allmydata-tahoe-1.4.1/Twisted-8.1.0-py2.4-linux-x86_64.egg/twisted/internet/defer.py", line 285, in unpause
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/local/src/tahoe-files/allmydata-tahoe-1.4.1/Twisted-8.1.0-py2.4-linux-x86_64.egg/twisted/internet/defer.py", line 328, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File "/usr/local/src/tahoe-files/allmydata-tahoe-1.4.1/src/allmydata/mutable/filenode.py", line 403, in _apply
    return self._upload(new_contents, servermap)
  File "/usr/local/src/tahoe-files/allmydata-tahoe-1.4.1/src/allmydata/mutable/filenode.py", line 438, in _upload
    return p.publish(new_contents)
  File "/usr/local/src/tahoe-files/allmydata-tahoe-1.4.1/src/allmydata/mutable/publish.py", line 146, in publish
    raise FileTooLargeError("SDMF is limited to one segment, and "
allmydata.interfaces.FileTooLargeError: SDMF is limited to one segment, and 3500419 > 3500000

12062355_1141263102.jpg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 20  5998   20  1232    0     0   5319      0  0:00:01 --:--:--  0:00:01  5319waiting for file data on stdin..
100  5998  100  5998    0     0  19168      0 --:--:-- --:--:-- --:--:-- 58839
./send-to-tahoe.sh: line 19: 20708 Broken pipe             cat files.lst

Attachments (2)

694.diff (15.5 KB) - added by kevan at 2009-06-20T02:53:41Z.
694.txt (15.9 KB) - added by kevan at 2009-06-20T21:55:37Z.

Download all attachments as: .zip

Change History (9)

comment:1 Changed at 2009-05-04T16:57:16Z by zooko

  • Description modified (diff)

Reformatting for Trac wiki escape.

comment:2 Changed at 2009-05-04T17:15:05Z by zooko

  • Summary changed from tahoe put reeturns "error, got 413 request entity too large" to remove hard limit on mutable file size

Thank you for the bug report, sigmonsays. There is a hardcoded limit on the maximum size of mutable files. Directories are stored inside mutable files. The directory into which you are linking your new file would grow beyond the limit by the addition of this link.

I believe the first thing to do is to remove the hardcoded limit, and I'm accordingly changing the title of this ticket to "remove hard limit on mutable file size". The line of code in question is publish.py line 145. Someone go fix it! Just remove the MAX_SEGMENT_SIZE hardcoded parameter and all two places that it is used.

There is already a unit test in test_mutable.py line 359 that makes sure that Tahoe raises a failure when you try to create a mutable file that is bigger than 3,500,000 bytes. Change that test to make sure that Tahoe doesn't raise a failure and instead that the file is created.

After that, however, you might start to learn why we put that limit in -- it is because modifying a mutable file requires downloading and re-uploading the entirety of that mutable file, and storing the entirety of it in RAM while changing it. So the more links you keep in that directory of yours, the slower it is going to be to read the directory or to change it, and the more RAM will be used.

Ultimately we need to implement efficient modification of mutable files without downloading and re-uploading the whole file -- that is the subject of #393 (mutable: implement MDMF).

In the mean-time, there are also some tickets about optimizing the CPU usage when processing large directories. Fixing these would not fix the problem that the entire directory has to be downloaded and re-uploaded, but these tickets might also be important: #327 (performance measurement of directories), #329 (dirnodes could cache encrypted/serialized entries for speed), #383 (large directories take a long time to modify), #414 (profiling on directory unpacking).

comment:3 Changed at 2009-06-10T16:39:00Z by zooko

  • Keywords easy added

Changed at 2009-06-20T02:53:41Z by kevan

comment:4 Changed at 2009-06-20T02:53:53Z by kevan

I tried my hand at fixing this, and am attaching a patch. Comments?

comment:5 Changed at 2009-06-20T03:52:43Z by zooko

  • Owner changed from nobody to kevan

Looks good! Please change the doc:

    # this used to be in Publish, but we removed it there. Some of the
    # tests in here still use it, though, so here it is.

to something like:

    # this used to be in Publish, but we removed the limit. Some of 
    # these tests test whether the new code correctly allows files 
    # larger than the limit

You can use darcs unrecord to undo the effect of your previous darcs record and then use darcs record again to make a new patch and attach it to this ticket. By the way, please name it with a trailing '.txt' this time.

Changed at 2009-06-20T21:55:37Z by kevan

comment:6 Changed at 2009-06-20T21:56:01Z by kevan

  • Cc kevan added

Done. Thanks for the feedback.

comment:7 Changed at 2009-06-21T05:28:47Z by zooko

  • Resolution set to fixed
  • Status changed from new to closed

Fixed by db939750a8831c1e and efcc45951d3544ee. Thanks, Kevan!

Note: See TracTickets for help on using tickets.