#783 closed defect (wontfix)
does it sometimes use 750 MB
Reported by: | terrell | Owned by: | terrell |
---|---|---|---|
Priority: | major | Milestone: | undecided |
Component: | code | Version: | 1.5.0 |
Keywords: | leak memory | Cc: | |
Launchpad Bug: |
Description
My main machine, Intel iMac on 10.5.7, has been running a tahoe node on the volunteergrid for a while (a few weeks).
I noticed today when looking at the Activity Monitor that the node is using over 750MB of memory.
Please advise for what I can do to catch it in the act.
Attachments (8)
Change History (21)
Changed at 2009-08-08T02:33:21Z by terrell
comment:1 Changed at 2009-08-08T02:38:40Z by terrell
[10:36:24:trel:~] ps -fA | grep tahoe 501 71020 1 0 9:04.72 ?? 33:17.14 /System/Library/Frameworks/Python.framework/Versions/2.5/Resources/Python.app/Contents/MacOS/Python /usr/bin/twistd -y tahoe-client.tac --logfile logs/twistd.log
comment:2 Changed at 2009-08-08T02:40:44Z by terrell
[10:36:26:trel:~] tahoe --version
allmydata-tahoe: 1.4.1-r3995, foolscap: 0.4.1, pycryptopp: 0.5.15, zfec: 1.4.2, Twisted: 2.5.0, Nevow: 0.9.32, zope.interface: 3.3.0, python: 2.5.1, platform: Darwin-9.7.0-i386-32bit, sqlite: 3.4.0, simplejson: 2.0.1, argparse: 0.8.0, pyOpenSSL: 0.6, pyutil: 1.3.28, zbase32: 1.1.1, setuptools: 0.6c12dev, pysqlite: 2.3.2
comment:3 Changed at 2009-08-08T02:41:32Z by terrell
- Summary changed from long running tahoe process - appears to be a memory leak to long running tahoe process - appears to be a slow memory leak
comment:4 Changed at 2009-08-08T15:28:45Z by zooko
What tool generated that "tahoesample.txt" sample file, and what is the meaning of the contents of that file?
Hm, let's see, how else can we figure out what's going on in there. The presence of 11 threads is a bit surprising to me. Oh! Look for incident report files. It would really be good if we made this better documented and even more automated. Anyway, look in $TAHOEBASEDIR/logs/incidents and attach the most recent ones to this ticket. Also try pressing the "Report an incident" button on the welcome page.
comment:5 Changed at 2009-08-08T15:41:53Z by zooko
- Priority changed from major to critical
I'm elevating the priority to "critical" because a big memory leak like this could prevent Tahoe-LAFS from being used in some cases. By the way, we've always been careful about this. We have graphs of the virtual memory usage of short-running programs which get automatically generated on darcs commit:
http://allmydata.org/trac/tahoe/wiki/Performance
And allmydata.com has graphs of the virtual and resident memory of long-running storage server processes. Unfortunately those graphs aren't public. I'll ask Peter Secor if we could make the live view on those graphs public. I'll attach the graphs from one of those servers to this ticket.
So, it is interesting that Trel's is the first report of something like this. I've been running long-running Tahoe-LAFS on my Intel Mac 10.4, and I've never seen something like this.
Note: we *have* seen major memory problems, but never a long-running slow memory leak like this, only a catastrophic "Out Of Memory -- now I am totally confused and broken" -- #651. Trel: please look in your twistd.log for MemoryError.
Changed at 2009-08-08T15:46:01Z by zooko
some random storage server at allmydata.com Look: no major memory leak! (graph shows a day)
Changed at 2009-08-08T15:46:14Z by zooko
some random storage server at allmydata.com Look: no major memory leak! (graph shows a week)
Changed at 2009-08-08T15:46:41Z by zooko
some random storage server at allmydata.com Look: no major memory leak! (graph shows a month)
Changed at 2009-08-08T15:46:53Z by zooko
some random storage server at allmydata.com Look: no major memory leak! (graph shows a year)
comment:6 Changed at 2009-08-08T15:48:15Z by zooko
I attached graphs of the memory usage of one of the allmydata.com storage servers. Note that these are (I think) running Tahoe-LAFS v1.3.0.
comment:7 Changed at 2009-08-08T21:11:17Z by terrell
That tahoesample.txt was generated from the Activity Monitor in OS X - with the 'Sample Process' button at the top after selecting the running 'Python' app.
I'm attaching another screenshot - this time showing that the memory usage had dropped back to 57M - and then I waited another 10 hours, and it seems to still be at 57M. So... now even more confused. I'll look for incident files when I return. Need to head out the door now.
comment:8 Changed at 2009-08-08T22:44:05Z by warner
FYI, I've seen unexpected memory usage in storage servers that are receiving shares, but not huge consumption (it felt like the 100KB-ish strings weren't being freed as quickly as I was expecting). I think we've also seen unexpected behavior in busy webapi servers.. we should check the allmydata.com webapi2/webapi3 nodes to see what their memory usage munin graphs look like.
Occasionally we've seen a node use up so much memory that it hits MemoryError, and then everything falls apart (because the reactor's unhandled-error handling code runs out of memory too.. it would be great if MemoryError weren't catchable, or at least if the reactor didn't try to catch it). We haven't been able to figure out how it got into that state, though.. there were no obvious smoking guns, just the fatal exit wound :).
comment:9 Changed at 2009-10-27T06:05:25Z by zooko
- Priority changed from critical to major
- Summary changed from long running tahoe process - appears to be a slow memory leak to does it sometimes use 750 MB
comment:10 Changed at 2009-12-13T02:25:04Z by davidsarah
- Keywords memory added
comment:11 Changed at 2009-12-13T04:25:53Z by zooko
- Owner changed from somebody to terrell
I appreciate the bug report, Terrell, and I don't consider it acceptable for Tahoe-LAFS to occasionally use 750 MB, but I don't see how to make progress on this ticket, unless you experience the problem again and this time you have verbose logging turned on or it generates an incident report file. Let's close this as 'wontfix' for now so that the ticket doesn't sit her open waiting for the event to reoccur on your system. Maybe it has been fixed! But please do re-open this ticket if it reoccurs.
comment:12 Changed at 2009-12-13T04:26:01Z by zooko
- Resolution set to wontfix
- Status changed from new to closed
comment:13 Changed at 2009-12-13T04:47:06Z by terrell
haven't seen this since it happened four months ago. closed is fine.
screenshot of Activity Monitor