Opened at 2013-04-02T21:47:48Z
Last modified at 2014-12-03T22:11:02Z
#1939 new defect
memory leak (during check --repair --add-lease)
Reported by: | killyourtv | Owned by: | killyourtv |
---|---|---|---|
Priority: | major | Milestone: | undecided |
Component: | code | Version: | 1.9.2 |
Keywords: | Cc: | killyourtv@… | |
Launchpad Bug: |
Description (last modified by zooko)
While repairing a list of files I've had a problem with the memory usage for the tahoe process growing wildly and leading to OOM.
Attached are incident reports and screencaps of the webui's status page. I don't know if the statuses will tell you what the problem is but it'll be obvious that there is a problem.
Edit:
Oops, I neglected to report the versions:
allmydata-tahoe: 1.9.2, foolscap: 0.6.4, pycryptopp: 0.6.0.1206569328141510525648634803928199668821045408958, zfec: 1.4.24, Twisted: 12.0.0, Nevow: 0.10.0, zope.interface: unknown, python: 2.7.3, platform: Linux-debian_7.0-x86_64-64bit_ELF, pyOpenSSL: 0.13, simplejson: 2.5.2, pycrypto: 2.6, pyasn1: unknown, mock: 0.8.0, sqlite3: 2.6.0 [sqlite 3.7.13], setuptools: 0.6 [distribute]
Attachments (3)
Change History (22)
Changed at 2013-04-02T21:48:57Z by killyourtv
comment:1 Changed at 2013-04-03T01:48:36Z by daira
Possibly a duplicate of #1824.
comment:2 Changed at 2013-04-03T02:19:18Z by kpreid
Dumped the flogs; I see no similarities to #1824.
comment:3 Changed at 2013-04-03T11:12:14Z by killyourtv
- Description modified (diff)
comment:4 follow-up: ↓ 8 Changed at 2013-04-03T17:54:51Z by zooko
Hey kytv! Good to hear from you again. By the way, have you seen the renewed interest in merging #68? Check it out. Lebek is away, so maybe you should take it over yourself.
Anyway, about this bug report: thank you for the bug report! Are you using exactly Tahoe-LAFS v1.9.2 as generated from our sources, e.g. https://tahoe-lafs.org/source/tahoe-lafs/releases/allmydata-tahoe-1.9.2.tar.bz2, or are you using your branch with the #68 patch, as described on OSPackages?
I have a request: change the APPNAME on line 42 of setup.py from allmydata-tahoe to something else like maybe tahoe-lafs-i2p. That way I'll never have to wonder again if someone else's "1.10.0" is the same as my "1.10.0" when someone reports a bug.
comment:5 Changed at 2013-04-03T18:00:24Z by zooko
comment:6 Changed at 2013-04-03T18:00:34Z by zooko
- Owner changed from davidsarah to killyourtv
comment:7 Changed at 2013-04-03T20:20:11Z by killyourtv
Yes, this is over I2P (the build from OSPackages)...and I'll be sure to change APPNAME in future revisions.
RE: #68 -- I need to try reintegrating the multiple introducer support into trunk. I've not been keeping up on it recent developments as much as I'd like to.
comment:8 in reply to: ↑ 4 ; follow-up: ↓ 9 Changed at 2013-04-04T04:52:01Z by daira
Replying to zooko:
I have a request: change the APPNAME on line 42 of setup.py from allmydata-tahoe to something else like maybe tahoe-lafs-i2p. That way I'll never have to wonder again if someone else's "1.10.0" is the same as my "1.10.0" when someone reports a bug.
That would break node directories created by earlier versions (or by the official branch), due to #1159. It would be nice to report the git branch name in the version info, but that could be done independently of changing the appname.
comment:9 in reply to: ↑ 8 Changed at 2013-04-23T19:44:42Z by zooko
comment:10 follow-ups: ↓ 11 ↓ 12 Changed at 2013-04-24T13:49:01Z by daira
I really think that changing the appname, in any branch, before fixing #1159 is going to cause far more problems than it can solve. I'll file a ticket to include the branch name (and maybe another identifying string separate from the appname that is easier to change) in the version info.
comment:11 in reply to: ↑ 10 Changed at 2013-04-24T13:59:19Z by daira
comment:12 in reply to: ↑ 10 ; follow-up: ↓ 13 Changed at 2013-04-24T22:03:52Z by zooko
Replying to daira:
I really think that changing the appname, in any branch, before fixing #1159 is going to cause far more problems than it can solve. I'll file a ticket to include the branch name (and maybe another identifying string separate from the appname that is easier to change) in the version info.
Argh, fine but then this makes me feel like #1159 is more urgent. ☺
comment:13 in reply to: ↑ 12 Changed at 2013-04-25T01:48:46Z by daira
comment:14 Changed at 2013-05-21T20:32:53Z by zooko
- Description modified (diff)
- Priority changed from normal to major
This ticket is important to me. Seems like a bad bug.
comment:15 Changed at 2014-09-11T22:36:14Z by warner
- Component changed from unknown to code
comment:16 Changed at 2014-12-03T21:47:46Z by zooko
In contradiction to what kpreid said in comment:2, the flog that kv attached (attachment:incident-2013-04-02--11-19-06Z-urhgqtq.flog.bz2) does have some evidence that points to a similar problem as kpreid's original report. It has this:
[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ngqc) (during method=RIBucketReader:read) ]
and this:
2013-04-01_21:20:12.017651Z [110581]: UNUSUAL {'facility':'tahoe.encoder', 'failure':<foolscap.call.CopiedFailure allmydata.util.pipeline.PipelineError>, 'format':'error while sending %(method)s to shareholder=%(shnum)d', 'incarnation':('\xd0N\xca\xf7\xa7l\x8f\xce', None), 'parent':110314, 'shnum':5, ...} FAILURE: [CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): Failure: allmydata.util.pipeline.PipelineError: <PipelineError error=([Failure instance: Traceback: <class 'foolscap.ipb.DeadReferenceError'>: Calling Stale Broker /usr/lib/pymodules/python2.7/allmydata/immutable/layout.py:237:_write /usr/lib/pymodules/python2.7/allmydata/util/pipeline.py:89:add /usr/lib/python2.7/dist-packages/twisted/internet/defer.py:134:maybeDeferred /usr/lib/pymodules/python2.7/foolscap/referenceable.py:415:callRemote --- <exception caught here> --- /usr/lib/python2.7/dist-packages/twisted/internet/defer.py:134:maybeDeferred /usr/lib/pymodules/python2.7/foolscap/referenceable.py:455:_callRemote /usr/lib/pymodules/python2.7/foolscap/broker.py:477:newRequestID ])> ]
This is similar to the evidence in kpreid's log, which suggests that maybe a foolscap rref becoming None led to the problem.
comment:17 Changed at 2014-12-03T21:48:24Z by zooko
Oh wow, attachment:2013-04-02--11-19-06Z-urhgqtq.flog.bz2 also has this:
2013-04-02_00:00:45.292535Z [190484]: WEIRD {'f_value':'AES.__init__() argument 1 must be string or read-only character buffer, not None', 'facility':'tahoe.mutable.mapupdate', 'failure':<foolscap.call.CopiedFailure exceptions.TypeError>, 'format':'error during privkey query: %(f_value)s', 'incarnation':('\xd0N\xca\xf7\xa7l\x8f\xce', None), 'parent':190483, ...} FAILURE: [CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 551, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/lib/pymodules/python2.7/allmydata/mutable/servermap.py", line 723, in _got_results d4.addCallback(lambda results, shnum=shnum: File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 301, in addCallback callbackKeywords=kw) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 290, in addCallbacks self._runCallbacks() --- <exception caught here> --- File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 551, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/lib/pymodules/python2.7/allmydata/mutable/servermap.py", line 724, in <lambda> self._try_to_validate_privkey(results, server, shnum, lp)) File "/usr/lib/pymodules/python2.7/allmydata/mutable/servermap.py", line 929, in _try_to_validate_privkey alleged_privkey_s = self._node._decrypt_privkey(enc_privkey) File "/usr/lib/pymodules/python2.7/allmydata/mutable/filenode.py", line 168, in _decrypt_privkey enc = AES(self._writekey) exceptions.TypeError: AES.__init__() argument 1 must be string or read-only character buffer, not None ]
I don't think I've seen that before!
comment:18 follow-up: ↓ 19 Changed at 2014-12-03T22:03:48Z by zooko
killyourtv: I'd really like to investigate these issues, but I can't be sure that I'm looking at the right source code. Tahoe-LAFS v1.10.0 doesn't have alleged_privkey_s = self._node._decrypt_privkey(enc_privkey) on line 929.
comment:19 in reply to: ↑ 18 Changed at 2014-12-03T22:11:02Z by zooko
Replying to zooko:
killyourtv: I'd really like to investigate these issues, but I can't be sure that I'm looking at the right source code. Tahoe-LAFS v1.10.0 doesn't have alleged_privkey_s = self._node._decrypt_privkey(enc_privkey) on line 929.
Oh, sorry, the incident report file says that it is Tahoe-LAFS v1.9.2, not v1.10.0. Let's see… Yes! line 929 of Tahoe-LAFS v1.9.2 is alleged_privkey_s = self._node._decrypt_privkey(enc_privkey).
incident #1