#1939 new defect

memory leak (during check --repair --add-lease)

Reported by: killyourtv Owned by: killyourtv
Priority: major Milestone: undecided
Component: code Version: 1.9.2
Keywords: Cc: killyourtv@…
Launchpad Bug:

Description (last modified by zooko)

While repairing a list of files I've had a problem with the memory usage for the tahoe process growing wildly and leading to OOM.

Attached are incident reports and screencaps of the webui's status page. I don't know if the statuses will tell you what the problem is but it'll be obvious that there is a problem.

Edit:

Oops, I neglected to report the versions:

allmydata-tahoe: 1.9.2,
foolscap: 0.6.4,
pycryptopp: 0.6.0.1206569328141510525648634803928199668821045408958,
zfec: 1.4.24,
Twisted: 12.0.0,
Nevow: 0.10.0,
zope.interface: unknown,
python: 2.7.3,
platform: Linux-debian_7.0-x86_64-64bit_ELF,
pyOpenSSL: 0.13,
simplejson: 2.5.2,
pycrypto: 2.6,
pyasn1: unknown,
mock: 0.8.0,
sqlite3: 2.6.0 [sqlite 3.7.13],
setuptools: 0.6 [distribute]

Attachments (3)

incident-2013-04-02--11-19-06Z-urhgqtq.flog.bz2 (67.9 KB) - added by killyourtv at 2013-04-02T21:48:57Z.
incident #1
incident-2013-04-02--11-51-59Z-3lr26oy.flog.bz2 (36.3 KB) - added by killyourtv at 2013-04-02T21:49:29Z.
incident 2
o.png (503.9 KB) - added by killyourtv at 2013-04-02T21:50:44Z.
mutable file retrieve status

Download all attachments as: .zip

Change History (22)

Changed at 2013-04-02T21:48:57Z by killyourtv

incident #1

Changed at 2013-04-02T21:49:29Z by killyourtv

incident 2

Changed at 2013-04-02T21:50:44Z by killyourtv

mutable file retrieve status

comment:1 Changed at 2013-04-03T01:48:36Z by daira

Possibly a duplicate of #1824.

comment:2 Changed at 2013-04-03T02:19:18Z by kpreid

Dumped the flogs; I see no similarities to #1824.

comment:3 Changed at 2013-04-03T11:12:14Z by killyourtv

  • Description modified (diff)

comment:4 follow-up: Changed at 2013-04-03T17:54:51Z by zooko

Hey kytv! Good to hear from you again. By the way, have you seen the renewed interest in merging #68? Check it out. Lebek is away, so maybe you should take it over yourself.

Anyway, about this bug report: thank you for the bug report! Are you using exactly Tahoe-LAFS v1.9.2 as generated from our sources, e.g. https://tahoe-lafs.org/source/tahoe-lafs/releases/allmydata-tahoe-1.9.2.tar.bz2, or are you using your branch with the #68 patch, as described on OSPackages?

I have a request: change the APPNAME on line 42 of setup.py from allmydata-tahoe to something else like maybe tahoe-lafs-i2p. That way I'll never have to wonder again if someone else's "1.10.0" is the same as my "1.10.0" when someone reports a bug.

Version 0, edited at 2013-04-03T17:54:51Z by zooko (next)

comment:5 Changed at 2013-04-03T18:00:24Z by zooko

Having looked at o.png, I sure hope it is running over i2p, because then the 2 to 10 seconds round-trip times to storage servers are somewhat justifiable. ;-) Seriously, please confirm what specific version it is...

comment:6 Changed at 2013-04-03T18:00:34Z by zooko

  • Owner changed from davidsarah to killyourtv

comment:7 Changed at 2013-04-03T20:20:11Z by killyourtv

Yes, this is over I2P (the build from OSPackages)...and I'll be sure to change APPNAME in future revisions.

RE: #68 -- I need to try reintegrating the multiple introducer support into trunk. I've not been keeping up on it recent developments as much as I'd like to.

comment:8 in reply to: ↑ 4 ; follow-up: Changed at 2013-04-04T04:52:01Z by daira

Replying to zooko:

I have a request: change the APPNAME on line 42 of setup.py from allmydata-tahoe to something else like maybe tahoe-lafs-i2p. That way I'll never have to wonder again if someone else's "1.10.0" is the same as my "1.10.0" when someone reports a bug.

That would break node directories created by earlier versions (or by the official branch), due to #1159. It would be nice to report the git branch name in the version info, but that could be done independently of changing the appname.

Last edited at 2014-03-05T02:49:13Z by daira (previous) (diff)

comment:9 in reply to: ↑ 8 Changed at 2013-04-23T19:44:42Z by zooko

killyourtv: despite daira's comment:8, I would still appreciate it if you would change the APPNAME as described in comment:4.

comment:10 follow-ups: Changed at 2013-04-24T13:49:01Z by daira

I really think that changing the appname, in any branch, before fixing #1159 is going to cause far more problems than it can solve. I'll file a ticket to include the branch name (and maybe another identifying string separate from the appname that is easier to change) in the version info.

comment:11 in reply to: ↑ 10 Changed at 2013-04-24T13:59:19Z by daira

Replying to daira:

I'll file a ticket to include the branch name (and maybe another identifying string separate from the appname that is easier to change) in the version info.

Filed as #1953.

comment:12 in reply to: ↑ 10 ; follow-up: Changed at 2013-04-24T22:03:52Z by zooko

Replying to daira:

I really think that changing the appname, in any branch, before fixing #1159 is going to cause far more problems than it can solve. I'll file a ticket to include the branch name (and maybe another identifying string separate from the appname that is easier to change) in the version info.

Argh, fine but then this makes me feel like #1159 is more urgent. ☺

comment:13 in reply to: ↑ 12 Changed at 2013-04-25T01:48:46Z by daira

Replying to zooko:

Argh, fine but then this makes me feel like #1159 is more urgent. ☺

I would have bumped its priority, but it already has priority 'major' and milestone 1.11.

comment:14 Changed at 2013-05-21T20:32:53Z by zooko

  • Description modified (diff)
  • Priority changed from normal to major

This ticket is important to me. Seems like a bad bug.

comment:15 Changed at 2014-09-11T22:36:14Z by warner

  • Component changed from unknown to code

comment:16 Changed at 2014-12-03T21:47:46Z by zooko

In contradiction to what kpreid said in comment:2, the flog that kv attached (attachment:incident-2013-04-02--11-19-06Z-urhgqtq.flog.bz2) does have some evidence that points to a similar problem as kpreid's original report. It has this:

[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ngqc) (during method=RIBucketReader:read)
]

and this:

2013-04-01_21:20:12.017651Z [110581]: UNUSUAL {'facility':'tahoe.encoder', 'failure':<foolscap.call.CopiedFailure allmydata.util.pipeline.PipelineError>, 'format':'error while sending %(method)s to shareholder=%(shnum)d', 'incarnation':('\xd0N\xca\xf7\xa7l\x8f\xce', None), 'parent':110314, 'shnum':5, ...} FAILURE:
[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
Failure: allmydata.util.pipeline.PipelineError: <PipelineError error=([Failure instance: Traceback: <class 'foolscap.ipb.DeadReferenceError'>: Calling Stale Broker
/usr/lib/pymodules/python2.7/allmydata/immutable/layout.py:237:_write
/usr/lib/pymodules/python2.7/allmydata/util/pipeline.py:89:add
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:134:maybeDeferred
/usr/lib/pymodules/python2.7/foolscap/referenceable.py:415:callRemote
--- <exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:134:maybeDeferred
/usr/lib/pymodules/python2.7/foolscap/referenceable.py:455:_callRemote
/usr/lib/pymodules/python2.7/foolscap/broker.py:477:newRequestID
])>
]

This is similar to the evidence in kpreid's log, which suggests that maybe a foolscap rref becoming None led to the problem.

comment:17 Changed at 2014-12-03T21:48:24Z by zooko

Oh wow, attachment:2013-04-02--11-19-06Z-urhgqtq.flog.bz2​ also has this:

2013-04-02_00:00:45.292535Z [190484]: WEIRD {'f_value':'AES.__init__() argument 1 must be string or read-only character buffer, not None', 'facility':'tahoe.mutable.mapupdate', 'failure':<foolscap.call.CopiedFailure exceptions.TypeError>, 'format':'error during privkey query: %(f_value)s', 'incarnation':('\xd0N\xca\xf7\xa7l\x8f\xce', None), 'parent':190483, ...} FAILURE:
[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 551, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/pymodules/python2.7/allmydata/mutable/servermap.py", line 723, in _got_results
    d4.addCallback(lambda results, shnum=shnum:
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 301, in addCallback
    callbackKeywords=kw)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 290, in addCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 551, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/pymodules/python2.7/allmydata/mutable/servermap.py", line 724, in <lambda>
    self._try_to_validate_privkey(results, server, shnum, lp))
  File "/usr/lib/pymodules/python2.7/allmydata/mutable/servermap.py", line 929, in _try_to_validate_privkey
    alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)
  File "/usr/lib/pymodules/python2.7/allmydata/mutable/filenode.py", line 168, in _decrypt_privkey
    enc = AES(self._writekey)
exceptions.TypeError: AES.__init__() argument 1 must be string or read-only character buffer, not None
]

I don't think I've seen that before!

comment:18 follow-up: Changed at 2014-12-03T22:03:48Z by zooko

killyourtv: I'd really like to investigate these issues, but I can't be sure that I'm looking at the right source code. Tahoe-LAFS v1.10.0 doesn't have alleged_privkey_s = self._node._decrypt_privkey(enc_privkey) on line 929.

comment:19 in reply to: ↑ 18 Changed at 2014-12-03T22:11:02Z by zooko

Replying to zooko:

killyourtv: I'd really like to investigate these issues, but I can't be sure that I'm looking at the right source code. Tahoe-LAFS v1.10.0 doesn't have alleged_privkey_s = self._node._decrypt_privkey(enc_privkey) on line 929.

Oh, sorry, the incident report file says that it is Tahoe-LAFS v1.9.2, not v1.10.0. Let's see… Yes! line 929 of Tahoe-LAFS v1.9.2 is alleged_privkey_s = self._node._decrypt_privkey(enc_privkey).

Note: See TracTickets for help on using tickets.