[tahoe-dev] Debugging memory leaks with guppy
Francois Deppierraz
francois at ctrlaltdel.ch
Fri Oct 22 00:13:07 UTC 2010
Hi,
I have been recently experiencing memory leaks in Tahoe-LAFS during
operations such as 'tahoe deep-check --add-lease --repair' or while
using the FTP/SFTP frontend¹.
While I was researching different ways to debug memory leaks in Python,
I learned about a tool named Heapy which provides a heap analysis
toolset which is part of the Guppy² Programming Environment.
The use of such toolset requires access to the Python interpreter under
which Tahoe-LAFS node is running. Fortunately, the (undocumented)
manhole feature allows one to access the Python interpreter with SSH on
a running node. This feature is activated by the following configuration
snippet in tahoe.cfg.
[node]
ssh.port = 8020
ssh.authorized_keys_file = ~/.ssh/authorized_keys
Access to the Python interpreter is now possible by SSHing on
localhost:8020.
Then I started the node and did a few operation with SFTP to warm the
system a bit and then loaded Heapy and recorded the current heap state.
> from guppy import hpy
> hp = hpy()
> hp.setrelheap()
Then, I deleted about 1000 very small files from a single directory and
had a look at how many objects of each type were created.
>>> hp.heap()
Partition of a set of 70211 objects. Total size = 75854072 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 23768 34 71651592 94 71651592 94 str
1 2119 3 1693864 2 73345456 97 dict (no owner)
2 19310 28 1540664 2 74886120 99 tuple
3 21345 30 512280 1 75398400 99 float
4 545 1 112392 0 75510792 100 list
5 75 0 78600 0 75589392 100 dict of
twisted.internet.base.DelayedCall
6 2021 3 48504 0 75637896 100 int
7 376 1 30080 0 75667976 100 types.MethodType
8 20 0 20960 0 75688936 100 dict of
allmydata.mutable.publish.PublishStatus
9 20 0 20960 0 75709896 100 dict of
allmydata.mutable.retrieve.RetrieveStatus
<77 more rows. Type e.g. '_.more' to view.>
Wow, that's a lot of strings! What size are those?
>>> hp.heap()[0].bysize
Partition of a set of 25137 objects. Total size = 71747672 bytes.
Index Count % Size % Cumulative % Individual Size
0 18150 72 71148000 99 71148000 99 3920
1 1203 5 86616 0 71234616 99 72
2 881 4 70480 0 71305096 99 80
3 502 2 60240 0 71365336 99 120
4 882 4 56448 0 71421784 100 64
5 480 2 53760 0 71475544 100 112
6 564 2 49632 0 71525176 100 88
7 437 2 45448 0 71570624 100 104
8 748 3 41888 0 71612512 100 56
9 392 2 37632 0 71650144 100 96
Interesting, 99% percent of those strings are 3920 bytes long.
>>> hp.heap()[0].byrcs[0].bysize[0].sp[0]
'%s.i0_modules[\'twisted.internet.reactor\'].__dict__[\'_reads\'].keys()[0].__dict__[\'protocol\'].__dict__[\'avatar\'].__dict__[\'_root\'].__dict__[\'_node\'].__dict__[\'_cache\'].__dict__[\'cache\'][((80,
\'\\tnI\\xf8&\\xd..."H\\t\\xf6\\xdcN\',
\'\\xb6\\xe7\\xb1...c]Rm\\xad\\x8dW\', 3789, 3788, 3, ...), 0)].??[1]'
>>> hp.heap()[0].byrcs[0].bysize[0].rp
Reference Pattern by <[dict of] class>.
0: _ --- [-] 18150 <size = 3920>: 0x63ce610, 0x63d18e0, 0x63d8b60,
0x63f4040...
1: a [-] 18150 tuple: 0x2726b90*3, 0x2d7ed20*3, 0x2f541e0*3...
2: aa ---- [-] 10 __builtin__.set: 0x37bf138, 0x6793220, 0x67c4050...
3: a3 [-] 1 allmydata.util.dictutil.DictOfSets: 0x675a0c0
4: a4 ------ [-] 1 dict of allmydata.mutable.common.ResponseCache:
0x67b4440
5: a5 [-] 1 allmydata.mutable.common.ResponseCache: 0x67b4440
6: a6 -------- [-] 1 dict of
allmydata.mutable.filenode.MutableFileNode: 0x6...
7: a7 [+] 1 allmydata.mutable.filenode.MutableFileNode: 0x67b4b90
It looks like cache entries which are not correctly released. Has
somebody more clue about the actual culprit code-wise? Any advise from
you guys is welcome!
I'll continue to work on this and post the findings here.
Cheers,
François
¹ http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1045
² http://guppy-pe.sourceforge.net/
More information about the tahoe-dev
mailing list