[tahoe-dev] Debugging memory leaks with guppy

Fri Oct 22 00:13:07 UTC 2010

Hi,

I have been recently experiencing memory leaks in Tahoe-LAFS during
operations such as 'tahoe deep-check --add-lease --repair' or while
using the FTP/SFTP frontend¹.

While I was researching different ways to debug memory leaks in Python,
I learned about a tool named Heapy which provides a heap analysis
toolset which is part of the Guppy² Programming Environment.

The use of such toolset requires access to the Python interpreter under
which Tahoe-LAFS node is running. Fortunately, the (undocumented)
manhole feature allows one to access the Python interpreter with SSH on
a running node. This feature is activated by the following configuration
snippet in tahoe.cfg.

 [node]
 ssh.port = 8020
 ssh.authorized_keys_file = ~/.ssh/authorized_keys

Access to the Python interpreter is now possible by SSHing on
localhost:8020.

Then I started the node and did a few operation with SFTP to warm the
system a bit and then loaded Heapy and recorded the current heap state.

> from guppy import hpy
> hp = hpy()
> hp.setrelheap()

Then, I deleted about 1000 very small files from a single directory and
had a look at how many objects of each type were created.

>>> hp.heap()
Partition of a set of 70211 objects. Total size = 75854072 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  23768  34 71651592  94  71651592  94 str
     1   2119   3  1693864   2  73345456  97 dict (no owner)
     2  19310  28  1540664   2  74886120  99 tuple
     3  21345  30   512280   1  75398400  99 float
     4    545   1   112392   0  75510792 100 list
     5     75   0    78600   0  75589392 100 dict of
twisted.internet.base.DelayedCall
     6   2021   3    48504   0  75637896 100 int
     7    376   1    30080   0  75667976 100 types.MethodType
     8     20   0    20960   0  75688936 100 dict of
allmydata.mutable.publish.PublishStatus
     9     20   0    20960   0  75709896 100 dict of
allmydata.mutable.retrieve.RetrieveStatus
<77 more rows. Type e.g. '_.more' to view.>

Wow, that's a lot of strings! What size are those?

>>> hp.heap()[0].bysize
Partition of a set of 25137 objects. Total size = 71747672 bytes.
 Index  Count   %     Size   % Cumulative  % Individual Size
     0  18150  72 71148000  99  71148000  99      3920
     1   1203   5    86616   0  71234616  99        72
     2    881   4    70480   0  71305096  99        80
     3    502   2    60240   0  71365336  99       120
     4    882   4    56448   0  71421784 100        64
     5    480   2    53760   0  71475544 100       112
     6    564   2    49632   0  71525176 100        88
     7    437   2    45448   0  71570624 100       104
     8    748   3    41888   0  71612512 100        56
     9    392   2    37632   0  71650144 100        96

Interesting, 99% percent of those strings are 3920 bytes long.

>>> hp.heap()[0].byrcs[0].bysize[0].sp[0]
'%s.i0_modules[\'twisted.internet.reactor\'].__dict__[\'_reads\'].keys()[0].__dict__[\'protocol\'].__dict__[\'avatar\'].__dict__[\'_root\'].__dict__[\'_node\'].__dict__[\'_cache\'].__dict__[\'cache\'][((80,
\'\\tnI\\xf8&\\xd..."H\\t\\xf6\\xdcN\',
\'\\xb6\\xe7\\xb1...c]Rm\\xad\\x8dW\', 3789, 3788, 3, ...), 0)].??[1]'

>>> hp.heap()[0].byrcs[0].bysize[0].rp
Reference Pattern by <[dict of] class>.
 0: _ --- [-] 18150 <size = 3920>: 0x63ce610, 0x63d18e0, 0x63d8b60,
0x63f4040...
 1: a      [-] 18150 tuple: 0x2726b90*3, 0x2d7ed20*3, 0x2f541e0*3...
 2: aa ---- [-] 10 __builtin__.set: 0x37bf138, 0x6793220, 0x67c4050...
 3: a3       [-] 1 allmydata.util.dictutil.DictOfSets: 0x675a0c0
 4: a4 ------ [-] 1 dict of allmydata.mutable.common.ResponseCache:
0x67b4440
 5: a5         [-] 1 allmydata.mutable.common.ResponseCache: 0x67b4440
 6: a6 -------- [-] 1 dict of
allmydata.mutable.filenode.MutableFileNode: 0x6...
 7: a7           [+] 1 allmydata.mutable.filenode.MutableFileNode: 0x67b4b90

It looks like cache entries which are not correctly released. Has
somebody more clue about the actual culprit code-wise? Any advise from
you guys is welcome!

I'll continue to work on this and post the findings here.

Cheers,

François

¹ http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1045
² http://guppy-pe.sourceforge.net/