[tahoe-dev] Debugging memory leaks with guppy

Brian Warner warner at lothar.com
Fri Oct 22 07:23:09 UTC 2010


On 10/21/10 5:13 PM, Francois Deppierraz wrote:

> Then, I deleted about 1000 very small files from a single directory
> and had a look at how many objects of each type were created.

Wow, this is awesome.

>  3: a3       [-] 1 allmydata.util.dictutil.DictOfSets: 0x675a0c0
>  4: a4 ------ [-] 1 dict of allmydata.mutable.common.ResponseCache:

> It looks like cache entries which are not correctly released. Has
> somebody more clue about the actual culprit code-wise? Any advise from
> you guys is welcome!

Hmmm. Can you locate the ResponseCache instance and look at
'self.cache.keys()' ? In particular, I'm interested in the number of
'verinfo' values in there:

 verinfos = set([verinfo for (verinfo,shnum) in cache.keys()])
 print len(verinfos)

My suspicion is that the ResponseCache object is living a lot longer
than expected, and it's accumulating cached responses from lots and lots
of generations ("versions") of the mutable file that contains a
directory which is being modified heavily. When you say your test
"deleted about 1000 very small files from a single directory", you're
really making 1000 sequential changes to the same directory, right? So
there will be 1000 mutable file writes to the same file? If my suspicion
is right, there will be 1000 different 'verinfos' values (and N times as
many keys in self.cache, each of which may have multiple strings in the
set, resulting in a large number of strings left around).

When I first wrote ResponseCache back during the original Big Mutable
File Rewrite (in april-2008), I expected that instances would only stick
around for the duration of a mutable-file operation and then be gc'ed,
so I didn't worry about ever removing old versions from its cache. I'm
not sure what's causing the MutableFileNode to stick around like this.
There's a WeakValueDictionary cache in NodeMaker._node_cache, which was
primarily added to make sure we never wound up with two MutableFileNode
instances that pointed to the same file (making it possible to collide
with yourself and get an UncoordinatedWriteError). But maybe that's
keeping the MutableFileNode (which owns the ResponseCache object) around
longer than expected, or maybe there's just a cycle somewhere and gc
isn't happening fast enough to drop the reference.

I'd still expect that object to go away once the webapi operations have
stopped and the node goes idle. Maybe Guppy could tell us who's keeping
the MutableFileNode alive?

The ResponseCache really never needs to retain cached responses for more
than the current version. We probably shouldn't be doing
version-comparison inside ResponseCache (that's not really the right
place to do it), but maybe we should add a function to remove all cached
entries except for a certain version, and then MutableFileNode could
call that (with what it thinks is the most recent version) every once in
a while. Or, maybe ResponseCache should include (serverid,shnum) as an
index, and a new mapupdate operation should clear out all the old data
with that index. Or, maybe we should get rid of ResponseCache
altogether: I don't remember what performance improvement it tries to
provide.

Awesome work Francois! Thanks so much for tracking this down!

cheers,
 -Brian


More information about the tahoe-dev mailing list