[tahoe-lafs-trac-stream] [Tahoe-LAFS] #227: our automated memory measurements might be measuring the wrong thing

Fri Mar 20 20:26:59 UTC 2015

#227: our automated memory measurements might be measuring the wrong thing
------------------------------------+-------------------------------------
     Reporter:  zooko               |      Owner:  zooko
         Type:  defect              |     Status:  assigned
     Priority:  major               |  Milestone:  eventually
    Component:  dev-infrastructure  |    Version:  0.7.0
   Resolution:                      |   Keywords:  memory performance unix
Launchpad Bug:                      |
------------------------------------+-------------------------------------

Old description:

> As visible in [http://allmydata.org/tahoe-figleaf-graph/hanford.allmydata
> .com-tahoe_memstats.html the memory usage graphs], pycryptopp increased
> the static memory footprint by about 6 MiB when we added it in early
> November (I think it was November 6, although [wiki:Performance the
> Performance page] says November 9), and removing pycrypto on 2007-12-03
> seems to have had almost no benefit in reducing memory footprint.
>
> This reminds me of the weirdness about the 64-bit version using way more
> memory than we expected.
>
> Hm.  I think maybe we are erring by using "VmSize" (from /proc/*/status)
> as our proxy for memory usage.  That number is the total size of the
> virtual address space requested by the process, if I understand
> correctly.  So for example, mmap'ing a file adds the file's size to your
> VmSize, although it does not (by itself) use any memory.
>
> Linux kernel hackers seem to be in universal agreement that it is a bad
> idea to use VmSize for anything:
>
> http://bmaurer.blogspot.com/2006/03/memory-usage-with-smaps.html
> http://lwn.net/Articles/230975/
>
> But what's the alternative?  We could read "smaps" and see if we can get
> a better metric out of that.
>
> By the way, if anyone wants to investigate more closely the memory usage,
> the valgrind tool named massif has been rewritten so maybe it will work
> this time.

New description:

 As visible in [http://allmydata.org/tahoe-figleaf-graph/hanford.allmydata
 .com-tahoe_memstats.html the memory usage graphs], pycryptopp increased
 the static memory footprint by about 6 MiB when we added it in early
 November (I think it was November 6, although [wiki:Performance the
 Performance page] says November 9), and removing pycrypto on 2007-12-03
 seems to have had almost no benefit in reducing memory footprint.

 This reminds me of the weirdness about the 64-bit version using way more
 memory than we expected.

 Hm.  I think maybe we are erring by using "VmSize" (from /proc/*/status)
 as our proxy for memory usage.  That number is the total size of the
 virtual address space requested by the process, if I understand correctly.
 So for example, mmap'ing a file adds the file's size to your VmSize,
 although it does not (by itself) use any memory.

 Linux kernel hackers seem to be in universal agreement that it is a bad
 idea to use VmSize for anything:

 http://bmaurer.blogspot.com/2006/03/memory-usage-with-smaps.html
 http://lwn.net/Articles/230975/

 But what's the alternative?  We could read "smaps" and see if we can get a
 better metric out of that.

 By the way, if anyone wants to investigate more closely the memory usage,
 the valgrind tool named massif has been rewritten so maybe it will work
 this time.

--

Comment (by warner):

 I took a quick look at smem today, seems pretty nice. I think the "USS"
 (Unique Set Size) might be a good thing to track: it's the amount of
 memory you'd get back by killing the process. For Tahoe, the main thing we
 care about is that the client process isn't leaking or over-allocating the
 memory used to hold files during the upload/download process, and that
 memory isn't going to be shared with any other process. So even if it
 doesn't answer the "can I fit this tahoe node/workload on my NN-MB
 computer", it *does* answer the question of whether we're meeting our
 memory-complexity design goals.

 Installing `smem` requires a bunch of other stuff (python-gtk2, python-tk,
 matplotlib), since it has a graphical mode that we don't care about, but
 that's not a big deal. There's a process-filter thing which I can't find
 documentation on, which we'd need to limit the output to the tahoe
 client's own PID. And then the main downside I can think of is that you
 have to shell out to a not-small python program for each sample (vs
 reading /proc/self/status, which is basically free), so somebody might be
 worried about the performance impact.

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/227#comment:10>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage