#419 closed defect

pycryptopp uses up too much RAM — at Version 12

Reported by: zooko Owned by: nobody
Priority: minor Milestone: undecided
Component: code-encoding Version: 1.0.0
Keywords: memory pycryptopp performance Cc:
Launchpad Bug:

Description (last modified by warner)

Brian pointed out that the graph which shows the time to create a mutable ("SSK") file has shown a dramatic reduction in time recent:

http://allmydata.org/tahoe-figleaf-graph/hanford.allmydata.com-tahoe_speedstats_SSK_creation.html

The time to create an SSK is supposed to be dominated by the time to create a new RSA public/private keypair, which is a randomized process involving iteratively creating a big number and testing it for primality.

Brian writes:

so, the munin performance graphs show a considerable speedup in mutable file creation time that occurred last tuesday around noon it affected both the colo and DSL tests, so I assume it was a code thing instead of a buildslave getting moved or something 

in addition, our 32-bit initial memory footprint went up by 6MB the only code change that appears relevant was the new reliance upon pycryptopp >= 0.5 

I can somewhat believe that the memory increase is due to the inclusion of the EC-DSA code but can you think of any reason why RSA key generation might have speed up by nearly a factor of 10?

Change History (12)

comment:1 Changed at 2008-05-14T19:32:50Z by warner

So if I understand what Zooko told me correctly, then pycryptopp was recently changed to include a full copy of the underlying Crypto++ library, and to use it in preference to any version that's installed in /usr/lib/ .

I'm still investigating, but I'm suspicious that this approach is responsible for the increased memory usage, for two reasons. The obvious one is that the .so files can't be shared between multiple users: if there are multiple Tahoe processes on a single box, or a Tahoe process and something else that's using Crypto++, then they won't be able to share Crypto++ code pages unless that code is coming from the same place.

The other reason is that, lacking a stable shared library to reference, the pycryptopp glue .so files must each statically link against the Crypto++ libraries. My not-fully-investigated evidence for this is that cipher/aes.so is 12MB in size, and /usr/bin/size reports that it includes 2.7MB of .text (i.e. code pages). Since AES can be implemented in far less than this, I'm suspecting that a whole bunch of extra C++ baggage has been copied into that file. The other glue .so files (sha256.so, rsa.so) have similar sizes, so I think that there may be multiple copies of that C++ baggage.

When the system's Crypto++ was used from /usr/lib/libcrypto++.so, these glue .so's would reference that file instead, and any of the C++ overhead would be shared between the different python modules. So that might be a reason for the 6MB increase in memory footprint.

I'm not sure what I think about pycryptopp insisting upon using its own code in preference to the version installed on the system. This isn't a new topic of discussion.. we've talked about this one a lot. So perhaps this is just another datapoint in that discussion: using private copies of libraries instead of linking against a system library increases both disk usage *and* memory footprint. Disk usage may not be a very compelling argument these days, but memory size still might be.

comment:2 Changed at 2008-05-14T19:44:44Z by warner

More investigation: aes.so contains symbols for RSA, EC-DSA, SHA-512, and a whole bunch of other stuff.

So I believe that each of our glue .so files contains a complete copy of Crypto++, and only differ in the tiny amount of python-to-C++ code that is added on top of that.

The whole .so (including the whole copy of Crypto++) gets added to the Tahoe node's memory space for each module that gets imported (aes.so, sha256.so, rsa.so). That's why the vmsize is so large: duplicate copies of Crypto++ code.

Obviously, most of that code will never get used: there is no way for the aes.so glue code to provide access to, say, the Blowfish cipher. But the linker doesn't know that, so it has to include the whole thing in the .so. I believe that the runtime code is not able to selectively map pieces of an .so into memory, so it is forced to map the whole thing too, which might be why the RSS size grows too. It's C++, so there are static constructors and things that use memory when the code is loaded, so memory consumption gets pretty complicated.

The answer is probably to just ignore this and regretfully accept that a Tahoe node will consume more memory (or at least appear to consume more memory) than it used to. We'll add more things like this over time, and Tahoe's memory usage will grow and grow. I hope this doesn't happen.

Some other possibilities:

  • pycryptopp could have a single .so file, containing all of the glue code for all algorithms (RSA, DSA, AES, SHA256, etc). Then it could have separate .py modules that provide access to this glue, perhaps as simple as "from _pycryptopp import AES" or something, just a mapping of the names. This would get us a single copy of Crypto++ instead of multiple ones.
  • pycryptopp could link statically against its copy of Crypto++ instead of dynamically. This would probably result in just a minimal subset of Crypto++ being copied into the glue .so files: i.e. only the RSA code (and necessary support) would wind up in aes.so, nothing else. This would give end-users the smallest memory footprint: they would not pay the memory penalty for RSA unless then actually needed it.
  • using a system Crypto++ instead of a private copy might help, because then aes.so would reference /usr/lib/libcrypto++.so rather than incorporating a full copy, so aes.so and rsa.so could reference the same thing, and a python process which used both aes.so and rsa.so would only get one copy of Crypto++ instead of two.

comment:3 Changed at 2008-05-14T20:46:48Z by zooko

Thanks for investigating this. I'll try to do something to reduce this size at some point. Your three ideas of how to reduce it are three good ones.

comment:4 Changed at 2008-05-15T12:40:46Z by zooko

  • Status changed from new to assigned
  • Summary changed from mysterious speed-up in creation of SSK files to pycryptopp uses up too much space

comment:5 Changed at 2008-05-15T18:17:28Z by warner

  • Summary changed from pycryptopp uses up too much space to pycryptopp uses up too much RAM

comment:6 Changed at 2008-05-29T22:19:41Z by warner

  • Milestone changed from 1.1.0 to 1.2.0

comment:7 Changed at 2009-05-04T17:47:46Z by zooko

If you love this ticket (#419), then you might like tickets #54 (port memory usage tests to windows), #227 (our automated memory measurements might be measuring the wrong thing), #478 (add memory-usage to stats-provider numbers), and #97 (reducing memory footprint in share reception).

comment:8 Changed at 2009-06-15T20:02:35Z by zooko

  • Milestone changed from 1.5.0 to undecided
  • Owner changed from zooko to nobody
  • Priority changed from major to minor
  • Status changed from assigned to new
  • Type changed from task to defect

http://allmydata.org/trac/pycryptopp/ticket/9 (link against existing (system) libcrypto++.so) has been fixed in pycryptopp. I think that this implements the first and the third of Brian's suggestions above, if you pass --disable-embedded-cryptopp. If you don't, then pycryptopp has always implemented the second suggestion of Brian's -- to link statically against its own copy of libcryptopp. Once the buildslaves that run the memory measurements (on The Dev Page) are upgraded to pycryptopp >= 0.5.13 then if whoever builds the pycryptopp package uses --disable-embedded-cryptopp we'll see if this changes those measurements.

Remember, though, that I don't think the number produced by those measurements correlates with any behavior that anyone cares about -- see #227 (our automated memory measurements might be measuring the wrong thing). If we measured resident set size (and even better if we turned off swap in order to prevent the resident set size measurement from dipping down randomly) then the number would correlate with something we care about: how many Tahoe nodes can you have in RAM and doing this sort of task simultaneously.

I'm moving this from Milestone 1.5 to Milestone undecided, but I don't know if we should instead close it as "fixed" or "invalid" or "wont-fix".

comment:9 Changed at 2009-06-17T20:04:00Z by zooko

Oh, and I forgot that Matt Mackall has invented "smem" which provides measurements of memory usage that are actually useful: http://lwn.net/Articles/329458/ .

comment:10 Changed at 2009-12-04T05:13:08Z by davidsarah

  • Keywords memory pycryptopp performance added

comment:11 Changed at 2017-06-05T09:37:37Z by Brian Warner <warner@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 42d8a79/trunk:

Merge PR419: update docs: OpenSolaris?->Illumos

closes #419

comment:12 Changed at 2017-06-05T09:39:52Z by warner

  • Description modified (diff)
  • Resolution fixed deleted
  • Status changed from closed to reopened

Oops, I forgot to use the don't-close-Trac style of github commit message. This ticket might not be worth keeping around, but it shouldn't have been closed like that. Sorry!

Note: See TracTickets for help on using tickets.