#85 closed enhancement (fixed)

disable asm

Reported by: zooko Owned by: daira
Priority: major Milestone: 0.7.0
Version: 0.5.29 Keywords: design-review-needed
Cc: Launchpad Bug:

Description

When building the bundled copy of Crypto++, we should just disable ASM always. This will result in a performance penalty on platforms that formerly were able to use the asm implementation, but it would also avoid occasional weird bugs (most recently #84) and would make it easier for us (especially me, who doesn't really know ASM) to understand what it is doing.

Change History (30)

comment:1 Changed at 2013-04-09T22:00:52Z by sneves

Unless Crypto++ speed is a bottleneck of pycryptopp's applications, I see nothing wrong with this.

The main loss here seems to be AES-NI support, which increases AES speed by quite a lot.

comment:2 Changed at 2013-04-10T02:22:17Z by daira

Where is the code that selects AES-NI? I don't see it in any obvious place under https://tahoe-lafs.org/trac/pycryptopp/browser/git/src-cryptopp (rijndael.cpp/h, aes.h, cpu.cpp).

comment:3 Changed at 2013-04-10T02:27:19Z by sneves

The bits of rijndael.cpp that use AES-NI are guarded by CRYPTOPP_BOOL_AESNI_INTRINSICS_AVAILABLE. This macro itself is defined in config.h, line 293.

comment:4 Changed at 2013-04-11T06:04:16Z by zooko

See also #76.

comment:5 Changed at 2013-04-11T07:27:44Z by zooko

I pushed a patch that defined CRYPTOPP_DISABLE_ASM to the debug-netbsd-rsa-2 branch:

https://github.com/zooko/pycryptopp/commit/2a38bfd4b4ab714d5f624a4e505993b6aec4418b

And triggered a build of it on all builders:

https://tahoe-lafs.org/buildbot-pycryptopp/waterfall?last_time=1365665120

As well as telling us if this changes any behavior of unit tests, this also will show the difference in benchmarks between the previous version and the CRYPTOPP_DISABLE_ASM version. (To see this you have to manually click and visually compare the previous and current benchmarks on each builder.)

comment:6 Changed at 2013-04-12T08:28:40Z by zooko

Hm... So with CRYPTOPP_DISABLE_ASM, the quick self-test of SHA256 segfaults at startup, on the "atlas" ubuntu box and on midnightmagic's netbsd box:

https://tahoe-lafs.org/buildbot-pycryptopp/builders/atlas1%20natty%20syslib/builds/133/steps/version/logs/valgrind

But only when they are linking against the system-provided libcryptopp instead of the embedded cryptopp. I guess it is really necessary that the CRYPTOPP_DISABLE_ASM macro be set the same when we build pycryptopp as when the libcryptopp.so that we're linking against was built. It is a safe bet (I guess) that it is never set for any system-provided libcryptopp.so, so I'll just make it so that our setup.py sets CRYPTOPP_DISABLE_ASM only when it is building the embedded Crypto++.

comment:9 Changed at 2013-04-12T09:03:45Z by zooko

  • Keywords review-needed added

comment:10 Changed at 2013-04-12T16:47:25Z by zooko

I posted an update on the Crypto++ mailing list: ​https://groups.google.com/forum/?fromgroups=#!topic/cryptopp-users/qGIdqp3MIgg

#85 appears to be on the way to fixing this bug, but it is somewhat dissatisfying since I don't know _exactly_ why #85 fixes it.

comment:11 Changed at 2013-04-12T19:20:58Z by daira

  • Owner set to daira
  • Status changed from new to assigned

I will review the branch.

comment:12 Changed at 2013-04-12T19:31:14Z by daira

  • Keywords review-needed removed
  • Owner changed from daira to zooko
  • Status changed from assigned to new

comment:14 Changed at 2013-04-13T03:35:31Z by zooko

  • Keywords design-review-needed added
  • Owner changed from zooko to warner

comment:15 Changed at 2013-04-14T22:56:18Z by sneves

comment:17 Changed at 2013-04-19T22:02:27Z by zooko

  • Resolution set to fixed
  • Status changed from new to closed

comment:18 Changed at 2013-07-17T16:29:02Z by zooko

  • Resolution fixed deleted
  • Status changed from closed to reopened

This hasn't been applied to trunk yet.

comment:19 Changed at 2013-07-17T16:29:12Z by zooko

  • Status changed from reopened to new

comment:20 Changed at 2014-07-22T19:22:15Z by zooko

We had already agreed to disable assembly optimizations in pycryptopp, because there seem to have been a lot of bugs in the optimized assembly code in the past, and because the added speed really makes no difference to our uses, as far as I know.

However, in order to explain and justify to other people (e.g. Debian packagers) why we are doing this, and why they should consider doing the same thing themselves, I just read through the entire history of issues in pycryptopp and classified whether they were runtime errors (and therefore potential security bugs) or build-time errors (therefore probably not), and whether they would have been avoided if we had been disabling assembly optimizations all along. Here are the results. They clearly show that we should disable the optimized assembly! About half of all the security-threatening bugs we've had would never have been an issue if we'd avoided assembly from the beginning.

By the way, in my opinion the author of Crypto++, Wei Dai, is an exceptionally skilled, careful, and experienced coder, and I would assume that if Crypto++ has had this many security-threatening bugs in its optimized assembly code, then other crypto libraries that also use optimized assembly code have also had at least as many.

bugs that cause run-time failures

(These bugs are potential security issues.)

  • would have been avoided by DISABLE_ASM:
  • unclear if it would have been avoided if we'd used DISABLE_ASM:
  • would not have been avoided by DISABLE_ASM:
  • would not have been avoided by DISABLE_ASM (but would have been avoided by using cffi instead of CPython API)
  • would have been avoided if we *didn't* use DISABLE_ASM! (A bug only in the non-ASM version!)

bugs that cause deterministic build or compilation failures

(These bugs are *typically* not potential security issues but they can be, and in any case they are engineering/deployment issues.)

  • would have been avoided by DISABLE_ASM:
Last edited at 2014-07-22T19:22:46Z by zooko (previous) (diff)

comment:21 follow-up: Changed at 2014-10-02T01:12:47Z by daira

What is the status of this ticket and https://github.com/tahoe-lafs/pycryptopp/pull/23 ?

comment:22 Changed at 2014-10-02T01:17:47Z by daira

  • Owner changed from warner to zooko

comment:23 in reply to: ↑ 21 Changed at 2014-10-03T15:35:09Z by zooko

Replying to daira:

What is the status of this ticket and https://github.com/tahoe-lafs/pycryptopp/pull/23 ?

I'm confused. I apparently thought that the patch had been merged into pycryptopp trunk. Obviously, it hasn't.

comment:24 Changed at 2014-10-17T10:12:57Z by daira

Let's fix this before the Tahoe-LAFS 1.11 1.10.1 release. It's needed for Tahoe-LAFS to work on FreeBSD (https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2269), and it's a clear auditability improvement.

Last edited at 2015-04-14T07:04:52Z by daira (previous) (diff)

comment:25 Changed at 2014-12-01T16:31:11Z by zooko

  • Owner changed from zooko to daira

I agree we should fix this right away. What's the next step?

comment:26 Changed at 2014-12-01T22:53:24Z by daira

I suspect it is to separate out the changes that are relevant to this ticket from https://github.com/tahoe-lafs/pycryptopp/pull/23 , and file another pull request.

comment:28 Changed at 2016-01-03T22:27:40Z by zooko

We applied the first two patches from https://github.com/tahoe-lafs/pycryptopp/pull/26 (https://github.com/twopir/pycryptopp/commit/b4cb7086cba8ebfdc90fe63b1da879a7ce84c03f and https://github.com/twopir/pycryptopp/commit/cd7a209d10992b87bbbfd70846ee561b6bb218b9), asked Jeromy "whyrusleeping" Johnson to manually verify that this made the build go from failing to working on his Mac OS X 10.10.5 (see https://github.com/tahoe-lafs/pycryptopp/pull/27), and then ran benchmarks before and after the patch on my Google Chromebook Pixel 1 (Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz). The benchmarks (shown below) showed as expected: no change for ECDSA, Ed25519, RSA, or hashlib, and small change in performance for AES, XSalsa20, and SHA256.

benchmarks of current trunk with asm enabled:

zooko@spark ~/playground/pycryptopp $ time python setup.py bench
running bench
<class 'pycryptopp.bench.bench_sigs.ECDSA256'>
generate key
best: 4.350e-02,   1th-best: 4.350e-02, mean: 4.350e-02,   1th-worst: 4.350e-02, worst: 4.350e-02 (of      1)
sign
best: 9.089e-01,   1th-best: 9.089e-01, mean: 9.089e-01,   1th-worst: 9.089e-01, worst: 9.089e-01 (of      1)
verify
best: 3.479e+00,   1th-best: 3.479e+00, mean: 3.479e+00,   1th-worst: 3.479e+00, worst: 3.479e+00 (of      1)

<class 'pycryptopp.bench.bench_sigs.Ed25519'>
generate key
best: 1.825e+00,   1th-best: 1.825e+00, mean: 1.825e+00,   1th-worst: 1.825e+00, worst: 1.825e+00 (of      1)
sign
best: 1.830e+00,   1th-best: 1.830e+00, mean: 1.830e+00,   1th-worst: 1.830e+00, worst: 1.830e+00 (of      1)
verify
best: 6.249e+00,   1th-best: 6.249e+00, mean: 6.249e+00,   1th-worst: 6.249e+00, worst: 6.249e+00 (of      1)

<class 'pycryptopp.bench.bench_sigs.RSA2048'>
generate key
best: 9.093e+01,   1th-best: 9.093e+01, mean: 9.093e+01,   1th-worst: 9.093e+01, worst: 9.093e+01 (of      1)
sign
best: 1.963e+00,   1th-best: 1.963e+00, mean: 1.963e+00,   1th-worst: 1.963e+00, worst: 1.963e+00 (of      1)
verify
best: 4.617e-02,   1th-best: 4.617e-02, mean: 4.617e-02,   1th-worst: 4.617e-02, worst: 4.617e-02 (of      1)

<class 'pycryptopp.bench.bench_sigs.RSA3248'>
generate key
best: 5.065e+02,   1th-best: 5.065e+02, mean: 5.065e+02,   1th-worst: 5.065e+02, worst: 5.065e+02 (of      1)
sign
best: 1.006e+01,   1th-best: 1.006e+01, mean: 1.006e+01,   1th-worst: 1.006e+01, worst: 1.006e+01 (of      1)
verify
best: 1.068e-01,   1th-best: 1.068e-01, mean: 1.068e-01,   1th-worst: 1.068e-01, worst: 1.068e-01 (of      1)

milliseconds per operation

<AES-128>
large (10000000 B)
best: 3.982e+00,   1th-best: 3.982e+00, mean: 3.982e+00,   1th-worst: 3.982e+00, worst: 3.982e+00 (of      1)

<AES-256>
large (10000000 B)
best: 5.422e+00,   1th-best: 5.422e+00, mean: 5.422e+00,   1th-worst: 5.422e+00, worst: 5.422e+00 (of      1)

<XSalsa20-256>
large (10000000 B)
best: 1.387e+00,   1th-best: 1.387e+00, mean: 1.387e+00,   1th-worst: 1.387e+00, worst: 1.387e+00 (of      1)

nanoseconds per byte crypted

<class 'pycryptopp.bench.bench_hashes.SHA256'>
large (10000000 B)
best: 5.136e+00,   1th-best: 5.136e+00, mean: 5.136e+00,   1th-worst: 5.136e+00, worst: 5.136e+00 (of      1)

<class 'pycryptopp.bench.bench_hashes.hashlibSHA256'>
large (10000000 B)
best: 4.798e+00,   1th-best: 4.798e+00, mean: 4.798e+00,   1th-worst: 4.798e+00, worst: 4.798e+00 (of      1)

nanoseconds per byte hashed

benchmarks with these two patches that disable asm:

running bench
<class 'pycryptopp.bench.bench_sigs.ECDSA256'>
generate key
best: 4.385e-02,   1th-best: 4.385e-02, mean: 4.385e-02,   1th-worst: 4.385e-02, worst: 4.385e-02 (of      1)
sign
best: 9.948e-01,   1th-best: 9.948e-01, mean: 9.948e-01,   1th-worst: 9.948e-01, worst: 9.948e-01 (of      1)
verify
best: 3.936e+00,   1th-best: 3.936e+00, mean: 3.936e+00,   1th-worst: 3.936e+00, worst: 3.936e+00 (of      1)

<class 'pycryptopp.bench.bench_sigs.Ed25519'>
generate key
best: 1.818e+00,   1th-best: 1.818e+00, mean: 1.818e+00,   1th-worst: 1.818e+00, worst: 1.818e+00 (of      1)
sign
best: 1.823e+00,   1th-best: 1.823e+00, mean: 1.823e+00,   1th-worst: 1.823e+00, worst: 1.823e+00 (of      1)
verify
best: 6.178e+00,   1th-best: 6.178e+00, mean: 6.178e+00,   1th-worst: 6.178e+00, worst: 6.178e+00 (of      1)

<class 'pycryptopp.bench.bench_sigs.RSA2048'>
generate key
best: 6.325e+01,   1th-best: 6.325e+01, mean: 6.325e+01,   1th-worst: 6.325e+01, worst: 6.325e+01 (of      1)
sign
best: 2.136e+00,   1th-best: 2.136e+00, mean: 2.136e+00,   1th-worst: 2.136e+00, worst: 2.136e+00 (of      1)
verify
best: 4.965e-02,   1th-best: 4.965e-02, mean: 4.965e-02,   1th-worst: 4.965e-02, worst: 4.965e-02 (of      1)

<class 'pycryptopp.bench.bench_sigs.RSA3248'>
generate key
best: 7.379e+02,   1th-best: 7.379e+02, mean: 7.379e+02,   1th-worst: 7.379e+02, worst: 7.379e+02 (of      1)
sign
best: 1.139e+01,   1th-best: 1.139e+01, mean: 1.139e+01,   1th-worst: 1.139e+01, worst: 1.139e+01 (of      1)
verify
best: 1.213e-01,   1th-best: 1.213e-01, mean: 1.213e-01,   1th-worst: 1.213e-01, worst: 1.213e-01 (of      1)

milliseconds per operation

<AES-128>
large (10000000 B)
best: 7.006e+00,   1th-best: 7.006e+00, mean: 7.006e+00,   1th-worst: 7.006e+00, worst: 7.006e+00 (of      1)

<AES-256>
large (10000000 B)
best: 9.376e+00,   1th-best: 9.376e+00, mean: 9.376e+00,   1th-worst: 9.376e+00, worst: 9.376e+00 (of      1)

<XSalsa20-256>
large (10000000 B)
best: 5.388e+00,   1th-best: 5.388e+00, mean: 5.388e+00,   1th-worst: 5.388e+00, worst: 5.388e+00 (of      1)

nanoseconds per byte crypted

<class 'pycryptopp.bench.bench_hashes.SHA256'>
large (10000000 B)
best: 6.213e+00,   1th-best: 6.213e+00, mean: 6.213e+00,   1th-worst: 6.213e+00, worst: 6.213e+00 (of      1)

<class 'pycryptopp.bench.bench_hashes.hashlibSHA256'>
large (10000000 B)
best: 4.839e+00,   1th-best: 4.839e+00, mean: 4.839e+00,   1th-worst: 4.839e+00, worst: 4.839e+00 (of      1)

nanoseconds per byte hashed

comment:29 Changed at 2016-01-03T22:53:05Z by zooko

  • Resolution set to fixed
  • Status changed from new to closed

comment:30 Changed at 2016-01-14T17:52:11Z by daira

  • Milestone set to 0.7.0
Note: See TracTickets for help on using tickets.