Understanding Tahoe's position on ASM and Intrinsics
Jeffrey Walton
noloader at gmail.com
Tue Nov 17 20:31:33 UTC 2015
Hi Everyone,
I work with the Crypto++ project. It was written by Wei Dai, and you
guys (and gals) know what it is...
I was perusing some old bug reports (like
https://tahoe-lafs.org/trac/pycryptopp/ticket/85), and came across
Comment 15. Comment 15 discusses PYCRYPTOPP_DISABLE_EMBEDDED_CRYPTOPP.
What are the expectations when PYCRYPTOPP_DISABLE_EMBEDDED_CRYPTOPP is
in effect? I think I understand it means "no hand written assembly
routines". That includes inline ASM, and out-of-line S files. But how
about intrinsics?
**********
Here's the background information... I tend to view intrinsics as
distinct from assembly language routines. I even view inline and
out-of-line ASM distinct. I found inline ASM integration is spotty at
times, and its not optimized well at times. I also found out-of-line
ASM in an S file (with a clear line of demarcation) was optimized
remarkably well under GCC 5. The out-ofline ASM was a leaf function,
and it did not spill over into the red zone.
As an example, we are getting ready to cut-in RDRAND and RDSEED
support. The high level design logic is:
(1) If the compiler supports RDRARND and RDRSEED intirinsics, then use them
- C-like functions
- optimized well
(2) If the compiler *does not* supports RDRARND and RDRSEED, then
use ASM routines
- the ASM routines emit the byte codes for the instructions
- even down level compilers and assemblers support it. I
recently tested Fedora 1 with GCC 3.2
(3) item (2) honors CRYPTOPP_DISABLE_ASM
- we disable RDRAND and RDREED ASM in response to CRYPTOPP_DISABLE_ASM
- intrinsics are still available, however
(4) If neither (1) nor (2) are available, then use a lazy throw strategy.
- throw NotImplemented() if GenerateBlock() is called
- owtherwise, let folks create the objects without complaining
I want to ensure its meeting expectations like
PYCRYPTOPP_DISABLE_EMBEDDED_CRYPTOPP.
We are also happy to detect PYCRYPTOPP_DISABLE_EMBEDDED_CRYPTOPP, and
set flags internally to ensure Tahoe's expectations are being met. I
think its a little easier for us to do it because Crypto++'s
distinctions between ASM and Intrinsics is kind of blurry at times.
Unraveling the distinction between SSE, AES and AES-NI is a bear, and
it was painful (for me) when porting to X32 platforms (e.g.,
https://wiki.debian.org/X32Port).
**********
My apologies if I am speaking out of turn.
Jeff
More information about the tahoe-dev
mailing list