[tahoe-dev] [pycryptopp] #2: choose+implement EC-DSA KDF: deterministic generation of private key from small seed

Sun Feb 28 15:31:41 PST 2010

#2: choose+implement EC-DSA KDF: deterministic generation of private key from
small seed
------------------------+---------------------------------------------------
Reporter:  zooko        |           Owner:  zooko
    Type:  enhancement  |          Status:  new  
Priority:  major        |         Version:  0.4.0
Keywords:               |   Launchpad_bug:       
------------------------+---------------------------------------------------

Comment(by warner):

 We chatted on IRC the other week about what the next step should be.
 Deciding
 upon and implementing the KDF seems like the best one. Testing the KDF is
 a
 challenge. Here's my current favorite approach.

  * the C function (in {{{ecdsamodule.cpp}}}) to construct an ECDSA signing
    key should take a longint argument named {{{exponent=}}}, and raise a
    {{{ValueError}}} (or maybe {{{TypeError}}}) if its numeric value is
    outside the acceptable range (I think this is 1..q-2, where "q" is the
    size of the prime subgroup, but I may be wrong).
  * the C module should also export a constant, next to the constructor
    function, which allows Python code to learn what "q" is.
  * pycryptopp must also expose Poly1305 and/or XSalsa20, or whatever hash
    function we plan to use for the KDF.
  * the Python module should provide an {{{ECDSASigningKey}}} class, whose
    constructor takes an argument named seed=, which accepts a variable
 length
    string.
   * internally, this constructor should call a python function named
     {{{KDF}}}, which takes the seed and "q", and invokes whatever hash
     functions, expansions, modulus computations, and try-try-again loops
 as
     necessary until it comes up with an exponent in the correct range.
 Then
     it should call the C-based constructor.
   * we might like to say that omitting {{{seed=}}} causes the python code
 to
     use {{{os.urandom(ceil(log2(q))))}}} as a seed.

 This moves the KDF logic out of C and into Python, where we can test it
 more
 easily.

 If we get concerned about speed, we could implement the KDF logic in C as
 well, but retain {{{exponent=}}} as an option (callers would either pass
 {{{exponent=}}} or {{{seed=}}} but not both), and then have tests which
 assert equality of keys generated with the C {{{seed=}}} argument against
 keys generated with {{{exponent=python.KDF(seed)}}}, to confirm that the
 C-based KDF code behaves the same way as the sample Python implementation.

 As to choice of KDF (specifically the "turn N random bits into a suitable
 random secret exponent" sort of derivation function), I think the four
 basic
 variants we discussed (ignoring choice of hash function) were:

  * 1 (try-try-again): guess=HASH(seed+counter), truncate to ceil(log2(q))
    bits, test against limit, if out of bounds then increment counter and
 try
    again
  * 2 (bigmodulo): expand the seed to perhaps 2*log2(q) bits, reduce it
 modulo
    the limit, done. (actually it'd be more like e=1+(largesecret%(q-3)) or
    something)
  * 3 (truncate): expand the seed to floor(log2(q)) bits, done.
  * 4 (bigmodulo-and-try-try-again): expand the seed to 2*log2(q) bits, if
    that is larger than q*int(guess/q) (i.e. in the "tail"), try again,
 else
    reduce modulo q and return

 The try-try-again choices (1 and 4) have the unfortunate property that,
 depending upon how close the group size is to a power of two, they might
 be
 very hard to test (if q=2**128-9 then the chances of triggering the loop
 are
 negligible, so writing a test to exercise that code would be hard). The
 opposite extreme is occasional cases of looping a large number of times
 (if
 q=2**128+9 then nearly half of the 129-bit keys would be out-of range,
 causing a loop, and although the average number of loops should be two,
 the
 exponential distribution of loop counts means that sometimes you'll get 10
 or
 100 or 1000 loops).

 The other choices (2 and 3) have the unfortunate property that there will
 be
 some amount of bias in the resulting key, although at least David-Sarah
 believes that the 2-bigmodulo variant can reduce the bias down to a
 negligible size (less than a bit). 2-bigmodulo needs an efficient bigint
 modulo operation, which is probably not a big deal in the C code (and
 certainly not a problem up at the Python level). 3-truncate has up to one
 bit
 of bias, reducing a 128-bit key to more like a 127-bit key, but is
 super-simple and super-fast.

-- 
Ticket URL: <http://allmydata.org/trac/pycryptopp/ticket/2#comment:13>
pycryptopp <http://allmydata.org/trac/pycryptopp>
Python bindings for the Crypto++ library