[tahoe-dev] #217: DSA-based mutable files -- small URLs, fast file creation

Wed Feb 4 16:29:33 PST 2009

On Wednesday 04 February 2009 02:21:29 pm Shawn Willden wrote:
> If X and Y are uniform random variables (a reasonable assumption since
> secure hash functions are designed to simulate uniform random variables),
> then the PMF of XY mod q has a kind of a fractal-ish sawtooth wave shape. 
> I haven't found a way to characterize it mathematically

Now I have :)

If n is the number of bits in q (also the hash length, etc.), then the 
probability that Z=XY takes a particular value z is:

	(log(gcd(z,q))+1) / 2**(n+1)		(1)

where log is base 2 and gcd is the greatest common divisor function.  Oh, one 
caveat -- if z=0, the above needs another 1/2**(n+1) added to it.

The useful thing to note there is that gcd(z, q) is always 1 when z is odd, 
because q=2^n.  That means for odd values of z, the above collapses to:

	1/2**n+1

Which means the odd-valued z's have a uniform distribution.  So, if you have a 
96-bit key space but throw out all the even values, then you have a 95-bit 
key space that is uniformly distributed.  Thus, ensuring that xy is odd -- 
maybe by just trying different values of x until you get one that gives an 
odd result for xH(g**x mod q) mod q -- would give you a 95-bit keyspace for 
the intermediate key.  I haven't fiddled with it, but I suspect you'd have to 
throw out about half of the values of x, so your secret key would also have a 
95-bit space.

Another thing I can see here is that the worst case scenario is when 
z=2**(n-1).  In that case gcd(z,q) is 2**(n-1), so the probability of getting 
that value is:

	n / 2**(n+1)

As n gets large, that's a pretty small probability, and that's the single most 
likely value of z.  This puts an upper bound on the loss of security of log n 
bits.

So the effective key space of your scheme is between 2**(n-log n) and 
2**(n-1).  For n=96, that's between 2**89 and 2**95.  Some more work could 
obviously increase the bottom of that range, but there's not much point.

My conclusion:  As long as DSA has a margin of security of a few bits and if H 
truly simulates a uniform random variable then the scheme is secure.

Oh -- should mention, though, that I have no idea how to prove equation (1).  
I computed examples for n=10 and noticed patterns in the probabilities.  I 
constructed the denominator of (1) by noticing that all of the probabilities 
were multiples of that value and the numerator by looking at what the 
multiples were.  After I had constructed (1), I verified that it holds for 
all non-zero values of z, for all 2 <= n <= 14.  Above n=14 the calculations 
just took too long.

The equation is too neat and makes too much intuitive sense for it to be an 
accident, and the fact that it holds for 32764 values checked is a pretty 
strong indicator that it holds for all values.  I'm sure there's a way to 
prove it, I just haven't really thought about it.  I think figuring out why 
z=0 needs the additional 1/2**(n+1) would probably give some insight.

I, however, have to take my scout troop snowboarding, so I'll leave it as an 
exercise for the reader :)

	Shawn.