# [tahoe-dev] #217: DSA-based mutable files -- small URLs, fast file creation

Shawn Willden shawn-tahoe at willden.org
Wed Feb 4 16:29:33 PST 2009

```On Wednesday 04 February 2009 02:21:29 pm Shawn Willden wrote:
> If X and Y are uniform random variables (a reasonable assumption since
> secure hash functions are designed to simulate uniform random variables),
> then the PMF of XY mod q has a kind of a fractal-ish sawtooth wave shape.
> I haven't found a way to characterize it mathematically

Now I have :)

If n is the number of bits in q (also the hash length, etc.), then the
probability that Z=XY takes a particular value z is:

(log(gcd(z,q))+1) / 2**(n+1)		(1)

where log is base 2 and gcd is the greatest common divisor function.  Oh, one
caveat -- if z=0, the above needs another 1/2**(n+1) added to it.

The useful thing to note there is that gcd(z, q) is always 1 when z is odd,
because q=2^n.  That means for odd values of z, the above collapses to:

1/2**n+1

Which means the odd-valued z's have a uniform distribution.  So, if you have a
96-bit key space but throw out all the even values, then you have a 95-bit
key space that is uniformly distributed.  Thus, ensuring that xy is odd --
maybe by just trying different values of x until you get one that gives an
odd result for xH(g**x mod q) mod q -- would give you a 95-bit keyspace for
the intermediate key.  I haven't fiddled with it, but I suspect you'd have to
throw out about half of the values of x, so your secret key would also have a
95-bit space.

Another thing I can see here is that the worst case scenario is when
z=2**(n-1).  In that case gcd(z,q) is 2**(n-1), so the probability of getting
that value is:

n / 2**(n+1)

As n gets large, that's a pretty small probability, and that's the single most
likely value of z.  This puts an upper bound on the loss of security of log n
bits.

So the effective key space of your scheme is between 2**(n-log n) and
2**(n-1).  For n=96, that's between 2**89 and 2**95.  Some more work could
obviously increase the bottom of that range, but there's not much point.

My conclusion:  As long as DSA has a margin of security of a few bits and if H
truly simulates a uniform random variable then the scheme is secure.

Oh -- should mention, though, that I have no idea how to prove equation (1).
I computed examples for n=10 and noticed patterns in the probabilities.  I
constructed the denominator of (1) by noticing that all of the probabilities
were multiples of that value and the numerator by looking at what the
multiples were.  After I had constructed (1), I verified that it holds for
all non-zero values of z, for all 2 <= n <= 14.  Above n=14 the calculations
just took too long.

The equation is too neat and makes too much intuitive sense for it to be an
accident, and the fact that it holds for 32764 values checked is a pretty
strong indicator that it holds for all values.  I'm sure there's a way to
prove it, I just haven't really thought about it.  I think figuring out why
z=0 needs the additional 1/2**(n+1) would probably give some insight.

I, however, have to take my scout troop snowboarding, so I'll leave it as an