[tahoe-dev] note about hash-based digital signatures

Brian Warner warner at lothar.com
Wed Jun 23 12:01:11 PDT 2010

On 6/23/10 7:18 AM, David-Sarah Hopwood wrote:
> David-Sarah Hopwood wrote:
> Ah, but it will work for a multi-layer Merkle tree scheme, such as
> GMSS: if keys are generated deterministically from a seed, then the
> signatures certifying keys at upper layers are also deterministic, so
> there's no key-reuse problem for those.

Right, exactly. The basic scheme would look something like this:

 * set K=1024
 * build a K-ary tree with 2^M=2^256 leaves. Each leaf corresponds to a
   possible message.
 * define a deterministic keyed function from leaf number to keypair
 * define a deterministic keyed function from node number to keypair
 * publish the root pubkey as the readcap. Retain the secret key (used
   as an input to the above deterministic functions) as the writecap.
 * to sign a given message:
   * identify the DEPTH=ln(base=K, 2^256) =26ish tree nodes along the
     path from root to leaf
   * generate the one privkey for the message leaf, use it to sign the
     message itself
   * for all nodes on the path, from bottom to top:
    * generate all K pubkeys for the node's children, concatenate them
      into a string, treat that as a message
    * sign the message with the parent node's privkey

As zooko pointed out, the leaf signature can be optimized away: since
each message gets a unique pubkey, it suffices to reveal the
preimage/privkey for that pubkey. This reduces the size of the leaf
signature down to a single hash.

Assuming a 256-bit Merkle-signature scheme (with a "0" and a "1"
preimage for each bit), it takes 512 hashes to build all the
privkey/preimages, and then 512 hashes to build the pubkeys. Each
signature requires computing and publishing (revealing) 256 preimage

Generating the readcap is pretty easy: you build the K pubkeys for the
top node (4*256 small hashes each), hash each one down (K large hashes
of 512*32 bytes each), then hash those lists into a single value (one
large hash of K*32 bytes). So 1M small hashes, 1024 hashes of 16KiB
each, and one hash of 32KiB bytes.

For K=1024 and M=256, the message signature consists of the leaf
signature (one hash), and 26 nodes. Each node contains one signature
(256 preimages), one pubkey (512 hashes), and 1023 hashes-of-pubkeys. So
the overall message signature requires you publish about 46567 hashes,
or 1.5MB.

To compute that signature, you have to actually generate all those
pubkeys (even though you only have to publish the hashes of most of
them). You must generate DEPTH*K (about 26624) pubkeys. Each one
requires generating 512 preimages (one hash each, H(secret+nodenum+bit))
and then hashing each one to get the post-image, which is 1024 hashes
per pubkey, or about 27M hashes (these are all tiny hashes: 256 bytes
input). You have to hash each pubkey down (DEPTH*K large hashes of
512*32 bytes each), and then for each node, hash those lists into a
single value for the parent to sign (DEPTH large hashes of 1024*32

So starting with:

 K: branching factor, e.g. 1024
 M: size of message in bits, num-leaves=2^M, e.g. 256
 DEPTH: depth of tree, ln(K,2^M)
 H: size of a hash in bytes, e.g. 32

building the readcap requires computation:
 K*4*M tiny hashes (of H bytes each)
 K large hashes (of 2*M*H bytes each)
 1 large hash (of K*H bytes)

signing a message requires computation:
 DEPTH*K*4*M tiny hashes (of H bytes each)
 DEPTH larger hashes (of K*H bytes each)

signing a message requires publication:
 DEPTH*(3*M+(K-1))*H bytes of hashes

Here's the optimization table: the smallest publish size occurs at K=256
(1MB of data, 8.4M hashes), while the fastest computation (for both the
readcap generation and per-write signature generation) occurs at K=2
(523k hashes), but K=4 is almost as good and drops the publish size to
3.16MB of data. Generation script is attached; please check my math!

% python /tmp/hashsigs.py
k=branch       readcap      |               signature
  factor     small   pubkey |    small    large    large     publish
            hashes   hashes |   hashes   hashes  hashsize      size
========    ======   ======    ========  ======  ========    =======
       2     2.05k        2     524.29k    256         64     6.30MB
       4     4.10k        4     524.29k    128        128     3.16MB <--
       8     8.19k        8     704.51k     86        256     2.13MB
      16    16.38k       16       1.05M     64        512     1.60MB
      32    32.77k       32       1.70M     52       1024     1.33MB
      64    65.54k       64       2.82M     43       2048     1.14MB
     128   131.07k      128       4.85M     37       4096     1.06MB
     256   262.14k      256       8.39M     32       8192     1.05MB <--
     512   524.29k      512      15.20M     29      16384     1.19MB
    1024     1.05M    1.02k      27.26M     26      32768     1.49MB
    2048     2.10M    2.05k      50.33M     24      65536     2.16MB
    4096     4.19M    4.10k      92.27M     22     131072     3.42MB
    8192     8.39M    8.19k     167.77M     20     262144     5.73MB
   16384    16.78M   16.38k     318.77M     19     524288    10.43MB
   32768    33.55M   32.77k     603.98M     18    1048576    19.32MB
   65536    67.11M   65.54k       1.07G     16    2097152    33.95MB
  131072   134.22M  131.07k       2.15G     16    4194304    67.50MB
  262144   268.44M  262.14k       4.03G     15    8388608   126.20MB
  524288   536.87M  524.29k       7.52G     14   16777216   235.22MB

(this doesn't take account the leaf-signature-optimization, but that
only saves you a few hundred hashes).

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: hashsigs.py
URL: <http://allmydata.org/pipermail/tahoe-dev/attachments/20100623/b1c4d075/attachment.ksh>

More information about the tahoe-dev mailing list