[tahoe-dev] Merkle-Winternitz-HORS signature scheme for Tahoe-LAFS
David-Sarah Hopwood
david-sarah at jacaranda.org
Sun Jul 4 23:12:11 PDT 2010
[cc:d to cryptography list from the tahoe-dev list.
See <http://allmydata.org/pipermail/tahoe-dev/2010-June/004439.html>,
<http://allmydata.org/pipermail/tahoe-dev/2010-June/004502.html> and
http://allmydata.org/pipermail/tahoe-dev/2010-June/004496.html> for context.]
Brian Warner wrote:
> On 6/23/10 7:18 AM, David-Sarah Hopwood wrote:
>
>> Ah, but it will work for a multi-layer Merkle tree scheme, such as
>> GMSS: if keys are generated deterministically from a seed, then the
>> signatures certifying keys at upper layers are also deterministic, so
>> there's no key-reuse problem for those.
>
> Right, exactly. The basic scheme would look something like this:
>
> * set K=1024
> * build a K-ary tree with 2^M=2^256 leaves. Each leaf corresponds to a
> possible message.
> * define a deterministic keyed function from leaf number to keypair
> * define a deterministic keyed function from node number to keypair
> * publish the root pubkey as the readcap. Retain the secret key (used
> as an input to the above deterministic functions) as the writecap.
> * to sign a given message:
> * identify the DEPTH=ln(base=K, 2^256) =26ish tree nodes along the
> path from root to leaf
> * generate the one privkey for the message leaf, use it to sign the
> message itself
> * for all nodes on the path, from bottom to top:
> [
> * generate all K pubkeys for the node's children, concatenate them
> into a string, treat that as a message
> * sign the message with the parent node's privkey
> ]
>
> As zooko pointed out, the leaf signature can be optimized away: since
> each message gets a unique pubkey, it suffices to reveal the
> preimage/privkey for that pubkey. This reduces the size of the leaf
> signature down to a single hash.
>
> Assuming a 256-bit Merkle-signature scheme (with a "0" and a "1"
> preimage for each bit), it takes 512 hashes to build all the
> privkey/preimages, and then 512 hashes to build the pubkeys.
> Each signature requires computing and publishing (revealing) 256 preimage
> hashes.
Note that this is a Lamport-Diffie signature, not a Merkle one-time
signature. The Merkle one-time signature scheme (described in the second
paragraph under "Signing a several bit message" in [Merkle1987]) publishes
only preimage hashes corresponding to "1" bits, and uses a checksum.
> Generating the readcap is pretty easy: you build the K pubkeys for the
> top node (4*256 small hashes each), hash each one down (K large hashes
> of 512*32 bytes each), then hash those lists into a single value (one
> large hash of K*32 bytes). So 1M small hashes, 1024 hashes of 16KiB
> each, and one hash of 32KiB bytes.
>
> For K=1024 and M=256, the message signature consists of the leaf
> signature (one hash), and 26 nodes. Each node contains one signature
> (256 preimages), one pubkey (512 hashes), and 1023 hashes-of-pubkeys.
>
> So the overall message signature requires you publish about 46567 hashes,
> or 1.5MB.
The scheme that I'm currently considering has the following five
improvements over the one above:
1. For the signatures on non-leaf public keys, use the Winternitz one-time
signature scheme. This was first publically described in [Merkle1987],
but a clearer description is given in [BDS2008].
The Winternitz scheme (unlike the Lamport-Diffie or Merkle schemes) has
the property that the full public key can be derived from a signature.
Therefore it's not necessary to explicitly include the pubkey that is
being signed at each node, since it can be derived from the signature
on the next-lower node. More precisely, the lower signature gives a
claimed public key, which can be authenticated using the upper signature.
Winternitz signatures also allow a trade-off between signature size and
the number of hash computations needed to sign, depending on the base.
(Typically the scheme is described with a base that is a power of 2,
i.e. the message representative and checksum are expressed as base
2^w numbers. Actually it works for an arbitrary base >= 2, and using
a base that is not a power of two can sometimes save an extra few
percent in signature cost for a given signing size.)
The signing cost is linear in the base B, while the size of the
signature is only divided by lg(B). Nevertheless, choices of B from
4 up to about 32 are useful.
In the example above, the 256 + 512 = 768 hashes for the signature and
pubkey, are reduced to 133 hashes for base 4; 90 hashes for base 8; and
67 hashes for base 16.
Note that the optimal choices of K are typically smaller than 1024, so
the one-time signature/pubkey makes up a greater proportion of the
published size for each layer than in the example above. For instance,
if K = 16, B = 16, and M = 256, then the number of hashes published per
layer drops to 67 + K-1 = 82. Without any of the other improvements
below, at least 64 layers would be needed, so that would be 5248 hashes,
or 164 KiB. (If Zooko's optimization is used for the leaf signatures
then it is 63*82 + 15 + 1 = 5171 hashes, or ~161.6 KiB.)
2. It is possible to sign the root of a small Merkle tree of the child
pubkeys, rather than a flat hash of them. This saves on the signature
size whenever the authentication path is smaller than the full list
of sibling keys. The saving is not spectacular for optimal choices of
K, but is still worth having.
For example, if we use a Merkle tree of arity A = 2 and height h = 4
(excluding the root), then for K = 16, instead of publishing 15 hashes
of sibling pubkeys at each layer, we can publish an authentication path
consisting of (A-1)*h = 4 hashes. That cuts the total signature size
for K = 16, B = 16, and M = 256, with Zooko's optimization, to
4478 hashes, or ~139.9 KiB.
3. For the signatures on messages, we can use the following generalization
of Zooko's idea of revealing a private key indexed by a hash of the
message:
Instead of revealing one privkey, we reveal q privkeys, indexed by q
independent hashes of the message. The probability that *all* of these
hashes will already have been revealed (allowing a forgery), can be made
much lower than the probability of one hash already having been revealed.
If the privkeys were from different parts of the certification tree,
then it would be necessary to include all of the signature chains, which
would negate any improvement in signature size. Instead, we'll have the
certification tree authenticate S "buckets" each containing K2 public
hashes (where K2 will typically be larger than K), and we'll reveal q
of the private keys from the *same* bucket for each signature. The
signature includes the certification chain for this bucket, plus q keys
and the corresponding q Merkle authentication paths (omitting duplicated
nodes).
This is essentially replacing the scheme used for the last signature
in the certification chain with an instance of the HORS signature scheme
[RR2002]. HORS is a stateless scheme that can sign multiple messages,
with security that degrades depending on how many messages have been
signed. Here, we're relying on its ability to sign more than one message
only when there is a collision that results in one of the S leaves being
used more than once.
Zooko suggested in
<http://allmydata.org/pipermail/tahoe-dev/2010-June/004496.html> that
the number of leaf hashes would only need to be large enough to prevent
accidental collisions. For the original scheme, this is not correct:
the attacker does not need to wait for two signatures to collide
accidentally; it can do an off-line search for a message that hashes
to any already-revealed key. So the number of leaf hashes would need to
be at least 2^k times the maximum number of signed messages in order to
achieve a k-bit security level against forgery. For example, for
10^16 ~= 2^53 signatures and 128-bit security, there would need to be
2^181 leaves. For similar security against a quantum computer running
the multiobject version of Grover's algorithm [CFS2000], we would have
to set k to 256, i.e. there would need to be 2^309 leaves.
Now let's analyse the security of the HORS-like variant. Suppose that
m signatures have been made. The number of times X that a given bucket
has been chosen follows a binomial distribution B(m, p) where p = 1/S.
I.e.
Pr(X = x) = C(m, x) * p^x * (1-p)^(m-x)
where C(m, x) is the binomial coefficient 'm choose x'.
If an attacker picks a random seed and message that falls into a bucket
that has been chosen x times, then at most q*x private values in that
bucket will have been revealed. In that case (ignoring the possibility
of guessing private keys, which is negligable) the attacker's success
probability for a forgery using the revealed values is at most
(q*x / K2)^q. If we model the hash as a random oracle (for simplicity),
then each query will choose a random bucket, and so the probability of
a query to the hash oracle allowing a forgery is:
Pr(forgery) = sum_{x = 1..j}(Pr(X = x) * (q*x / K2)^q) + Pr(x > j)
where j = floor(K2/q)
We can choose S, q and K2 such that, up to a given maximum number of
signatures M, this probability is less than the probability of guessing
a hash preimage. It is possible to meet this constraint even if S is
less than M^2, i.e. even if we expect there to be some accidental
collisions after M signatures (although making S much less than M^2
does not result in optimal signature sizes).
Note that while this improvement allows us to decrease the number of
leaves in the certification tree, it does *not* by itself allow decreasing
the hash output size; at this point the hash output still needs to be
long enough to be collision-resistant.
It is possible to use variations of HORS such as HORS++ [PWX2004],
which depends on a weaker security assumption, or Schemes 1 and 2
from [PC2006], which may allow slightly smaller signatures for a given
security level (although note that the HORS signature is a relatively
small part of the overall signature chain).
4. In order to prevent rollback attacks, the scheme needs to be stateless,
in the sense that a storage client can always sign given the write cap,
without knowing which keys have previously been used.
However, this doesn't preclude a storage client from cacheing temporary
state as an optimization, provided that it can throw that state away
without loss of security. Storage clients already cache MutableFileNode
objects temporarily, so it would be an advantage if we could sign more
cheaply when a MutableFileNode is already cached. We can assume that
operations on a given MutableFileNode are serialized. (Note that if we
accidentally have two cache entries for the same file, that won't break
security, because it is the same as having two independent signers.)
To do this we add another certification layer below the one that uses HORS.
When a cache entry for a mutable file is created, we generate R Winternitz
keypairs and put the pubkeys in a Merkle tree. Then we sign the root of
that Merkle tree (instead of the message-hash) as above. That signature
chain and the R privkeys (or seeds to regenerate them) are cached with the
MutableFileNode. To sign a message, we sign it using one of the R cached
privkeys, delete that privkey, and append the signature to the previously
cached chain. If we run out of privkeys, we repeat the process as though
the node was not cached.
For reasonably small R (say R ~= K), generating the R keypairs will take
little time compared to regenerating and signing the all of the keys at
higher layers. In the cached case, the cost of signing is only making
one Winternitz signature.
The R keys can be used in a random order (different from the order they
appear in the HORS signature), to avoid leaking any information about
whether and for how long the filenode had been cached.
Another advantage of this optimization is that it allows us to assume
that each message signed by the HORS scheme is random and cannot be
influenced by an attacker (since it is the hash of a set of keys that
does not depend on any file contents), which simplifies the security
analysis.
5. By using seeded hashes, we can halve the size of hash output required for
any given security level. This is important because -- all other parameters
being the same -- it results in a fourfold reduction in signature size, as
well as halving the number of hash compressions.
[The literature refers to "keyed" or "dedicated-key" hashes. But the seeds
are not really keys, since they're made public immediately after a given
hash operation. Also, there are usually other keys involved in a protocol,
so I think it is unnecessarily confusing to refer to these seeds as keys.]
To minimize the strength of security property needed from the hash
function, we'll use seeds that the verifier can authenticate as having
been chosen by the file creator or signer. It is possible to use
unauthenticated seeds, but that would require a stronger property of
the hash function -- 's-eSec' [RSM2010] instead of 'eSec' [RS2004].
[The 'eSec' property, which stands for "everywhere second preimage
resistant", is also known by the names TCR (Target Collision Resistant),
or being a UOWHF (Universal One-Way Hash Family). I prefer the name 'eSec'
because it is closer to being a variant of second preimage resistance,
than it is to collision resistance. In particular, generic birthday
attacks cannot be used to break eSec.]
We need the following seeds:
- The creator of a mutable file chooses a random "file seed", in such a
way that it can be authenticated by any holder of a read or write cap.
For example, it could be generated as an unkeyed hash of the write
secret; in that case the write cap contains sufficient information to
sign without waiting for any round-trips to the storage servers.
- For each Winternitz keypair generated for the caching optimization,
we add another private key element called the "signature seed".
This is revealed in the signature, but treated as private until then.
The signature seed is included as input to the hash that derives the
corresponding Winternitz public key.
The file seed is used to key all hash applications for a given mutable
file, *except* when hashing a message to produce a message representative.
The latter is keyed by the signature seed from the signing key.
Each position in the certification tree should use a hash that can be
treated as independent. There are several ways to do this:
a) Generate independent seeds for each hash position. We can do this
by expanding the file seed using another hash or pseudorandom bit
generator. The result can be proven secure when the seed expander is
modelled as a random oracle.
(This is a fairly inoccuous use of the random oracle model. We are
*not* modelling the hash that is used for the signatures and Merkle
trees as a random oracle, which would be unreasonable because its
output may be too short. Note that we can't model the seed expander
as a PRF, because we would be revealing its key.)
b) Use one of the constructions for hash trees based on XORing the
hash inputs with masks, such as XOR trees [BR1997] or their
improvements [Mironov2001], or SPR-Merkle trees [DOTV2008].
c) Use a tree hashing mode with a compression function that is designed
to be secure in that mode. Typically the compression function will
have a label input that is unique for each position.
Zooko asked in
<http://allmydata.org/pipermail/tahoe-dev/2010-June/004439.html>,
why the original Merkle tree construction, [Merkle1987], can't be
proven secure (that is, to preserve the second preimage resistance
of the hash it is based on) without assuming that the hash is collision
resistant.
First, note that the applicable security property is the eSec variant
of second preimage resistance, since the input is not random as it
would have to be for the Sec or aSec variants. (In general, the
attacker chooses the inputs at the bottom level of the tree. Also,
the output of the hash is not random because that's not implied by
any of the properties studied in [RS2004], so neither is the input to
the next-higher level.)
So, we are looking for a tree hash that preserves eSec/TCR. Section 5.1
of [BR1997] explains why the Merkle-Damgård construction does not
preserve this property in the case of a linear hash. Basically, it's
because iterating the hash might result in violating the assumption
that the input to later iterations is independent of the seed. The
counterexample used to show this is completely contrived, but it's
enough to show that security is not preserved for an arbitrary eSec
compression function. The same counterexample also shows why the
Merkle tree construction (with the same seed used for all of the hash
applications) does not preserve eSec.
Note that eSec *is* preserved if we use independent seeds for each
level of the Merkle tree. All the hashes at the same level can use
the same key -- that gives the "TH" construction in section 5.4 of
[BR1997]. However, that would effectively multiply the seed size
by the number of levels. That wouldn't be fatal -- we could still
authenticate the larger seed with the same size of read and write
caps. However we'd have to include the seed in the signature (it would
effectively double the length of the Merkle authentication paths), and
obviously we don't want to increase the signature size if we don't
have to. I think that options a), b) and c) above are better.
(If you did use the TH construction, you'd still want to include the
position within a layer in the input to the hash, in order to prevent
multi-target attacks within a layer, as discussed below.)
Sigh, this post is far too long, but we have a way to go yet :-)
Let's explain some more of the security motivation.
Without the file seed, a multi-target attack would be possible: a k-bit
preimage can be found on any of 2^s targets with only 2^(k-s) work. An
attacker could use this to find a preimage for any of the hashes used
in the certification trees of several files, without having to first pick
which file to attack. Also, if the same hash function were used at each
position, there would be a multi-target attack against all of the hashes
in a particular certification tree (which would reduce the security by
a factor equal to the number of hashes in the tree).
A multi-target attack is still possible against the write secret, which
must be long enough to resist it. I.e. if there can be 2^s files, then
for a security level of k bits the write secret (and its hash in the
readcap) must be at least k+s bits. However, this length does not affect
the size of signatures.
Without the signature seed, a hash collision would allow chosen-message
attacks: the attacker finds two colliding messages, submits one as a
chosen-message query, and obtains a signature that is also valid on the
other ([BR1997] page 26). Hashing the message with the signature seed
prevents this because the seed is not known to the attacker when it is
choosing a message (the file seed cannot be used because it would be
known at this point). A given signature seed will be known after the
signature is revealed, but then that Winternitz key will not be reused.
The verifier knows that the file seed is the one chosen by the file
creator, because it is authenticated by the readcap. It knows that the
signature seed is the one chosen by the storage client, because it is
authenticated by a Winternitz public key, that is certified by the rest
of the certification chain.
Some dicussion of patents:
The basic idea used in point 5 is that we commit to a seed for a eSec/TCR
hash function that will be used in some future signature. We do that by
including the future seed with other elements of the future key, applying
another instance of the hash to it using a previous seed, and signing
the result with a previously authenticated signature key. There could be
some concern that this idea is similar to one patented by Rohatgi in
U.S. patents 6701434, filed in 1999, and/or 6826687, filed in 2000
(although I came up with it independently, consider it to be obvious,
and don't concede the validity or legality of *any* algorithm patent).
In any case, this approach has solid prior art; it was described ten
years earlier by Naor and Yung. In section 4.3 of [NY1989] they
describe it for Lamport-Diffie signatures used linearly, and then
in section 4.5 they generalize it to a tree-based scheme similar to
what I described above.
There could also be a concern that point 4 above is similar to
on-line/off-line signatures as patented by Even, Goldreich and Micali
(U.S. patent 5016274, filed in 1988; expires on 14 May 2011). Again
there is prior art, this time by Merkle in U.S. patent 4881264,
filed in July 1987, and expired in July 2007. (The body of this patent
is essentially a version of the [Merkle1987] paper, but it includes
some details not in the paper -- for example it mentions the
possibility of using the nodes of the certification tree in random
order.)
The EGM patent is particularly obnoxious because it tries to patent
several ideas invented/discovered by other people. We hates patents.
We hates them forever.
This is now *way* too long, so I'll discuss performance in another post.
[Merkle1987] Ralph Merkle,
"A Digital Signature Based on a Conventional Encryption
Function,"
In proceedings of CRYPTO '87:
Lecture Notes In Computer Science Vol. 293, pp 369-378,
Springer-Verlag 1988.
<http://systems.cs.colorado.edu/grunwald/Classes/Fall2003-InformationStorage/Papers/merkle-tree.pdf>
[NY1989] Moni Naor and Moti Yung,
"Universal One-Way Hash Functions and their Cryptographic
Applications,"
Proceedings of the 21st Annual ACM Symposium on Theory of
Computing, held on May 14-17 1989.
<http://portal.acm.org/citation.cfm?id=73011>
Revised version, 13 March 1995:
<http://www.wisdom.weizmann.ac.il/~naor/PAPERS/uowhf_abs.html>
[BR1997] Mihir Bellare and Phillip Rogaway,
"Collision-Resistant Hashing: Towards Making UOWHFs Practical,"
July 17, 1997.
<http://cseweb.ucsd.edu/~mihir/papers/tcr-hash.html>
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.6148>
[CFS2000] Goong Chen, Stephen A. Fulling and Marlan O. Scully,
"Grover's Algorithm for Multiobject search in Quantum
Computing,"
arXiv:quant-ph/0007123v1 and 0007124v1, 31 July 2000.
Part I: <http://arxiv.org/abs/quant-ph/0007123v1>
Part II: <http://arxiv.org/abs/quant-ph/0007124v1>
[Mironov2001] Ilya Mironov,
"Hash Functions: From Merkle-Damgård to Shoup,"
Proceedings of Eurocrypt 2001, pp. 166-181.
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.103.4039>
[RR2002] Leonid Reyzin and Natan Reyzin,
"Better than BiBa: Short One-time Signatures with
Fast Signing and Verifying,"
Cryptology ePrint Archive: Report 2002/014
(clarified 17 October 2007).
<http://eprint.iacr.org/2002/014>
[PWX2004] Josef Pieprzyk, Huaxiong Wang, and Chaoping Xing,
"Multiple-Time Signature Schemes against Adaptive Chosen
Message Attacks,"
In Selected Areas in Cryptography,
Lecture Notes in Computer Science Vol. 3006, pp. 88-100,
Springer-Verlag 2004.
<http://web.science.mq.edu.au/~hwang/SAC-PWX.ps>
[RS2004] Phillip Rogaway and Thomas Shrimpton,
"Cryptographic Hash-Function Basics: Definitions, Implications,
and Separations for Preimage Resistance, Second-Preimage
Resistance, and Collision Resistance,"
Full version, 2004.
<http://eprint.iacr.org/2004/035/>
[PC2006] Yongsu Park and Yookun Cho,
"Efficient One-time Signature Schemes for Stream
Authentication,"
Journal of Information Science and Engineering 22,
pp. 611-624, 2006.
<http://www.iis.sinica.edu.tw/JISE/2006/200605_09.pdf>
[BDS2008] Johannes Buchmann, Erik Dahmen, and Michael Szydlo,
"Hash-based Digital Signature Schemes,"
29 October 2008.
<http://www.cdc.informatik.tu-darmstadt.de/~dahmen/papers/hashbasedcrypto.pdf>
[DOTV2008] Erik Dahmen, Katsuyuki OKEYA, Tsuyoshi TAKAGI,
and Camille Vuillaume,
"Digital Signatures Out of Second-Preimage Resistant
Hash Functions,"
In Post-Quantum Cryptography,
Lecture Notes in Computer Science Vol. 5299, pp. 109-123,
Springer-Verlag 2008.
<http://pubgrid.tahoe-lafs.org/file/URI%3ACHK%3Ajfh3rs3lfzc6bocahmany5vteu%3A4v3wk352d5bjrnwa4roxvwg5iav3bvs2hqxlhp5tylqmg2rfiqlq%3A3%3A10%3A309460/@@named=/Dahmen_et_al.pdf>
[RSM2010] Mohammad Reza Reyhanitabar, Willy Susilo, and Yi Mu
"Enhanced Security Notions for Dedicated-Key Hash Functions:
Definitions and Relationships,"
Accepted at Fast Software Encryption 2010.
14 January 2010.
<http://eprint.iacr.org/2010/022/>
--
David-Sarah Hopwood ⚥ http://davidsarah.livejournal.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
URL: <http://allmydata.org/pipermail/tahoe-dev/attachments/20100705/fe240c94/attachment-0001.pgp>
More information about the tahoe-dev
mailing list