#217 (Ed25519-based mutable files -- fast file creation, possibly smaller URLs) – Tahoe-LAFS

comment:1 Changed at 2007-11-30T04:12:27Z by zooko

Hm. I actually don't know which is better: DSA where my private key is the sha-256 hash of my 128-bit seed, or DSA where my private key is merely my 128-bit seed.

For best long-term security I would really rather that my private key is 256-bits, but those bits would be generated from a deterministic random number generator anyway, so they would basically be the SHA-256 of something, and I'm skeptical that the practical entropy of the something can ever really exceed 128 bits. So the former is definitely okay, but the latter is a tad simpler, and makes it clear to the cryptologist that the DSA private key has at most 128-bits of entropy in it...

comment:2 Changed at 2007-11-30T04:40:49Z by zooko

I take that back -- maybe x really ought to be 256-bits when it is used as the exponent. So no problem -- just use SHA-256 to make it into 256-bits.

comment:3 Changed at 2007-11-30T06:51:57Z by zooko

Summary changed from even cleverer public key crypto for mutable files to better crypto for mutable files -- small URLs, fast file creation

Okay, I couldn't sleep. I realized, while trying to fall asleep, that my idea had indeed had a hole in it -- I'd neglected verify-cap. The design I posted in the initial description of this ticket didn't allow us to give someone the ability to verify ciphertext without also giving them the ability to decrypt.

I've run up against this problem before when trying to figure out how to go down to one crypto-value in caps instead of two crypto-values, and I was chagrined to have forgotten about it when I posted this idea.

But then, just as I started falling asleep, I had a brainstorm about how you can have separate read-write, read-only, and verify caps, while having only a single crypto-value in the read-write cap and a single crypto-value in the read-only cap. So I got up to write it down lest I forget.

Starting with the design described in the initial description of this ticket, consider the problem that knowledge of the public key -- the verification key which is necessary for verification, implies knowledge of the symmetric secret key. This means that we can't give verification authority without also giving read authority. To fix this, we want the symmetric key to be something which isn't derivable from the verification key. This suggests that we have to include this symmetric key, or a crypto-value which can be used to generate the symmetric key, in the read-cap. But the read-cap also requires a proof that the verification key is correct. In our current implementation (0.7.0) (see mutable.txt, the read-cap contains both the symmetric key and the hash of the verification key.

Okay, so this was as far as I had gotten before -- I couldn't think of a way for there to be a single crypto-value which served both to prove the correctness of the verification key and to provide the symmetric data-decryption key, without being derivable from the verification key alone.

But just now I thought of this:

Create a new mutable file as described above, except do not set the symmetric encryption key k equal to the secure hash of the verification key. So now you have your initial seed r which is also the read-write-cap, and you have the public key, let's call it u. (For concreteness, imagine that we're using DSA and u = g^H(r)%p, where g and p are system-wide values and H() is SHA-256 .)

To generate the verify-cap v, let v = H(u).

To generate the read-only cap, generate a 128-bit salt, s with a secure hash of r. Let the read-cap be H(s, v). Now encrypt s using the read-cap and store the ciphertext of s on the storage servers.

Now given only the read-cap, you download the verification key v and the salt s, decrypt them as needed and check that your read-cap = H(s, v). This proves that v is correct, and it makes the read-cap unguessable, even to someone who knows v and the encryption of s.

So now the read-only-cap can be a single crypto-value, and the read-write-cap can be a single crypto-value, and we can separately permit read-write, read-only, and verify.

Note that the read-only-cap can be generated from the read-write-cap purely locally -- without having to fetch any information from the storage servers -- which means that the storage index can be derived from the read-only-cap.

The verify-cap will have to include the storage index as well as v. Therefore, these caps when embedded into tahoe hyperlinks can look like this:

http://localhost:8123/MW_upyf5nwrpccqw4f53hiidug96h

http://localhost:8123/MR_o859qp5btcftknqdppt66y1rxy

http://localhost:8123/MV_7j97kjz7zm3mdwsbr8n35oafpr4gjsn9665marzgunpf43nnzz8y

(That's with 128 bits for the storage index in the verify-cap -- I don't imagine verify-caps are important to pass around among users the way read-only caps and read-write caps are.)

This is very satisfying! We get the "http://localhost:8123" trick, the "MW_"/"MR_" tags, the human-friendly base-32 encoding, and we still have URLs that are small enough to be not quite so "intimidating" to users who are considering sharing them with text tools.

Note that since the crypto-values are 128-bits long, and 26 chars of base-32 encoding holds 130 bits, we have two extra bits to play with. It wouldn't hurt to redundantly encode the type tags, in case users lose or mangle the "MW_"/"MR_" tags. (For example when I double-click on the cap in XEmacs it selects only the base-32 portion -- it treats the underscore as a word separator.)

comment:4 Changed at 2007-11-30T14:30:37Z by zooko

Hm. Waking up this morning I remember one other issue. I couldn't find it in the trac tickets, but Brian has mentioned that he wants to make the storage index be derived from the public key so that storage servers can protect against a DoS attack in which someone sees a storage index that you are using, and then goes and uploads a different public key into that storage index on storage servers.

The scheme described in this ticket has the storage index derivable from the read-write-cap (the private key a.k.a. the signing key), and also derivable from the read-only-cap (which is derived from a combination of the public key and s -- the secret read-key salt). So the storage index cannot be derived from purely public information.

comment:5 Changed at 2007-12-31T21:04:07Z by warner

Summary changed from better crypto for mutable files -- small URLs, fast file creation to DSA-based mutable files -- small URLs, fast file creation

We have a protocol designed for this.. I'll attach it here. I've started the conversion work, it's about 50% complete. It requires new code in pycryptopp to expose the DSA library, of course.

Changed at 2007-12-31T21:04:43Z by warner

Attachment mutable.txt added

new DSA-based mutable-file protocol

comment:6 Changed at 2008-01-05T03:52:50Z by warner

Milestone changed from undecided to 0.8.0

comment:7 Changed at 2008-01-09T01:09:59Z by warner

Milestone changed from 0.8.0 (Allmydata 3.0 Beta) to 0.10.0

Changed at 2008-01-09T02:12:41Z by warner

Attachment mutable-DSA.png added

quick sketch of the new crypto scheme

comment:8 Changed at 2008-01-10T20:23:49Z by zooko

See discussion about key lengths on the tahoe-dev list.

comment:9 Changed at 2008-02-08T03:30:14Z by warner

When we do this, visit #306 and take advantage of the compatibility break to clean up hash tag names.

comment:10 Changed at 2008-02-09T00:03:44Z by warner

also visit #308 (maybe add "directory traversal caps" for deep-verify), which would probably require the addition of another layer of read-cap into the mutable file structure.

comment:11 Changed at 2008-02-12T22:40:44Z by warner

so, I think I see where the #308 "traversal cap" would fit. We define another cap in between the read-cap and the storage-index. The first 192 bits of the traversal cap is the hash of the first 192 bits of the read-cap (which is itself derived from the hash of the salt and the pubkey hash). The last 64 bits of the traversal cap is the hash of the last 64 bits of the read-cap (which is derived from the hash of the pubkey hash). The storage index is then composed of the 64-bit hash of the first 192 bits of the traversal cap, and the 64-bit hash of the last 64 bits of the traversal cap. The verify cap remains the same: the first 64 bits of the storage index, plus the 256-bit pubkey hash.

Then we define three separate AES encryption keys. The most powerful one is the hash of the write-cap, and is used in dirnodes to encrypt the read-write child caps. The middle one is the hash of the read-cap, and is used to encrypt the read-only child cap and the filename. The weakest one is the hash of the traversal cap, and is used to encrypt the traversal-cap (for directories) or the verify-cap (for files).

We'd need to enhance the MutableFileNode interface to expose these keys:

get_writekey()
get_readkey()
get_traversalkey() (or maybe get_deepverifykey())
get_readcap()
get_deepverifycap()
get_verifycap()

We could actually define an arbitrary number of intermediate capabilities, although of course there's no point in doing so unless we define how they should be used ahead of time.

The main downsides to this scheme are complexity and a slight performance hit (we need an extra two hashes to get from the write-cap or read-cap to the storage index).

Changed at 2008-03-05T22:47:13Z by warner

Attachment mutable-DSA.2.png added

new scheme, with deep-traversal cap

comment:12 Changed at 2008-03-05T22:53:44Z by warner

I added a picture of the enhanced scheme, with deep-verify/traversal caps.

One lingering question.. I don't remember why I wanted the last 64 bits of the read-cap/deep-verify-cap/storage-index to be a hash of the last 64 bits of the stronger cap, rather than simply a direct copy. If it were a copy, then the tail of the read-cap would just be the last 64 bits of the pubkey hash, and the exact same 64 bits would be put in the other caps. This would save some CPU time when deriving the subordinate caps.

I don't *think* there were any security properties that resulted from this being a hash instead of a copy.. the public key is public knowledge, so the pubkey hash is as well. We aren't mixing in any other material for those hashes (otherwise the storage server could not check that the slot has the right pubkey, by hashing the pubkey it finds there and comparing it against the tail of the SI).

comment:13 Changed at 2008-03-28T19:08:46Z by warner

At hackfest4 last night, Zooko, Ping, and I came up with a new scheme that takes advantage of the smaller keys (mainly the small public key) available with elliptic-curve DSA. In discrete-log DSA, the private key is 256 bits, but the public key is the full length of the modulus, probably 2048 bits. In EC-DSA (as I understand it), the two keys are the same length, and 192 bits would be plenty .

So in the new scheme:

the write-cap is just a 192-bit DSA signing key, randomly generated
- zooko says that with sufficient coaxing of the crypto++ PRNG seed, we could generate this safely and repeatably from a smaller seed, say 96 bits, and then get smaller write-caps.
the write-cap is hashed to form a 96-bit encryption secret. This provides the confidentiality
the read-cap is the 96-bit encryption secret and the 192-bit DSA verifying key. This is 288 bits long.
whatever subordinate caps we want are formed by hashing the encryption secret and concatenating the result to the verifying key
the storage index is the 128-bit hash of the verifying key

The data encryption keys are:

write-key: hash of the write-cap
read-key: hash of the read-cap
deep-verify key: hash of a subordinate cap
shallow-verify key: hash of a subordinate cap

There are a small number of caps that are meant to be shared by humans over text links (IM, email, etc). These are the ones that we want to keep small. Since we only really need maybe 3 or 4 of these, we assign each of these a single-letter prefix, like:

"D": read-write directory
"d": read-only directory
"F": immutable file (still longer than we want)

The human-shared-text caps then look like:

write-cap: http://127.0.0.1:8123/D18GeGYBSLAodrPuS4mUzWgrQkt2tlkxsZ
read-cap: http://127.0.0.1:8123/d2mR5OSt7yuf1ymLgvPxQdJFBGPUEU7uRzBL9CTM5lYwh0KCwG

We still need to store a copy of the verifying key in the share, so that the storage server can use it (optionally) to verify the signatures. The server can hash the verifying key and confirm that it matches the storage index.

I'm starting to think that sharing files and directories should be done at a slightly higher level than a raw read-cap. More details in #152.

comment:14 Changed at 2008-03-28T19:18:07Z by warner

Oops, that got formatted a bit weird:

write-cap (192b): http://127.0.0.1:8123/D18GeGYBSLAodrPuS4mUzWgrQkt2tlkxsZ
read-cap (288b): http://127.0.0.1:8123/d2mR5OSt7yuf1ymLgvPxQdJFBGPUEU7uRzBL9CTM5lYwh0KCwG

The read-cap is 72 characters long. The write-cap is 56 characters long (and could potentially be way shorter if we use the PRNG trick, like 40 characters (with prefix) or 16 characters (without).

Also note that this still depends upon having shared parameters. We're still trying to work out what these are in EC-DSA: zooko tnow tells me that they are independent of the curve (or at least, you pick a curve, then you pick the parameters, then you pick the key). We'll either need to choose one set of parameters for all Tahoe installations, or somehow build them into the hypothetical "grid identifier", or have the introducer tell everybody about them, or something.

comment:15 Changed at 2008-03-28T21:51:29Z by zooko

I thought Ping had suggested that capital letter meant "write authority", so it would be:

D: read-write directory
d: read-only directory
f: immutable file

Also potentially:

F: read-write mutable file

?

Leaving open how to spell "read-only mutable file"...

comment:16 Changed at 2008-04-03T00:18:47Z by warner

Zooko told me today that all the parameters are built-in to the curve that we select. NIST has been kind enough to generate and publish a couple of relevant ones: 128, 160, 192, 224 bits.

So we don't have to try and figure out how to distribute any "shared parameters": these are all specified by the curve. The private key is indeed simply a random number between 1 and 2^{n-1. There are still a few things we need in pycryptopp before we can complete this work:}

serialize private key to just the private exponent (and not the parameters)
deserialize private key
serialize public key to just the public point

And an item which will let us shrink the write-caps even further:

deterministic generation of private key from small seed

Zooko has put some tickets on the pycryptopp trac instance at http://allmydata.org/trac/pycryptopp/report/1

comment:17 Changed at 2008-04-03T15:51:51Z by zooko

Owner changed from somebody to zooko
Status changed from new to assigned

pycryptopp #3 (serialize ecdsa keys without the fluff)
pycryptopp #2 (deterministic generation of private key from small seed)

Note that deterministic generation of private key from small seed also enables an unspeakable "Password Based Public Key Encryption" idea. We shall not speak of it.

comment:18 Changed at 2008-04-16T01:02:50Z by warner

Jed Donnelley (from the cap-talk list) suggested that it would be useful to have shallow read-only caps on dirnodes, such that the holder could modify any children, but could not modify the dirnode itself. To accomplish this, we'd want another layer of key, in between the write-cap and the read-cap. I'm not sure if this will fit into our new DSA design as well as it would have in the RSA design, but I suspect there is room for it, especially if zooko's "shmublic" key idea works out.

comment:19 Changed at 2008-04-16T02:18:44Z by warner

Jed says:

> Why would I want a shallow read-only directory capability?  One example
> is to manage a project with other colleagues who I trust with write
> access to some of the underlying objects.  I can manage the project by
> choosing what to put into the shallow read-only directory (including
> whether some of the pieces are writable, shallow read-only, or deep
> read-only capabilities to directories) - nobody who I give it to can
> modify it - but everybody who I give the shallow read-only capability
> to can extract what's in it and write to that which I choose to share
> write access.

comment:20 Changed at 2008-04-16T03:55:44Z by zooko

My concern about this is not the implementation costs but the cost of increasing the cognitive load for users.

In Tahoe currently, the number of access control concepts that you have to learn in order to be confident that you can understand the possibilities is somewhat small: caps (identifiers which are also access control tokens), immutable files, mutable files, directories, deep-read-only access to directories.

If we added shallow-read-only-caps-to-directories then this would reduce the number of people who become sufficiently familiar with Tahoe that they feel confident predicting what can and cannot happen with various uses of it. This is a high cost to pay, so I would support it only if the payoff were similarly high. I don't yet understand why Jed's use case can't be solved pretty well with the current access control tools.

This sounds like the kind of use case that Zandr's dad has. I sure would like to see some documentation of their needs...

comment:21 Changed at 2008-04-17T18:10:55Z by warner

Yeah. I suppose we could implement the cap internally, but not make it particularly obvious in the UI, and then make it more accessible later when we figure out how to explain it safely.

comment:22 Changed at 2008-04-17T19:28:28Z by zooko

Hm... This is interesting. I'm not sure that this approach would work to make more people confident about understanding the possible consequences of their actions. Indeed it might undermine their confidence or lead them to be falsely confident!

Consider, for example, if you are writing a program using the current Tahoe API, and someone is going to hand your program a capability to a directory. You write your program so that it queries the cap to determine whether it is read-write or read-only, and then your program takes different actions about how to share this directory with others depending on what sort of cap it is.

Now, if we release a new version of Tahoe with shallow-read-only caps to directories, then what should a shallow-read-only cap answer when queried about whether it is a read-write cap? Obviously it is not a read-write cap, but the program might be inferring that by answering "false" that it is claiming to be a deep-read-only cap.

It seems like any way we do it could cause a program that worked and was secure with Tahoe v1 to work and have a security hole with that new version of Tahoe. So, perhaps such directories would have to be a new type, so that programs written to the Tahoe v1 API cannot use the new directories at all. Then a program that worked and was secure with Tahoe v1 would fail safely if someone passes it a new dir.

In general, I doubt that we can deploy additional access control semantics without raising the amount of study necessary to become a confident programmer. A programmer should want to understand the whole access control mechanism, so undocumented or optional features come with a cost.

This is not to say that we shouldn't add shallow-read-only directories. Perhaps they are so useful that they are worth the cost.

comment:23 Changed at 2008-04-24T23:27:14Z by warner

Component changed from code to code-mutable

comment:24 Changed at 2008-05-01T11:54:44Z by zooko

See also #102 (smaller and prettier directory URIs).

comment:25 Changed at 2008-05-09T00:11:05Z by warner

Milestone changed from 1.1.0 to undecided

this isn't going to happen for 1.1.0

comment:26 Changed at 2008-07-31T13:50:37Z by zooko

Note that it is elliptic curve cryptography which allows us to have public keys that are a mere 192-bits in size and are still secure.

comment:27 Changed at 2008-08-12T20:35:16Z by zooko

I've been blogging with tiddly-wiki-on-top-of-tahoe. here's my blog. Almost everytime I give someone the URL to my blog, they say something about awful the URL is. :-(

I'm getting sick of hearing about it.

<zooko> Please read my blog: 
        http://tahoebs1.allmydata.com:8123/uri/URI:DIR2-RO:hgvn7nhforxhfxbx3nbej53qoi:yhbnnuxl4o2hr4sxuocoi735t6lcosdin72axkrcboulfslwbfwq/wiki.html 
                                                                        [13:12] 
<wiqd> er, no ?                                                         [13:17] 
<zooko> Okay. 
<PenguinOfDoom> zooko: That URL is utterly terrifying                   [13:30] 
<PenguinOfDoom> zooko: is it a thing that you made up with tahoe?       [13:31] 
<zooko> PoD: I know.  -(                                                [13:32] 
<zooko> http://allmydata.org/trac/tahoe/ticket/217 # DSA-based mutable files 
        -- small URLs, fast file creation                               [13:33] 
<arkanes_> that's "small" is it?                                        [13:34]

comment:28 Changed at 2008-08-14T19:45:18Z by zooko

Here's today's mockery of my blog's URL, from Wes Felter:

zooko wrote:

Hi WMF! I read your blog today. Here is my new one: http://tahoebs1.allmydata.com:8123/uri/URI:DIR2RO:hgvn7nhforxhfxbx3nbej53qoi:yhbnnuxl4o2hr4sxuocoi735t6lcosdin72axkrcboulfslwbfwq/wiki.html

Dude, that URL is crazy; the price of being a cypherpunk I guess.

I really want to implement this ticket. I intend to prioritize #331 (add DSA to pycryptopp - serialize pubkeys with less fluff) right after my StorageSS08 paper and new checker.

comment:29 Changed at 2008-09-24T13:45:53Z by zooko

Please see http://allmydata.org/~zooko/lafs.pdf -- Figure 2 shows the current mutable file crypto structure. Figure 3 shows what it would look like to use ECDSA and semi-private keys, as described in this ticket. Figure 3 is much simpler than Figure 2.

comment:30 Changed at 2008-09-24T13:51:58Z by zooko

I mentioned this ticket as one of the most important-to-me improvements that we could make in the Tahoe code: http://allmydata.org/pipermail/tahoe-dev/2008-September/000809.html

comment:31 Changed at 2008-09-26T00:10:18Z by zooko

added the diagrams from my paper: docs/mut.svg and docs/proposed/mutsemi.svg

comment:32 Changed at 2009-01-15T21:32:30Z by zooko

Here's the latest in my collection of mockery and suspicion for having such a long, ugly URL:

<zooko> Here's my blog which mentions it: 
<zooko> 
        http://testgrid.allmydata.org:3567/uri/URI:DIR2-RO:j74uhg25nwdpjpacl6rkat2yhm:kav7ijeft5h7r7rxdp5bgtlt3viv32yabqajkrdykozia5544jqa/w\
iki.html 
<TigZ> erm 
<Defiant-> longest url award 
<TigZ> is it just me or is that URL a little odd 
<CVirus> LOL                                                            [13:30] 
<TigZ> smells a bit spammy 
<cjb> zooko: yeah, what's up with that?  :)

comment:33 Changed at 2009-02-03T23:29:46Z by zooko

The next step on this ticket is to write up a proof of security of the scheme. George Danezis and Ian Goldberg's recent work on "Sphinx" might be a good model to follow, as they used a nearly identical construction to achieve rather different security properties :-) http://eprint.iacr.org/2008/475.pdf Loosely speaking, Sphinx is about encrypting where semi-private keys is about signing. I think.

Also perhaps vaguely relevant is Dan Brown's recent publication on "The One-Up Problem in (EC)DSA" http://eprint.iacr.org/2008/286.ps .

I would be extremely grateful if a real cryptographer who has experience writing such papers were to volunteer to help.

However, I've resolved to stop being a scaredy-cat about it and just do my best. It really shouldn't be that hard to do.

comment:34 Changed at 2009-02-03T23:30:15Z by zooko

Cc tahoe-dev@… added

adding Cc: tahoe-dev@…, and then I'm going to re-post my previous comment.

comment:35 Changed at 2009-02-03T23:33:39Z by zooko

The next step on this ticket is to write up a proof of security of the scheme. George Danezis and Ian Goldberg's recent work on "Sphinx" might be a good model to follow, as they used a nearly identical construction to achieve rather different security properties :-) http://eprint.iacr.org/2008/475.pdf Loosely speaking, Sphinx is about encrypting where semi-private keys is about signing. I think.

Also perhaps vaguely relevant is Dan Brown's recent publication on "The One-Up Problem in (EC)DSA" http://eprint.iacr.org/2008/286.ps .

I would be extremely grateful if a real cryptographer who has experience writing such papers were to volunteer to help.

However, I've resolved to stop being a scaredy-cat about it and just do my best. It really shouldn't be that hard to do.

comment:36 Changed at 2009-02-04T02:19:26Z by warner

Oh, here is a message I wrote last fall about a proposed API for the semi-private keys.

http://allmydata.org/pipermail/tahoe-dev/2008-October/000828.html

I'll also open a pycryptopp ticket for semi-private keys: pycryptopp#13

comment:37 Changed at 2009-02-09T19:54:57Z by zooko

The ECDSA-based mutable file approach was diagrammed and documented in http://allmydata.org/~zooko/lafs.pdf .

comment:38 Changed at 2009-02-22T21:07:46Z by zooko

Add to the collection of mockery, contempt and suspicion:

<zooko> On a nearly completely unrelated topic, please check out my awesome blog and the great flamewar that I've spawned on the Open Source Initiative's mailing list
<zooko> http://testgrid.allmydata.org:3567/uri/URI:DIR2-RO:j74uhg25nwdpjpacl6rkat2yhm:kav7ijeft5h7r7rxdp5bgtlt3viv32yabqajkrdykozia5544jqa/wiki.html 
<mwhudson> zooko: that's one mighty url

File under "mockery".

comment:39 Changed at 2009-04-13T19:13:01Z by zooko

Add to the collection of mockery, contempt and suspicion:

<zooko> Here's my blog:
	http://testgrid.allmydata.org:3567/uri/URI:DIR2-RO:j74uhg25nwdpjpacl6rkat2yhm:kav7ijeft5h7r7rxdp5bgtlt3viv32yabqajkrdykozia5544jqa/wiki.html
<glyph> whee ridiculous URLs
<glyph> zooko: is that number swiss? :
<glyph> :)

file under mockery

comment:40 Changed at 2009-04-13T22:39:34Z by zooko

<zooko> I blogged:
	http://testgrid.allmydata.org:3567/uri/URI:DIR2-RO:j74uhg25nwdpjpacl6rkat2yhm:kav7ijeft5h7r7rxdp5bgtlt3viv32yabqajkrdykozia5544jqa/wiki.html
<edcba> your url is really shitty

comment:41 Changed at 2009-04-28T20:10:32Z by zooko

I met Wayne Radinsky and one of the first things he said to me was:

Oh, cool, and you have a blog but that's the whackiest blog URL I've ever seen -- I guess it's temporary.

comment:42 Changed at 2009-04-28T22:48:01Z by zooko

rootard is one of the creators of the Nexenta distribution:

<zooko> Laptop Versus Axe:
	http://testgrid.allmydata.org:3567/uri/URI:DIR2-RO:j74uhg25nwdpjpacl6rkat2yhm:kav7ijeft5h7r7rxdp5bgtlt3viv32yabqajkrdykozia5544jqa/wiki.html
<zooko> Yes, just a Python userland application, packaged up in the
	distributions.  Exactly.
<rootard> you really need tinyurl for these things :)
<zooko> Duly noted.

comment:43 follow-up: ↓ 44 Changed at 2009-05-10T22:53:29Z by zooko

I have realized that embedding an ECDSA public key directly into the capability doesn't allow for caps to be as short and secure as embedding a secure hash of an ECDSA key into the capability. That's because ECDSA keys have a crypto strength in bits which is half of their size in bits, but secure hash functions have a crypto strength in bits against second-pre-image attacks which is equal to their size in bits.

Now, traditionally in Tahoe we don't rely on a K-bit hash function for more than K/2 bits of security, because that way we don't have to think about the situation that collisions are feasible even though second-pre-images aren't. (Collision-resistance can't be better than K/2 because of the "Birthday Paradox".)

Now when we're talking about immutable file caps, we have to keep doing that, because the user-visible requirement on an immutable file cap is that there is exactly one file that can match it. So immutable file caps in the new crypto cap scheme will still need to be sufficiently long, let's say 256-bits long, that nobody can find a collision. They would look like this:

iaUVPT93Sw1xp9u1MX6JTrv5IMse29V34ZaZ2U8JK91E

(256 bits) (See http://allmydata.org/trac/tahoe/ticket/102 for some other details about formatting of caps.)

But for mutable files, the user-visible requirement is that an unauthorized writer can't create files that would match, which corresponds to second-pre-image resistance instead of collision-resistance, so the caps can look like this:

raIUXhc56d4U18EAlpxZph

(125 bits)

I think it would be valuable to have the latter kind of caps that are that much shorter. The smaller the caps are, the more uses people will adopt them for. The short 125-bit caps are within striking distance of "tiny urls". Here's the first tiny url that I found on twitter just now: http://bit.ly/ossLb .

There is a trade-off to this, however -- you can't do off-line diminish from a write-cap to a read-cap (or a verify-cap) in this scheme. On the other hand, the caps are small enough that you can carry around both the write-cap and the read-cap in the same space that would hold just the write-cap in the other scheme, which is just as good as doing off-line diminish from write-cap to read-cap, if you thought to keep the read-cap around.

Another trade-off is that this makes us vulnerable to weaknesses in a secure hash function in addition to weaknesses in the digital signature scheme.

(By the way, in the future, using this scheme would make it easier to use a digital signature scheme which has even more than 2K bits in its public or private keys. Of particular interest to me are schemes for post-quantum cryptography (http://cr.yp.to/talks/2008.10.18/slides.pdf ) such as "multivariate quadratic" signatures e.g. sflashv2. Here are benchmarks: http://bench.cr.yp.to/results-sign.html .)

comment:44 in reply to: ↑ 43 Changed at 2009-05-12T14:21:01Z by swillden

Replying to zooko:

I have realized that embedding an ECDSA public key directly into the capability doesn't allow for caps to be as short and secure as embedding a secure hash of an ECDSA key into the capability. That's because ECDSA keys have a crypto strength in bits which is half of their size in bits

In your semi-private key scheme, they're a little weaker than that, because the keyspace is not flat. This slight weakening is probably irrelevant (and can certainly be addressed by adding a few extra bits of key size), but it's probably worth thinking about. Also, it occurs to me that perhaps there are other unidentified weaknesses in the semi-private key scheme which could be masked by putting hashes of keys in caps, rather than keys (though I confess I haven't read/thought enough to understand how hashes of keys are useful).

http://allmydata.org/pipermail/tahoe-dev/2009-February/001106.html

comment:45 follow-up: ↓ 46 Changed at 2009-05-12T22:17:21Z by zooko

Shawn: thank you very much for this analysis. I agree that it is an issue. Ian Goldberg also pointed out this issue to me in private communication. I made a mistake in the paper by saying "let y = H(g^x)". Instead, when you're generating a "random" or unguessable ECC point, you should choose an exponent in the interval [0..n-1] uniformly (where n is the order of the group).

A typical way to do that is instead of taking the result mod n, you instead check whether the result is >= n, and if it is then "re-roll" for example by incrementing the input and trying again. :-) There is another way to do it which involves generating an extra 80 bits or so of your result and taking the whole thing mod n, in order to avoid the theoretically unbounded problem of re-rolling over and over, but that seems unnecessary to me, especially considering that the n's that we are talking about often start with lots of leading 1 bits. E.g., here is the order of the NIST 256-bit randomly-generated curve in hex: 0xFFFFFFFF00000000FFFFFFFFFFFFFFFFBCE6FAADA7179E84F3B9CAC2FC632551.

Anyway, we should amend the semi-private keys proposal which says "let y = H(g^x)" to define H as being "secure hash and then re-roll until it falls within the interval of [0..n-1]" instead opf being "secure hash and then mod n".

That completely solves the weakness that you've identified, right Shawn?

Thanks!

comment:46 in reply to: ↑ 45 Changed at 2009-05-12T22:46:18Z by swillden

Replying to zooko:

That completely solves the weakness that you've identified, right Shawn?

I may be missing something, but I don't think it does.

The issue I referred to has to do not with the generation of y, but of the multiplication of x by y (mod q), and the subsequent use of xy as the signing key. The problem is that the distribution of xy mod q values is not uniform.

I should mention that it's been years since I studied ECDSA and I don't at present understand anything about how the signing key xy is used to perform a signing operation. I'm just noting that your method for constructing the signing key results in some signing keys being more likely than others.

comment:47 Changed at 2009-05-12T23:55:24Z by warner

hrm, I think we should move the discussion about semi-private keys to the pycryptopp ticket http://allmydata.org/trac/pycryptopp/ticket/13 , and leave this ticket to talk about how we might use semiprivate keys for DSA-based mutable files.

Shawn: I have a response to your comment, which I'll put on pycryptopp ticket 13.

comment:48 Changed at 2009-05-14T21:51:26Z by zooko

Adding idnar to the roll-call of people who spontaneously complain about the current Tahoe URLs:

<secorp> Your blog is viewable by me -
	 http://testgrid.allmydata.org:3567/uri/URI:DIR2-RO:j74uhg25nwdpjpacl6rkat2yhm:kav7ijeft5h7r7rxdp5bgtlt3viv32yabqajkrdykozia5544jqa/wiki.html
<idnar> man, IRC sucks							
<idnar> (specifically, because I have to stare at that big ugly URI)

comment:49 Changed at 2009-05-18T23:01:11Z by warner

It occurred to me the other night that, if we can make pycryptopp#13 semi-private DSA keys work, then we could have a super-simple mutable-file cap scheme as follows:

assume K=128 bits (might be comfortable with 96 bits), this is the security parameter
create K-bit random seed, this is the writecap (128 bits)
derive 2K-bit semi-private DSA key: this is the readcap (256 bits)
- hash semi-private key to get the symmetric data-protection key (or rather a value that is used to derive it.. SDMF has a per-version salt, MDMF has a per-segment-per-version salt)
derive 2K-bit verifying key: this is the verifycap (256 bits)
either use the verifying key as a storage-index, or hash it, or truncate it. Store a copy of the verifying key in the share for the benefit of server-side validation.

For #308 deep-traversal dircaps, insert another semi-private key step between the readcap and the verifycap.

This would give us i.e. 128-bit writecaps, 256-bit readcaps, offline attenuation, full server-side verification of every bit of the share, and minimal roundtrips (no need to fetch an encrypted private key before creating new shares).

comment:50 Changed at 2009-05-19T04:57:15Z by zooko

So up until the last bullet point concerning the storage-index, I think what you describe is what I diagrammed in Figure 3 of lafs.pdf. Does that look right?

I agree this scheme has many good properties. I do still have a concern that 256-bit read-caps (e.g. http://127.0.0.1:8234/c/r_FRPG24yB7Amho6NoWaaJlBrU7lON7AyiChWRcaQZ1pH or http://127.0.0.1:8234/c/D_FRPG24yB7Amho6NoWaaJlBrU7lON7AyiChWRcaQZ1pH/path/to/sub/file.txt) might be long enough to exclude Tahoe from some interesting uses where 125-bit read-caps (e.g. http://127.0.0.1:8234/c/r_FMK3eUypHbj6uLocF0496 or http://127.0.0.1:8234/c/D_FMK3eUypHbj6uLocF0496/path/to/sub/file.txt would fit.

Have you ever looked at http://bench.cr.yp.to ? In particular, this page -- http://bench.cr.yp.to/results-sign.html -- shows measurements of a bunch of digital signature schemes, including key-generation time, public-key size, and private-key size. Compare "ecdonaldp256" (which is ecdsa with a 256-bit curve) with the one named "hector" (which is a hyperelliptic curve scheme).

Hector has estimated 113-bit security (compared to ecdonaldp256's 128-bit security), verifies signatures in about half as many CPU cycles, generates signatures in about one eighth as many CPU cycles, generates key pairs in about one eighth as many CPU cycles. In theory (I think) hyperelliptic curve public keys can be compressed down to the size of the curve, in this case 113 bits, just like elliptic curve pubkeys can be compressed down to the size of the curve, although implementations of both measured here don't do that sort of compression.

Here's the source code of the hyperelliptic curve implementation:

http://cryptojedi.org/crypto/index.shtml

113 bits should be enough for now in my opinion.

The fatal flaw of the hector algorithm is that it isn't mature. The only known implementation is described as a "demo", it doesn't work at all unless your CPU has SSE2 instructions, and it doesn't compile out of the box on my Ubuntu Jaunty amd64. Figuring out how to compress the public keys and finding or creating a portable implementation sounds like way too hard of a job for us. Hopefully in a few years other people who know more about implementing such things than us will have done so and we can rely on their implementations, but for now I think we have to reluctantly pass up the opportunity to be the first ever serious users of hyperelliptic curve cryptography. :-)

Lacking hyperelliptic curve cryptography, we have to make a trade-off between the larger size of having a full elliptic curve point in the cap and the disadvantages of having the key stored on the servers instead of in the caps.

I'm not entirely sure about those disadvantages. We've previously talked about several, but on closer inspection I'm not sure if they are actual disadvantages. You (Brian) nicely summarized some of those at the end of your note:

offline attenuation (By the way, let's call this action "diminishing" a capability. "Attenuating" is something that you do to authority, and it is a very general and flexible notion -- you can imagine writing arbitrary code or even having a human in the loop making the decisions which result in the authority being attenuated. "Diminishing" is something that you do to a capability, and it only goes from one specific thing to another specific thing. I called this operation "diminishing" in lafs.pdf in order to follow the terminology of Jonathan Shapiro's thesis about EROS (http://www.eros-os.org/papers/shap-thesis.ps), where he defined the "Diminish-Take" access model as an extension of the standard "Take-Grant" access model. The addition he added was an operation named "diminish", the effect of which was to produce a capability which offered transitive read-only access to whatever the original capability could access. The first kind of "diminishing" that we wanted in Tahoe was for precisely this same purpose, so that's why I used that word. Of course, the next kind of diminishing that we wanted was for something else -- producing a verify cap from a read cap. Oh well. Anyway, since I've already committed to "diminish" in publishing lafs.pdf, and since it might be useful to have the word "attenuation" separately as being what you do to authority in general, let's call this operation "diminish".)

Oh dear it is way past my bedtime. I'll continue this tomorrow.

comment:51 Changed at 2009-05-19T17:31:25Z by warner

Yes, it's the same scheme as in your paper. I must have been misremembering that diagram.. somehow I thought there was still an encrypted private key in there somewhere.

It's a shame that "diminishing"/"diminishment"/("dimunition"?) isn't as easy to say as "attenuation" :). I'll have to read shap's thesis.. I haven't heard "diminishing" from anybody else, whereas I hear "attenuation" all the time. But I'll try to follow your lead.

I guess I'll wait for the rest of your response before trying to follow up.

comment:52 Changed at 2009-05-19T17:43:24Z by zooko

I haven't heard "diminishing" from anybody else, whereas I hear "attenuation" all the time.

That makes sense. I think there's a good reason for that, which is that the folks you are talking to (the modern obj-cap crowd) are working with the more general, programmable, dynamic kind of authority-reduction, since they are working at the programming language level. I would be interested to see if the people who also have a foot in the operating system level use this or that terminology. I'm not married to it, so maybe we could bring it up at the next friam that includes MarkM (terminology clarifier extraordinaire). Currently I think it is useful to have two words -- I refer to "diminishing" a write cap to produce a read cap, and I also refer to "attenuation" of a storage authority in the accounting scheme (where there is more variety of what sorts of limitations can be combined with one another). If that distinction isn't sensible or useful then I don't mind switching to "attenuation" from now on and letting that detail of lafs.pdf terminology become obsolete.

I guess basically I think of diminishing a cap as one particular way of implementing attenuation of authority. It is a specific way that I am particularly interested in, and I hope to attract the interest of cryptographers who will write papers about "efficient off-line diminishing of capabilities using pairing-based cryptography in hyperelliptic curves of 2-rank one", or whatever brain-busting gobbledygook those cryptographers are always coming up with. :-)

comment:53 Changed at 2009-08-24T16:20:10Z by zooko

This guy I don't previously know named Dhananjay Nene, dnene on twitter, wrote: "@zooko have you ever documented what the long URL on your klog is for ? Spooks me every time .. and I always wonder.".

I added a note to the NewCapDesign web page specifying short-and-sweet as a separate desideratum from cut-and-pastable.

comment:54 Changed at 2009-08-24T16:59:39Z by swillden

The entropy required for high security precludes truly "short and sweet" URLs as long as the key is embedded in the URL.

I think this is a strong argument for variable-security aliases, and perhaps even user-selectable aliases.

comment:55 Changed at 2009-10-28T03:32:13Z by davidsarah

Keywords newcaps added

Tagging issues relevant to new cap protocol design.

comment:56 Changed at 2010-01-05T17:12:04Z by zooko

Argh! I just encountered a new example of how the current Tahoe-LAFS caps are too long to be acceptable to most users.

I had commented on the blog of (good security researcher) Nate Lawson -- http://rdist.root.org/2009/12/30/side-channel-attacks-on-cryptographic-software/ -- and included a link to my klog, namely this link:

http://testgrid.allmydata.org:3567/uri/URI:DIR2-RO:j74uhg25nwdpjpacl6rkat2yhm:kav7ijeft5h7r7rxdp5bgtlt3viv32yabqajkrdykozia5544jqa/wiki.html

He edited my comment and replaced that link with this:

http://allmydata.org/pipermail/tahoe-dev/2010-January/003476.html

Thus changing my comment from linking to my blog to linking to my mailing list message, which is not what I had intended.

I assume that Nate Lawson did this because the URL to my blog is too long and ugly.

So this makes me feel renewed motivation to invent new Tahoe-LAFS caps which are substantially shorter than the current ones.

comment:57 Changed at 2010-01-06T07:40:50Z by warner

hooray! renewed motivation! so maybe ECDSA will happen soon? :-)

comment:58 Changed at 2010-01-06T16:21:38Z by zooko

:-) Well, my current priorities are the Tahoe-LAFS v1.6 release (and the RSA 2010 conference). Here is what we need to do next for ECDSA implementation in pycryptopp. Others can help! One of these items is simply to do a code review and apply a patch!

http://allmydata.org/trac/pycryptopp/ticket/30# release EC-DSA http://allmydata.org/trac/pycryptopp/ticket/3# serialize ecdsa keys without the fluff http://allmydata.org/trac/pycryptopp/ticket/2# choose+implement EC-DSA KDF: deterministic generation of private key from small seed

If we are going to us HKDF-Poly1305: http://allmydata.org/trac/pycryptopp/ticket/33# implement Poly1305 or VMAC

If we are going to use semi-private keys: http://allmydata.org/trac/pycryptopp/ticket/13# DSA "semi-private"/intermediate keys

comment:59 follow-up: ↓ 60 Changed at 2010-01-06T18:18:32Z by davidsarah

I'm confused. Shortening the read caps does not depend on ECDSA. The semi-private key approach depends on ECDSA (to get ~2K bit read caps), but that approach doesn't give the shortest read caps. The mutable read caps in the Elk Point designs would be about K+50 bits, i.e. nearly a 1/3rd shorter, regardless of public key algorithm, and without depending on semi-private keys. (The immutable read caps are also 1/3rd shorter -- 2K instead of 3K.)

[This is the size of the cryptovalues; there is obviously some fixed overhead in the URL encoding.]

comment:60 in reply to: ↑ 59 Changed at 2010-01-06T18:22:11Z by davidsarah

Replying to davidsarah:

... Shortening the read caps does not depend on ECDSA.

I should clarify that ECDSA would help with fast mutable file creation.

comment:61 Changed at 2010-01-06T20:27:52Z by zooko

Nate Lawson pointed out on his blog that my comments kept hitting his spam filter due to the long URL.

comment:62 follow-up: ↓ 65 Changed at 2010-01-06T21:13:34Z by warner

Zooko: interesting! A spam filter that keys off the length of URL! I wonder if the assumption is that it takes a human being to come up with short names, and that robots are only capable of coming up with long random ones? That seems to be the thinking behind some other comments you've transcribed, from humans saying they distrust the tahoe URLs because they "smell spammy".

David-Sarah: I was mainly thinking of speed: ECDSA (and other schemes for which key generation does not involve the creation of new prime numbers) will be way faster than our current RSA approach. I'm also assuming that we'll end up using the simplest approaches we've discussed so far, which mostly involve ECDSA (with or without semi-private keys). There are schemes we can use that reduce cap length without improving generation speed, but I'd rather make one incompatible change for two simultaneous benefits than make two separate incompatible changes for one benefit each. (but I'll take this as motivation to review and comment upon the latest Elk Point designs you've done, and I look forward to hearing more about its successor).

Also, I've been hounding Zooko (usually offline) to finish ECDSA for years, since there are lots of other projects that have been waiting on it. So I'll take any opportunity to encourage him that I can get :).

comment:63 Changed at 2010-01-06T22:05:33Z by zooko

Yeah, not speaking for Brian but for myself I want to have as few crypto formats to support as possible, so I would like to jump straight from the current cyrpto structure to the best crypto structure I can. Unfortunately this seems to have paralyzed my forward progress for a couple of years now as I am always learning about new improved crypto primitives and structures (like Elk Point 2, HKDF-Poly1305-XSalsa20, hash-function-combiners, cipher-combiners...) that are EVEN BETTER than the ones I had previously thought of.

Along these lines, I'm currently feeling a bit polarized about Brian's preference for simplicity vs. the features of Elk Point 2. I highly value short URLs and long-lived crypto, and at least to some degree I would be willing to accept complexity in return for those values.

I think polarization is what I get when people express value judgments without a lot of technical detail. When I get into comparing technical details then even if I still disagree with someone else's preference, at least I don't feel as frustrated about it -- I can look at a table of tradeoffs and say "Okay I can live with this and that tradeoff". I think the way forward on that issue is to make comparable documentation for the three current candidates (Elk Point 2, Simple, Semi-Private-Keys), or maybe for David-Sarah to divulge their latest ideas.

By the way, the reason I keep posting on this ticket about people who complain about Tahoe-LAFS URLs, bots that ban Tahoe-LAFS URLs, etc. etc. is to show that the issue with long URLs is not just my personal preference. There seems to be plenty of evidence that long URLs are unacceptable to a significant, perhaps overwhelming, fraction of users. One of the data points that isn't already recorded on this ticket is that as soon as allmydata.com had paid Brian and me to invent Tahoe-LAFS, they then immediately paid someone else to invent a tiny-url-central-database to hide Tahoe-LAFS URLS.

If anyone has any evidence that users are okay using Tahoe-LAFS-sized URLs, please post it to this ticket! As far as I know, I'm the only human in the universe who doesn't mind using Tahoe-LAFS URLs on the Web. (Note: I don't mean putting Tahoe-LAFS caps in your aliases files or whatever, I mean on the Web. Sharing the URLs with other people, posting them on blogs, etc. etc.) Of course, I am not a representative data point for this issue since I am not only a hacker but also a Tahoe-LAFS hacker. If you are a hacker and you don't mind using Tahoe-LAFS URLs, I would like to know it, but I would be even more interested if your mom is okay using Tahoe-LAFS URLs. But I'll take whatever data points I can get, because I think making a major technical decision about something like URL size without considering real world observations of user preferences is a sin (akin to optimizing without measuring). :-)

comment:64 Changed at 2010-01-07T05:45:28Z by davidsarah

I am a hacker and I do mind using Tahoe URLs, primarily because they wrap. That usually requires manual fiddling to get a web browser to accept a Tahoe gateway URL that is embedded in email, rather than a single click. If they were less than 75 characters, it'd be fine.

However, this ticket is about ECDSA! I will copy all the comments about URLs from here to a new ticket, #882.

comment:65 in reply to: ↑ 62 Changed at 2010-01-07T07:44:19Z by davidsarah

Summary changed from DSA-based mutable files -- small URLs, fast file creation to ECDSA-based mutable files -- fast file creation, possibly smaller URLs

Replying to warner:

David-Sarah: I was mainly thinking of speed: ECDSA (and other schemes for which key generation does not involve the creation of new prime numbers) will be way faster than our current RSA approach.

Absolutely -- I have no objection at all to switching to ECDSA for that reason, or to doing so at the same time as other changes in the cap protocol. I just think we should be clear that cap length is not a particularly significant reason for doing so.

I'm also assuming that we'll end up using the simplest approaches we've discussed so far, which mostly involve ECDSA (with or without semi-private keys).

The semi-private key scheme depends on ECDSA. The other scheme in NewMutableEncodingDesign doesn't; it is IMHO simpler, and certainly much less dependent on novel public-key crypto that is difficult to analyse.

comment:66 Changed at 2010-01-07T07:44:29Z by davidsarah

Keywords performance added

comment:67 Changed at 2010-02-23T03:08:50Z by zooko

Milestone changed from eventually to 2.0.0

comment:68 Changed at 2012-02-21T23:46:33Z by davidsarah

Summary changed from ECDSA-based mutable files -- fast file creation, possibly smaller URLs to Ed25519-based mutable files -- fast file creation, possibly smaller URLs

During the second Tahoe-LAFS summit (the long conversation in the restaurant :-) we settled on Ed25519 rather than ECDSA for signing introducer messages, and I think we also acknowledged that this would be likely to steer us toward choosing Ed25519 for mutable files. It's not strictly necessary to use the same algorithm, but I argued for using the same one on complexity grounds (and for Ed25519 being a good choice, if we haven't found a hash-based algorithm with a better performance/signature size tradeoff).

comment:69 Changed at 2013-01-22T14:09:18Z by zooko

Keywords research added

comment:70 Changed at 2021-03-30T18:40:46Z by meejah

Milestone 2.0.0 deleted

Ticket retargeted after milestone closed (editing milestones)

Reported by:	zooko	Owned by:	zooko
Priority:	major	Milestone:
Component:	code-mutable	Version:	0.7.0
Keywords:	mutable crypto newcaps performance research	Cc:	tahoe-dev@…
Launchpad Bug:

Context Navigation

#217 assigned enhancement

Ed25519-based mutable files -- fast file creation, possibly smaller URLs

Description

Attachments (3)

Change History (73)