[tahoe-dev] a crypto puzzle about digital signatures and future compatibility

Thu Aug 27 09:02:42 PDT 2009

On Wednesday,2009-08-26, at 19:49 , Brian Warner wrote:

> Attack B is where Alice uploads a file, Bob gets the filecap and  
> downloads it, Carol gets the same filecap and downloads it, and  
> Carol desires to see the same file that Bob saw. ... The attackers  
> (who may be Alice and/or other parties) get to craft the filecap  
> and the shares however they like. The attackers win if Bob and  
> Carol accept different documents.

Right, and if we add algorithm agility then this attack is possible  
even if both SHA-2 and SHA-3 are perfectly secure!

Consider this variation of the scenario: Alice generates a filecap  
and gives it to Bob.  Bob uses it to fetch a file, reads the file and  
sends the filecap to Carol along with a note saying that he approves  
this file.  Carol uses the filecap to fetch the file.  The Bob-and- 
Carol team loses if she gets a different file than the one he got.

Now suppose Alice is malicious and knows how to produce output which  
appears to come from Tahoe-SHA2-SHA3, suppose Bob uses Tahoe-SHA3,  
and suppose Carol uses Tahoe-SHA2.  Then Alice can generate two  
files, one which shows Bob what Alice wants him to see and the other  
which shows Carol what Alice wants her to see.

So by adding an optional new hash algorithm intended to strengthen  
Tahoe-LAFS against the possibility that someone can break SHA2, we  
might (if we're not careful) open up a hole that can be exploited  
even by someone who can't break SHA2.

One defense against the attack above would be to make sure that, as  
long as you might want to share files with someone who might still  
use Tahoe-SHA2, then you don't upgrade to Tahoe-SHA3 -- instead you  
have to stick with the intermediate bi-lingual version, Tahoe-SHA2- 
SHA3, which produces both kinds of hashes and checks both kinds of  
hashes.  But how can you tell whether there are still some Tahoe-SHA2  
users out there somewhere that you might eventually want to share a  
file with?  Also, might this approach somehow accidentally prolong  
Tahoe-LAFS's vulnerability to a flaw in SHA2?

So to use this defense, Bob would use Tahoe-SHA2-SHA3, and he would  
always verify both hashes before approving the file.  If one hash  
matched but the other didn't, his Tahoe-LAFS software would warn him  
that something is very wrong with Alice or her Tahoe-LAFS software.   
(This means that we have to spend the CPU cycles verifying old- 
fashioned hashes, and worse that we have to make file capabilities  
twice as big in order to hold both hashes, which could negatively  
impact the user experience.)

Another, complementary, defense against this sort of attack would be  
that if you receive a filecap which has a hash in it that you don't  
know how to check, then you should *erase* that hash from the filecap  
before you pass that filecap on to your friend.  Then if Alice has a  
malicious Tahoe-SHA2-SHA3, Bob has Tahoe-SHA2, and Carol has Tahoe- 
SHA3, Bob will give Carol a filecap with only a SHA2 hash in it,  
which Carol will not know how to check, thus defeating Alice's evil  
scheme.

The bottom line is that the whole idea of adding algorithm agility  
and an optional hash algorithm seems to entail complication and  
danger, and Tahoe-LAFS is very likely going to take the alternate  
route: a future version of Tahoe-LAFS will probably define a  
completely different type of immutable file capability which is  
syntactically distinct from the current type (i.e. it starts with a  
distinct leading character or it is a different length so that it  
cannot be confused with the old kind by a program and hopefully not  
by a human either), and which uses only SHA3.  Then you will not be  
able to produce a single filecap which can be verified with both SHA2  
and SHA3.  You can, of course, produce two different filecaps, one in  
the old format and one in the new format.  This sounds good to me  
because if Alice sends a pair of filecaps to Bob then it will be  
obvious to Bob that the two could point to different files, at  
Alice's disgression.

> I always get confused about the difference between first-preimage  
> and second-preimage, but I think there's a correspondence here. In  
> Attack A, the attacker doesn't get to choose the filecap (i.e. the  
> hash of the message): they've got to create shares to match a  
> specific pre-determined cap. In Attack B, Alice can craft an  
> arbitrarily complex message, taking advantage of a known collision  
> or whatever.

Pre-image is figuring out the input x that someone used to compute y  
= H(x), when they give you only y.  Second-pre-image is when someone  
else chooses an x and tells you x and then you find a different x2 !=  
x such that H(x) = H(x2).  Collision is when you come up with any two  
values, x and x2 != x such that H(x) = H(x2).

Tahoe-LAFS's semantics of immutable file caps is that the cap is an  
*identifier* of the file, not just a digital signature or message  
authentication code on the file, as demonstrated in the Alice->Bob- 
 >Carol scenario above.  Therefore, Tahoe-LAFS requires collision  
resistance from its secure hash algorithm and not just second- 
preimage-resistance.  It is too bad we can't make do with second- 
preimage-resistance, because we have much greater confidence in the  
second-preimage-resistance of our hash functions than in collision- 
resistance.  SHA1, for example, has second-preimage-resistance (as  
far as we know) but not collision-resistance.  (By the way, I believe  
that git has the same semantics for its hashes that Tahoe-LAFS has  
for its immutable file caps and that, contrary to Linus Torvalds ,  
Perry Metzger, et al. that git users are vulnerable to exploitation  
by collisions.  I'll try to write up my reasoning at some point.)

Regards,

Zooko