[tahoe-dev] Interesting hashing result

Wed Mar 4 14:41:02 PST 2009

[Folks: I'm replying to old mailing list posts that I didn't have  
time to reply to when they were new because I was preparing the  
tahoe-1.3.0 release.  Beware of time travel culture shock.]

On Feb 16, 2009, at 1:51 AM, Shawn Willden wrote:

> I don't think this is a problem.  Or at least, it's not a problem  
> that doesn't exist even without the weak hash.  If the attacker  
> knows the storage ID of your file, he can replace it in the grid --  
> he doesn't need to be able to generate another file that hashes to  
> the same value.

Currently we address this problem by having storage servers never  
overwrite immutable files with different contents.  Only the first  
client to begin uploading an immutable file gets to choose its  
storage index, then if another client tries to use the same storage  
index while the upload is in progress the server tells it that the  
file is already in progress (or maybe it says "the file is already  
there", which wouldn't be quite right...), and then once the uploader  
closes the upload the mapping between that storage index and that  
share, in the mind of that storage server, is set in stone.

Now, we're about to introduce garbage collection in Tahoe-1.4 or so,  
and then that raises the question of what if the share got garbage  
collected and then someone uploads a different flie with the same  
storage index, and then someone who didn't know about either of those  
events tries to re-upload the original one.

In the long run I think a better solution is to make the storage  
index be equal to the verifier cap.  This requires a different  
semantics for uploads-in-progress because the verifier cap isn't  
known to the uploader when it starts the upload, only when it  
finishes the upload, so it will have to tell the storage server that  
it is about to start uploading something and bind the ongoing upload  
to the current connection or else to a temporary "upload in progress"  
token instead of to the ultimate storage index.  Then, once the  
upload is finished the storage server moves it from the temp  
"incoming" directory to the final location indexed by its storage  
index which is its verify cap.  The storage server can also therefore  
*check* that the share matches the verify cap (because anyone can  
check that a share fits a given verify cap), which makes all of those  
aforementioned issues simpler and more obviously right.

As an added benefit, this might facilitate better restart of  
interrupted uploads and such.

I think Brian might know some other problems or complications of that  
proposal, so hopefully he'll follow-up to this post.

> Another use case that I plan to try in the near future is to attach  
> a big USB drive to a Linksys router running custom firmware, and  
> use that as a Tahoe node.

:-)

David Reid and Zandr Milewski are both interested in experimenting  
with Tahoe on those sorts of embedded NAS/router/whatsit boxes.   
Exciting!

>> In the year 2012 (hey, we're living in the future!), the new SHA-3  
>> hash function will be chosen.  That function will also, I hope,  
>> require about 1/3 as many CPU cycles as SHA-256 does while being a  
>> safer long-term bet.
>
> If the result parallels the success of the AES selection process,  
> it may be even faster than that.

I wish!  The very fastest not-yet-broken candidates right now take  
about 1/3 as many CPU cycles as SHA-256 (according to [1]), and the  
thrust of NIST's management of the contest seems to be to get a hash  
function which isn't slower than SHA-256, but which is safer.

So, even after SHA-3 is final, we'll need either as many CPU cycles  
as SHA-2 or perhaps 1/2 or 1/3 or 1/4 as many.  By comparison MD5  
takes about 1/4 as many cycles as SHA-256.  (And by the way if  
matters a lot what CPU architecture you're using and how long are the  
messages you want to hash.)

Regards,

Zooko

[1] http://bench.cr.yp.to/results-hash.html