[tahoe-dev] change to storage layout

Thu Feb 14 14:09:33 PST 2008

Jim:

Thanks for the reminder -- I'd been concentrating on Unixy  
filesystems for servers and had forgotten about case-insensitive  
filesystems until Greg Hazel and now you reminded me.

I don't think any harm will result.  There are two potential problems  
to think about 1.  security, 2.  accidental collisions.  The  
following letter explains why neither of these are fatal flaws in our  
system, but after writing this letter I think we should switch back  
to base-32, or base-36, for filesystem usage.  Part of my motivation  
is the "Caesar's Wife" rule: it isn't enough that our design be  
correct, it must also be *obviously* correct.  If it takes this much  
thinking on my part to assure myself that it is correct, then we  
should switch back to a simpler design.

First of all, the security issue: I don't think it is a problem,  
because we don't rely on the uniqueness of storage indexes for  
security.  If you are a storage server, then when a storage client  
comes to you and asks to access storage using a given storage index,  
you have no way to verify whether that storage index was properly  
generated.  In fact, the storage index is supposed to be a secure  
hash of the encryption key, but since you, the storage server, aren't  
allowed to know the encryption key, you can't verify that.

So what we already do is, implement a first-come-first-served rule on  
storage indexes.  The first person who asks to allocate a given  
storage index on a given storage server gets it.  If it is an  
immutable file, then once they finish uploading the share, the share  
gets moved into place, and after that nobody can write anything else  
under that storage index.  If it is a mutable file, then the first  
client to ask to allocate a storage index on a storage server gets to  
register a secret "write enabler" with that storage server.   
Possession of that write enabler is necessary to change the contents  
of the data stored under that storage index.

Okay, now what about collisions?  It would be unfortunate if there  
were accidental collisions such that two (legitimately derived)  
storage indexes couldn't both be stored on one storage server.   
Currently storage indexes are 128 bits.  If we encode that in base-62  
then that's 22 chars.  If those chars get mashed into case- 
insensitivity, then that's effectively 22 chars of base-36, or in  
other words there are 36^22 possible strings.  The birthday paradox  
says once we have approximately the square root of that many thrown  
together into one bucket, we have a 50% chance of collision.  The  
square root of 36^22 is about 1.3 * 10^17.  So we're already safe  
from collision, and if we want to be more sure, then we can stop  
truncating storage indexes from SHA-256 size (256 bits) down to 128- 
bits and instead start truncating down to, say, 196-bits, or even not- 
truncating.

The only reason to keep using base-62 instead of switching to base-36  
is that on filesystems which are case-sensitive and which are ext3  
and therefore can't have more than 32000 entries in a directory, we  
could have only 32^2 (== 1024) or 36^2 (== 1296) different top-level  
directories, which means either a. you can have no more than about 40  
M total storage indexes (on ext3), or b. we have to have more than  
one layer of indirectories.

Sigh.  Okay, thanks a lot for the heads-up, Jim.  We'll probably just  
switch to base-36 real quick, because that is the easiest thing that  
is obviously correct and can handle enough storage indexes on ext3.   
(Base-36 makes it so that you can be assured of at least 40 M  
different storage indexes before your ext3 filesystem poops out.   
Base-32 would guarantee at least 30 M.  (Based on my simulations.))

Regards,

Zooko