[tahoe-dev] change to storage layout
zooko
zooko at zooko.com
Thu Feb 14 14:09:33 PST 2008
Jim:
Thanks for the reminder -- I'd been concentrating on Unixy
filesystems for servers and had forgotten about case-insensitive
filesystems until Greg Hazel and now you reminded me.
I don't think any harm will result. There are two potential problems
to think about 1. security, 2. accidental collisions. The
following letter explains why neither of these are fatal flaws in our
system, but after writing this letter I think we should switch back
to base-32, or base-36, for filesystem usage. Part of my motivation
is the "Caesar's Wife" rule: it isn't enough that our design be
correct, it must also be *obviously* correct. If it takes this much
thinking on my part to assure myself that it is correct, then we
should switch back to a simpler design.
First of all, the security issue: I don't think it is a problem,
because we don't rely on the uniqueness of storage indexes for
security. If you are a storage server, then when a storage client
comes to you and asks to access storage using a given storage index,
you have no way to verify whether that storage index was properly
generated. In fact, the storage index is supposed to be a secure
hash of the encryption key, but since you, the storage server, aren't
allowed to know the encryption key, you can't verify that.
So what we already do is, implement a first-come-first-served rule on
storage indexes. The first person who asks to allocate a given
storage index on a given storage server gets it. If it is an
immutable file, then once they finish uploading the share, the share
gets moved into place, and after that nobody can write anything else
under that storage index. If it is a mutable file, then the first
client to ask to allocate a storage index on a storage server gets to
register a secret "write enabler" with that storage server.
Possession of that write enabler is necessary to change the contents
of the data stored under that storage index.
Okay, now what about collisions? It would be unfortunate if there
were accidental collisions such that two (legitimately derived)
storage indexes couldn't both be stored on one storage server.
Currently storage indexes are 128 bits. If we encode that in base-62
then that's 22 chars. If those chars get mashed into case-
insensitivity, then that's effectively 22 chars of base-36, or in
other words there are 36^22 possible strings. The birthday paradox
says once we have approximately the square root of that many thrown
together into one bucket, we have a 50% chance of collision. The
square root of 36^22 is about 1.3 * 10^17. So we're already safe
from collision, and if we want to be more sure, then we can stop
truncating storage indexes from SHA-256 size (256 bits) down to 128-
bits and instead start truncating down to, say, 196-bits, or even not-
truncating.
The only reason to keep using base-62 instead of switching to base-36
is that on filesystems which are case-sensitive and which are ext3
and therefore can't have more than 32000 entries in a directory, we
could have only 32^2 (== 1024) or 36^2 (== 1296) different top-level
directories, which means either a. you can have no more than about 40
M total storage indexes (on ext3), or b. we have to have more than
one layer of indirectories.
Sigh. Okay, thanks a lot for the heads-up, Jim. We'll probably just
switch to base-36 real quick, because that is the easiest thing that
is obviously correct and can handle enough storage indexes on ext3.
(Base-36 makes it so that you can be assured of at least 40 M
different storage indexes before your ext3 filesystem poops out.
Base-32 would guarantee at least 30 M. (Based on my simulations.))
Regards,
Zooko
More information about the tahoe-dev
mailing list