[tahoe-dev] Tahoe Lock Files
zooko
zooko at zooko.com
Mon Mar 3 16:20:47 PST 2008
Folks:
We had a huge discussion on IRC and voice channels today about
UncoordinatedWriteError.
When we designed Tahoe's mutable files, we explicitly decided to punt
on the fearsome coordination/consistency problem in favor of offering
excellent availability. (Later, we were somewhat reassured when we
saw the Amazon Dynamo paper which advocated something similar.)
We wrote "The Prime Directive of Uncoordinated Writes: Don't Do
That!" [1], instructing programmers who use the Tahoe API to figure
out some way to make sure that they never have two separate,
uncoordinated processes trying to write to the same mutable file or
directory.
But of course, Tahoe is gaining more use cases -- namely the MacFUSE
layer thanks to Rob Kinninmont and the Windows SMB layer thanks to
Mike Booker -- and we need a better solution than telling the upper
layers to deal with it.
We sketched out three possible solutions on IRC and phone today:
1. Make handling of colliding write more robustness (we need to do
that anyway -- ticket #272), and rely on that for write coordination,
but don't rely on it "too hard" -- warn the application programmer
that it should not be used at high frequency (more than, say once
every 30 seconds), or with many uncoordinated writers (more than,
say, 3).
2. Implement lock servers -- give out a furl to a server which you
can talk to acquire a lock on a mutable file. Include such furls in
dirnodes, or otherwise make sure they are available when needed.
Decide what to do when you can't reach that server.
3. Use Tahoe storage servers as lock servers. On the plus side, you
know that enough of them are available (if you can write files at
all), and using a bunch of them in a decentralized algorithm can help
solve the availability problem with lock servers. On the downside,
this sounds complicated. How would it work exactly?
Here is an idea does #3 -- use Tahoe storage servers as lock servers
-- but in a nice simple way by re-using Tahoe mutable files as black
boxes. I fleshed out in my own mind after getting off the phone with
Brian and Mike just now. I like this one! I propose to implement
this, or something like it, and make the Tahoe client use it so that
the coder working at the next layer up (MacFUSE or Windows SMB) can
simply rely on magic coordination at the cost of an occasional delay
if his client has to wait for other clients to finish.
TAHOE LOCK FILES
When you give someone a write-cap to a mutable file-or-directory, M1,
which you yourself are also intending to write into in the future,
you also give them a write-cap to a mutable Tahoe lockfile, L1.
Thereafter, whenever you want to write to M1, you first read L1 to
see if it is currently locked. If L1 is empty (zero length), then M1
is currently unlocked.
To lock M1, you pick a random 32-byte string and write that string
into L1. If you get an UncoordinatedWriteError, then you read L1 to
see if your string was the winner of the write collision, and if not
then do an exponential back-off and then re-read L1 to see if it is
locked. If you don't get an UncoordinatedWriteError, or if you do
but then it turns out that your lock string was the winner, then that
means (modulo certain assumptions about the Tahoe storage servers
that will be more carefully documented later) that you are the only
one who has a write lock on M1. Go ahead and write your new M1, and
then write the empty string into L1 to unlock it. You are not
allowed to hold a lock for more than 300 seconds, but fortunately it
almost never takes more than 300 seconds to write a mutable file.
(If looks like it is going to take more than 300 seconds to write M1,
then you need to acquire a new lock by writing a newly generated 32-
byte lock string into L1.)
If you read L1 and find it locked, you remember the random lock
string that was in it and set a 300 second timer, and then re-read
L1. If it still has the same random string in it, then this means
that client who locked it has failed, and you are allowed to break
the lock by overwriting it with your own random lock string.
That's it.
There are interesting robustness details that I know of, and no doubt
Rob or Brian can come up with more interesting robustness details,
but as a solution to the write-coordination problem, Tahoe Lock Files
are simple and modular enough that it just might work.
Regards,
Zooko
[1] http://allmydata.org/trac/tahoe/browser/docs/mutable.txt?
rev=2145#L415
tickets mentioned in this e-mail:
http://allmydata.org/trac/tahoe/ticket/272 -- "http://allmydata.org/
trac/tahoe/ticket/272"
More information about the tahoe-dev
mailing list