[tahoe-dev] mutable file and directory safety in allmydata.org "tahoe" 0.9
zooko
zooko at zooko.com
Wed Mar 12 10:15:56 PDT 2008
Brian:
The following is very brief, because I want to get on the phone with
Mike Booker and learn how the Windows client actually works.
I feel very urgent about the schedule of allmydata.org "Tahoe" 0.9.0
and of allmydata.com 3.0.
I'm replying on tahoe-dev instead of in private, faster, voice
channel only in order to preserve the public record.
On Mar 12, 2008, at 10:45 AM, Brian Warner wrote:
> I propose to fix this (today) in the following way:
The fix you suggest was indeed the one that I started on when I
discovered the two new ways to lose data. (Below)
Except for:
> I recommend that the dirnode delta operations (add_children() and
> friends)
> *not* attempt to perform retry at this time. We can make writes
> safe from
> this blind overwrite bug by implementing update(), but continue to
> treat UCW
> as a user error and not feel an urgent need to protect the user
> from it. I
> believe that UCW will be rare enough for the next month that we
> don't need to
> go out of our way to hide them.
I don't understand -- what would the Windows user interface do if it
got a UCWError exception?
>> Argh. Folks: I just went to implement "robust application of
>> set_children", as per #1 above, and discovered *two* previously
>> unknown ways that multiple uncoordinated writes to a directory can
>> cause silent data loss.
>
> Could you describe these two new problems?
Okay, but just to be clear, I am *not* saying "We should not ship
uncoordinated multiple writers in Allmydata 3.0 because of these two
problems.". I am saying "We should not ship uncoordinated multiple
writers in Allmydata 3.0 because there are an unknown number of
problems, as demonstrated by the fact that I just found two without
even trying.".
So, problem 1 with the "update" method that you described (in the
letter to which this is a reply) is that, after you read directory
version N, and then write back directory version N+1, and someone
else has also (previously -- not even at the same time as you!)
written up a directory version N+1, and if the "root hash" of your N
+1 happens to sort higher than the root hash of their N+1
lexicographically, then you will not get any indication that there
ever was their version N+1 -- instead your version N+1 will silently
overwrite theirs.
Problem 1 is the one that we could fix with a couple of days of work
(including redoing some of the manual testing that Peter and others
have been doing for the last week). You and I have previously
discussed how we could use < instead of <= in the testv-and-setv in
order to detect collisions of this kind. I'm not sure what other
changes we would need to make to the mutable file write protocol.
Problem 2 is that when you do the read-back after detecting a UCW, if
the first 3 servers that you talk to all have your version N+1, then
you will treat that as the "current" version N+1, generate a version N
+2, and then upload you N+2, which was made without knowledge of the
other person's N+1. Problem 2 is the one that I think would take a
few weeks to do right. Ideally, your version N+2 should probably
come with a set of root hashes of predecessors, and the test-and-set
should say "This is the set of versions that I have already seen and
am intending to supercede -- if your current version is one of them
then please overwrite it with my new version. If your current
version is not one of them then please do not overwrite it, and
return an error.".
You and I tried to solve this one before and did not yet come up with
a satisfactory solution.
> For the benefit of the non-allmydata folks: we haven't yet implemented
> directory sharing in the .com product (and when we do, we're
> planning to use
> directed pairwise one-reader-one-writer directories, which doesn't
> suffer
> from this concern because it doesn't give a write-cap to the
> recipient). So
> the main concern right now is a user who has an automated backup
> process
> writing a lot of data into a directory, at the same time that this
> user using
> a web browser (on a different tahoe node) to modify those backup
> directories.
>
> As long as we continue in this approach (i.e. *not* taking
> advantage of
> tahoe's easily-shareable directory capabilities), then a per-
> account lock
> (respected by both the FUSE plugin and all web frontends) will be
> sufficient
> to completely avoid UCWs.
Good summary. This what I want to work out in detail with Mike
Booker and you on the phone now.
Regards,
Zooko
P.S.
> Or maybe even "laughfs" :).
:-)
I suspect that "laughfs" might be pronounced "laugh eff ess", but
that "laugfs" will always be pronounced "laughs" (unless pronounced
"l ow gufs"). ;-)
More information about the tahoe-dev
mailing list