[tahoe-dev] two-phase-commit for Tahoe-LAFS, and Dropbox-like functionality

Zooko Wilcox-O'Hearn zooko at zooko.com
Wed Nov 21 00:32:27 UTC 2012


Folks:

I posted some thoughts to #1755. If you're interested in distributed
systems, please read and comment!

I posted an argument for why distributed, end-to-end, two-phase commit
will probably work fine for LAFS's purposes even though it has gained
a well-deserved reputation for "not scaling up to the Internet" in
other contexts. (Hint: the answer is, of course, that we're demanding
less of it than most systems do.)

Unfortunately I didn't yet get around to explaining what we actually
want it for. I remember there being at least two different reasons why
I really wanted end-to-end two-phase-commit in the LAFS storage
protocol. One reason has to do with uploading large mutables and
making modifications to large mutables, without asking any computer to
"buffer up" all the changes so that it can apply them all quickly, and
without opening a large window of time in which a failure in any of
several places will leave a corrupted mutable share. The other reason,
which I remember less precisely, has to do with multiple writers
sharing write-access to the same mutable file or directory. LAFS
currently handles that use case very badly. I think e2e 2PC can do
better, handling write-collisions with a clean failure ("no can do!")
instead of, as it currently does, with potential data loss. At least
in almost all cases.

But even so, multiple uncoordinated writes to the same resource still
have to be held down to a low frequency and a small number of
uncoordinated writers.

A key insight into all this is that you should use shared access to
LAFS's mutables as sparingly as possible, and instead manage almost
all of your state with immutables and with single-writer mutables. A
great example of this design pattern is the new design we came up with
for Dropbox-like functionality on top of Tahoe-LAFS. That synthesizes
a "magic folder" like Dropbox from the perspective of the user, but
does so without *any* concurrent writes to a shared mutable. Instead,
every writer has their own single-writer mutable and the client is
responsible for reading all the mutables and synthesizing the result
as those mutables get changed by their respective writers.

Unfortunately the details of that design are sitting in a queue of
"Notes From the Tahoe-LAFS Weekly Dev Call" that I am supposed to
write up and post to this list ASAP...

Regards,

Zooko

https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1755# 2-phase commit


More information about the tahoe-dev mailing list