[tahoe-lafs-trac-stream] [tahoe-lafs] #1755: 2-phase commit
tahoe-lafs
trac at tahoe-lafs.org
Wed Nov 21 00:16:13 UTC 2012
#1755: 2-phase commit
-------------------------+-------------------------------------------------
Reporter: | Owner: davidsarah
davidsarah | Status: assigned
Type: defect | Milestone: soon
Priority: normal | Version: 1.9.2
Component: code | Keywords: 2pc mutable reliability consistency
Resolution: |
Launchpad Bug: |
-------------------------+-------------------------------------------------
Comment (by zooko):
The difficulty of distributed two-phase commit in general is that if the
Transaction Manager fails after telling some of the Resource Managers to
prepare but before either telling them to commit or telling them to
rollback, then they are stuck in this prepared state (i.e. locked).
(See Gray and Reuter's book "Transaction Processing", and see also
Gray-1995-“Consensus On Transaction Commit”.)
The role of Transaction Manager in this future extension of Tahoe-LAFS
would be filled by the LAFS storage client (i.e. the LAFS gateway) and the
roles of Resource Managers would be filled by LAFS storage servers.
There is, of course, no way for a Resource Manager to tell the difference
between their Transaction Manager having failed versus being slow or being
temporarily disconnected from the network, other than the passage of time
with the absence of a new message (either "commit" or "rollback") from the
Transaction Manager.
In general, this can become intractable for large distributed systems with
many resources being locked, many Transaction Managers which need to fail
over to one another (using Paxos to elect a new leader, I suppose), and
frequent write-contention.
But in practical terms, I expect Tahoe-LAFS will be able to use 2-phase-
commit ("2PC") nicely, because typically the scope of what is locked, who
is doing the locking, and how much write-contention we have to support,
are all relatively narrow. That is, for the use cases that we expect to be
asked to handle, only a single mutable file/dir is locked at a time, and
only one or a small number of computers have the write cap to a single
mutable file/dir.
I think we intend to support the use case that a small number of writers
have shared write access to a mutable file/dir and they may
''occasionally'' write at the same time as each other, but we do not
intend to support the use case that where a large or dynamic set of
writers have write access to the same resources, and there may be
continuous write collisions that never pause long enough for the
distributed system to stabilize.
(I think this is sufficient because I think people who use Tahoe-LAFS will
typically use immutables and single-writer-mutables for most of their
state management, and rely on shared-writer-mutables only for the sort of
"last link in the chain" that can't be managed any other way.)
Another way that Tahoe-LAFS is less fragile than most distributed 2-phase-
commit systems is that we've already long since accepted that
inconsistency can happen (different storage servers have different
versions of a mutable file), and we have mechanisms (repair) in place to
recover from that.
So unlike traditional 2PC, 2PC-for-LAFS doesn't have to bear the burden of
preventing inconsistency from ever occurring in the distributed system.
2PC for us is just to help multiple writers to coordinate with one another
more efficiently, and to help reduce the rate of inconsistency arising
''within'' a single storage server. I.e. to allow upload or modification
of a mutable share, which may require multiple messages from LAFS storage
client to LAFS storage server, without opening a large time window in
which a failure of either end or of the connection between them would
leave an inconsistent share on that server.
So anyway, we have to come up with a plan for how storage servers (who are
playing the role of Resource Manager) handle the case that the storage
client (LAFS gateway, Transaction Manager) has told them to prepare and
hasn't yet told them whether they should commit or rollback, and then a
lot of time passes. As a first strawman argument, I propose a simple
hardcoded, fixed, long timeout. Let's say one hour. If your LAFS client
hasn't told you whether to commit or to rollback within an hour of asking
you to prepare, then you will unilaterally roll back.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1755#comment:5>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list