[tahoe-lafs-trac-stream] [tahoe-lafs] #1755: 2-phase commit

Wed Nov 21 00:16:13 UTC 2012

#1755: 2-phase commit
-------------------------+-------------------------------------------------
     Reporter:           |      Owner:  davidsarah
  davidsarah             |     Status:  assigned
         Type:  defect   |  Milestone:  soon
     Priority:  normal   |    Version:  1.9.2
    Component:  code     |   Keywords:  2pc mutable reliability consistency
   Resolution:           |
Launchpad Bug:           |
-------------------------+-------------------------------------------------

Comment (by zooko):

 The difficulty of distributed two-phase commit in general is that if the
 Transaction Manager fails after telling some of the Resource Managers to
 prepare but before either telling them to commit or telling them to
 rollback, then they are stuck in this prepared state (i.e. locked).

 (See Gray and Reuter's book "Transaction Processing", and see also
 Gray-1995-“Consensus On Transaction Commit”.)

 The role of Transaction Manager in this future extension of Tahoe-LAFS
 would be filled by the LAFS storage client (i.e. the LAFS gateway) and the
 roles of Resource Managers would be filled by LAFS storage servers.

 There is, of course, no way for a Resource Manager to tell the difference
 between their Transaction Manager having failed versus being slow or being
 temporarily disconnected from the network, other than the passage of time
 with the absence of a new message (either "commit" or "rollback") from the
 Transaction Manager.

 In general, this can become intractable for large distributed systems with
 many resources being locked, many Transaction Managers which need to fail
 over to one another (using Paxos to elect a new leader, I suppose), and
 frequent write-contention.

 But in practical terms, I expect Tahoe-LAFS will be able to use 2-phase-
 commit ("2PC") nicely, because typically the scope of what is locked, who
 is doing the locking, and how much write-contention we have to support,
 are all relatively narrow. That is, for the use cases that we expect to be
 asked to handle, only a single mutable file/dir is locked at a time, and
 only one or a small number of computers have the write cap to a single
 mutable file/dir.

 I think we intend to support the use case that a small number of writers
 have shared write access to a mutable file/dir and they may
 ''occasionally'' write at the same time as each other, but we do not
 intend to support the use case that where a large or dynamic set of
 writers have write access to the same resources, and there may be
 continuous write collisions that never pause long enough for the
 distributed system to stabilize.

 (I think this is sufficient because I think people who use Tahoe-LAFS will
 typically use immutables and single-writer-mutables for most of their
 state management, and rely on shared-writer-mutables only for the sort of
 "last link in the chain" that can't be managed any other way.)

 Another way that Tahoe-LAFS is less fragile than most distributed 2-phase-
 commit systems is that we've already long since accepted that
 inconsistency can happen (different storage servers have different
 versions of a mutable file), and we have mechanisms (repair) in place to
 recover from that.

 So unlike traditional 2PC, 2PC-for-LAFS doesn't have to bear the burden of
 preventing inconsistency from ever occurring in the distributed system.
 2PC for us is just to help multiple writers to coordinate with one another
 more efficiently, and to help reduce the rate of inconsistency arising
 ''within'' a single storage server. I.e. to allow upload or modification
 of a mutable share, which may require multiple messages from LAFS storage
 client to LAFS storage server, without opening a large time window in
 which a failure of either end or of the connection between them would
 leave an inconsistent share on that server.

 So anyway, we have to come up with a plan for how storage servers (who are
 playing the role of Resource Manager) handle the case that the storage
 client (LAFS gateway, Transaction Manager) has told them to prepare and
 hasn't yet told them whether they should commit or rollback, and then a
 lot of time passes. As a first strawman argument, I propose a simple
 hardcoded, fixed, long timeout. Let's say one hour. If your LAFS client
 hasn't told you whether to commit or to rollback within an hour of asking
 you to prepare, then you will unilaterally roll back.

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1755#comment:5>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage