#2034 closed defect

Update for mutable files is sensitive to the number of servers — at Version 6

Reported by: markberger Owned by: markberger
Priority: normal Milestone: 1.10.1
Component: code-mutable Version: 1.10.0
Keywords: mutable publish review-needed Cc:
Launchpad Bug:

Description (last modified by markberger)

When the number of servers used in a grid is greater than or equal to N + k, there is a race condition when mutable files are updated. While waiting for responses, ServermapUpdater? will hit a threshold after EPSILON servers do not have any shares (currently EPSILON = k). This causes the process to finish and reject any remaining responses.

However, this can occur before the ServermapUpdater? has added the server to the servermap, causing the client to reject the server's response. Since the server is not added to the servermap, a writer for that server is not created. This causes a key error when the updater attempts to retrieve the corresponding writer for each share.

This race condition can be replicated by setting the number of servers for allmydata.test.test_mutable.Update to 13.

===============================================================================
[FAIL]
Traceback (most recent call last):
  File "/Users/markberger/Code/tahoe-lafs/src/allmydata/test/test_mutable.py", line 3663, in <lambda>
    self.failUnlessEqual(results, new_data))
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/trial/_synctest.py", line 356, in assertEqual
    % (msg, pformat(first), pformat(second)))
twisted.trial.unittest.FailTest: not equal:
a = 'test datatest datatest datatest datatest datatest datatest datatest datatest datatest data'
b = 'test datatest datatest datatest datatest datatest datatest datatest datatest datatest dataappended'


allmydata.test.test_mutable.Update.test_update_sdmf
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: allmydata.mutable.common.NotEnoughServersError: ("Publish ran out of good servers, last failure was: [Failure instance: Traceback: <type 'exceptions.KeyError'>: 0\n/Users/markberger/Code/tahoe-lafs/src/allmydata/mutable/publish.py:642:_push\n/Users/markberger/Code/tahoe-lafs/src/allmydata/mutable/publish.py:662:push_segment\n/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/internet/defer.py:304:addCallback\n/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/internet/defer.py:293:addCallbacks\n--- <exception caught here> ---\n/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/internet/defer.py:575:_runCallbacks\n/Users/markberger/Code/tahoe-lafs/src/allmydata/mutable/publish.py:767:_push_segment\n]", None)

allmydata.test.test_mutable.Update.test_append
allmydata.test.test_mutable.Update.test_append_power_of_two
allmydata.test.test_mutable.Update.test_multiple_segment_replace
allmydata.test.test_mutable.Update.test_replace_beginning
allmydata.test.test_mutable.Update.test_replace_segstart1
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: allmydata.mutable.common.NotEnoughServersError: ("Publish ran out of good servers, last failure was: [Failure instance: Traceback: <type 'exceptions.KeyError'>: 1\n/Users/markberger/Code/tahoe-lafs/src/allmydata/mutable/publish.py:642:_push\n/Users/markberger/Code/tahoe-lafs/src/allmydata/mutable/publish.py:662:push_segment\n/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/internet/defer.py:304:addCallback\n/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/internet/defer.py:293:addCallbacks\n--- <exception caught here> ---\n/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/internet/defer.py:575:_runCallbacks\n/Users/markberger/Code/tahoe-lafs/src/allmydata/mutable/publish.py:767:_push_segment\n]", None)

allmydata.test.test_mutable.Update.test_replace_and_extend
allmydata.test.test_mutable.Update.test_replace_in_last_segment
allmydata.test.test_mutable.Update.test_replace_locations
allmydata.test.test_mutable.Update.test_replace_middle
allmydata.test.test_mutable.Update.test_replace_zero_length_middle
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/Users/markberger/Code/tahoe-lafs/src/allmydata/mutable/filenode.py", line 1177, in _build_uploadable_and_finish
    return p.update(u, offset, segments_and_bht[2], self._version)
  File "/Users/markberger/Code/tahoe-lafs/src/allmydata/mutable/publish.py", line 341, in update
    self._push()
  File "/Users/markberger/Code/tahoe-lafs/src/allmydata/mutable/publish.py", line 642, in _push
    return self.push_segment(self._current_segment)
  File "/Users/markberger/Code/tahoe-lafs/src/allmydata/mutable/publish.py", line 659, in push_segment
    return self._push()
  File "/Users/markberger/Code/tahoe-lafs/src/allmydata/mutable/publish.py", line 645, in _push
    return self.push_everything_else()
  File "/Users/markberger/Code/tahoe-lafs/src/allmydata/mutable/publish.py", line 778, in push_everything_else
    self.push_blockhashes()
  File "/Users/markberger/Code/tahoe-lafs/src/allmydata/mutable/publish.py", line 806, in push_blockhashes
    writers = self.writers[shnum]
exceptions.KeyError: 0

allmydata.test.test_mutable.Update.test_replace_zero_length_beginning
allmydata.test.test_mutable.Update.test_replace_zero_length_segstart1
-------------------------------------------------------------------------------
Ran 14 tests in 58.570s

Change History (6)

comment:1 Changed at 2013-07-23T13:22:21Z by markberger

  • Description modified (diff)

comment:2 Changed at 2013-07-23T16:20:37Z by markberger

  • Description modified (diff)
  • Summary changed from Mutable update tests are sensitive to the number of servers to Update for mutable files is sensitive to the number of servers

comment:3 Changed at 2013-07-23T16:26:49Z by markberger

Possible related tickets: #549, #1042

Last edited at 2013-07-23T16:27:04Z by markberger (previous) (diff)

comment:4 Changed at 2013-07-23T17:14:25Z by markberger

  • Keywords review-needed added
  • Milestone undecided deleted
  • Owner set to markberger

comment:5 Changed at 2013-07-23T22:55:32Z by markberger

This patch is necessary in order to have tests for #1057 with a reasonably large number of servers.

Last edited at 2013-07-24T15:06:39Z by markberger (previous) (diff)

comment:6 Changed at 2013-08-05T23:13:46Z by markberger

  • Description modified (diff)
Note: See TracTickets for help on using tickets.