Opened at 2012-05-23T04:14:47Z
Closed at 2012-06-16T18:20:42Z
#1749 closed defect (fixed)
bug in mutable publish that could cause an IndexError when a writer is removed in Publish._connection_problem
| Reported by: | davidsarah | Owned by: | warner |
|---|---|---|---|
| Priority: | critical | Milestone: | 1.9.2 |
| Component: | code-mutable | Version: | 1.9.1 |
| Keywords: | publish regression test-needed review-needed | Cc: | |
| Launchpad Bug: |
Description (last modified by davidsarah)
"Traceback (most recent call last): Failure: allmydata.mutable.common.NotEnoughServersError: (\"Publish ran out of good servers, last failure was: [Failure instance: Traceback: <type 'exceptions.IndexError'>: list index out of range /home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/base.py:800:runUntilCurrent /home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/foolscap-0.6.3-py2.6.egg/foolscap/eventual.py:26:_turn /home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py:368:callback /home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py:464:_startRunCallbacks --- <exception caught here> ---\\n/home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py:551:_runCallbacks /home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:634:_push /home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:651:push_segment /home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:637:_push /home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:773:push_everything_else /home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:878:finish_publishing /home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:886:_record_verinfo
I can reproduce this, at least on the cloud-branch, when I do a tahoe put --mutable shortly after the gateway has started.
Attachments (1)
Change History (11)
comment:1 Changed at 2012-05-23T04:16:23Z by davidsarah
- Description modified (diff)
- Owner set to zooko
comment:2 Changed at 2012-05-23T04:25:29Z by warner
sounds like a job for dictutil.DictOfSets!
comment:3 Changed at 2012-05-23T04:42:26Z by warner
Seriously though, yes, DictOfSets was made for this. It deletes the key from the dict when the last value is removed. I skimmed through publish.py: the only place that populates self.writers does so from an unordered set, so self.writers does not need to retain order, so DictOfSets is sufficient, and DictOfLists is not necessary.
comment:4 Changed at 2012-05-23T15:36:26Z by davidsarah
+1 for using DictOfSets.
comment:5 Changed at 2012-06-12T16:56:00Z by davidsarah
- Owner changed from zooko to davidsarah
- Status changed from new to assigned
comment:6 Changed at 2012-06-15T03:30:48Z by davidsarah
- Owner changed from davidsarah to kevan
- Status changed from assigned to new
I looked at changing it to use DictOfSets, but there are several instances of self.writers.values()[0][0]. Currently, that will always pick the first writer added for some arbitrary shnum. If I use list(self.writers.values()[0])[0], that does pass tests, but it is using an arbitrary writer for an arbitrary shnum. Kevan (who wrote this code), is that change ok?
comment:7 Changed at 2012-06-15T03:33:34Z by davidsarah
- Owner changed from kevan to davidsarah
Oh, warner is right, the writers are populated based on self.goal which is unordered.
comment:8 Changed at 2012-06-15T03:44:13Z by davidsarah
Note that there are two places where self.writers is initialized, and the diff in comment:1 only changes one of them. (In general there's a lot of duplicated code between the update and publish methods of class Publish.)
Changed at 2012-06-15T03:47:23Z by davidsarah
Fix a bug in mutable publish that could cause an IndexError? when a writer is removed in Publish._connection_problem. This version uses DictOfSets? as suggested by warner. fixes #1749
comment:9 Changed at 2012-06-15T03:49:34Z by davidsarah
- Keywords review-needed added
- Owner changed from davidsarah to warner
The existing tests should be sufficient to check that the changed code works, but we probably should have a test for the original issue. Review needed for the fix.
comment:10 Changed at 2012-06-16T18:20:42Z by david-sarah@…
- Resolution set to fixed
- Status changed from new to closed
In 635e87bd7b460324:
(The changeset message doesn't reference this ticket)

zooko suggests this fix, which I reviewed and approved of:
--- old-dw/src/allmydata/mutable/publish.py 2012-05-22 21:48:48.764939441 -0600 +++ new-dw/src/allmydata/mutable/publish.py 2012-05-22 21:48:50.641598788 -0600 @@ -253,6 +253,10 @@ # updating, we ignore damaged and missing shares -- callers must # do a repair to repair and recreate these. self.goal = set(self._servermap.get_known_shares()) + + # k: shnum, v: [ instance of IMutableSlotWriter ] + # The value is required to always be non-empty if the item is present + # in the dict at all. self.writers = {} # SDMF files are updated differently. @@ -891,8 +895,10 @@ """ self.log("found problem: %s" % str(f)) self._last_failure = f - self.writers[writer.shnum].remove(writer) - + writers = self.writers[writer.shnum] + writers.remove(writer) + if len(writers) == 0: + del self.writers[writer.shnum] def log_goal(self, goal, message=""): logmsg = [message]