#1749 closed defect (fixed)

bug in mutable publish that could cause an IndexError when a writer is removed in Publish._connection_problem

Reported by: davidsarah Owned by: warner
Priority: critical Milestone: 1.9.2
Component: code-mutable Version: 1.9.1
Keywords: publish regression test-needed review-needed Cc:
Launchpad Bug:

Description (last modified by davidsarah)

"Traceback (most recent call last):
Failure: allmydata.mutable.common.NotEnoughServersError: (\"Publish ran out of good servers, last failure was:
[Failure instance: Traceback: <type 'exceptions.IndexError'>: list index out of range
/home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/base.py:800:runUntilCurrent
/home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/foolscap-0.6.3-py2.6.egg/foolscap/eventual.py:26:_turn
/home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py:368:callback
/home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py:464:_startRunCallbacks
--- <exception caught here> ---\\n/home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py:551:_runCallbacks
/home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:634:_push
/home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:651:push_segment
/home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:637:_push
/home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:773:push_everything_else
/home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:878:finish_publishing
/home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:886:_record_verinfo

I can reproduce this, at least on the cloud-branch, when I do a tahoe put --mutable shortly after the gateway has started.

Attachments (1)

fix-1749.darcs.patch (129.3 KB) - added by davidsarah at 2012-06-15T03:47:23Z.
Fix a bug in mutable publish that could cause an IndexError? when a writer is removed in Publish._connection_problem. This version uses DictOfSets? as suggested by warner. fixes #1749

Download all attachments as: .zip

Change History (11)

comment:1 Changed at 2012-05-23T04:16:23Z by davidsarah

  • Description modified (diff)
  • Owner set to zooko

zooko suggests this fix, which I reviewed and approved of:

--- old-dw/src/allmydata/mutable/publish.py     2012-05-22 21:48:48.764939441 -0600
+++ new-dw/src/allmydata/mutable/publish.py     2012-05-22 21:48:50.641598788 -0600
@@ -253,6 +253,10 @@
         # updating, we ignore damaged and missing shares -- callers must
         # do a repair to repair and recreate these.
         self.goal = set(self._servermap.get_known_shares())
+
+        # k: shnum, v: [ instance of IMutableSlotWriter ]
+        # The value is required to always be non-empty if the item is present
+        # in the dict at all.
         self.writers = {}
 
         # SDMF files are updated differently.
@@ -891,8 +895,10 @@
         """
         self.log("found problem: %s" % str(f))
         self._last_failure = f
-        self.writers[writer.shnum].remove(writer)
-
+        writers = self.writers[writer.shnum]
+        writers.remove(writer)
+        if len(writers) == 0:
+            del self.writers[writer.shnum]
 
     def log_goal(self, goal, message=""):
         logmsg = [message]

comment:2 Changed at 2012-05-23T04:25:29Z by warner

sounds like a job for dictutil.DictOfSets!

comment:3 Changed at 2012-05-23T04:42:26Z by warner

Seriously though, yes, DictOfSets was made for this. It deletes the key from the dict when the last value is removed. I skimmed through publish.py: the only place that populates self.writers does so from an unordered set, so self.writers does not need to retain order, so DictOfSets is sufficient, and DictOfLists is not necessary.

comment:4 Changed at 2012-05-23T15:36:26Z by davidsarah

+1 for using DictOfSets.

comment:5 Changed at 2012-06-12T16:56:00Z by davidsarah

  • Owner changed from zooko to davidsarah
  • Status changed from new to assigned

comment:6 Changed at 2012-06-15T03:30:48Z by davidsarah

  • Owner changed from davidsarah to kevan
  • Status changed from assigned to new

I looked at changing it to use DictOfSets, but there are several instances of self.writers.values()[0][0]. Currently, that will always pick the first writer added for some arbitrary shnum. If I use list(self.writers.values()[0])[0], that does pass tests, but it is using an arbitrary writer for an arbitrary shnum. Kevan (who wrote this code), is that change ok?

comment:7 Changed at 2012-06-15T03:33:34Z by davidsarah

  • Owner changed from kevan to davidsarah

Oh, warner is right, the writers are populated based on self.goal which is unordered.

comment:8 Changed at 2012-06-15T03:44:13Z by davidsarah

Note that there are two places where self.writers is initialized, and the diff in comment:1 only changes one of them. (In general there's a lot of duplicated code between the update and publish methods of class Publish.)

Changed at 2012-06-15T03:47:23Z by davidsarah

Fix a bug in mutable publish that could cause an IndexError? when a writer is removed in Publish._connection_problem. This version uses DictOfSets? as suggested by warner. fixes #1749

comment:9 Changed at 2012-06-15T03:49:34Z by davidsarah

  • Keywords review-needed added
  • Owner changed from davidsarah to warner

The existing tests should be sufficient to check that the changed code works, but we probably should have a test for the original issue. Review needed for the fix.

comment:10 Changed at 2012-06-16T18:20:42Z by david-sarah@…

  • Resolution set to fixed
  • Status changed from new to closed

In 635e87bd7b460324:

Fix a bug in mutable publish that could cause an IndexError? when a writer is removed in Publish._connection_problem. This version uses DictOfSets? as suggested by warner. fixes #1749

Note: See TracTickets for help on using tickets.