#1739 closed defect (fixed)

servers-responding list is wrong

Reported by: warner Owned by: warner
Priority: normal Milestone: 1.9.2
Component: code Version: 1.9.1
Keywords: repair Cc:
Launchpad Bug:

Description

I ran into a bug, hunted it down, and squashed it. Here's the description and the patch:

If a server did not respond to the pre-repair filecheck, but did respond to the repair, that server was not correctly added to the RepairResults.data["servers-responding"] list. (This resulted from a buggy usage of DictOfSets.union() in filenode.py).

In addition, servers to which filecheck queries were sent, but did not respond, were incorrectly added to the servers-responding list anyawys. (This resulted from code in the checker.py not paying attention to the 'responded' flag).

The first bug was neatly masked by the second: it's pretty rare to have a server suddenly start responding in the one-second window between a filecheck and a subsequent repair, and if the server was around for the filecheck, you'd never notice the problem. I only spotted the smelly code while I was changing it for IServer cleanup purposes.

I added coverage to test_repairer.py for this. Trying to get that test to fail before fixing the first bug is what led me to discover the second bug. I also had to update test_corrupt_file_verno, since it was incorrectly asserting that 10 servers responded, when in fact one of them throws an error (but the second bug was causing it to be reported anyways).

Change History (7)

comment:1 Changed at 2012-05-16T22:46:01Z by warner

ugh, the nginx bug (#1581) prevented me from uploading a diff. Look at the github pull-request in https://github.com/tahoe-lafs/tahoe-lafs/pull/9 instead.

comment:2 Changed at 2012-05-16T23:06:15Z by zooko

  • Keywords review-needed added
  • Milestone changed from soon to 1.9.2
  • Owner changed from warner to zooko
  • Status changed from new to assigned

will review

comment:3 follow-up: Changed at 2012-05-16T23:36:01Z by davidsarah

  • Keywords repair added; review-needed removed
  • Owner changed from zooko to warner
  • Priority changed from minor to normal
  • Status changed from assigned to new

+1 on removing DictOfSets.union; I was looking at using DictOfSets for something and also found that method confusing. I confirmed that the patches remove all uses.

Would d['servers-responding'] = sorted(list(servers_responding)) be better, for determinism? (I know the previous code also wasn't deterministic here.)

I think the overloading of .broken as an int or boolean will not work as intended for the boolean case, because:

$ python
Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> broken = True
>>> isinstance(broken, int)
True
>>> broken -= 1
>>> broken
0

Otherwise looks good.

comment:4 Changed at 2012-05-16T23:41:43Z by warner

wow, good catch, I thought booleans were distinct from ints for sure. Thanks!

Yeah, determinism is helpful. I had to add similar sorted() calls in the webapi code (in particular in places where we switch from a set, to a JSON list) so the tests can match against something stable. I'll move that up to the code that creates servers-responding.

comment:5 in reply to: ↑ 3 Changed at 2012-05-16T23:48:43Z by davidsarah

Would d['servers-responding'] = sorted(list(servers_responding)) be better, for determinism?

Actually the list() isn't necessary; sorted(servers_responding) does the same thing.

comment:6 Changed at 2012-05-17T00:02:43Z by warner

  • Resolution set to fixed
  • Status changed from new to closed

Landed in cc366903 and 9acf5bee.

comment:7 Changed at 2012-05-17T00:19:31Z by davidsarah

Also applied to 1.9.2: [5500/1.9.2], [5499/1.9.2]

Note: See TracTickets for help on using tickets.