#875 closed defect (fixed)

errors during add-lease cause checker false-negatives

Reported by: warner Owned by: warner
Priority: major Milestone: 1.6.0
Component: code-encoding Version: 1.5.0
Keywords: repair leases reliability Cc:
Launchpad Bug:

Description

Francois, on the 25c3 grid, observed failures (looking very much like the exceptions in #786) that occurred when "check --repair --add-lease" was done on a directory for which all of the shares were stored on an old (tahoe-1.2.0) server.

The implementation of the mutable checker's add-lease code was such that any unexpected errors in the add-lease call would cause the checker to ignore any shares reported by the simultaneous readv call. tahoe-1.2.0 had a bug in the latency-measure code such that add-lease always threw a KeyError (I'd mis-analyzed the relative levels of support in my notes in the NEWS file).

So when he did "check --add-lease --repair", the add-lease bug caused the checker to think there were no shares available, so it attempted repair, and the lack of any shares caused repair to fail with the weird TypeError that is the focus of #786.

The fix is to separate out the add-lease response/errback path from the readv path. I want to record some errors but ignore the ones that I think are harmless and noisy, like the known limitations of older tahoe versions.

The immutable checker code uses the same pipelined add-lease/DYHB queries, and needs to be fixed also.

Change History (2)

comment:1 Changed at 2009-12-29T23:11:58Z by warner

  • Milestone changed from undecided to 1.6.0
  • Resolution set to fixed
  • Status changed from new to closed

Ok, 794e32738fc654ae should fix this one (both mutable and immutable).

comment:2 Changed at 2009-12-30T00:38:19Z by davidsarah

  • Keywords repair leases reliability added
Note: See TracTickets for help on using tickets.