Opened at 2010-10-07T21:19:41Z
Last modified at 2010-10-31T02:57:23Z
#1223 closed defect
got 'WrongSegmentError' during repair — at Version 6
Reported by: | francois | Owned by: | somebody |
---|---|---|---|
Priority: | major | Milestone: | 1.8.1 |
Component: | code-encoding | Version: | 1.8.0 |
Keywords: | regression repair performance news-needed | Cc: | francois@… |
Launchpad Bug: |
Description (last modified by francois)
As I was working to improve the logging of 'tahoe deep-check' and 'tahoe check' (another ticket coming soon), I manually deleted shares from 22 different tahoe nodes to manually trigger a repair.
Encoding parameters of this file were N=66 and K=22.
The complete debug log as extracted by 'flogtool' is attached to this ticket.
$ tahoe check --repair URI:CHK:XXXXX ERROR: 500 Internal Server Error Traceback (most recent call last): File \"/usr/lib/pymodules/python2.6/foolscap/eventual.py\", line 26, in _turn cb(*args, **kwargs) File \"/home/francois/dev/tahoe-upstream/src/allmydata/immutable/downloader/node.py\", line 472, in _deliver d.callback(result) # might actually be an errback File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 280, in callback self._startRunCallbacks(result) File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 354, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 371, in _runCallbacks self.result = callback(self.result, *args, **kw) File \"/home/francois/dev/tahoe-upstream/src/allmydata/immutable/downloader/segmentation.py\", line 116, in _got_segment raise WrongSegmentError(\"I was given the wrong data.\") allmydata.immutable.downloader.common.WrongSegmentError: I was given the wrong data.
Change History (7)
Changed at 2010-10-07T21:20:02Z by francois
comment:1 Changed at 2010-10-07T21:43:54Z by warner
- Description modified (diff)
comment:2 Changed at 2010-10-07T21:44:37Z by warner
Francois notes that the filesize was 135 bytes.
comment:3 Changed at 2010-10-07T22:10:22Z by warner
gleaned so far: the file has one segment. The repairer starts with a get_segsize(), which is currently lazily-implemented as get_segment(0). Log messages up through 2864211 are the get_segment(0), at which point the upload process starts, and spends through 2864212 performing upload-share-placement.
The weird bit starts on message 2864212, where the repairer performs a 7-byte read. It's as if the repairer is confused about the segment size (or the repairer's uploader is confused about what a good chunksize should be), and does a bunch of tiny reads instead of one whole segment. That's the first problem, but it's merely a performance issue, not fatal.
The fatal problem is some sort of fencepost error. Grepping for "Segmentation got data" shows a series of 7-byte reads that ends badly (remembering that this is a 135-byte file):
22:47:17.975 L20 []#2864526 Segmentation got data: want [0-7), given [0-135), for segnum=0 22:47:18.088 L20 []#2864841 Segmentation got data: want [7-14), given [0-135), for segnum=0 ... 22:47:30.757 L20 []#2869881 Segmentation got data: want [119-126), given [0-135), for segnum=0 22:47:31.694 L20 []#2870196 Segmentation got data: want [126-133), given [0-135), for segnum=0 22:47:32.807 L20 []#2870511 Segmentation got data: want [133-135), given [0-135), for segnum=0 22:47:32.953 L20 []#2870826 Segmentation got data: want [140-135), given [0-135), for segnum=0
The [133-135) should have been the last read, but for some reason it went further and did that bogus [140-135) read. The "140" offset is beyond the end of the file, and of course having a negative size is also a problem.
comment:4 Changed at 2010-10-07T22:17:04Z by francois
- Keywords regression repair added
- Milestone changed from undecided to 1.8.1
The same file repair worked perfectly well with 1.7.1.
comment:5 Changed at 2010-10-07T22:20:50Z by francois
- Description modified (diff)
comment:6 Changed at 2010-10-07T22:21:14Z by francois
- Description modified (diff)
(reformatted the 'tahoe check' output a bit for easier display)