Context Navigation

← Previous Ticket
Next Ticket →

Opened at 2010-10-07T21:19:41Z

Closed at 2010-10-31T02:57:23Z

#1223 closed defect (fixed)

got 'WrongSegmentError' during repair

Reported by:	francois	Owned by:	davidsarah
Priority:	major	Milestone:	1.8.1
Component:	code-encoding	Version:	1.8.0
Keywords:	regression repair performance news-needed	Cc:	francois@…
Launchpad Bug:

Description (last modified by francois)

As I was working to improve the logging of 'tahoe deep-check' and 'tahoe check' (another ticket coming soon), I manually deleted shares from 22 different tahoe nodes to manually trigger a repair.

Encoding parameters of this file were N=66 and K=22.

The complete debug log as extracted by 'flogtool' is attached to this ticket.

$ tahoe check --repair URI:CHK:XXXXX
ERROR: 500 Internal Server Error
Traceback (most recent call last):
  File \"/usr/lib/pymodules/python2.6/foolscap/eventual.py\", line 26, in _turn
    cb(*args, **kwargs)
  File \"/home/francois/dev/tahoe-upstream/src/allmydata/immutable/downloader/node.py\", line 472, in _deliver
    d.callback(result) # might actually be an errback
  File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 280, in callback
    self._startRunCallbacks(result)
  File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 354, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 371, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File \"/home/francois/dev/tahoe-upstream/src/allmydata/immutable/downloader/segmentation.py\", line 116, in _got_segment
    raise WrongSegmentError(\"I was given the wrong data.\")
allmydata.immutable.downloader.common.WrongSegmentError: I was given the wrong data.

Attachments (3)

debug.log (726.2 KB) - added by francois at 2010-10-07T21:20:02Z.
1223.diff (9.5 KB) - added by warner at 2010-10-18T19:09:22Z.: potential fix
news-1223.dpatch (11.2 KB) - added by francois at 2010-10-30T11:36:52Z.

Download all attachments as: .zip

Change History (26)

Changed at 2010-10-07T21:20:02Z by francois

Attachment debug.log added

comment:1 Changed at 2010-10-07T21:43:54Z by warner

Description modified (diff)

(reformatted the 'tahoe check' output a bit for easier display)

comment:2 Changed at 2010-10-07T21:44:37Z by warner

Francois notes that the filesize was 135 bytes.

comment:3 Changed at 2010-10-07T22:10:22Z by warner

gleaned so far: the file has one segment. The repairer starts with a get_segsize(), which is currently lazily-implemented as get_segment(0). Log messages up through 2864211 are the get_segment(0), at which point the upload process starts, and spends through 2864212 performing upload-share-placement.

The weird bit starts on message 2864212, where the repairer performs a 7-byte read. It's as if the repairer is confused about the segment size (or the repairer's uploader is confused about what a good chunksize should be), and does a bunch of tiny reads instead of one whole segment. That's the first problem, but it's merely a performance issue, not fatal.

The fatal problem is some sort of fencepost error. Grepping for "Segmentation got data" shows a series of 7-byte reads that ends badly (remembering that this is a 135-byte file):

22:47:17.975 L20 []#2864526 Segmentation got data: want [0-7), given [0-135), for segnum=0
22:47:18.088 L20 []#2864841 Segmentation got data: want [7-14), given [0-135), for segnum=0
...
22:47:30.757 L20 []#2869881 Segmentation got data: want [119-126), given [0-135), for segnum=0
22:47:31.694 L20 []#2870196 Segmentation got data: want [126-133), given [0-135), for segnum=0
22:47:32.807 L20 []#2870511 Segmentation got data: want [133-135), given [0-135), for segnum=0
22:47:32.953 L20 []#2870826 Segmentation got data: want [140-135), given [0-135), for segnum=0

The [133-135) should have been the last read, but for some reason it went further and did that bogus [140-135) read. The "140" offset is beyond the end of the file, and of course having a negative size is also a problem.

comment:4 Changed at 2010-10-07T22:17:04Z by francois

Keywords regression repair added
Milestone changed from undecided to 1.8.1

The same file repair worked perfectly well with 1.7.1.

comment:5 Changed at 2010-10-07T22:20:50Z by francois

Description modified (diff)

comment:6 Changed at 2010-10-07T22:21:14Z by francois

Description modified (diff)

comment:7 follow-up: ↓ 10 Changed at 2010-10-07T22:57:30Z by warner

next observations: the Encoder (immutable/encode.py) is doing input_piece_size = codec.get_block_size(), which is then used as the read-size passed to uploadable.read_encrypted. I think this is much too small (by a factor of 'k').. we should probably be reading a full segment at a time.

The docs for CRSEncoder.encode say that it takes a list of 'k' strings, each of length X where X*k=data_size, which should be the same as codec.get_block_size. The code in Encoder._gather_data is doing a batch of 'k' reads, each of that same size. So even though we're doing a bunch of tiny reads, that code is doing the right thing.

(idea for speed improvement: change _gather_data to do larger reads, merge the arbitrary-sized response chunks into one big string, then split into get_block_size() strings. If each read_encrypted is expensive, this will help a lot, although I don't think there are any places where read_encrypted turns into a network call, since the Helper transfers the whole file to disk first before uploading)

So the next step is to investigate the upload process, to see why it appears to be overrunning the input file. Did it not invoke the tail-segment gathering code? Does it always overread, and we're only seeing a problem because the new-downloader's uploadable doesn't tolerate overreads?

comment:8 Changed at 2010-10-07T23:26:23Z by warner

oh, and a note about performance: each of those 7-byte reads is causing a whole (154-byte) segment to be fetched (and then, after decoding, most of the segment is discarded). The new-downloader caches the hash trees, so fetching a segment requires one round-trip. On Francois' grid, which seems pretty fast, it takes about 100ms for each segment. Repairing the whole 135-byte file with those tiny reads then takes about 5 seconds overall, which is horrible.

We could either add a one-segment cache to the downloader, so that you can do crazily-short sequential reads without so much overhead, or change the uploader to do larger (full-segment) reads, or both.

comment:9 Changed at 2010-10-09T14:41:16Z by davidsarah

Component changed from code to code-encoding
Keywords performance added

comment:10 in reply to: ↑ 7 Changed at 2010-10-14T07:49:35Z by zooko

Replying to warner:

next observations: the Encoder (immutable/encode.py) is doing input_piece_size = codec.get_block_size(),

in _encode_segment()

which is then used as the read-size passed to uploadable.read_encrypted.

in _gather_data()

I think this is much too small (by a factor of 'k').. we should probably be reading a full segment at a time.

Agreed—it looks like:

        d = self._uploadable.read_encrypted(input_chunk_size, False)

should have been:

        d = self._uploadable.read_encrypted(input_chunk_size*num_chunks, False)

also the tests for correctness below need to be changed:

                # non-tail segments should be the full segment size
                if length != input_chunk_size:

should be:

                # non-tail segments should be the full segment size
                if length != input_chunk_size*num_chunks:

Or even better I guess we should put at the top of the function:

        segment_size = input_chunk_size*num_chunks

and then the test code would say:

                # non-tail segments should be the full segment size
                if length != segment_size:

Okay it is my bed-time so I'm stopping here for now. Still need to inspect the rest of Brian's notes and see if I can do the next step: investigate the upload process.

comment:11 Changed at 2010-10-15T08:16:22Z by warner

so I think there are two problems: one performance-harming, one crashing:

the read_encrypted(input_chunk_size) will read about segsize/k bytes at a time. When k=3 this isn't too bad, but when k=22, it starts to hurt (especially for a small file, or a short tail segment of a multiple-segment file). zfec wants data in pieces of that size (input_chunk_size comes from zfec), but that doesn't mean we have to do read_encrypted() in chunks of that size.
- we should read full segments at a time, and then split them into input_chunk_size pieces after hashing. If we do it right, I don't think this will increase the memory footprint, although it will add another brief window where there's an extra 1*segsize in use (during the split and before we free the original segsize-sized buffer).
- this performance problem is more significant with new-downloader than with old-downloader. I think old-downloader would have cached the ciphertext onto disk (the old CacheFileManager or something). A client doing sequential reads of small ranges would trigger a full download and then read tiny chunks off disk (really out of the kernel's dirty FS buffers) at RAM speeds. But new-downloader, which doesn't use a disk cache, responds to each read() call by fetching a single segment. But since it doesn't cache those segments anywhere, a client which does a lot of tiny reads will trigger a lot of segment fetches, taking a round-trip each.

the Repairer has probably been reading past the end of the input file all along. The old-downloader tolerated this (because it was really just reading the cachefile from disk). But the new-downloader does not, and in fact crashes when you ask it to read() with a starting offset that is beyond the end of the file
- we should first fix Repairer to not do this, then we should fix new-downloader to tolerate it or at least raise a sensible exception

Changed at 2010-10-18T19:09:22Z by warner

Attachment 1223.diff added

potential fix

comment:12 Changed at 2010-10-18T19:13:40Z by warner

this patch adds a test and fixes the issue. I'd like another pair of eyeballs on it, specifically because it changes the way that read_encrypted is called (segment-at-a-time instead of inputchunksize-at-a-time), and I'm concerned about inducing a fencepost error. But I think this new approach is a lot better: fewer reads, about the same memory footprint, remove some recursion. The only tiny drawback is that Uploadables (and other things that provide read_encrypted(), like the Repairer, and the Offloaded uploadable) are now constrained to act more like regular disk files: if you do a read_encrypted(123), they are obligated to give you exactly 123 bytes, unless you're at EOF, in which case they can give you less than you asked for exactly once. Previously we weren't imposing this requirement, allowing Uploadables to return random amounts of data on each read_encrypted() call.

comment:13 Changed at 2010-10-19T06:58:09Z by warner

Keywords review-needed added

comment:14 Changed at 2010-10-27T23:08:09Z by kevan

Your patch contains the performance improvements discussed in the first part of comment:11, and makes the new downloader more able to handle read requests beyond EOF than it was before, as mentioned in the second part of comment:11, but doesn't contain any fixes for whatever repairer code was responsible for the weird offsets to begin with, unless I'm missing something obvious.

Other than that, it looks good to me; I didn't see any newly-introduced fencepost errors, nor any other issues.

comment:15 Changed at 2010-10-27T23:08:37Z by kevan

Keywords review-needed removed
Owner changed from somebody to warner

comment:16 Changed at 2010-10-28T06:00:01Z by zooko

Brian, is the idea to apply this patch before actually fixing the repairer code? Or shall we wait?

comment:17 Changed at 2010-10-29T08:33:43Z by warner

we can go ahead and apply this patch. The overrun (which is actually a behavior of the uploader, in the way it calls read_encrypted, rather than the repairer, which just glues an uploader to a downloader) is ok, now that the interface has changed to require that Uploadables tolerate overrun. We could change the uploader to carefully track how much data they've requested and reduce their readsize when they get close to the end, but I don't think it's worth it (it would make the code more complex, and now it only overruns once, at the very end).

Kevan: thanks for the review! I'll land it now.

comment:18 Changed at 2010-10-29T09:00:26Z by Brian Warner <warner@…>

Resolution set to fixed
Status changed from new to closed

In c18953c169fe9e7c:

fix #1223, crash+inefficiency during repair due to read overrun

repairer (really the uploader) reads beyond end of input file (Uploadable)
new-downloader does not tolerate overreads
uploader does lots of tiny reads (inefficient)

This fixes the last two. The uploader still does a single overread at the end
of the input file, but now that's ok so we can leave it in place. The
uploader now expects the Uploadable to behave like a normal disk
file (reading beyond EOF will return less data than was asked for), and now
the new-downloadable behaves that way.