got 'WrongSegmentError' during repair #1223

New issue

Closed

opened 2010-10-07 21:19:41 +00:00 by francois · 19 comments

francois commented

2010-10-07 21:19:41 +00:00

Owner

As I was working to improve the logging of 'tahoe deep-check' and 'tahoe check' (another ticket coming soon), I manually deleted shares from 22 different tahoe nodes to manually trigger a repair.

Encoding parameters of this file were N=66 and K=22.

The complete debug log as extracted by 'flogtool' is attached to this ticket.

$ tahoe check --repair URI:CHK:XXXXX
ERROR: 500 Internal Server Error
Traceback (most recent call last):
  File \"/usr/lib/pymodules/python2.6/foolscap/eventual.py\", line 26, in _turn
    cb(*args, **kwargs)
  File \"/home/francois/dev/tahoe-upstream/src/allmydata/immutable/downloader/node.py\", line 472, in _deliver
    d.callback(result) # might actually be an errback
  File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 280, in callback
    self._startRunCallbacks(result)
  File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 354, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 371, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File \"/home/francois/dev/tahoe-upstream/src/allmydata/immutable/downloader/segmentation.py\", line 116, in _got_segment
    raise WrongSegmentError(\"I was given the wrong data.\")
allmydata.immutable.downloader.common.WrongSegmentError: I was given the wrong data.

As I was working to improve the logging of 'tahoe deep-check' and 'tahoe check' (another ticket coming soon), I manually deleted shares from 22 different tahoe nodes to manually trigger a repair. Encoding parameters of this file were N=66 and K=22. The complete debug log as extracted by 'flogtool' is attached to this ticket. ``` $ tahoe check --repair URI:CHK:XXXXX ERROR: 500 Internal Server Error Traceback (most recent call last): File \"/usr/lib/pymodules/python2.6/foolscap/eventual.py\", line 26, in _turn cb(*args, **kwargs) File \"/home/francois/dev/tahoe-upstream/src/allmydata/immutable/downloader/node.py\", line 472, in _deliver d.callback(result) # might actually be an errback File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 280, in callback self._startRunCallbacks(result) File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 354, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 371, in _runCallbacks self.result = callback(self.result, *args, **kw) File \"/home/francois/dev/tahoe-upstream/src/allmydata/immutable/downloader/segmentation.py\", line 116, in _got_segment raise WrongSegmentError(\"I was given the wrong data.\") allmydata.immutable.downloader.common.WrongSegmentError: I was given the wrong data. ```

tahoe-lafs added the

labels 2010-10-07 21:19:41 +00:00

tahoe-lafs added this to the undecided milestone 2010-10-07 21:19:41 +00:00

francois commented

2010-10-07 21:20:02 +00:00

Author

Owner

Attachment debug.log (743646 bytes) added

**Attachment** debug.log (743646 bytes) added

debug.log

726 KiB

warner commented

2010-10-07 21:43:54 +00:00

(reformatted the 'tahoe check' output a bit for easier display)

warner commented

2010-10-07 21:44:37 +00:00

Francois notes that the filesize was 135 bytes.

warner commented

2010-10-07 22:10:22 +00:00

gleaned so far: the file has one segment. The repairer starts with a
get_segsize(), which is currently lazily-implemented as
get_segment(0). Log messages up through 2864211 are the
get_segment(0), at which point the upload process starts, and spends
through 2864212 performing upload-share-placement.

The weird bit starts on message 2864212, where the repairer performs a 7-byte
read. It's as if the repairer is confused about the segment size (or the
repairer's uploader is confused about what a good chunksize should be), and
does a bunch of tiny reads instead of one whole segment. That's the first
problem, but it's merely a performance issue, not fatal.

The fatal problem is some sort of fencepost error. Grepping for "Segmentation
got data" shows a series of 7-byte reads that ends badly (remembering that
this is a 135-byte file):

22:47:17.975 L20 []#2864526 Segmentation got data: want [0-7), given [0-135), for segnum=0
22:47:18.088 L20 []#2864841 Segmentation got data: want [7-14), given [0-135), for segnum=0
...
22:47:30.757 L20 []#2869881 Segmentation got data: want [119-126), given [0-135), for segnum=0
22:47:31.694 L20 []#2870196 Segmentation got data: want [126-133), given [0-135), for segnum=0
22:47:32.807 L20 []#2870511 Segmentation got data: want [133-135), given [0-135), for segnum=0
22:47:32.953 L20 []#2870826 Segmentation got data: want [140-135), given [0-135), for segnum=0

The [133-135) should have been the last read, but for some reason it
went further and did that bogus [140-135) read. The "140" offset is
beyond the end of the file, and of course having a negative size is also a
problem.

gleaned so far: the file has one segment. The repairer starts with a `get_segsize()`, which is currently lazily-implemented as `get_segment(0)`. Log messages up through 2864211 are the `get_segment(0)`, at which point the upload process starts, and spends through 2864212 performing upload-share-placement. The weird bit starts on message 2864212, where the repairer performs a 7-byte read. It's as if the repairer is confused about the segment size (or the repairer's uploader is confused about what a good chunksize should be), and does a bunch of tiny reads instead of one whole segment. That's the first problem, but it's merely a performance issue, not fatal. The fatal problem is some sort of fencepost error. Grepping for "Segmentation got data" shows a series of 7-byte reads that ends badly (remembering that this is a 135-byte file): ``` 22:47:17.975 L20 []#2864526 Segmentation got data: want [0-7), given [0-135), for segnum=0 22:47:18.088 L20 []#2864841 Segmentation got data: want [7-14), given [0-135), for segnum=0 ... 22:47:30.757 L20 []#2869881 Segmentation got data: want [119-126), given [0-135), for segnum=0 22:47:31.694 L20 []#2870196 Segmentation got data: want [126-133), given [0-135), for segnum=0 22:47:32.807 L20 []#2870511 Segmentation got data: want [133-135), given [0-135), for segnum=0 22:47:32.953 L20 []#2870826 Segmentation got data: want [140-135), given [0-135), for segnum=0 ``` The `[133-135)` should have been the last read, but for some reason it went further and did that bogus `[140-135)` read. The "140" offset is beyond the end of the file, and of course having a negative size is also a problem.

francois commented

2010-10-07 22:17:04 +00:00

Author

Owner

The same file repair worked perfectly well with 1.7.1.

tahoe-lafs modified the milestone from undecided to 1.8.1

2010-10-07 22:17:04 +00:00

warner commented

2010-10-07 22:57:30 +00:00

next observations: the Encoder (immutable/encode.py) is doing input_piece_size = codec.get_block_size(), which is then used as the read-size passed to uploadable.read_encrypted. I think this is much too small (by a factor of 'k').. we should probably be reading a full segment at a time.

The docs for CRSEncoder.encode say that it takes a list of 'k' strings, each of length X where X*k=data_size, which should be the same as codec.get_block_size. The code in Encoder._gather_data is doing a batch of 'k' reads, each of that same size. So even though we're doing a bunch of tiny reads, that code is doing the right thing.

(idea for speed improvement: change _gather_data to do larger reads, merge the arbitrary-sized response chunks into one big string, then split into get_block_size() strings. If each read_encrypted is expensive, this will help a lot, although I don't think there are any places where read_encrypted turns into a network call, since the Helper transfers the whole file to disk first before uploading)

So the next step is to investigate the upload process, to see why it appears to be overrunning the input file. Did it not invoke the tail-segment gathering code? Does it always overread, and we're only seeing a problem because the new-downloader's uploadable doesn't tolerate overreads?

next observations: the Encoder (immutable/encode.py) is doing `input_piece_size = codec.get_block_size()`, which is then used as the read-size passed to `uploadable.read_encrypted`. I think this is much too small (by a factor of 'k').. we should probably be reading a full segment at a time. The docs for `CRSEncoder.encode` say that it takes a list of 'k' strings, each of length X where X*k=data_size, which should be the same as `codec.get_block_size`. The code in `Encoder._gather_data` is doing a batch of 'k' reads, each of that same size. So even though we're doing a bunch of tiny reads, that code is doing the right thing. (idea for speed improvement: change `_gather_data` to do larger reads, merge the arbitrary-sized response chunks into one big string, then split into `get_block_size()` strings. If each `read_encrypted` is expensive, this will help a lot, although I don't think there are any places where `read_encrypted` turns into a network call, since the Helper transfers the whole file to disk first before uploading) So the next step is to investigate the upload process, to see why it appears to be overrunning the input file. Did it not invoke the tail-segment gathering code? Does it always overread, and we're only seeing a problem because the new-downloader's uploadable doesn't tolerate overreads?

warner commented

2010-10-07 23:26:23 +00:00

We could either add a one-segment cache to the downloader, so that you can do crazily-short sequential reads without so much overhead, or change the uploader to do larger (full-segment) reads, or both.

oh, and a note about performance: each of those 7-byte reads is causing a whole (154-byte) segment to be fetched (and then, after decoding, most of the segment is discarded). The new-downloader caches the hash trees, so fetching a segment requires one round-trip. On Francois' grid, which seems pretty fast, it takes about 100ms for each segment. Repairing the whole 135-byte file with those tiny reads then takes about 5 seconds overall, which is horrible. We could either add a one-segment cache to the downloader, so that you can do crazily-short sequential reads without so much overhead, or change the uploader to do larger (full-segment) reads, or both.

daira added

c/code-encoding

and removed

c/code

labels 2010-10-09 14:41:16 +00:00

zooko commented

2010-10-14 07:49:35 +00:00

Replying to warner:

next observations: the Encoder (immutable/encode.py) is doing input_piece_size = codec.get_block_size(),

in [_encode_segment()]source:trunk/src/allmydata/immutable/encode.py?rev=4308&annotate=blame#L299

which is then used as the read-size passed to uploadable.read_encrypted.

in [_gather_data()]source:trunk/src/allmydata/immutable/encode.py?rev=4308&annotate=blame#L370

I think this is much too small (by a factor of 'k').. we should probably be reading a full segment at a time.

Agreed—it looks like:

        d = self._uploadable.read_encrypted(input_chunk_size, False)

should have been:

        d = self._uploadable.read_encrypted(input_chunk_size*num_chunks, False)

also the tests for correctness below need to be changed:

                # non-tail segments should be the full segment size
                if length != input_chunk_size:

should be:

                # non-tail segments should be the full segment size
                if length != input_chunk_size*num_chunks:

Or even better I guess we should put at the top of the function:

        segment_size = input_chunk_size*num_chunks

and then the test code would say:

                # non-tail segments should be the full segment size
                if length != segment_size:

Okay it is my bed-time so I'm stopping here for now. Still need to inspect the rest of Brian's notes and see if I can do the next step: investigate the upload process.

Replying to [warner](/tahoe-lafs/trac/issues/1223#issuecomment-381494): > next observations: the Encoder (immutable/encode.py) is doing `input_piece_size = codec.get_block_size()`, in [_encode_segment()]source:trunk/src/allmydata/immutable/encode.py?rev=4308&annotate=blame#L299 > which is then used as the read-size passed to `uploadable.read_encrypted`. in [_gather_data()]source:trunk/src/allmydata/immutable/encode.py?rev=4308&annotate=blame#L370 > I think this is much too small (by a factor of 'k').. we should probably be reading a full segment at a time. Agreed—it looks like: ``` d = self._uploadable.read_encrypted(input_chunk_size, False) ``` should have been: ``` d = self._uploadable.read_encrypted(input_chunk_size*num_chunks, False) ``` also the tests for correctness below need to be changed: ``` # non-tail segments should be the full segment size if length != input_chunk_size: ``` should be: ``` # non-tail segments should be the full segment size if length != input_chunk_size*num_chunks: ``` Or even better I guess we should put at the top of the function: ``` segment_size = input_chunk_size*num_chunks ``` and then the test code would say: ``` # non-tail segments should be the full segment size if length != segment_size: ``` Okay it is my bed-time so I'm stopping here for now. Still need to inspect the rest of Brian's notes and see if I can do the next step: investigate the upload process.

warner commented

2010-10-15 08:16:22 +00:00

so I think there are two problems: one performance-harming, one crashing:

the read_encrypted(input_chunk_size) will read about segsize/k bytes at a time. When k=3 this isn't too bad, but when k=22, it starts to hurt (especially for a small file, or a short tail segment of a multiple-segment file). zfec wants data in pieces of that size (input_chunk_size comes from zfec), but that doesn't mean we have to do read_encrypted() in chunks of that size.
we should read full segments at a time, and then split them into input_chunk_size pieces after hashing. If we do it right, I don't think this will increase the memory footprint, although it will add another brief window where there's an extra 1*segsize in use (during the split and before we free the original segsize-sized buffer).
this performance problem is more significant with new-downloader than with old-downloader. I think old-downloader would have cached the ciphertext onto disk (the old CacheFileManager or something). A client doing sequential reads of small ranges would trigger a full download and then read tiny chunks off disk (really out of the kernel's dirty FS buffers) at RAM speeds. But new-downloader, which doesn't use a disk cache, responds to each read() call by fetching a single segment. But since it doesn't cache those segments anywhere, a client which does a lot of tiny reads will trigger a lot of segment fetches, taking a round-trip each.
the Repairer has probably been reading past the end of the input file all along. The old-downloader tolerated this (because it was really just reading the cachefile from disk). But the new-downloader does not, and in fact crashes when you ask it to read() with a starting offset that is beyond the end of the file
we should first fix Repairer to not do this, then we should fix new-downloader to tolerate it or at least raise a sensible exception

so I think there are two problems: one performance-harming, one crashing: * the read_encrypted(input_chunk_size) will read about segsize/k bytes at a time. When k=3 this isn't too bad, but when k=22, it starts to hurt (especially for a small file, or a short tail segment of a multiple-segment file). zfec wants data in pieces of that size (input_chunk_size comes from zfec), but that doesn't mean we have to do `read_encrypted()` in chunks of that size. * we should read full segments at a time, and then split them into `input_chunk_size` pieces after hashing. If we do it right, I don't think this will increase the memory footprint, although it will add another brief window where there's an extra 1*segsize in use (during the split and before we free the original segsize-sized buffer). * this performance problem is more significant with new-downloader than with old-downloader. I think old-downloader would have cached the ciphertext onto disk (the old `CacheFileManager` or something). A client doing sequential reads of small ranges would trigger a full download and then read tiny chunks off disk (really out of the kernel's dirty FS buffers) at RAM speeds. But new-downloader, which doesn't use a disk cache, responds to each read() call by fetching a single segment. But since it doesn't cache those segments anywhere, a client which does a lot of tiny reads will trigger a lot of segment fetches, taking a round-trip each. * the Repairer has probably been reading past the end of the input file all along. The old-downloader tolerated this (because it was really just reading the cachefile from disk). But the new-downloader does not, and in fact crashes when you ask it to `read()` with a starting offset that is beyond the end of the file * we should first fix Repairer to not do this, then we should fix new-downloader to tolerate it or at least raise a sensible exception

warner commented

2010-10-18 19:09:22 +00:00

Attachment 1223.diff (9769 bytes) added

potential fix

**Attachment** 1223.diff (9769 bytes) added potential fix

1223.diff

9.5 KiB

warner commented

2010-10-18 19:13:40 +00:00

this patch adds a test and fixes the issue. I'd like another pair of eyeballs on it, specifically because it changes the way that read_encrypted is called (segment-at-a-time instead of inputchunksize-at-a-time), and I'm concerned about inducing a fencepost error. But I think this new approach is a lot better: fewer reads, about the same memory footprint, remove some recursion. The only tiny drawback is that Uploadables (and other things that provide read_encrypted(), like the Repairer, and the Offloaded uploadable) are now constrained to act more like regular disk files: if you do a read_encrypted(123), they are obligated to give you exactly 123 bytes, unless you're at EOF, in which case they can give you less than you asked for exactly once. Previously we weren't imposing this requirement, allowing Uploadables to return random amounts of data on each read_encrypted() call.

this patch adds a test and fixes the issue. I'd like another pair of eyeballs on it, specifically because it changes the way that `read_encrypted` is called (segment-at-a-time instead of inputchunksize-at-a-time), and I'm concerned about inducing a fencepost error. But I think this new approach is a lot better: fewer reads, about the same memory footprint, remove some recursion. The only tiny drawback is that Uploadables (and other things that provide `read_encrypted()`, like the Repairer, and the Offloaded uploadable) are now constrained to act more like regular disk files: if you do a `read_encrypted(123)`, they are obligated to give you exactly 123 bytes, unless you're at EOF, in which case they can give you less than you asked for exactly once. Previously we weren't imposing this requirement, allowing Uploadables to return random amounts of data on each `read_encrypted()` call.

kevan commented

2010-10-27 23:08:09 +00:00

Author

Owner

Your patch contains the performance improvements discussed in the first part of comment:381499, and makes the new downloader more able to handle read requests beyond EOF than it was before, as mentioned in the second part of comment:381499, but doesn't contain any fixes for whatever repairer code was responsible for the weird offsets to begin with, unless I'm missing something obvious.

Other than that, it looks good to me; I didn't see any newly-introduced fencepost errors, nor any other issues.

Your patch contains the performance improvements discussed in the first part of [comment:381499](/tahoe-lafs/trac/issues/1223#issuecomment-381499), and makes the new downloader more able to handle read requests beyond EOF than it was before, as mentioned in the second part of [comment:381499](/tahoe-lafs/trac/issues/1223#issuecomment-381499), but doesn't contain any fixes for whatever repairer code was responsible for the weird offsets to begin with, unless I'm missing something obvious. Other than that, it looks good to me; I didn't see any newly-introduced fencepost errors, nor any other issues.

zooko commented

2010-10-28 06:00:01 +00:00

Brian, is the idea to apply this patch before actually fixing the repairer code? Or shall we wait?

warner commented

2010-10-29 08:33:43 +00:00

we can go ahead and apply this patch. The overrun (which is actually a behavior of the uploader, in the way it calls read_encrypted, rather than the repairer, which just glues an uploader to a downloader) is ok, now that the interface has changed to require that Uploadables tolerate overrun. We could change the uploader to carefully track how much data they've requested and reduce their readsize when they get close to the end, but I don't think it's worth it (it would make the code more complex, and now it only overruns once, at the very end).

Kevan: thanks for the review! I'll land it now.

we can go ahead and apply this patch. The overrun (which is actually a behavior of the uploader, in the way it calls `read_encrypted`, rather than the repairer, which just glues an uploader to a downloader) is ok, now that the interface has changed to require that Uploadables tolerate overrun. We could change the uploader to carefully track how much data they've requested and reduce their readsize when they get close to the end, but I don't think it's worth it (it would make the code more complex, and now it only overruns once, at the very end). Kevan: thanks for the review! I'll land it now.

Brian Warner <warner@lothar.com> commented

2010-10-29 09:00:26 +00:00

Author

Owner

In changeset:c18953c169fe9e7c:

fix #1223, crash+inefficiency during repair due to read overrun

* repairer (really the uploader) reads beyond end of input file (Uploadable)
* new-downloader does not tolerate overreads
* uploader does lots of tiny reads (inefficient)

This fixes the last two. The uploader still does a single overread at the end
of the input file, but now that's ok so we can leave it in place. The
uploader now expects the Uploadable to behave like a normal disk
file (reading beyond EOF will return less data than was asked for), and now
the new-downloadable behaves that way.

In changeset:c18953c169fe9e7c: ``` fix #1223, crash+inefficiency during repair due to read overrun * repairer (really the uploader) reads beyond end of input file (Uploadable) * new-downloader does not tolerate overreads * uploader does lots of tiny reads (inefficient) This fixes the last two. The uploader still does a single overread at the end of the input file, but now that's ok so we can leave it in place. The uploader now expects the Uploadable to behave like a normal disk file (reading beyond EOF will return less data than was asked for), and now the new-downloadable behaves that way. ```

tahoe-lafs added the

r/fixed

label 2010-10-29 09:00:26 +00:00

Brian Warner <warner@lothar.com> closed this issue

2010-10-29 09:00:26 +00:00

zooko commented

2010-10-29 12:26:59 +00:00

This needs a source:NEWS entry.

francois commented

2010-10-30 11:36:52 +00:00

Author

Owner

Attachment news-1223.dpatch (11424 bytes) added

**Attachment** news-1223.dpatch (11424 bytes) added

news-1223.dpatch

11 KiB

francois commented

2010-10-30 11:38:18 +00:00

Author

Owner

Replying to zooko:

This needs a source:NEWS entry.

Here is it, please note that the patch bundle also contains the NEWS entry for #1045 because of darcs dependency handling.

Replying to [zooko](/tahoe-lafs/trac/issues/1223#issuecomment-381507): > This needs a source:NEWS entry. Here is it, please note that the patch bundle also contains the NEWS entry for #1045 because of darcs dependency handling.

Rows
Columns