webapi doesn't handle Range header correctly #2459

Closed
opened 2015-07-04 15:07:36 +00:00 by spreitzer · 11 comments
spreitzer commented 2015-07-04 15:07:36 +00:00
Owner

The web-API hangs if a GET request with Range header is sent and the end byteRange is equal the size or size-1 if cap is a unterminated string.

given that filecap contains 'hi':

130 sspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-0 http://127.0.0.1:3456/uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa                                                             
* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 3456 (#0)
> GET /uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa HTTP/1.1
> Range: bytes=0-0
> User-Agent: curl/7.37.0
> Host: 127.0.0.1:3456
> Accept: */*
> 
< HTTP/1.1 206 Partial Content
< Content-Length: 1
< Accept-Ranges: bytes
* Server TwistedWeb/14.0.2 is not blacklisted
< Server: TwistedWeb/14.0.2
< Content-Range: bytes 0-0/2
< Date: Sat, 04 Jul 2015 15:02:39 GMT
< Content-Type: text/plain
< 
* Connection #0 to host 127.0.0.1 left intact
hsspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-1 http://127.0.0.1:3456/uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa                                                                
* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 3456 (#0)
> GET /uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa HTTP/1.1
> Range: bytes=0-1
> User-Agent: curl/7.37.0
> Host: 127.0.0.1:3456
> Accept: */*
> 
< HTTP/1.1 206 Partial Content
< Content-Length: 2
< Accept-Ranges: bytes
* Server TwistedWeb/14.0.2 is not blacklisted
< Server: TwistedWeb/14.0.2
< Content-Range: bytes 0-1/2
< Date: Sat, 04 Jul 2015 15:02:50 GMT
< Content-Type: text/plain
< 
^C
130 sspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-2 http://127.0.0.1:3456/uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa                                                             
* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 3456 (#0)
> GET /uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa HTTP/1.1
> Range: bytes=0-2
> User-Agent: curl/7.37.0
> Host: 127.0.0.1:3456
> Accept: */*
> 
< HTTP/1.1 206 Partial Content
< Content-Length: 2
< Accept-Ranges: bytes
* Server TwistedWeb/14.0.2 is not blacklisted
< Server: TwistedWeb/14.0.2
< Content-Range: bytes 0-1/2
< Date: Sat, 04 Jul 2015 15:02:56 GMT
< Content-Type: text/plain
< 
^C

Apache httpd returns:

sspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-0 http://people.redhat.com/sspreitz/hi                                                                                                                                            
* Hostname was NOT found in DNS cache
*   Trying 10.5.19.28...
* Connected to people.redhat.com (10.5.19.28) port 80 (#0)
> GET /sspreitz/hi HTTP/1.1
> Range: bytes=0-0
> User-Agent: curl/7.37.0
> Host: people.redhat.com
> Accept: */*
> 
< HTTP/1.1 206 Partial Content
< Date: Sat, 04 Jul 2015 15:06:32 GMT
* Server Apache is not blacklisted
< Server: Apache
< Last-Modified: Sat, 04 Jul 2015 15:06:06 GMT
< ETag: "2-51a0e030688e9"
< Accept-Ranges: bytes
< Content-Length: 1
< Content-Range: bytes 0-0/2
< Connection: close
< 
* Closing connection 0
hsspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-1 http://people.redhat.com/sspreitz/hi                                                                                                                                             
* Hostname was NOT found in DNS cache
*   Trying 10.5.19.28...
* Connected to people.redhat.com (10.5.19.28) port 80 (#0)
> GET /sspreitz/hi HTTP/1.1
> Range: bytes=0-1
> User-Agent: curl/7.37.0
> Host: people.redhat.com
> Accept: */*
> 
< HTTP/1.1 206 Partial Content
< Date: Sat, 04 Jul 2015 15:06:36 GMT
* Server Apache is not blacklisted
< Server: Apache
< Last-Modified: Sat, 04 Jul 2015 15:06:06 GMT
< ETag: "2-51a0e030688e9"
< Accept-Ranges: bytes
< Content-Length: 2
< Content-Range: bytes 0-1/2
< Connection: close
< 
* Closing connection 0
hisspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-2 http://people.redhat.com/sspreitz/hi
* Hostname was NOT found in DNS cache
*   Trying 10.5.19.28...
* Connected to people.redhat.com (10.5.19.28) port 80 (#0)
> GET /sspreitz/hi HTTP/1.1
> Range: bytes=0-2
> User-Agent: curl/7.37.0
> Host: people.redhat.com
> Accept: */*
> 
< HTTP/1.1 206 Partial Content
< Date: Sat, 04 Jul 2015 15:06:42 GMT
* Server Apache is not blacklisted
< Server: Apache
< Last-Modified: Sat, 04 Jul 2015 15:06:06 GMT
< ETag: "2-51a0e030688e9"
< Accept-Ranges: bytes
< Content-Length: 2
< Content-Range: bytes 0-1/2
< Connection: close
< 
* Closing connection 0
The web-API hangs if a `GET` request with `Range` header is sent and the end byteRange is equal the size or size-1 if cap is a unterminated string. given that filecap contains 'hi': ``` 130 sspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-0 http://127.0.0.1:3456/uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa * Hostname was NOT found in DNS cache * Trying 127.0.0.1... * Connected to 127.0.0.1 (127.0.0.1) port 3456 (#0) > GET /uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa HTTP/1.1 > Range: bytes=0-0 > User-Agent: curl/7.37.0 > Host: 127.0.0.1:3456 > Accept: */* > < HTTP/1.1 206 Partial Content < Content-Length: 1 < Accept-Ranges: bytes * Server TwistedWeb/14.0.2 is not blacklisted < Server: TwistedWeb/14.0.2 < Content-Range: bytes 0-0/2 < Date: Sat, 04 Jul 2015 15:02:39 GMT < Content-Type: text/plain < * Connection #0 to host 127.0.0.1 left intact hsspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-1 http://127.0.0.1:3456/uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa * Hostname was NOT found in DNS cache * Trying 127.0.0.1... * Connected to 127.0.0.1 (127.0.0.1) port 3456 (#0) > GET /uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa HTTP/1.1 > Range: bytes=0-1 > User-Agent: curl/7.37.0 > Host: 127.0.0.1:3456 > Accept: */* > < HTTP/1.1 206 Partial Content < Content-Length: 2 < Accept-Ranges: bytes * Server TwistedWeb/14.0.2 is not blacklisted < Server: TwistedWeb/14.0.2 < Content-Range: bytes 0-1/2 < Date: Sat, 04 Jul 2015 15:02:50 GMT < Content-Type: text/plain < ^C 130 sspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-2 http://127.0.0.1:3456/uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa * Hostname was NOT found in DNS cache * Trying 127.0.0.1... * Connected to 127.0.0.1 (127.0.0.1) port 3456 (#0) > GET /uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa HTTP/1.1 > Range: bytes=0-2 > User-Agent: curl/7.37.0 > Host: 127.0.0.1:3456 > Accept: */* > < HTTP/1.1 206 Partial Content < Content-Length: 2 < Accept-Ranges: bytes * Server TwistedWeb/14.0.2 is not blacklisted < Server: TwistedWeb/14.0.2 < Content-Range: bytes 0-1/2 < Date: Sat, 04 Jul 2015 15:02:56 GMT < Content-Type: text/plain < ^C ``` Apache httpd returns: ``` sspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-0 http://people.redhat.com/sspreitz/hi * Hostname was NOT found in DNS cache * Trying 10.5.19.28... * Connected to people.redhat.com (10.5.19.28) port 80 (#0) > GET /sspreitz/hi HTTP/1.1 > Range: bytes=0-0 > User-Agent: curl/7.37.0 > Host: people.redhat.com > Accept: */* > < HTTP/1.1 206 Partial Content < Date: Sat, 04 Jul 2015 15:06:32 GMT * Server Apache is not blacklisted < Server: Apache < Last-Modified: Sat, 04 Jul 2015 15:06:06 GMT < ETag: "2-51a0e030688e9" < Accept-Ranges: bytes < Content-Length: 1 < Content-Range: bytes 0-0/2 < Connection: close < * Closing connection 0 hsspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-1 http://people.redhat.com/sspreitz/hi * Hostname was NOT found in DNS cache * Trying 10.5.19.28... * Connected to people.redhat.com (10.5.19.28) port 80 (#0) > GET /sspreitz/hi HTTP/1.1 > Range: bytes=0-1 > User-Agent: curl/7.37.0 > Host: people.redhat.com > Accept: */* > < HTTP/1.1 206 Partial Content < Date: Sat, 04 Jul 2015 15:06:36 GMT * Server Apache is not blacklisted < Server: Apache < Last-Modified: Sat, 04 Jul 2015 15:06:06 GMT < ETag: "2-51a0e030688e9" < Accept-Ranges: bytes < Content-Length: 2 < Content-Range: bytes 0-1/2 < Connection: close < * Closing connection 0 hisspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-2 http://people.redhat.com/sspreitz/hi * Hostname was NOT found in DNS cache * Trying 10.5.19.28... * Connected to people.redhat.com (10.5.19.28) port 80 (#0) > GET /sspreitz/hi HTTP/1.1 > Range: bytes=0-2 > User-Agent: curl/7.37.0 > Host: people.redhat.com > Accept: */* > < HTTP/1.1 206 Partial Content < Date: Sat, 04 Jul 2015 15:06:42 GMT * Server Apache is not blacklisted < Server: Apache < Last-Modified: Sat, 04 Jul 2015 15:06:06 GMT < ETag: "2-51a0e030688e9" < Accept-Ranges: bytes < Content-Length: 2 < Content-Range: bytes 0-1/2 < Connection: close < * Closing connection 0 ```
tahoe-lafs added the
code-frontend-web
major
defect
1.10.0
labels 2015-07-04 15:07:36 +00:00
tahoe-lafs added this to the undecided milestone 2015-07-04 15:07:36 +00:00
daira commented 2015-07-07 16:45:23 +00:00
Author
Owner

We can reproduce this problem with cURL and HTTPie. The command line for the latter is:

http -v http://127.0.0.1:3456/uri/${URI} 'Range:bytes=0-1'

for example. (This suggests that it is not a cURL-specific problem.)

We can reproduce this problem with cURL and HTTPie. The command line for the latter is: ``` http -v http://127.0.0.1:3456/uri/${URI} 'Range:bytes=0-1' ``` for example. (This suggests that it is not a cURL-specific problem.)
daira commented 2015-07-07 17:25:54 +00:00
Author
Owner

This seems to be specific to SDMF, which is strange. (Also tried LIT, MDMF, and 100-byte immutable files.)

This seems to be specific to SDMF, which is strange. (Also tried LIT, MDMF, and 100-byte immutable files.)
daira commented 2015-07-07 20:39:04 +00:00
Author
Owner

See https://tools.ietf.org/html/rfc2616#section-14.16 for the HTTP/1.1 spec. The correct response for the Range: bytes=0-2 case is the whole file. (A '416 Requested Range not satisfiable' error should not be returned in that case, because the requested range does have an overlap with the file contents. The web-API seems to correctly return a 416 error if the starting byte offset is past the end of the file.)

There is currently no test for Range requests for SDMF in test_mutable.py; only for MDMF. I have added one for SDMF on the https://github.com/tahoe-lafs/tahoe-lafs/commits/2659.test-sdmf-version-partial-read.0 branch.

The behaviour of this test (Version.test_partial_read_sdmf_*) is confusing; it works when the data is 100 bytes or 2 bytes, but not if it is 90 bytes. Perhaps there is an off-by-one error that only triggers when the size of the data is a multiple of k bytes? (See ticket:2462#comment:-1 for why that might happen.) But the case that hangs in this ticket is 2 bytes, which suggests that the test is not finding the same bug.

See <https://tools.ietf.org/html/rfc2616#section-14.16> for the HTTP/1.1 spec. The correct response for the `Range: bytes=0-2` case is the whole file. (A '416 Requested Range not satisfiable' error should not be returned in that case, because the requested range *does* have an overlap with the file contents. The web-API seems to correctly return a 416 error if the starting byte offset is past the end of the file.) There is currently no test for Range requests for SDMF in `test_mutable.py`; only for MDMF. I have added one for SDMF on the <https://github.com/tahoe-lafs/tahoe-lafs/commits/2659.test-sdmf-version-partial-read.0> branch. The behaviour of this test (`Version.test_partial_read_sdmf_*`) is confusing; it works when the data is 100 bytes or 2 bytes, but not if it is 90 bytes. Perhaps there is an off-by-one error that only triggers when the size of the data is a multiple of `k` bytes? (See ticket:2462#[comment:-1](/tahoe-lafs/trac-2024-07-25/issues/2459#issuecomment--1) for why that might happen.) But the case that hangs in this ticket is 2 bytes, which suggests that the test is not finding the same bug.
tahoe-lafs modified the milestone from undecided to 1.10.2 2015-07-17 21:53:22 +00:00
tahoe-lafs changed title from webapi doesnt handle Range header correctly to webapi doesn't handle Range header correctly 2015-07-18 01:19:26 +00:00
warner commented 2015-07-28 17:41:56 +00:00
Author
Owner

I wasn't able to reproduce this with 1.10.1 when my encoding parameters were 3/3/10. I was able to reproduce it with k=1/H=1/N=1.

This suggests something more than just an off-by-one error in the mutable retrieve code. Do we round the segsize up to be a multiple of 'k'?

I wasn't able to reproduce this with 1.10.1 when my encoding parameters were 3/3/10. I *was* able to reproduce it with k=1/H=1/N=1. This suggests something more than just an off-by-one error in the mutable retrieve code. Do we round the segsize up to be a multiple of 'k'?
Brian Warner <warner@lothar.com> commented 2015-07-28 18:04:16 +00:00
Author
Owner

In /tahoe-lafs/trac-2024-07-25/commit/a7e1dac27f0bc2b25b143f1be6f79d29c33ff41b:

Add tests for SDMF partial reads. refs #2459

Signed-off-by: Daira Hopwood <daira@jacaranda.org>
In [/tahoe-lafs/trac-2024-07-25/commit/a7e1dac27f0bc2b25b143f1be6f79d29c33ff41b](/tahoe-lafs/trac-2024-07-25/commit/a7e1dac27f0bc2b25b143f1be6f79d29c33ff41b): ``` Add tests for SDMF partial reads. refs #2459 Signed-off-by: Daira Hopwood <daira@jacaranda.org> ```
Brian Warner <warner@lothar.com> commented 2015-07-28 18:04:18 +00:00
Author
Owner

In /tahoe-lafs/trac-2024-07-25/commit/89e9076c41420a4145ae9a1db236dc2a1eb41259:

mutable/retrieve.py: rewrite partial-read handling

This should tolerate offset/size combinations that read the last byte of
the file, something which was broken before. It quits early in the case
of zero-byte reads, to simplify the resulting "which segments do I need"
logic. Probably addresses ticket:2459.
In [/tahoe-lafs/trac-2024-07-25/commit/89e9076c41420a4145ae9a1db236dc2a1eb41259](/tahoe-lafs/trac-2024-07-25/commit/89e9076c41420a4145ae9a1db236dc2a1eb41259): ``` mutable/retrieve.py: rewrite partial-read handling This should tolerate offset/size combinations that read the last byte of the file, something which was broken before. It quits early in the case of zero-byte reads, to simplify the resulting "which segments do I need" logic. Probably addresses ticket:2459. ```
warner commented 2015-07-28 19:56:38 +00:00
Author
Owner

So, it's useful to know that SDMF files, even though they only have a single segment, still round up their recorded segsize value to be a multiple of shares.needed. So if you upload a 2-byte file, and your tahoe.cfg holds the default k of 3, then you'll wind up with segsize=3. If you've changed k=N=1, you'll wind up with segsize=2.

segsize is used by mutable/retrieve.py to decide which segments we're going to download. This only really makes sense for MDMF (which can have multiple segments), but when MDMF landed, SDMF got the same logic. It is also used in _set_segment() to figure out how much of each segment should be delivered to the consumer. This last function had several bugs, and one failure case was to read with offset=0 and size=(some multiple of the segsize). In this case, if you're only reading one segment, the data would be truncated completely, and nothing would be written to the consumer. web/filenode.py has already returned a Content-Length header by this point, so the HTTP client is expecting to see all the data it asked for. If the client is using a persistent connection, then they won't notice that the request has finished, and the client will hang.

It looks like _set_segment() would also have had problems if you set the offset= to something non-zero: I think it would have returned the wrong number of bytes. The problem didn't show up in the two-byte file when it was uploaded with k=3, because then the two-byte read wasn't a multiple of k, and the modulo bug wasn't triggered.

We rewrote _set_segment(), and I think it should now handle all inputs correctly.

So, it's useful to know that SDMF files, even though they only have a single segment, still round up their recorded `segsize` value to be a multiple of `shares.needed`. So if you upload a 2-byte file, and your `tahoe.cfg` holds the default `k` of 3, then you'll wind up with `segsize=3`. If you've changed `k=N=1`, you'll wind up with `segsize=2`. `segsize` is used by `mutable/retrieve.py` to decide which segments we're going to download. This only really makes sense for MDMF (which can have multiple segments), but when MDMF landed, SDMF got the same logic. It is also used in `_set_segment()` to figure out how much of each segment should be delivered to the consumer. This last function had several bugs, and one failure case was to read with offset=0 and size=(some multiple of the segsize). In this case, if you're only reading one segment, the data would be truncated completely, and nothing would be written to the consumer. `web/filenode.py` has already returned a Content-Length header by this point, so the HTTP client is expecting to see all the data it asked for. If the client is using a persistent connection, then they won't notice that the request has finished, and the client will hang. It looks like `_set_segment()` would also have had problems if you set the offset= to something non-zero: I think it would have returned the wrong number of bytes. The problem didn't show up in the two-byte file when it was uploaded with `k=3`, because then the two-byte read wasn't a multiple of k, and the modulo bug wasn't triggered. We rewrote `_set_segment()`, and I think it should now handle all inputs correctly.
warner commented 2015-07-28 19:57:53 +00:00
Author
Owner

It'd be nice to add a test_web.py case for this, but it needs to use a real SDMF file (uploaded with k=1). Most of the web tests are using fake file objects so they'll run faster.

It'd be nice to add a `test_web.py` case for this, but it needs to use a real SDMF file (uploaded with k=1). Most of the web tests are using fake file objects so they'll run faster.
warner commented 2015-07-28 23:36:44 +00:00
Author
Owner

Actually, I'm ok with not adding a test. The test_mutable.py tests exercise the IReadable.read() offset/range arguments pretty well, and I don't think we've observed any problems in the HTTP Range header parser. Would anyone object if I closed this?

Actually, I'm ok with not adding a test. The `test_mutable.py` tests exercise the `IReadable.read()` offset/range arguments pretty well, and I don't think we've observed any problems in the HTTP Range header parser. Would anyone object if I closed this?
daira commented 2015-07-29 00:19:02 +00:00
Author
Owner

Replying to warner:

Actually, I'm ok with not adding a test. The test_mutable.py tests exercise the IReadable.read() offset/range arguments pretty well, and I don't think we've observed any problems in the HTTP Range header parser. Would anyone object if I closed this?

I'm ok with not adding a specific test of the HTTP layer, given that we already smoke-tested that, and the bug wasn't in that layer.

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/2459#issuecomment-139181): > Actually, I'm ok with not adding a test. The `test_mutable.py` tests exercise the `IReadable.read()` offset/range arguments pretty well, and I don't think we've observed any problems in the HTTP Range header parser. Would anyone object if I closed this? I'm ok with not adding a specific test of the HTTP layer, given that we already smoke-tested that, and the bug wasn't in that layer.
warner commented 2015-07-29 00:42:21 +00:00
Author
Owner

Ok, great, closing this one.

Ok, great, closing this one.
tahoe-lafs added the
fixed
label 2015-07-29 00:42:21 +00:00
warner closed this issue 2015-07-29 00:42:21 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#2459
No description provided.