#2459 closed defect (fixed)

webapi doesn't handle Range header correctly

Reported by: spreitzer Owned by: daira
Priority: major Milestone: 1.10.2
Component: code-frontend-web Version: 1.10.0
Keywords: webapi reliability availability mutable retrieve Range http standards Cc:
Launchpad Bug:

Description (last modified by daira)

The web-API hangs if a GET request with Range header is sent and the end byteRange is equal the size or size-1 if cap is a unterminated string.

given that filecap contains 'hi':

130 sspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-0 http://127.0.0.1:3456/uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa                                                             
* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 3456 (#0)
> GET /uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa HTTP/1.1
> Range: bytes=0-0
> User-Agent: curl/7.37.0
> Host: 127.0.0.1:3456
> Accept: */*
> 
< HTTP/1.1 206 Partial Content
< Content-Length: 1
< Accept-Ranges: bytes
* Server TwistedWeb/14.0.2 is not blacklisted
< Server: TwistedWeb/14.0.2
< Content-Range: bytes 0-0/2
< Date: Sat, 04 Jul 2015 15:02:39 GMT
< Content-Type: text/plain
< 
* Connection #0 to host 127.0.0.1 left intact
hsspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-1 http://127.0.0.1:3456/uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa                                                                
* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 3456 (#0)
> GET /uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa HTTP/1.1
> Range: bytes=0-1
> User-Agent: curl/7.37.0
> Host: 127.0.0.1:3456
> Accept: */*
> 
< HTTP/1.1 206 Partial Content
< Content-Length: 2
< Accept-Ranges: bytes
* Server TwistedWeb/14.0.2 is not blacklisted
< Server: TwistedWeb/14.0.2
< Content-Range: bytes 0-1/2
< Date: Sat, 04 Jul 2015 15:02:50 GMT
< Content-Type: text/plain
< 
^C
130 sspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-2 http://127.0.0.1:3456/uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa                                                             
* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 3456 (#0)
> GET /uri/URI:SSK:rei5f5wqbqycmq2mwtjygglyii:ho4xwwvgbvhwp6xyleafclcjikszuvilb2yzyap4slwgsnmz6joa HTTP/1.1
> Range: bytes=0-2
> User-Agent: curl/7.37.0
> Host: 127.0.0.1:3456
> Accept: */*
> 
< HTTP/1.1 206 Partial Content
< Content-Length: 2
< Accept-Ranges: bytes
* Server TwistedWeb/14.0.2 is not blacklisted
< Server: TwistedWeb/14.0.2
< Content-Range: bytes 0-1/2
< Date: Sat, 04 Jul 2015 15:02:56 GMT
< Content-Type: text/plain
< 
^C

Apache httpd returns:

sspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-0 http://people.redhat.com/sspreitz/hi                                                                                                                                            
* Hostname was NOT found in DNS cache
*   Trying 10.5.19.28...
* Connected to people.redhat.com (10.5.19.28) port 80 (#0)
> GET /sspreitz/hi HTTP/1.1
> Range: bytes=0-0
> User-Agent: curl/7.37.0
> Host: people.redhat.com
> Accept: */*
> 
< HTTP/1.1 206 Partial Content
< Date: Sat, 04 Jul 2015 15:06:32 GMT
* Server Apache is not blacklisted
< Server: Apache
< Last-Modified: Sat, 04 Jul 2015 15:06:06 GMT
< ETag: "2-51a0e030688e9"
< Accept-Ranges: bytes
< Content-Length: 1
< Content-Range: bytes 0-0/2
< Connection: close
< 
* Closing connection 0
hsspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-1 http://people.redhat.com/sspreitz/hi                                                                                                                                             
* Hostname was NOT found in DNS cache
*   Trying 10.5.19.28...
* Connected to people.redhat.com (10.5.19.28) port 80 (#0)
> GET /sspreitz/hi HTTP/1.1
> Range: bytes=0-1
> User-Agent: curl/7.37.0
> Host: people.redhat.com
> Accept: */*
> 
< HTTP/1.1 206 Partial Content
< Date: Sat, 04 Jul 2015 15:06:36 GMT
* Server Apache is not blacklisted
< Server: Apache
< Last-Modified: Sat, 04 Jul 2015 15:06:06 GMT
< ETag: "2-51a0e030688e9"
< Accept-Ranges: bytes
< Content-Length: 2
< Content-Range: bytes 0-1/2
< Connection: close
< 
* Closing connection 0
hisspreitz@sspreitz:~/workspace/tahoefuse/src$ curl -v -r 0-2 http://people.redhat.com/sspreitz/hi
* Hostname was NOT found in DNS cache
*   Trying 10.5.19.28...
* Connected to people.redhat.com (10.5.19.28) port 80 (#0)
> GET /sspreitz/hi HTTP/1.1
> Range: bytes=0-2
> User-Agent: curl/7.37.0
> Host: people.redhat.com
> Accept: */*
> 
< HTTP/1.1 206 Partial Content
< Date: Sat, 04 Jul 2015 15:06:42 GMT
* Server Apache is not blacklisted
< Server: Apache
< Last-Modified: Sat, 04 Jul 2015 15:06:06 GMT
< ETag: "2-51a0e030688e9"
< Accept-Ranges: bytes
< Content-Length: 2
< Content-Range: bytes 0-1/2
< Connection: close
< 
* Closing connection 0

Change History (13)

comment:1 Changed at 2015-07-07T16:45:23Z by daira

We can reproduce this problem with cURL and HTTPie. The command line for the latter is:

http -v http://127.0.0.1:3456/uri/${URI} 'Range:bytes=0-1'

for example. (This suggests that it is not a cURL-specific problem.)

Last edited at 2015-07-07T16:45:57Z by daira (previous) (diff)

comment:2 Changed at 2015-07-07T17:25:54Z by daira

This seems to be specific to SDMF, which is strange. (Also tried LIT, MDMF, and 100-byte immutable files.)

comment:3 Changed at 2015-07-07T20:39:04Z by daira

  • Description modified (diff)

See https://tools.ietf.org/html/rfc2616#section-14.16 for the HTTP/1.1 spec. The correct response for the Range: bytes=0-2 case is the whole file. (A '416 Requested Range not satisfiable' error should not be returned in that case, because the requested range does have an overlap with the file contents. The web-API seems to correctly return a 416 error if the starting byte offset is past the end of the file.)

There is currently no test for Range requests for SDMF in test_mutable.py; only for MDMF. I have added one for SDMF on the https://github.com/tahoe-lafs/tahoe-lafs/commits/2659.test-sdmf-version-partial-read.0 branch.

The behaviour of this test (Version.test_partial_read_sdmf_*) is confusing; it works when the data is 100 bytes or 2 bytes, but not if it is 90 bytes. Perhaps there is an off-by-one error that only triggers when the size of the data is a multiple of k bytes? (See 2462#comment:3 for why that might happen.) But the case that hangs in this ticket is 2 bytes, which suggests that the test is not finding the same bug.

comment:4 Changed at 2015-07-17T21:53:22Z by daira

  • Milestone changed from undecided to 1.10.2

comment:5 Changed at 2015-07-18T01:19:26Z by daira

  • Keywords webapi reliability availability mutable retrieve Range http standards added
  • Owner set to daira
  • Status changed from new to assigned
  • Summary changed from webapi doesnt handle Range header correctly to webapi doesn't handle Range header correctly

comment:6 Changed at 2015-07-28T17:41:56Z by warner

I wasn't able to reproduce this with 1.10.1 when my encoding parameters were 3/3/10. I *was* able to reproduce it with k=1/H=1/N=1.

This suggests something more than just an off-by-one error in the mutable retrieve code. Do we round the segsize up to be a multiple of 'k'?

comment:7 Changed at 2015-07-28T18:04:16Z by Brian Warner <warner@…>

In a7e1dac27f0bc2b25b143f1be6f79d29c33ff41b/trunk:

Add tests for SDMF partial reads. refs #2459

Signed-off-by: Daira Hopwood <daira@…>

comment:8 Changed at 2015-07-28T18:04:18Z by Brian Warner <warner@…>

In 89e9076c41420a4145ae9a1db236dc2a1eb41259/trunk:

mutable/retrieve.py: rewrite partial-read handling

This should tolerate offset/size combinations that read the last byte of
the file, something which was broken before. It quits early in the case
of zero-byte reads, to simplify the resulting "which segments do I need"
logic. Probably addresses ticket:2459.

comment:9 Changed at 2015-07-28T19:56:38Z by warner

So, it's useful to know that SDMF files, even though they only have a single segment, still round up their recorded segsize value to be a multiple of shares.needed. So if you upload a 2-byte file, and your tahoe.cfg holds the default k of 3, then you'll wind up with segsize=3. If you've changed k=N=1, you'll wind up with segsize=2.

segsize is used by mutable/retrieve.py to decide which segments we're going to download. This only really makes sense for MDMF (which can have multiple segments), but when MDMF landed, SDMF got the same logic. It is also used in _set_segment() to figure out how much of each segment should be delivered to the consumer. This last function had several bugs, and one failure case was to read with offset=0 and size=(some multiple of the segsize). In this case, if you're only reading one segment, the data would be truncated completely, and nothing would be written to the consumer. web/filenode.py has already returned a Content-Length header by this point, so the HTTP client is expecting to see all the data it asked for. If the client is using a persistent connection, then they won't notice that the request has finished, and the client will hang.

It looks like _set_segment() would also have had problems if you set the offset= to something non-zero: I think it would have returned the wrong number of bytes. The problem didn't show up in the two-byte file when it was uploaded with k=3, because then the two-byte read wasn't a multiple of k, and the modulo bug wasn't triggered.

We rewrote _set_segment(), and I think it should now handle all inputs correctly.

comment:10 Changed at 2015-07-28T19:57:53Z by warner

It'd be nice to add a test_web.py case for this, but it needs to use a real SDMF file (uploaded with k=1). Most of the web tests are using fake file objects so they'll run faster.

comment:11 follow-up: Changed at 2015-07-28T23:36:44Z by warner

Actually, I'm ok with not adding a test. The test_mutable.py tests exercise the IReadable.read() offset/range arguments pretty well, and I don't think we've observed any problems in the HTTP Range header parser. Would anyone object if I closed this?

comment:12 in reply to: ↑ 11 Changed at 2015-07-29T00:19:02Z by daira

Replying to warner:

Actually, I'm ok with not adding a test. The test_mutable.py tests exercise the IReadable.read() offset/range arguments pretty well, and I don't think we've observed any problems in the HTTP Range header parser. Would anyone object if I closed this?

I'm ok with not adding a specific test of the HTTP layer, given that we already smoke-tested that, and the bug wasn't in that layer.

comment:13 Changed at 2015-07-29T00:42:21Z by warner

  • Resolution set to fixed
  • Status changed from assigned to closed

Ok, great, closing this one.

Note: See TracTickets for help on using tickets.