#443 closed enhancement (fixed)

set ETag on immutable directories, short-circuit on cache hit

Reported by: warner Owned by: warner
Priority: major Milestone: 1.10.0
Component: code-frontend-web Version: 1.0.0
Keywords: etag performance news-needed Cc: jeremy@…, zooko
Launchpad Bug:

Description

When retrieving an immutable file, the ETag header should be set to the file's URI. When retrieving a mutable file, it should be set to the current roothash value. When retrieving a directory, it should also be set to the roothash value.

The server should also use the right sort of If-Modified-Since dance, using the ETag to let browser caches do the right thing.

Attachments (2)

imm-dir-etag.diff.txt (2.9 KB) - added by jsgf at 2010-03-12T00:00:24Z.
Add ETag support for immutable directories
short-circuit-imm-etag.diff.txt (2.7 KB) - added by jsgf at 2010-03-12T00:33:12Z.
Short circuit GET when client already has ETag

Download all attachments as: .zip

Change History (29)

comment:1 Changed at 2008-11-06T19:43:22Z by warner

I added code recently to set an ETag on immutable files (using the storage index.. I'm not sure whether to use this or the URI). It needs review, however, because the twisted.web req.setETag method needs to be called at a specific point in the response process (to make it interact with the if-etag-equals HTTP header), and I'm not sure we're doing it quite right.

We still don't have etags on mutable files.

comment:2 Changed at 2010-01-16T00:22:01Z by davidsarah

  • Keywords etag performance test added

#844 references http://allmydata.org/pipermail/tahoe-dev/2009-November/003221.html which claims that the immutable-file ETag support wasn't doing the right thing for (at least) Squid to recognize immutable files as unchanged. Let's make it the goal of this ticket to make sure that we have this support works properly for a representative sample of web proxies and browsers, and open another ticket to add ETag support for mutable files.

comment:3 Changed at 2010-01-16T00:22:51Z by davidsarah

  • Summary changed from webapi should provide an ETag to Check that webapi support for ETags on immutable files works properly

comment:4 follow-up: Changed at 2010-02-27T06:34:22Z by zooko

Can we add into this file to also check that it does the right thing for immutable directories? Or is that sufficiently different that it should be a separate ticket?

comment:5 in reply to: ↑ 4 Changed at 2010-02-27T08:31:08Z by davidsarah

  • Milestone changed from undecided to 1.7.0
  • Summary changed from Check that webapi support for ETags on immutable files works properly to Check that webapi support for ETags on immutable files/directories works properly

Replying to zooko:

Can we add into this file to also check that it does the right thing for immutable directories?

We can indeed.

comment:6 Changed at 2010-03-11T18:16:46Z by jsgf

  • Cc jeremy@… added
  • Owner set to jsgf

I'll have a look at this, since I'm mucking around in that area anyway (at least for immutable; I'll need someone to explain the situation with mutable files to me).

comment:7 Changed at 2010-03-11T23:12:11Z by jsgf

For immutable files:

Basic If-Match appears to be completely ignored; it always returns 200 rather than 412 (precondition failed), regardless of the provided etag.

If-None-Match will return "304 not modified" and no body, as it is supposed.

This is consistent with twisted web's server, which only checks for if-none-match. I'm not sure quite how the body is getting suppressed. We should be checking the result of setETag() to see if we have a etag hit or not; it may be its doing too much work.

Will check immutable dirs next.

Changed at 2010-03-12T00:00:24Z by jsgf

Add ETag support for immutable directories

Changed at 2010-03-12T00:33:12Z by jsgf

Short circuit GET when client already has ETag

comment:8 Changed at 2010-06-16T04:01:19Z by davidsarah

  • Keywords review-needed added
  • Milestone changed from 1.7.0 to 1.7.1

comment:9 Changed at 2010-07-08T05:16:13Z by zooko

Did anyone figure out why it was thought that the ETag mechanism wasn't working for Squid per comment:2? Was the explanation something to do with the thing about If-Match that jsgf mentioned in comment:7? I don't understand what If-Match is for.

comment:10 Changed at 2010-07-08T08:42:51Z by jsgf

I think if-match is for PUT/POST operations to make sure the entity at the specified URI is still in the expected state.

I'm not sure why squid might have problems with Tahoe. The posting referred to in comment:2 is pretty vague about what might be going wrong; it would be nice to see some actual analysis (or even just an HTTP protocol dump).

comment:11 Changed at 2010-07-08T13:40:00Z by zooko

  • Owner changed from jsgf to zooko
  • Status changed from new to assigned
  • Summary changed from Check that webapi support for ETags on immutable files/directories works properly to set ETag on immutable directories, short-circuit on cache hit

comment:12 Changed at 2010-07-08T14:24:53Z by zooko

  • Cc zooko added
  • Keywords review-needed removed
  • Owner changed from zooko to jsgf
  • Status changed from assigned to new

Well, in handling a filenode these patches say:

+            if si and req.setETag(base32.b2a(si)):

but in handling a dirnode these patches say:

+            if si and req.setETag('DIR:%s-%s' % (base32.b2a(si), t or "")):

Shouldn't the filenode also have t appended to its ETag so that people who request somefile?t=json don't get the same thing as the previous person who requested somefile? jsgf: please reply to this and, if appropriate, re-add the "review-needed" flag on this ticket.

comment:13 Changed at 2010-07-08T18:41:50Z by jsgf

Yes, that looks pretty dubious. The ETag should be computed by a single function everywhere, rather than redone in a bespoke fashion at each point. But let me double-check that there isn't something else going on there...

comment:14 Changed at 2010-07-08T18:43:48Z by jsgf

No, I see. The first one is for plain immutable files which only have one representation - their actual binary contents. The second is for directories, which are interpreted by Tahoe itself and can have multiple representations. If you access the directory as a just the raw immutable file then you'll end up with an ETag of the first form.

comment:15 Changed at 2010-07-08T23:54:52Z by zooko

But won't the current patch make the following thing go wrong, causing the second request to cache-hit when instead the second request needs to returning something entirely different.

(If I'm right, then this probably goes to show that our docs/frontends/webapi.txt needs to be reorganized so that people like jsgf can easily find this stuff.)

$ wget http://127.0.0.1:3456/uri/URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124
--2010-07-08 17:39:41--  http://127.0.0.1:3456/uri/URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124
Connecting to 127.0.0.1:3456... connected.
HTTP request sent, awaiting response... 200 OK
Length: 124 [text/plain]
Saving to: `URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124.1'

100%[================================================================================================================================================================>] 124         --.-K/s   in 0s      

2010-07-08 17:39:42 (7.88 MB/s) - `URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124.1' saved [124/124]

$ hexdump -C URI\:CHK\:u7xfz46l7lpwaxw6yjeqnotqk4\:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za\:3\:10\:124 
00000000  47 49 46 38 39 61 0a 00  3c 00 b3 ff 00 c0 c0 c0  |GIF89a..<.......|
00000010  cb cb e5 b1 b1 d8 f8 f8  fb 85 85 c2 e9 e9 f4 b5  |................|
00000020  b5 da dd dd ee ff ff ff  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 21 f9 04  |.............!..|
00000040  01 00 00 00 00 2c 00 00  00 00 0a 00 3c 00 00 04  |.....,......<...|
00000050  29 10 c9 59 0c 00 13 0d  72 ef 3c 5d 27 09 a1 18  |)..Y....r.<]'...|
00000060  94 22 da 9d 6a eb be 70  2c cf 74 6d df 78 ae ef  |."..j..p,.tm.x..|
00000070  7c ef ff c0 a0 70 48 2c  12 23 00 3b              ||....pH,.#.;|
0000007c
$ wget http://127.0.0.1:3456/uri/URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124?t=json
--2010-07-08 17:39:51--  http://127.0.0.1:3456/uri/URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124?t=json
Connecting to 127.0.0.1:3456... connected.
HTTP request sent, awaiting response... 200 OK
Length: 298 [text/plain]
Saving to: `URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124?t=json.1'

100%[================================================================================================================================================================>] 298         --.-K/s   in 0s      

2010-07-08 17:39:51 (18.9 MB/s) - `URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124?t=json.1' saved [298/298]

$ hexdump -C URI\:CHK\:u7xfz46l7lpwaxw6yjeqnotqk4\:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za\:3\:10\:124\?t\=json 
00000000  5b 0a 20 22 66 69 6c 65  6e 6f 64 65 22 2c 20 0a  |[. "filenode", .|
00000010  20 7b 0a 20 20 22 6d 75  74 61 62 6c 65 22 3a 20  | {.  "mutable": |
00000020  66 61 6c 73 65 2c 20 0a  20 20 22 76 65 72 69 66  |false, .  "verif|
00000030  79 5f 75 72 69 22 3a 20  22 55 52 49 3a 43 48 4b  |y_uri": "URI:CHK|
00000040  2d 56 65 72 69 66 69 65  72 3a 79 62 70 6f 64 6e  |-Verifier:ybpodn|
00000050  62 35 69 62 67 71 62 6b  61 74 6b 6d 68 64 65 6f  |b5ibgqbkatkmhdeo|
00000060  76 33 36 6d 3a 69 35 35  71 64 65 79 37 63 77 61  |v36m:i55qdey7cwa|
00000070  71 62 66 75 72 61 66 78  37 33 66 34 66 6c 78 6d  |qbfurafx73f4flxm|
00000080  79 6a 36 6c 68 71 37 6c  79 6e 62 6b 66 63 78 35  |yj6lhq7lynbkfcx5|
00000090  75 73 34 67 37 6c 35 7a  61 3a 33 3a 31 30 3a 31  |us4g7l5za:3:10:1|
000000a0  32 34 22 2c 20 0a 20 20  22 72 6f 5f 75 72 69 22  |24", .  "ro_uri"|
000000b0  3a 20 22 55 52 49 3a 43  48 4b 3a 75 37 78 66 7a  |: "URI:CHK:u7xfz|
000000c0  34 36 6c 37 6c 70 77 61  78 77 36 79 6a 65 71 6e  |46l7lpwaxw6yjeqn|
000000d0  6f 74 71 6b 34 3a 69 35  35 71 64 65 79 37 63 77  |otqk4:i55qdey7cw|
000000e0  61 71 62 66 75 72 61 66  78 37 33 66 34 66 6c 78  |aqbfurafx73f4flx|
000000f0  6d 79 6a 36 6c 68 71 37  6c 79 6e 62 6b 66 63 78  |myj6lhq7lynbkfcx|
00000100  35 75 73 34 67 37 6c 35  7a 61 3a 33 3a 31 30 3a  |5us4g7l5za:3:10:|
00000110  31 32 34 22 2c 20 0a 20  20 22 73 69 7a 65 22 3a  |124", .  "size":|
00000120  20 31 32 34 0a 20 7d 0a  5d 0a                    | 124. }.].|
0000012a

comment:16 Changed at 2010-07-10T23:38:03Z by zooko

  • Keywords test-needed added

I asked jsgf about this in email. He replied:

Yes the etag should distinguish different representations.

Which I believe means that comment:15 is right that the current patch is wrong. I guess this shows that we need a unit test which would be red with the current patch--a unit test that loads a file with one representation (GET /uri/$FILEREADCAP) and then loads it again with a different representation (GET /uri/$FILEREADCAP?t=json) and flunks the code under test if it gave back the same thing on the second query.

comment:17 Changed at 2010-07-17T07:33:35Z by zooko

  • Milestone changed from 1.7.1 to 1.8β

This probably just needs one more unit test (to demonstrate the problem mentioned in comment:16) and one small tweak and it will be ready.

comment:18 Changed at 2010-07-20T04:57:58Z by zooko

  • Keywords news-needed added; test removed

Oh and a NEWS snippet.

comment:19 Changed at 2010-09-11T01:02:12Z by davidsarah

  • Milestone changed from 1.8β to 1.9.0

Out of time for 1.8.

comment:20 Changed at 2011-07-16T21:23:50Z by davidsarah

If anyone wants to add a unit test and fix the issue in comment:15, they should do so in the next week in order for this improvement to get into 1.9.

comment:21 Changed at 2011-07-27T18:23:10Z by zooko

  • Milestone changed from 1.9.0 to soon

comment:22 Changed at 2012-03-31T20:53:23Z by amiller

I added unit tests and a minor tweak.

In jsgf's original patches, the short-circuit behavior was added to the FileDownloader class and would therefore only apply to the "t=None" case. So the problem in comment:15 does not occur.

However, we might as well short-circuit when an etag matches for "t=json" etc., in which case the tags do need to be distinguished.

https://github.com/amiller/tahoe-lafs/pull/2.patch

Last edited at 2012-04-01T00:04:19Z by zooko (previous) (diff)

comment:23 Changed at 2012-03-31T20:53:58Z by amiller

  • Keywords review-needed added; test-needed removed

comment:24 Changed at 2012-04-01T01:12:30Z by davidsarah

  • Milestone changed from soon to 1.10.0
  • Owner changed from jsgf to davidsarah
  • Status changed from new to assigned

comment:25 Changed at 2012-05-13T07:44:53Z by warner

  • Keywords review-needed removed
  • Owner changed from davidsarah to warner
  • Status changed from assigned to new

I've merged together jsgf's changes with amiller's fixes, and added a couple of new tests. I also fixed the implementation to not use an etag for t=info or t=rename-form, since both actually provide variable output. "", "json", "uri", "readonly-uri" are all eligible.

I'll push the changes in a moment.

comment:26 Changed at 2012-05-13T07:47:23Z by Brian Warner <warner@…>

In 5d404db898e1e6dc:

webapi: don't allow ETags in t=info or t=rename-form, both are variable

t=info contains randomly-generated ophandles, and t=rename-form contains the
name of the child being renamed, so neither is eligible for a
short-circuiting ETag. Enhanced test_web to exercise this. Had to improve
FakeCHKFileNode slightly to let it participate. Refs #443.

comment:27 Changed at 2012-05-13T07:50:58Z by warner

  • Resolution set to fixed
  • Status changed from new to closed

This landed in a series of patches ending with 5d404db89.

Note: See TracTickets for help on using tickets.