[tahoe-dev] 'Client' caching?

Nathan Eisenberg nathan at atlasnetworks.us
Thu Nov 26 11:26:07 PST 2009



> -----Original Message-----
> From: Nathan Eisenberg
> Sent: Thursday, November 26, 2009 11:19 AM
> To: 'tahoe-dev at allmydata.org'
> Subject: RE: [tahoe-dev] 'Client' caching?
> 
> > From: tahoe-dev-bounces at allmydata.org [mailto:tahoe-dev-
> > bounces at allmydata.org] On Behalf Of Brian Warner
> > Sent: Thursday, November 26, 2009 11:09 AM
> > To: tahoe-dev at allmydata.org
> > Subject: Re: [tahoe-dev] 'Client' caching?
> >
> >
> > Welcome! Yes, this is the perfect place for questions like this.
> >
> > It's useful to note the distinction between mutable and immutable
> files
> > here. Caching immutable files is perfectly safe: it's a simple
> tradeoff
> > between local storage consumption and performance, assuming a given
> > locality-of-reference / reader behavior. Caching *mutable* files (or
> > directories), on the other hand, is not safe: the tradeoff includes a
> > correctness aspect, since someone else might change the contents and
> > you might use your (now stale) cached copy. In general, we try to
> avoid
> > putting any sorts of heuristics about correctness into tahoe itself,
> so
> > any caching layer that requires a decision on a correctness-vs-
> > performance policy would need to be placed above the tahoe node.
> >
> > Hm, I'm surprised that didn't work. Do you know what caused Squid to
> > believe the file was changing each time? Maybe a quick peek at the
> > returned headers would be informative.
> >
> > Basically, any URL that starts with /uri/URI:CHK: should be immutable
> > and an ideal choice for caching. We'd planned (although I can't check
> > right now to see whether we got around to implementing it or not) to
> > add an ETag: header with the file's UEB header, which is basically a
> > hash of the contents, and thus an ideal etag. And I know we aren't
> > intentionally adding any Cache-Control or date headers that might
> make
> > the file look uncacheable.
> >
> > (for mutable files, there is a similar value called the "roothash"
> > which covers the file contents, and allows If-ETag-Differs: -type
> > queries to do the right thing, but I don't know if we've actually
> > implemented that either).
> >
> > hope that helps,
> >  -Brian
> 
> Hello Brian,
> 
> Yep, I'm currently only interested in immutable files.  I might be
> missing out on functionality by doing so, but I've been trying to get
> up to speed on Tahoe rapidly, which means picking things to hold off on
> attempting to wrap my brain around.  I can see why caching mutable
> files would be bad, though!
> 
> I'm not very familiar with debugging Squid, but here's what I saw in
> the access logs:
> 
> 
> and the store logs:
> 
> 1259221695.460 RELEASE 00 00002F1D 687E491AEFFD986129E9BC7FFF6EF9D2
> 200 1259221693        -1        -1 text/plain 620888/620888 GET
> http://x.x.10.44/uri/(URI)
> 1259221695.477 RELEASE 00 00002F1E 5593E0D36660EBADEE217CE11710366E
> 200 1259221693        -1        -1 text/plain 620888/620888 GET
> http://x.x.10.44/uri/(URI)
> 1259221695.517 RELEASE 00 00002F1F EABE7B59FF7FCC4197BB11E7135F136C
> 200 1259221693        -1        -1 text/plain 620888/620888 GET
> http://x.x.10.44/uri/(URI)
> 1259221695.532 SWAPOUT 00 00002F20 72B9FDA4BB757341E001D528A8E6DE56
> 200 1259221693        -1        -1 text/plain 620888/620888 GET
> http://x.x.10.44/uri/(URI)

Buggery Outlook shortcuts, sorry for the double post... Whatever I hit send my message before I was done typing!

Here's what was in the access logs.

x.x.10.13 - - [26/Nov/2009:07:48:15 +0000] "GET http://x.x.10.44/uri/(URI) HTTP/1.0" 200 621198 "-" "ApacheBench/2  " TCP_REFRESH_MISS:FIRST_UP_PARENT
x.x.10.13 - - [26/Nov/2009:07:48:15 +0000] "GET http://x.x.10.44/uri/(URI) HTTP/1.0" 200 621198 "-" "ApacheBench/2  " TCP_REFRESH_MISS:FIRST_UP_PARENT
x.x.10.13 - - [26/Nov/2009:07:48:15 +0000] "GET http://x.x.10.44/uri/(URI) HTTP/1.0" 200 621198 "-" "ApacheBench/2  " TCP_REFRESH_MISS:FIRST_UP_PARENT
x.x.10.13 - - [26/Nov/2009:07:48:15 +0000] "GET http://x.x.10.44/uri/(URI) HTTP/1.0" 200 621198 "-" "ApacheBench/2  " TCP_REFRESH_MISS:FIRST_UP_PARENT

And the store logs, again

1259221695.460 RELEASE 00 00002F1D 687E491AEFFD986129E9BC7FFF6EF9D2  200 1259221693        -1        -1 text/plain 620888/620888 GET http://x.x.10.44/uri/(URI)
1259221695.477 RELEASE 00 00002F1E 5593E0D36660EBADEE217CE11710366E  200 1259221693        -1        -1 text/plain 620888/620888 GET http://x.x.10.44/uri/(URI)
1259221695.517 RELEASE 00 00002F1F EABE7B59FF7FCC4197BB11E7135F136C  200 1259221693        -1        -1 text/plain 620888/620888 GET http://x.x.10.44/uri/(URI)
1259221695.532 SWAPOUT 00 00002F20 72B9FDA4BB757341E001D528A8E6DE56  200 1259221693        -1        -1 text/plain 620888/620888 GET http://x.x.10.44/uri/(URI)

I presume that the 5th column is some sort of version hash that identifies when a file was modified.  More than likely, there's a way of telling squid to 'shut up and cache the file for x seconds', but I couldn't determine what it was.

In any event, I'm pretty happy with apache/mod_proxy/mod_cache's performance.  It also lets me filter out the 'root' Tahoe interface, so that I can present things to the end user differently (thinking of building a very basic user interface, since the existing one is really too technical for Joe customer.)

Best Regards,
Nathan Eisenberg


More information about the tahoe-dev mailing list