Context Navigation

← Previous Ticket
Next Ticket →

Opened at 2008-02-06T00:25:50Z

Last modified at 2020-01-16T19:39:16Z

#300 closed defect

macfuse: need some sort of caching — at Version 2

Reported by:	robk	Owned by:	robk
Priority:	major	Milestone:	eventually
Component:	code-frontend	Version:	0.7.0
Keywords:	fuse mac cache performance	Cc:
Launchpad Bug:

Description (last modified by zooko)

so doing some initial experiments with mac fuse and python fuse bindings, it seems like the simple act of viewing a directory in finder generates a large number of calls through the fuse api.

I ran a stub (loopback) fs with instrumentation of each fuse call, and opened a directory or two, with only a few files. (having tested a much larger directory and seen correspondingly larger numbers of calls). the tool inserted a 100ms delay in answering each call, which explains the spacing of calls over time.

see attached log.

Change History (3)

Changed at 2008-02-06T00:26:39Z by robk

Attachment tfuse.log added

log of fuse calls

comment:1 Changed at 2008-02-12T02:51:20Z by warner

So if I'm reading that log right, when the finder looks in a directory, it makes the following calls:

about 102 calls to access(DIR) 14 calls to getattr(DIR)

3 calls to getattr(.DS_Store) 1 call to getattr(.hidden) 1 call to readdir(DIR) 21 calls to statfs()

for FILE in DIR:

24 calls to access(FILE) 6 calls to getattr(FILE) 12 calls to access(FILE.swp) 3 calls to getattr(FILE.swp)

And displaying that 5-file directory resulted in about 330 system calls. Impressive! :-)

It sounds like everything except statfs() can be handled with the data from a single dirnode, so caching it long enough to make sure that this batch of 330-ish calls can be fed with a single Tahoe dirnode fetch is an important goal. We have a few numbers to suggest how long it takes to perform this fetch: http://allmydata.org/tahoe-figleaf-graph/hanford.allmydata.com-tahoe_speedstats_delay_SSK.html suggests that it takes about 70ms for a Tahoe node to retrieve a small mutable file over a DSL line. There will be some extra delays involved if we include web API time, or more servers than those used on our speed-net test, but I believe that any given directory should be fully retrieveable in under a second.

So we'll need to choose a caching policy based upon the following criteria:

displaying a directory requires several hundred system calls that refer to the same dirnode contents, in rapid succession
fetching the dirnode contents probably takes less than a second, closer to 100ms

The cache entries should expire after some reasonable period of time. Longer expiration times will produce surprises and frustration when a user updates a directory on one machine and then fails to see the updates on a different machine.

If the expiration time is more than a few seconds, the implementation will require some sort of forced-expiration or local-update in the face of locally-caused changes to the directory, to make sure you can see the changes you just made. (if we didn't have caching, we wouldn't need this relatively-complicated feature).

My straw-man suggestion is the following:

index the cache by the URI of the directory
expire the cache entries 10 seconds after they are retrieved
expire the cache entries immediately if the directory is modified

More data (specifically system-call traces) would be useful on the following cases:

opening a child directory directly (perhaps through a symlink). Does the finder do a lot of calls for the ancestor directories? If so, that will increase the pressure to retain cached entries longer.
when writing to a file in a directory, how much (and when) is the directory re-read? That will influence the modify-the-cache vs. expire-the-cache design decisions.

comment:2 Changed at 2008-03-08T02:52:06Z by zooko

Description modified (diff)
Keywords macintosh added
Milestone changed from 0.9.0 (Allmydata 3.0 final) to 0.10.0

Note: See TracTickets for help on using tickets.

Download in other formats: