#300 closed defect (wontfix)

macfuse: need some sort of caching

Reported by: robk Owned by: robk
Priority: major Milestone: eventually
Component: code-frontend Version: 0.7.0
Keywords: fuse mac cache performance Cc:
Launchpad Bug:

Description (last modified by exarkun)

so doing some initial experiments with mac fuse and python fuse bindings, it seems like the simple act of viewing a directory in finder generates a large number of calls through the fuse api.

I ran a stub (loopback) fs with instrumentation of each fuse call, and opened a directory or two, with only a few files. (having tested a much larger directory and seen correspondingly larger numbers of calls). the tool inserted a 100ms delay in answering each call, which explains the spacing of calls over time.

see attached log.

Attachments (1)

tfuse.log (37.1 KB) - added by robk at 2008-02-06T00:26:39Z.
log of fuse calls

Download all attachments as: .zip

Change History (11)

Changed at 2008-02-06T00:26:39Z by robk

log of fuse calls

comment:1 Changed at 2008-02-12T02:51:20Z by warner

So if I'm reading that log right, when the finder looks in a directory, it makes the following calls:

about 102 calls to access(DIR) 14 calls to getattr(DIR)

3 calls to getattr(.DS_Store) 1 call to getattr(.hidden) 1 call to readdir(DIR) 21 calls to statfs()

for FILE in DIR:

24 calls to access(FILE) 6 calls to getattr(FILE) 12 calls to access(FILE.swp) 3 calls to getattr(FILE.swp)

And displaying that 5-file directory resulted in about 330 system calls. Impressive! :-)

It sounds like everything except statfs() can be handled with the data from a single dirnode, so caching it long enough to make sure that this batch of 330-ish calls can be fed with a single Tahoe dirnode fetch is an important goal. We have a few numbers to suggest how long it takes to perform this fetch: http://allmydata.org/tahoe-figleaf-graph/hanford.allmydata.com-tahoe_speedstats_delay_SSK.html suggests that it takes about 70ms for a Tahoe node to retrieve a small mutable file over a DSL line. There will be some extra delays involved if we include web API time, or more servers than those used on our speed-net test, but I believe that any given directory should be fully retrieveable in under a second.

So we'll need to choose a caching policy based upon the following criteria:

  • displaying a directory requires several hundred system calls that refer to the same dirnode contents, in rapid succession
  • fetching the dirnode contents probably takes less than a second, closer to 100ms

The cache entries should expire after some reasonable period of time. Longer expiration times will produce surprises and frustration when a user updates a directory on one machine and then fails to see the updates on a different machine.

If the expiration time is more than a few seconds, the implementation will require some sort of forced-expiration or local-update in the face of locally-caused changes to the directory, to make sure you can see the changes you just made. (if we didn't have caching, we wouldn't need this relatively-complicated feature).

My straw-man suggestion is the following:

  • index the cache by the URI of the directory
  • expire the cache entries 10 seconds after they are retrieved
  • expire the cache entries immediately if the directory is modified

More data (specifically system-call traces) would be useful on the following cases:

  • opening a child directory directly (perhaps through a symlink). Does the finder do a lot of calls for the ancestor directories? If so, that will increase the pressure to retain cached entries longer.
  • when writing to a file in a directory, how much (and when) is the directory re-read? That will influence the modify-the-cache vs. expire-the-cache design decisions.

comment:2 Changed at 2008-03-08T02:52:06Z by zooko

  • Description modified (diff)
  • Keywords macintosh added
  • Milestone changed from 0.9.0 (Allmydata 3.0 final) to 0.10.0

comment:3 Changed at 2008-05-29T22:21:00Z by warner

  • Milestone changed from 1.1.0 to 1.2.0

comment:4 Changed at 2009-06-21T19:36:00Z by zooko

  • Keywords changed from fuse mac macintosh to fuse,mac,macintosh

comment:5 Changed at 2009-06-30T12:38:59Z by zooko

  • Milestone changed from 1.5.0 to eventually

comment:6 Changed at 2009-09-24T05:54:33Z by zooko

If you like this ticket, you might also like #606 (backupdb: add directory cache), #465 (add a mutable-file cache), and #316 (add caching to tahoe proper?).

comment:7 Changed at 2009-12-13T04:34:59Z by terrell

  • Keywords changed from fuse,mac,macintosh to fuse mac macintosh

comment:8 Changed at 2010-02-13T00:44:16Z by davidsarah

  • Keywords cache performance added; macintosh removed

comment:9 Changed at 2020-01-16T19:39:10Z by exarkun

  • Description modified (diff)

The direct FUSE support in Tahoe-LAFS was removed in 4f8e3e5ae8fefc01df3177e737d8ce148edd60b9 (2011). The preferred route to have native filesystem-like interface is via the SFTP frontend and something like sshfs.

comment:10 Changed at 2020-01-16T19:39:16Z by exarkun

  • Resolution set to wontfix
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.