[tahoe-lafs-trac-stream] [Tahoe-LAFS] #935: zandr's FUSE/NAS idea

Tahoe-LAFS trac at tahoe-lafs.org
Sun Sep 8 22:39:58 UTC 2019


#935: zandr's FUSE/NAS idea
-------------------------+-------------------------------------------------
     Reporter:  warner   |      Owner:
         Type:           |     Status:  new
  enhancement            |
     Priority:  major    |  Milestone:  eventually
    Component:  code-    |    Version:  1.5.0
  frontend               |   Keywords:  fuse smb sftp sshfs webdav cache
   Resolution:           |  preservation gsoc
Launchpad Bug:           |
-------------------------+-------------------------------------------------
Changes (by amontero):

 * cc: amontero@… (added)


Old description:

> At lunch today, Zandr and I were talking about an interesting approach to
> a
> tahoe frontend.
>
> Imagine, if you will, a NAS box, to which your client connects via webdav
> or
> some other convenient protocol. On this box sites a specialized webdav
> server, a Tahoe node, and a bunch of (real) disk.
>
> The server maintains a database. For each pathname visible to the client,
> the
> database records two things: "file present on disk?" and "filecap in
> grid?".
> When the client reads a file, the server checks to see if a real file is
> present on disk, and if so, it satisfies the read with that file. If not,
> it
> uses the filecap to satisfy whatever piece of the data the client
> requested
> (e.g. with a Range: header), returns it to the client, writes it to local
> disk, then (in the background) fills the rest of the local disk file with
> data from the grid.
>
> On write, the server writes data to a real local file. Later, when the
> file
> has stopped twitching, the server uploads the file into the grid and
> updates
> the database to reflect the filecap.
>
> Much later, when the server concludes that this file is no longer "hot",
> it
> removes the local disk copy. There are two separate timers: one to decide
> when the contents are stable, another to decide when the file is no
> longer
> interesting enough to spend local disk space on. The latter timer is
> likely
> to be related to the amount of disk space available.
>
> From the client's point of view, this is just a NAS box that occasionally
> suffers from higher-than-normal latency, but all of its contents
> eventually
> show up on a tahoe backup grid.
>
> Shared directories must be tolerated somehow. I imagine that the server
> maintains a cache of dirnode contents (so that the client sees
> directories
> load quickly), but when a client references a given path, the cached
> dirnodes
> on that path are refreshed more quickly than the others. And of course
> any
> UCWE surprises are cause for refreshing a lot dirnodes. With a real on-
> disk
> copy of the file, the server could deal with collisions by presenting the
> old
> version, the new local version, and the new upstream version, and let the
> user sort it out.
>
> This idea has been partially explored before, both by the windows FUSE-
> like code that
> shipped with the allmydata.com client, and in the OS-X FUSE code
> ("blackmatch") written by Rob Kinninmont. But neither of these are
> particularly general or available for widespread use.

New description:

 At lunch today, Zandr and I were talking about an interesting approach to
 a
 tahoe frontend.
 Imagine, if you will, a NAS box, to which your client connects via webdav
 or
 some other convenient protocol. On this box sites a specialized webdav
 server, a Tahoe node, and a bunch of (real) disk.
 The server maintains a database. For each pathname visible to the client,
 the
 database records two things: "file present on disk?" and "filecap in
 grid?".
 When the client reads a file, the server checks to see if a real file is
 present on disk, and if so, it satisfies the read with that file. If not,
 it
 uses the filecap to satisfy whatever piece of the data the client
 requested
 (e.g. with a Range: header), returns it to the client, writes it to local
 disk, then (in the background) fills the rest of the local disk file with
 data from the grid.
 On write, the server writes data to a real local file. Later, when the
 file
 has stopped twitching, the server uploads the file into the grid and
 updates
 the database to reflect the filecap.
 Much later, when the server concludes that this file is no longer "hot",
 it
 removes the local disk copy. There are two separate timers: one to decide
 when the contents are stable, another to decide when the file is no longer
 interesting enough to spend local disk space on. The latter timer is
 likely
 to be related to the amount of disk space available.
 From the client's point of view, this is just a NAS box that occasionally
 suffers from higher-than-normal latency, but all of its contents
 eventually
 show up on a tahoe backup grid.
 Shared directories must be tolerated somehow. I imagine that the server
 maintains a cache of dirnode contents (so that the client sees directories
 load quickly), but when a client references a given path, the cached
 dirnodes
 on that path are refreshed more quickly than the others. And of course any
 UCWE surprises are cause for refreshing a lot dirnodes. With a real on-
 disk
 copy of the file, the server could deal with collisions by presenting the
 old
 version, the new local version, and the new upstream version, and let the
 user sort it out.
 This idea has been partially explored before, both by the windows FUSE-
 like code that
 shipped with the allmydata.com client, and in the OS-X FUSE code
 ("blackmatch") written by Rob Kinninmont. But neither of these are
 particularly general or available for widespread use.

--

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/935#comment:10>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list