Tahoe as git remote
Sean Lynch
seanl at literati.org
Fri Oct 23 22:16:54 UTC 2015
On Fri, Oct 23, 2015 at 6:21 AM Greg Troxel <gdt at ir.bbn.com> wrote:
>
> The other is to implement a FUSE interface for tahoe. This could be a
> program in python that does tahoe ops using the existing code and takes
> requests from FUSE. This will run into the same issues that sshfs has,
> and that other distributed filesystems have, which is that the posix
> interface allows arbitrary writes to a file, which turns into a need for
> read-modify-write. But, it's fairly normal to only write to the cloud
> on when the close system call happens, which means a usage pattern of
> open/write/write/write/close can result in a put without having to get,
> and with only a single put. coda does this, and it worked reasonably
> well.
>
Coda... now there's a name I haven't heard in a while. I'm pretty sure git
never modifies files, so it seems like "write on close" is just fine. This
is what I did in an S3 FUSE filesystem I implemented at Linden Lab as a
cheap way to integrate offsite backups into Bacula.
> All that said, one of the things missing in tahoe is caching, where
> copies of files from the grid are kept locally to make reads more
> efficient. In coda's case, there is write-back caching, so
> open/write/close is fast, and then the changes are put back to the
> servers. But all of this raises the spectre of locking and conflicts -
> which are quite avoidable if you only use the distributed fs from one
> place at a time.
>
It seems like all one needs for avoiding write conflicts is a CAS
operation. Then you need a way to notify the user about it and allow them
to resolve the conflict. If you only write on close, though, that makes
solutions like renaming one or both conflicting versions possible.
> It might be that caching should be a layered FUSE fileysstem that
> presents a cache to the user while using an uncached fs. I think there
> are read-only caches. But this is tricky because once you have caching
> you more or less need cache invalidation. Coda has all this - when a
> user on one system opens a file for write it gets a write lock from the
> servers, and when it's written the other servers get notified and
> invalidate their local caches (details fuzzy, but the point is right).
>
The requirement for cache invalidation depends on the cost of having a
stale entry in the cache. I suspect most uses of Tahoe don't care very much
about it, and where they do, a "readthrough" operation should suffice,
either to the underlying uncached OS or directly to Tahoe.
I'm not sure how well FUSE works in a layered setup. This is the sort of
thing GlusterFS does, but it doesn't interact with the kernel filesystem
machinery between layers. 9p is also an option, which has the advantage of
working over any kind of stream connection, including ssh, and has
extensible authentication support.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20151023/fdbf2c49/attachment.html>
More information about the tahoe-dev
mailing list