<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Fri, Oct 23, 2015 at 6:21 AM Greg Troxel <<a href="mailto:gdt@ir.bbn.com">gdt@ir.bbn.com</a>> wrote:</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

The other is to implement a FUSE interface for tahoe.   This could be a<br>

program in python that does tahoe ops using the existing code and takes<br>

requests from FUSE.   This will run into the same issues that sshfs has,<br>

and that other distributed filesystems have, which is that the posix<br>

interface allows arbitrary writes to a file, which turns into a need for<br>

read-modify-write.  But, it's fairly normal to only write to the cloud<br>

on when the close system call happens, which means a usage pattern of<br>

open/write/write/write/close can result in a put without having to get,<br>

and with only a single put.   coda does this, and it worked reasonably<br>

well.<br></blockquote><div><br></div><div>Coda... now there's a name I haven't heard in a while. I'm pretty sure git never modifies files, so it seems like "write on close" is just fine. This is what I did in an S3 FUSE filesystem I implemented at Linden Lab as a cheap way to integrate offsite backups into Bacula.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

All that said, one of the things missing in tahoe is caching, where<br>

copies of files from the grid are kept locally to make reads more<br>

efficient.  In coda's case, there is write-back caching, so<br>

open/write/close is fast, and then the changes are put back to the<br>

servers.  But all of this raises the spectre of locking and conflicts -<br>

which are quite avoidable if you only use the distributed fs from one<br>

place at a time.<br></blockquote><div><br></div><div>It seems like all one needs for avoiding write conflicts is a CAS operation. Then you need a way to notify the user about it and allow them to resolve the conflict. If you only write on close, though, that makes solutions like renaming one or both conflicting versions possible.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

It might be that caching should be a layered FUSE fileysstem that<br>

presents a cache to the user while using an uncached fs.  I think there<br>

are read-only caches.  But this is tricky because once you have caching<br>

you more or less need cache invalidation.  Coda has all this - when a<br>

user on one system opens a file for write it gets a write lock from the<br>

servers, and when it's written the other servers get notified and<br>

invalidate their local caches (details fuzzy, but the point is right).<br></blockquote><div><br></div><div>The requirement for cache invalidation depends on the cost of having a stale entry in the cache. I suspect most uses of Tahoe don't care very much about it, and where they do, a "readthrough" operation should suffice, either to the underlying uncached OS or directly to Tahoe.</div><div><br></div><div>I'm not sure how well FUSE works in a layered setup. This is the sort of thing GlusterFS does, but it doesn't interact with the kernel filesystem machinery between layers. 9p is also an option, which has the advantage of working over any kind of stream connection, including ssh, and has extensible authentication support.</div></div></div>