[tahoe-dev] webapps on top of tahoe-lafs: howto - newbie questions

Fri Oct 8 16:45:25 UTC 2010

dear tahoe-dev,

(first of all, i'd like to apologize for making the following questions about
concepts for which I don't have a solid grasp. please point me to concepts or
docs you feel I should review more in-depth before addressing these things)

I'm very interested in tahoe-lafs, from a "cypherpunk" perspective. I've
successfully installed and used it in a small grid. I believe it has the design
features that we were looking for (in terms of resilient, secure, self-managed 
infrastructure).

But I feel sort of stuck thinking about how to build applications on top
of it. Reading through the fs API, it makes sense, but it is still a magical 
black box for me. My programming background is with webapps, so I find myself mostly 
lost when handling concepts related to crypto + p2p (although I *think* I understand
them at high level). I'm trying to change this :). Maybe I'm too used to
the classical single-server mvc way of doing things.

If I understand it correctly, tahoe-lafs is "only" a *distributed, secure
filesystem*. By reading this thread [0] I assume that if I want to design any 
kind of "distributed webapp", I would need a layer on top of tahoe-lafs that would 
take care of:

- managing the write- and read-caps. if I don't trust the server in which app is
  running, I should manage my own node to upload files and be responsible for
  the management of my own keys.
- implementing ACLs: "distributing" read-caps to whoever should be able to read
  them.

(Is this correct so far?)

So I assume the logical way to have something running "on top" of tahoe would be
using a traditional database in any framework (looking at the django canopy
implementation for instance), and delegating the storage of *files* to the grid. 
ie, I upload a file to my traditional app, and it stores it on the
grid, storing the caps in the database (or on another file on the grid). If I
want to share my file with a friend, I share the read-cap by any means I can
think of (using ostatus protocols, or xmpp, for instance).

This dual single-server database-for-data, distributed fs for "media" scheme is what 
I had in mind. But thinking about this, I was wondering:

- how could the rest of the data, ie, all or part of the app's database, be also 
  stored in the grid (I guess the obvious answer would be "serializing and uploading
  to grid", and then deserializing + syncing on the read side ?) and shared only
  with selected end-users?
- Do I need to come up with an extra communication layer to share the read-caps with 
  "friend" nodes, or could I somehow make use of the underlying DHT?

..

by other side, i've been playing a little with alternatives to rdbms, like rdf
triplestores (having them on the grid with proper acls sounds really good) or
document-oriented db (some mongo and couch), and being delighted with their
practical advantages (schema-free) to build stuff (although again I must be 
lacking background to fully appreciate their implications).

I say this because I was a bit confused when I discovered [1][2] that the the fs
layer on tahoe-lafs is in fact build upon a distributed key-value store layer.
Could this key-value store be used for other purposes than the top fs
abstraction? (thinking about indexing and querying data chunks). 
I guess the answer might be no, being them non-human-meaningful?

I came through these questions thinking again about how something like diaspora could
be ported to work on top of tahoe, and again, I see some conceptual barriers
from my limited webdev optic: a key-value or document store is something I can
readily query and filter on the fly, while a "file" is an abstraction I have to 
write/read as a unit, and process before building any complex app that needs to be able
to filter/sort data in a efficient way.

besides, the couchapp diaspora port [3] seems very interesting by the couchdb builtin
features for selective replication. I'd really like to contribute towards seing
something similar based on strong crypto and distributedness, but as I explain 
in the lines above, I'm completely lost just by starting to think how to connect 
the html+js frontend to the storage grid, and which should be the role of the database 
in between (hmm something in the lines of what's discussed here [4]... is
html/js <--> storage grid the only possible answer? )

Again, I'd like to apologize if something of the above is completely nonsense;
I'd be grateful if you see any errors in my understanding and can point to anything
I should assimilate before shooting this kind of questions :)

thanks for your time and any thoughts, 

cal.

[0] http://www.mail-archive.com/cryptography@metzdowd.com/msg10782.html
[1] http://events.ccc.de/congress/2009/wiki/Tahoe-LAFS_Workshop
[2] http://tahoe-lafs.org/trac/tahoe-lafs/ticket/869
[3] http://github.com/maxogden/couchappspora
[4] http://identi.ca/conversation/54300294#notice-54798703