[tahoe-dev] Idea for a Publish/Subscribe Message System on Tahoe-LAFS
Nathan
nejucomo at gmail.com
Tue Feb 21 19:37:38 UTC 2012
On Sat, Feb 11, 2012 at 11:08 AM, darrob <darrob at i2pmail.org> wrote:
> Hi everybody,
>
> I'd like to discuss my idea for a messaging system on Tahoe with you. As
> far as I know there is currently no such thing.
I'm not aware of anyone using Tahoe-LAFS like this, but I'd like to
see more applications like this.
This kind of application presupposes all users have access to the same
grid (as do many multi-user Tahoe-LAFS-based applications). So I want
to "bump" the idea or priority that there should be one universal
Tahoe-LAFS grid, so that a user can publish a capability through
twitter, which then gets picked up by a search engine, which then gets
broadcast by shortwave radio, and a person completely unknown to the
publisher can install Tahoe-LAFS and retrieve that content.
If that could work at large scale, then applications like this
messaging system you propose could grow to the scale of popular
webapps, like twitter.
>
> The following is a very basic idea which sounds promising to me but
> might still have major flaws. It kind of sounds too good and simple to
> be true, so I'd love to read some comments.
>
> So, I believe the following setup would implement a publish/
> subscribe-style message system on Tahoe. It would be more like a
> microblogging than a message board system because users would publish
> "feeds" and others would need to "subscribe" to those feeds first. This
> isn't perfect but seems to complement Tahoe's existing access
> permissions and linking capabilities the most.
>
> Publishing and Subscribing on Tahoe
> -----------------------------------
>
> Users will have one or more directories (i.e. "feeds" and/or
> "identities") on Tahoe into which they will save their messages. Those
> directories can either be made public somewhere or only be shared with
> selected people. "Following" users means linking their feed directories
> in one's own directory [1].
>
> Surprisingly, that's basically it. You'll end up with a list of
> directories containing the messages of people you care for and a
> directory containing your own messages for others.
>
Yes, I like this. The simplicity of understanding the capability
system makes understanding this design natural.
> Reading
> -------
>
> To make this actually usable I imagine that a program will mirror the
> subscription directory to the local filesystem and rearrange the
> individual messages [2] into a Maildir [3]. That's enough to be able to
> read those messages in an MUA, threaded even.
A wonderful feature of basing an application on a LAFS directory /
file tree which follows proper conventions is that any number of user
interfaces would work. For example, one user might use this MUA
scheme, while another might use a browser-based javascript client.
>
> As long as the local copy exists the Maildir format will keep track of
> messages' read status. That information could even be mirrored back onto
> Tahoe, resembling something like OfflineIMAP.
>
> Posting
> -------
>
> A fake SMTP server [4] will have to listen for new messages and simply
> write them to disk. From there they can get uploaded into the user's
> public directory on Tahoe.
>
> That's *really* it.
>
> Side Effect: Aggregators
> ------------------------
>
> Users will obviously be free to publish their own subscription directory
> as well. This would turn them into one of possibly many Twitters or
> maybe something of a Tumblr blog. Somebody could also run scripts to
> publish messages in HTML format or something. These are just some random
> ideas to illustrate some interesting, additional possibilities. They
> aren't relevant for this proposal at all.
>
> Outline of the Directory Structure
> ----------------------------------
>
> This is an outline of the directory structure that I try to describe
> above.
>
> The file naming scheme here is <username>-<user_ID>-<timestamp>
> -<message_hash>. The user_ID could be the directory URI or a shorter
> hash of it etc. The message hash at the end should probably not be MD5.
> I used that here to keep the lines short.
>
> This is just an initial idea and can probably be improved a lot.
>
> Tahoe filesystem
> alice (public)
> alice-<user_ID>-20120101140532-c192ced31c7b9cb49589de9133a6b94a
> alice-<user_ID>-20120105101001-753287bc3b8e11a66063a8dbf763644c
> alice-<user_ID>-20120201055612-aa16718790c37e719417b71b6cc24e2e
> bob (public)
> bob-<user_ID>-20120103233000-8296213e63b1e307ea68a379b26c32a0
> charlie (public)
> charlie-<user_ID>-20120201032055-59ce00d489e18528be2ef97d9dc4d4ba
> charlie-<user_ID>-20120201033510-3b7692ca707d3ae5decbf18754b46d96
> charlie-<user_ID>-20120202064449-bdc6de4f577303369bec8cc9c9c99250
> dave (public)
> dave-<user_ID>-20120204183947-3d057a61f0a54f578f714e4c0daf7ca1
> eric (public)
> eric-<user_ID>-20120118021802-a16d8f6411d9aa6ee1b3a1e6eee8a735
> eric-<user_ID>-20120208021032-bf6385faa2ac42fa38040191ea0f58f9
> alice's subscriptions (private)
> alice-<Tahoe_DIR_URI> (self)
> alice-<user_ID>-20120101140532-c192ced31c7b9cb49589de9133a6b94a
> alice-<user_ID>-20120105101001-753287bc3b8e11a66063a8dbf763644c
> alice-<user_ID>-20120201055612-aa16718790c37e719417b71b6cc24e2e
> bob-<Tahoe_DIR_URI> (include URI to allow multiple people called bob)
> bob-<user_ID>-20120103233000-8296213e63b1e307ea68a379b26c32a0
> eric-<Tahoe_DIR_URI>
> eric-<user_ID>-20120118021802-a16d8f6411d9aa6ee1b3a1e6eee8a735
> eric-<user_ID>-20120208021032-bf6385faa2ac42fa38040191ea0f58f9
> local filesystem
> mirror of alice's subscriptions on Tahoe
> alice-<Tahoe_DIR_URI>
> alice-<user_ID>-20120101140532-c192ced31c7b9cb49589de9133a6b94a
> alice-<user_ID>-20120105101001-753287bc3b8e11a66063a8dbf763644c
> alice-<user_ID>-20120201055612-aa16718790c37e719417b71b6cc24e2e
> bob-<Tahoe_DIR_URI>
> bob-<user_ID>-20120103233000-8296213e63b1e307ea68a379b26c32a0
> eric-<Tahoe_DIR_URI>
> eric-<user_ID>-20120118021802-a16d8f6411d9aa6ee1b3a1e6eee8a735
> eric-<user_ID>-20120208021032-bf6385faa2ac42fa38040191ea0f58f9
> maildir (rearranged copies of mirrored messages)
> cur
> alice-<user_ID>-20120101140532-c192ced31c7b9cb49589de9133a6b94a:2,S
> alice-<user_ID>-20120105101001-753287bc3b8e11a66063a8dbf763644c:2,S
> alice-<user_ID>-20120201055612-aa16718790c37e719417b71b6cc24e2e:2,S
> bob-<user_ID>-20120103233000-8296213e63b1e307ea68a379b26c32a0:2,S
> eric-<user_ID>-20120118021802-a16d8f6411d9aa6ee1b3a1e6eee8a735:2,S
> eric-<user_ID>-20120208021032-bf6385faa2ac42fa38040191ea0f58f9:2,S
> new
> tmp
> alice's feed (messages to be published on Tahoe; written by fake SMTP server)
> alice-<user_ID>-20120101140532-c192ced31c7b9cb49589de9133a6b94a
> alice-<user_ID>-20120105101001-753287bc3b8e11a66063a8dbf763644c
> alice-<user_ID>-20120201055612-aa16718790c37e719417b71b6cc24e2e
>
> Requirements of the Client Software
> -----------------------------------
>
> Necessities
> * Download/mirror subscriptions from Tahoe
> - skip existing messages
> - Check validity of message headers (and maybe even the feed
> directory layout)
> - Rearrange messages into Maildir format
> * Provide a fake SMTP server
> - write "sent" messages to local files and upload them to the
> user's public Tahoe directory
> - check for valid mail headers etc.
> - in case of multiple feed directories/identities we need a way
> to upload messages to the correct directory (parse "From: "
> address?)
> * Create new identities
> - basically `tahoe mkdir`
> - maybe more complicated if we come up with fancy feed metadata
> like a title/description file
> - create globally unique user IDs
> * alice@<Tahoe_DIR_URI>.tahoepsm?
We should probably require all users to use their "real name"... Just kidding!
I recommend being very careful about authentication. If a user sees
"LadyGaga@<Tahoe_DIR_URI>.tahoepsm" they might believe that's somehow
authoritatively related to the famous persona, when in reality, all
they know is that whatever appears there comes from someone with the
write cap.
An alternative scheme is to require users to pick their own nickname
in the UI when they add a subscription. For example if you send me a
dir cap over OTR, I'd put in my configuration:
"<Tahoe_DIR_URI>.tahoepsm: darrob from the tahoe-dev mailing list"
> Luxuries
> * when downloading, check that message IDs match the directory they
> originated from
> - if this is made sure of messages could basically be considered
> signed (right?)
> * we could add a X-TAHOEPSM-VERIFIEDAUTHOR header or
> something
> - once messages have been moved into the Maildir this cannot be
> verified anymore
> - offline archives (see below) won't be able to have this
> checked either
> - it's easy to use PGP with MUAs though
> * allow import of offline archives
> - think sneakernet; bridging of different Tahoe grids
> - X-TAHOEPSM-VERIFIEDAUTHOR: no
> * list subscriptions
> - basically a `tahoe ls`
> - maybe more complicated if we come up with fancy feed metadata
> like a title/description file
> * easy adding of new subscriptions
> - basically `tahoe ln`
> * upload read/flag statuses to private Tahoe dir
> * IMAP-like functionality; allows reading on multiple computers
>
> Conclusion
> ----------
>
> This system would allow for safe and secret communication between groups
> or individuals without any extra effort as the files and directories on
> Tahoe cannot be guessed, discovered or tampered with.
>
> -----
>
> [1]: A text file containing a list of URIs would work, too, I suppose. I
> don't know which will be more efficient.
I cannot say with precision that directories are more efficient, but I
assume that if they are *not* over time they will become more so since
they are an integral use case.
>
> [2]: I chose a system of individual message files because I hope that it
> will make mirroring the Tahoe directories faster. We could just compare
> filenames (unique due to message hash etc.) and skip existing messages
> instead of constantly redownloading them.
I advocated a while ago for exposing a cryptographic hash of a file's
plain text: https://tahoe-lafs.org/trac/tahoe-lafs/ticket/280
There is some optimization during file publication where the same file
content will get encoded in the same way and then if a storage server
has the same share the upload is skipped. My understanding on this
may be out of date, though. Perhaps the optimization has been pushed
closer to the client.
It's always possible, for a particular application to include the
file's hash in its name, or if you must not mangle the name, you can
have a separate namespace (in a separate Tahoe-LAFS directory, for
example) mapping hash to contents.
>
> [3]: I chose the email format over actual feed formats like RSS for a
> few reasons: a) efficiency of synchronization, see [2]; b) plain text
> files with a bunch of email headers are easy to create; c) they can be
> read in existing MUAs and d) can keep references and be displayed nicely
> (threaded) in those MUAs.
>
> [4]: A quick Google search brought up
> http://muffinresearch.co.uk/archives/2010/10/15/fake-smtp-server-with-python/
>
> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev at tahoe-lafs.org
> http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
Nejucomo
More information about the tahoe-dev
mailing list