[tahoe-dev] Google Summer of Code chooses to sponsor Tahoe-LAFS!

Zooko O'Whielacronx zookog at gmail.com
Fri Mar 19 22:12:52 PDT 2010


An anonymous hacker offered to help out with any GSoC projects about
integrating Tahoe-LAFS and Tor.  Here is my reply:

Thanks! Right now we don't have a Tahoe-LAFS+Tor project on our
GSoCIdeas page [1]. It would be nice to add one, but it needs to have
enough detail that a student can get started on the right track from
it.

Also, it needs to have enough "meat" to keep a student busy all
summer. I think the right way to do that is to make the project be
#467/#573 (allow the user to specify which servers are used for
uploads). Once #467/573 is done, then we can configure Tahoe-LAFS to
upload a few shares to Tor-hidden-servers (K shares -- just enough to
download) and the rest of the shares to non-Tor-hidden-servers. That
would mean that content gets downloaded from the non-hidden-servers
exclusively except in the case that the non-hidden-servers are
unavailable, in which case the Tahoe-LAFS downloader will
automatically fall back to downloading from the Tor-hidden-servers.

I think, based mostly on what Harold Gonzales told me, that this is
the best way to structure this because:

(a) It minimizes the load on Tor when the content is not under attack,
which is important because Tor doesn't handle bulk data loads very
nicely and bulk data loads destroy the latency of the interactive
loads (like ssh sessions or interactive web sessions).

(b) It optimizes the performance experienced by users when the content
they are viewing is not under attack. This is important so that we are
not asking users to endure a performance penalty for all of their
normal web browsing just so that they can be using an attack-resistant
service. The goal is that this would perform well enough and be
convenient enough to serve as a normal way to host static files.

(c) But, if the non-hidden servers *were* to disappear, or start
serving up corrupted data, or just become reallly reallly slow, or
something, then the Tahoe-LAFS storage client would automatically use
shares served by the hidden servers. The result should be that the
content is very attack-resistant.

Another big advantage of doing it this way is that #467/#573 is also
wanted by other users with completely different use cases. #467/#573
is what distributed database folks call "rack awareness", meaning that
they want to ensure that shares get spread across multiple racks and
not just across multiple servers that might happen to be in the same
rack, because a major unfortunate event (usually power related, it
seems) could disconnect or even damage multiple servers in the same
rack. The generalization of this, of course, is "location awareness"
or even more generally "correlated-failure awareness". You don't want
all your shares stored on servers that all sit above the San Andreas
Fault, you don't want all your shares stored on servers that are
operated by the same sysadmin team (even if the servers are isolated
from one another in physical and geographical dimensions), etc. etc.
#467/#573, if done right, should simultaneously satisfy the
Tahoe-LAFS-over-Tor use case and the "rack awareness" use case, as
well as others. See the tickets for a list of requests we've received
for different use cases. One of them is the "Shawn Willden's mom" use
case, which is that files which are family photos have to have at
least K shares stored on Shawn's mom's home computer so that she can
view them instantaneously. :-)

Regards,

Zooko

[1] http://tahoe-lafs.org/trac/tahoe-lafs/wiki/GSoCIdeas

http://allmydata.org/trac/tahoe-lafs/ticket/467# allow the user to
specify which servers are used for uploads
http://allmydata.org/trac/tahoe-lafs/ticket/573# Allow client to
control which storage servers receive shares


More information about the tahoe-dev mailing list