[tahoe-dev] Fwd: LAFS performance

Fri Jul 16 18:29:35 UTC 2010

---------- Forwarded message ----------
From: Zooko O'Whielacronx <zooko at zooko.com>
Date: Tue, Jul 6, 2010 at 3:17 PM
Subject: Re: LAFS performance
To: Nikita Borisov <nikita at illinois.edu>

Hi Nikita! Good to hear from you!

On Tue, Jul 6, 2010 at 1:04 PM, Nikita Borisov <nikita at illinois.edu> wrote:
>
> I'm working on a project building decentralized social networks, and
> I'm trying to think about how to design a storage layer for it.

Yay! I love the idea of decentralized social networks. I liked the
idea (which I think might have been a student of yours) to
transparently encrypt your data when you type it into Facebook. My
brother Nathan has a really well thought-out design in his head which
he calls "Grapevine". Then there is, of course, Diaspora. Then there
is an idea I have which is: write your decentralized social network
entirely in JavaScript (ideally in a secure subset: Cajita, Jacaranda,
AdSafe or Secure EcmaScript), and deploy it on Tahoe-LAFS. Then you
have a fully decentralized thing which is also persistent, globally
reachable and identifiable, and fault-tolerant. (See below for caveats
about fault tolerance.) Also Tahoe-LAFS has excellent access control
features which you can probably leverage.

So far there are two known instances of this pattern of deploying
JavaScript on top of Tahoe-LAFS. One is my blog:

http://pubgrid.tahoe-lafs.org/uri/URI:DIR2-RO:ixqhc4kdbjxc7o65xjnveoewym:5x6lwoxghrd5rxhwunzavft2qygfkt27oj3fbxlq4c6p45z5uneq/blog.html

The other is a Google Summer of Code project:

http://tahoe-music-player.tumblr.com/

>  We
> took a brief look at Tahoe-LAFS, but one concern I have about the
> implementation is that, to read a single file, you have to pick up a
> bunch of pieces that are scattered around the world on different
> nodes, and that seems like it would significantly impact performance.

Is the concern that the pieces are remote from the computation, or
that there are multiple pieces for a single file?

As far as the latter goes, Tahoe-LAFS as an architecture is pretty
good. There are M different pieces out there on M different servers,
and you need any K of them in order to get the whole file. Each piece
is 1/K of the size of the whole file. Therefore, you can download the
fastest K of them and have your file.

This is also what the current Tahoe-LAFS implementation actually does,
although there may be other slowdowns in the current implementation
having to do with pipelining successive reads of more and more data
for the same file.

For files smaller than 128 KiB that doesn't matter--you'll get the
entire file in the first read.

There are also other slowdowns, mostly about how many round trips you
have to take between the downloader and each server before you can
download the first byte of file data. There is a patch named "Brian's
New Downloader" which is mostly complete and will very likely be
released in August or September of this year which also optimizes that
a bit.

So all that I just said should give you reason to believe that
something with a Tahoe-LAFS-like architecture could be fast, and that
future releases of Tahoe-LAFS could be fast, but whether the *current*
version of Tahoe-LAFS is actually fast *enough* for your particular
use case should be empirically verified. :-)

Here is a quick test. Go to http://tahoe-lafs.org . Click on "Try
Tahoe-LAFS now!". Then click on some file that you see there which is
approximately of the size you care about. Then click on "Return to
Welcome page" and then click on "Recent Uploads and Downloads". One of
the recent downloads should be the one that you just did (maybe check
timestamps and file sizes to identify it). Click on that one's
"status" entry which is named "Finished". That will take you to a page
that shows the timings of that downloads. For example, the one that I
just did reports:

# Timings:

   * File Size: 309460 bytes
   * Total: 2.54s (121.8kBps)
         o Peer Selection: 54ms
         o UEB Fetch: 137ms
         o Hashtree Fetch: 46ms
         o Segment Fetch: 2.30s (134.5kBps)
               + Cumulative Fetching: 2.29s (135.1kBps)
               + Cumulative Decoding: 144us (2134.82MBps)
               + Cumulative Decrypting: 9.1ms (34.10MBps)
         o Paused by client: 1.7ms
   * Per-Server Segment Fetch Response Times:
         o [fp3xjndg]: 1.36s, 513ms, 264ms
         o [lv3fqmev]: 400ms, 269ms, 94ms
         o [tavrk54e]: 785ms, 663ms, 135ms

Hm, one of the three servers took 1.36 *seconds* to respond to the
first request! That's pretty slow. Maybe that particular server is an
overloaded virtual machine or something. However, even with that 1.36s
delay, the overall transfer time for the 300 KB file was 2.5s, which
is reasonable.

> I didn't see any measurements in the papers I read on the website,
> have you guys run up against this / looked into potential solutions?
> Do you have any studies of caching or load balancing in Tahoe?

Caching: now, although you can certainly find some detailed discussion
of technical issues with possible caching strategies, in the issue
tracker. Load balancing: depends on what you mean. The download just
chose the fastest 3 servers out of 6. (It was supposed to be 10 but
apparently we don't have 10 working servers on the demo grid right
now.)

> Also, have you looked at the problem of denial-of-service attacks
> where I try to erase your files by overwriting the data at the right
> location in the hash table?  You won't be able to create anything
> authentic, but it still seems like a problem.  Is that what the Verify
> caps are for?

We have thought about that sort of thing. In a sense, that is what the
erasure coding is for. You would have to overwrite the data in any 8
out of 10 servers in order to delete it. Perhaps you could be cleverer
and do things like trick us into uploading to only 6 servers when we
meant to upload to 10. Then you would have to overwrite the data in
any 4 of those 6 to delete the data.

This is all assuming the default (and widely used) erasure coding
parameters of K=3, M=10. However you can also choose any 1<=M<=256 and
any 1<=K<=256. The hard part is getting enough reliable servers
connected to your grid. :-)

I don't say that our story is complete about that. There are some
known flaws involving that sort of trickery where you cause all the
good servers to lose their TCP connections right when I am uploading
and therefore I upload only to bad servers. Some of those we will
hopefully fix this summer. But I would say that our story about this
is probably a damn sight better than anyone else's. :-) (As far as
actual working code that you can rely on.)

Regards,

Zooko

P.S. See also http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/performance.txt

P.P.S. Oh I think I misunderstood your question. I was telling you
about our attempts to defend against a harder problem: a conspiracy of
servers which collude to delete your data. Preventing unauthorized
*other clients* from deleting your data is solved with cryptography:
the key in the key-value store is not free but is instead constrained
to be either the secure hash of the value (for immutable data) or the
public key used to sign the data (for mutable data).