[tahoe-dev] Tahoe as glue filesystem

zooko zooko at zooko.com
Thu Jul 3 12:23:14 PDT 2008


Dear Valentino Volonghi:

I'm sorry that it has taken so long for me to reply to your inquiry.
The problem was that I am not confident that I understand what you
want to do, nor whether Tahoe would be a good tool for you to do it.

Here I will do my best to answer your questions.


On May 19, 2008, at 3:40 PM, Valentino Volonghi wrote:

 > Hi all, I'm using Twisted Matrix to write a process pool (Ampoule on
 > launchpad). One of the things I'd like to achieve is the possibility
 > to run processes on remote machines and of course here comes the
 > problem of letting them talk to each other.

 > a) does tahoe make any sense for this kind of usage? Seeing that it
 > has very strong read performance my impression is that it does, it
 > might not be tailored for this if subprocesses write too much in it.

The best way to find out if this is a good fit is to try some
experiments.  Here are some general measurements of latency and
throughput for Tahoe reads and writes on LAN and on DSL:

http://allmydata.org/trac/tahoe/wiki/Performance

However, instead of extrapolating from these measurements to what the
performance impact would be in your system, you should try some reads
and writes of the kind that your system will need and measure those.
Fortunately, it is easy to build Tahoe and easy to script reads and
writes (either in Python or with RESTful HTTP calls), and Tahoe
automatically produces detailed performance measurements for you on
each read or write.

For example, I just uploaded a 41 MB mp3 file (econtalk.org podcast --
Robin Hanson on signalling [1]) to the Tahoe Test Grid [2] from my
Macbook Pro over my home DSL line, and got these performance
measurements:

  * File Size: 40980688 bytes
  * Total: 1456.18s (28.1kBps)
     o Storage Index: 684ms (59.86MBps)
     o Peer Selection: 1.42s
     o Encode And Push: 1454.08s (28.3kBps)
        + Cumulative Encoding: 5.19s (7.90MBps)
        + Cumulative Pushing: 1441.24s (28.4kBps)
        + Send Hashes And Close: 7.38s

So it was "encoding" (which includes both encryption and erasure
coding) at a rate of almost 8 MBps and "pushing" (uploading) at a rate
of 28 KBps.


 > b) is it possible to disable parts of the protocol for speed
 > reasons?  I'm sure that all or most of the features inside tahoe are
 > extremely useful when storing sensible data or third party data. But
 > this is not true for subprocesses.  Essentially something very
 > transparent and that could be inspected would be great (so basically
 > limiting capabilities and encryption a bit or any other feature for
 > different cases).

There is no option to disable encryption, but you could always hack
the software to add such an option, but if you did I'll bet you
couldn't measure any improvement in performance, because the
encryption is already quite efficient and imposes a very low overhead.

If your measurements show that the encryption layer's overhead *is*
significant for your intended use cases, then I would like to hear
about it, because I plan to further optimize that layer at some
point...


 > c) would it be a bad idea to add a 'temporary storage' in tahoe that
 > would simply keep data sent in memory (and distribute it between all
 > introducer's clients) in order to be able to access it from inside
 > this cluster of servers.

That's an interesting idea, but you probably have 1/1000th as much RAM
as hard drive.  (Unless you are thinking of letting the operating
system swap the storage to disk in virtual memory, which is another
interesting idea.)


 > Basically I think I see some potential for tahoe to compete against
 > hbase and similar distributed and failsafe filesystems, and I would
 > like to know if that's impossible to obtain or if it requires
 > countless hours and deaths.

Honestly, I don't know, and I think the only way we'll really come to
understand this is for some people to try it and post what they learn
to the public.


Thank you for your note.

Regards,

Zooko

[1] http://tahoebs1.allmydata.com:8123/file/URI%3ACHK% 
3Ay46pv66g7wdly7xkp3adko464y% 
3A3lmidhdhbkqtve3bwlfntqr2y3ljzz54abxrldcjh6pxyz344rra%3A3%3A10% 
3A40980688/@@named=/Hansonsignalling.mp3
[2] http://allmydata.org/trac/tahoe/wiki/TestGrid



More information about the tahoe-dev mailing list