Performance – Tahoe-LAFS

Context Navigation

Version 16 (modified by warner, at 2007-10-31T18:38:44Z) (diff)
explain the sometimes-slow DSL results

Some basic notes on performance:

Memory Footprint

We try to keep the Tahoe memory footprint low by continuously monitoring the memory consumed by common operations like upload and download.

For each currently active upload or download, we never handle more than a single segment of data at a time. This serves to keep the data-driven footprint down to something like 4MB or 5MB per active upload/download.

Some other notes on memory footprint:

importing sqlite (for the share-lease database) raised the static footprint by 6MB, going from 24.3MB to 31.5MB (as evidenced by the munin graph from 2007-08-29 to 2007-09-02).

importing nevow and twisted.web (for the web interface) raises the static footprint by about 3MB (from 12.8MB to 15.7MB).

The 32-bit memory usage graph shows our static memory footprint on a 32bit machine (starting a node but not doing anything with it) to be about 24MB. Uploading one file at a time gets the node to about 29MB. (we only process one segment at a time, so peak memory consumption occurs when the file is a few MB in size and does not grow beyond that). Uploading multiple files at once would increase this.

We also have a 64-bit memory usage graph, which currently shows a disturbingly large static footprint. We've determined that simply importing a few of our support libraries (such as Twisted) results in most of this expansion, before the node is ever even started. The cause for this is still being investigated: we can think of plenty of reasons for it to be 2x, but the results show something closer to 6x.

Network Speed

Test Results

Using a 3-server testnet in colo and an uploading node at home (on a DSL line that gets about 78kBps upstream and has a 14ms ping time to colo) using 0.5.1-34 takes 820ms-900ms per 1kB file uploaded (80-90s for 100 files, 819s for 1000 files). The DSL speed results are occasionally worse than usual, when the owner of the DSL line is using it for other purposes while a test is taking place.

'scp' of 3.3kB files (simulating expansion) takes 8.3s for 100 files and 79s for 1000 files, 80ms each.

Doing the same uploads locally on my laptop (both the uploading node and the storage nodes are local) takes 46s for 100 1kB files and 369s for 1000 files.

Small files seem to be limited by a per-file overhead. Large files are limited by the link speed.

The munin delay graph and rate graph show these Ax+B numbers for a node in colo and a node behind a DSL line.

The delay*RTT graph shows this per-file delay as a multiple of the average round-trip time between the client node and the testnet. Much of the work done to upload a file involves waiting for message to make a round-trip, so expressing the per-file delay in units of RTT helps to compare the observed performance against the predicted value.

Roundtrips

The 0.5.1 release requires about 9 roundtrips for each share it uploads. The upload algorithm sends data to all shareholders in parallel, but these 9 phases are done sequentially. The phases are:

allocate_buckets
send_subshare (once per segment)
send_plaintext_hash_tree
send_crypttext_hash_tree
send_subshare_hash_trees
send_share_hash_trees
send_UEB
close
dirnode update

We need to keep the send_subshare calls sequential (to keep our memory footprint down), and we need a barrier between the close and the dirnode update (for robustness and clarity), but the others could be pipelined. 9*14ms=126ms, which accounts for about 15% of the measured upload time.

Doing steps 2-8 in parallel (using the attached pipeline-sends.diff patch) does indeed seem to bring the time-per-file down from 900ms to about 800ms, although the results aren't conclusive.

With the pipeline-sends patch, my uploads take A+B*size time, where A is 790ms and B is 1/23.4kBps . 3.3/B gives the same speed that basic 'scp' gets, which ought to be my upstream bandwidth. This suggests that the main limitation to upload speed is the constant per-file overhead, and the FEC expansion factor.

Storage Servers

ext3 (on tahoebs1) refuses to create more than 32000 subdirectories in a single parent directory. In 0.5.1, this appears as a limit on the number of buckets (one per storage index) that any StorageServer? can hold. A simple nested directory structure will work around this.. the following code would let us manage 33.5G shares (see #150).

  from idlib import b2a
  os.path.join(b2a(si[:2]), b2a(si[2:4]), b2a(si))

This limitation is independent of problems of memory use and lookup speed. Once the number of buckets is large, the filesystem may take a long time (and multiple disk seeks) to determine if a bucket is present or not. The provisioning page suggests how frequently these lookups will take place, and we can compare this against the time each one will take to see if we can keep up or not. If and when necessary, we'll move to a more sophisticated storage server design (perhaps with a database to locate shares).

I was unable to measure a consistent slowdown resulting from having 30000 buckets in a single storage server.

Attachments (1)

pipeline-sends.diff (1.5 KB) - added by warner at 2007-09-09T00:10:23Z. patch to pipeline the hash-sends during upload

Download all attachments as: .zip

Download in other formats:

Plain Text