[tahoe-dev] Tahoe benchmarking data

Kyle Markley kyle at arbyte.us
Sun Jul 25 20:02:20 UTC 2010


Hello developers,

I collected some benchmark data on variations of file size, wired vs.
wireless network, and variations of the pipeline_size
(immutable/layout.py:97) and max_segment_size (client.py:118).  I've
included the results as measured by /usr/bin/time, and tcpdump logs for one
typical experiment.

Background:

This was a two-node benchmark with no helper running.  The introducer ran
on the client machine.  The client node was {OpenBSD 4.6 / Core i7 940
2.93GHz} and the storage node was {Windows Vista SP2 / Pentium T3400
2.16GHz}.  The client node is always connected via 100Mbps ethernet to an
Actiontec MI424WR router.  The storage node was connected to the same
router, either via 100Mbps ethernet or by wireless.

These are my versions:
allmydata-tahoe: 1.7.1, foolscap: 0.5.1, pycryptopp: 0.5.19, zfec: 1.4.7,
Twisted: 10.0.0, Nevow: 0.10.0, zope.interface: 3.6.1, python: 2.6.2,
platform: OpenBSD-4.6-amd64-Genuine_Intel-R-_CPU_000_ at _2.93GHz-64bit-ELF,
sqlite: 3.6.13, simplejson: 2.1.1, argparse: 1.1, pycrypto: 2.1.0,
pyOpenSSL: 0.10, pyutil: 1.7.7, zbase32: 1.1.1, setuptools: 0.6c15dev,
pyasn1: 0.0.11a, pysqlite: 2.4.1

All data was collected on the client node.  In the tcpdump data:
  192.168.1.12 - the client node
  192.168.1.13 - the storage node's wired interface
  192.168.1.14 - the storage node's wireless interface

For these tests I used:
shares.needed = 1
shares.happy = 1
shares.total = 1

I measured two operations for each experiment:
  1) tahoe backup of    4 16MiB files (64MiB total)
  2) tahoe backup of 1024 64KiB files (64MiB total)

Attached is the generate_data.sh script that created the data (in /tmp).


The client node's directory was $PWD/client.
Between every experiment I did:
 [C] allmydata-tahoe-1.7.1/bin/tahoe stop $PWD/client
 [S] Stop the storage node
 [S] Remove the storage/shares/* directories
 [S] Start the storage node
 [C] If needed, edit the source code and re-run setup.py build
 [C] allmydata-tahoe-1.7.1/bin/tahoe start $PWD/client
 [C] rm $PWD/client/private/{aliases,backupdb.sqlite}
 [C] allmydata-tahoe-1.7.1/bin/tahoe create-alias -d $PWD/client bench

The experiment commands themselves looked like:
/usr/bin/time -lp allmydata-tahoe-1.7.1/bin/tahoe backup -d $PWD/client
/tmp/largefiles bench:largefiles 2> large.time
/usr/bin/time -lp allmydata-tahoe-1.7.1/bin/tahoe backup -d $PWD/client
/tmp/smallfiles bench:smallfiles 2> small.time


Results (apologies if you're not using a monospace font):

Wired LAN, large files, in seconds:
                      pipeline_size
               10KiB  50000 192KiB 256KiB 320KiB
segment_size   
        8KiB   47.65
       16KiB                 18.74
       32KiB                 17.55
       64KiB                 15.71
      128KiB          31.26  14.80  14.61  14.30
      256KiB                               14.35


Wired LAN, small files, in seconds:
                      pipeline_size
               10KiB  50000 192KiB 256KiB 320KiB
segment_size
        8KiB  265.79
       16KiB                234.98
       32KiB                144.92
       64KiB                231.61
      128KiB          68.59 228.16 225.54 223.25
      256KiB                              218.85


Wireless, large files, in seconds:
                      pipeline_size
               10KiB  50000 192KiB 256KiB 320KiB
segment_size
        8KiB   62.00
       16KiB                 35.69
       32KiB                 34.22
       64KiB                 43.34
      128KiB         140.75  34.83  35.94  33.90
      256KiB                               98.33


Wireless, small files, in seconds:
                      pipeline_size
               10KiB  50000 192KiB 256KiB 320KiB
segment_size
        8KiB  312.90
       16KiB                341.78
       32KiB                193.91
       64KiB                 96.68
      128KiB         101.76  98.40  98.27  95.98
      256KiB                              147.71


For large files, tuning the parameters seems not to make a lot of
difference.  Increasing the pipeline_size helps, but quickly levels off,
and changes in the segment_size then don't seem to matter either.

For small files, the results are surprising!  The default settings
(pipeline 50000 segsize 128KiB) give significantly better performance on
the wired LAN than anything else I tried -- this is a huge outlier when
compared to the other data points.  What explains this?  Why does the
wireless network perform so much better than the wired network for small
files (except for the single outlier)?  Why are small segments so much
worse on wireless for small files but not for large files?

Hopefully someone can get some answers out of the .tcp files in the
attached archive.

-- 
Kyle Markley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tahoe-benchmark.tgz
Type: application/octet-stream
Size: 3659470 bytes
Desc: not available
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20100725/0f0820ec/attachment-0001.obj>


More information about the tahoe-dev mailing list