| | 129 | |
| | 130 | == System Load == |
| | 131 | |
| | 132 | The source:src/allmydata/test/check_load.py tool can be used to generate |
| | 133 | random upload/download traffic, to see how much load a Tahoe grid imposes on |
| | 134 | its hosts. |
| | 135 | |
| | 136 | Preliminary results on the Allmydata test grid (14 storage servers spread |
| | 137 | across four machines (each a 3ishGHz P4), two web servers): we used three |
| | 138 | check_load.py clients running with 100ms delay between requests, an |
| | 139 | 80%-download/20%-upload traffic mix, and file sizes distributed exponentially |
| | 140 | with a mean of 10kB. These three clients get about 8-15kBps downloaded, |
| | 141 | 2.5kBps uploaded, doing about one download per second and 0.25 uploads per |
| | 142 | second. These traffic rates were higher at the beginning of the process (when |
| | 143 | the directories were smaller and thus faster to traverse). |
| | 144 | |
| | 145 | The storage servers were minimally loaded. Each storage node was consuming |
| | 146 | about 9% of its CPU at the start of the test, 5% at the end. These nodes were |
| | 147 | receiving about 50kbps throughout, and sending 50kbps initially (increasing |
| | 148 | to 150kbps as the dirnodes got larger). Memory usage was trivial, about 35MB |
| | 149 | VmSize per node, 25MB RSS. The load average on a 4-node box was about 0.3 . |
| | 150 | |
| | 151 | The two machines serving as web servers (performing all encryption, hashing, |
| | 152 | and erasure-coding) were the most heavily loaded. The clients distribute |
| | 153 | their requests randomly between the two web servers. Each server was |
| | 154 | averaging 60%-80% CPU usage. Memory consumption is minor, 37MB VmSize and |
| | 155 | 29MB RSS on one server, 45MB/33MB on the other. Load average grew from about |
| | 156 | 0.6 at the start of the test to about 0.8 at the end. Network traffic |
| | 157 | (including both client-side plaintext and server-side shares) outbound was |
| | 158 | about 600Kbps for the whole test, while the inbound traffic started at |
| | 159 | 200Kbps and rose to about 1Mbps at the end. |
| | 160 | |
| | 161 | === initial conclusions === |
| | 162 | |
| | 163 | So far, Tahoe is scaling as designed: the client nodes are the ones doing |
| | 164 | most of the work, since these are the easiest to scale. In a deployment where |
| | 165 | central machines are doing encoding work, CPU on these machines will be the |
| | 166 | first bottleneck. Profiling can be used to determine how the upload process |
| | 167 | might be optimized: we don't yet know if encryption, hashing, or encoding is |
| | 168 | a primary CPU consumer. We can change the upload/download ratio to examine |
| | 169 | upload and download separately. |
| | 170 | |
| | 171 | Deploying large networks in which clients are not doing their own encoding |
| | 172 | will require sufficient CPU resources. Storage servers use minimal CPU, so |
| | 173 | having all storage servers also be web/encoding servers is a natural |
| | 174 | approach. |