#962 closed defect (cannot reproduce)

Performance problems testing on a 45 disk Dual core 3.3Ghz 4G memory Box

Reported by: stott Owned by: stott
Priority: major Milestone: undecided
Component: code Version: 1.5.0
Keywords: performance memory response-needed Cc:
Launchpad Bug:

Description

Performance problems testing on a 45 disk Dual core 3.3Ghz 4G memory Box.

When uploading a 200MB file generating 10 Shares, 3 Needed to recover. The system load spiked to 100% for ~20 minutes. During this time the system was also Swapping.

Performance was extremely slow.

Attachments (4)

node1.pid.9394 (4.8 MB) - added by stott at 2010-02-24T06:27:47Z.
gateway strace of a tahoe put 106mb file.
node2.pid.3225 (169.8 KB) - added by stott at 2010-02-24T06:28:18Z.
server node strace file.
codepad-tahoeget.html (7.3 KB) - added by stott at 2010-02-26T05:01:20Z.
recent stats of the get
1.6DL.html (70.2 KB) - added by stott at 2010-03-04T06:05:26Z.

Change History (29)

comment:1 Changed at 2010-02-20T19:10:01Z by zooko

So did you have 45 different Tahoe-LAFS storage server processes? Could you check how much RAM each one is using by using top?

(Or, for bonus points, using smem instead: http://www.selenic.com/smem/ )

comment:2 Changed at 2010-02-22T16:20:21Z by imhavoc

It would helpful to know how you have this configured.

comment:3 Changed at 2010-02-23T01:08:26Z by stott

System Configuration (http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage)

1 Intel(R) Core(TM)2 Duo CPU E8600 @ 3.33GHz 1 80G boot drive mounted as / 45 Seagate 1.5 T drives mounted as /tahoe/node1-45 4G memory

eth0 is at 100MB/FULL

Default tahoe configuration.

SMEM : root@firetruck1:~/app/smem-0.9# ./smem |egrep "PID|python"

PID User Command Swap USS PSS RSS

13025 root egrep --color=auto PID|pyth 0 260 276 860 13024 root /usr/bin/python ./smem 0 4560 4609 6224 12402 www-data /usr/bin/python /usr/bin/tw 23028 33356 33391 35504 12546 www-data /usr/bin/python /usr/bin/tw 14796 47700 47754 50764 12537 www-data /usr/bin/python /usr/bin/tw 9412 53300 53351 56228 12590 www-data /usr/bin/python /usr/bin/tw 12 60732 60786 63796 12634 www-data /usr/bin/python /usr/bin/tw 0 60792 60846 63856 12645 www-data /usr/bin/python /usr/bin/tw 0 60796 60854 63868 12612 www-data /usr/bin/python /usr/bin/tw 8 60904 60958 63968 12601 www-data /usr/bin/python /usr/bin/tw 4 61192 61246 64256 12425 www-data /usr/bin/python /usr/bin/tw 668 61860 61911 64788 12447 www-data /usr/bin/python /usr/bin/tw 728 62240 62291 65168 12557 www-data /usr/bin/python /usr/bin/tw 12 62368 62422 65432 12480 www-data /usr/bin/python /usr/bin/tw 0 62384 62438 65448 12524 www-data /usr/bin/python /usr/bin/tw 156 62500 62552 65492 12623 www-data /usr/bin/python /usr/bin/tw 4 62500 62554 65564 12579 www-data /usr/bin/python /usr/bin/tw 8 62564 62616 65556 12513 www-data /usr/bin/python /usr/bin/tw 52 62632 62686 65696 12700 www-data /usr/bin/python /usr/bin/tw 0 62644 62698 65708 12491 www-data /usr/bin/python /usr/bin/tw 40 62772 62824 65764 12469 www-data /usr/bin/python /usr/bin/tw 264 62796 62850 65860 12656 www-data /usr/bin/python /usr/bin/tw 0 63004 63057 66060 12502 www-data /usr/bin/python /usr/bin/tw 0 63184 63238 66248 12667 www-data /usr/bin/python /usr/bin/tw 0 63264 63318 66328 12436 www-data /usr/bin/python /usr/bin/tw 92 63392 63446 66456 12458 www-data /usr/bin/python /usr/bin/tw 164 63780 63834 66844 12744 www-data /usr/bin/python /usr/bin/tw 0 64016 64070 67080 12788 www-data /usr/bin/python /usr/bin/tw 0 65400 65451 68328 12722 www-data /usr/bin/python /usr/bin/tw 0 66472 66526 69536 12678 www-data /usr/bin/python /usr/bin/tw 0 67036 67090 70100 12689 www-data /usr/bin/python /usr/bin/tw 0 67112 67166 70176 12568 www-data /usr/bin/python /usr/bin/tw 0 70172 70224 73164 12832 www-data /usr/bin/python /usr/bin/tw 0 71800 71854 74864 12898 www-data /usr/bin/python /usr/bin/tw 0 71760 71949 75168 12777 www-data /usr/bin/python /usr/bin/tw 0 72432 72486 75496 12854 www-data /usr/bin/python /usr/bin/tw 0 73000 73050 75920 12735 www-data /usr/bin/python /usr/bin/tw 0 73096 73150 76160 12711 www-data /usr/bin/python /usr/bin/tw 0 73308 73362 76372 12414 www-data /usr/bin/python /usr/bin/tw 908 73256 73449 76680 12799 www-data /usr/bin/python /usr/bin/tw 0 73652 73706 76716 12843 www-data /usr/bin/python /usr/bin/tw 0 74660 74714 77724 12865 www-data /usr/bin/python /usr/bin/tw 0 74888 74946 77960 12821 www-data /usr/bin/python /usr/bin/tw 0 76804 76858 79868 12887 www-data /usr/bin/python /usr/bin/tw 0 79180 79233 82236 12755 www-data /usr/bin/python /usr/bin/tw 0 79892 79946 82956 12766 www-data /usr/bin/python /usr/bin/tw 0 80548 80599 83532 12810 www-data /usr/bin/python /usr/bin/tw 0 86512 86563 89440 12876 www-data /usr/bin/python /usr/bin/tw 0 87592 87645 90648

================

Note,

After switching to

shares.needed = 32 shares.happy = 35 shares.total = 40

Performance has greatly increased. The 3.3 Blowup may have been the largest contributing factor.

I'm still concerned about memory utilization per node as the nodes increase per grid, but that may be a separate ticket.

Performace is still much less than I would like to see.

bash-3.2$ time curl -T 524stottlemyer.zip http://10.20.0.151:3470/uri/URI:DIR2:whzhc6kxh5mfem4basdoioewji:4qpwpea2s46tg3lnd5znwsyqqenwjoddjxpsvr733sk7n4on6uua/ URI:CHK:fsmf5rn6me7mrueq2ri4zvboay:sd5lnugn2gnisrvodxtzp7zvkx4f2skyk6ri5ncs65rmq6qkdoqa:32:40:106554122 real 1m17.962s

bash-3.2$ ls -larth 524stottlemyer.zip -rw-r--r-- 1 justin staff 102M Nov 20 15:44 524stottlemyer.zip

I'll have to collect more stats during an upload to present what is going on. A snapshot indicated heavy CPU, nearly no swap in use at this time, and disks were much better able to handle the load w/ great distribution.

Here is a brief network snapshot though time spent on internal operations was much greater.

eth0: 0.003 In, 0.000 Out Mbps | 4 In, 0 Out pps eth0: 40.860 In, 0.963 Out Mbps | 3709 In, 1912 Out pps eth0: 92.249 In, 2.226 Out Mbps | 8409 In, 4354 Out pps eth0: 37.314 In, 0.893 Out Mbps | 3430 In, 1761 Out pps eth0: 0.006 In, 0.015 Out Mbps | 8 In, 3 Out pps

comment:4 Changed at 2010-02-23T06:53:43Z by davidsarah

  • Keywords performance memory added; Performance removed
  • Type changed from enhancement to defect

comment:5 Changed at 2010-02-24T04:17:59Z by warner

Is there just one upload happening at a time? Or are you doing many in parallel?

One of the "recent uploads and downloads" stats that you pasted in IRC showed a mutable upload, rather than the normal immutable upload (which the curl command above would use). We've documented once or twice how memory usage for mutable files is pretty bad (N/k*filesize, as opposed to the well-limited N/k*128KiB for immutable files). source:docs/performance.txt mentions about half the story.. I guess we should probably improve that section.

If you had a lot of simultaneous large-memory uploads going on, I can easily see how the overall memory usage would spike, and you'd get a lot of swapping and stalls. It sounds like there were a couple different kinds of tests taking place, though, so I don't want to misdiagnose problem A if the symptoms actually occurred during test B.

Changed at 2010-02-24T06:27:47Z by stott

gateway strace of a tahoe put 106mb file.

Changed at 2010-02-24T06:28:18Z by stott

server node strace file.

comment:7 Changed at 2010-02-26T04:00:08Z by zooko

Are node1.pid.9394 and node2.pid.3225 from the same upload of a 106 MB file?

comment:8 follow-up: Changed at 2010-02-26T04:14:05Z by stott

I should have specified, this latest operation was a Tahoe get of that 106MB file.

Yes these two straces are from the same get, just watching it from two places. The gateway and a storage server, the same operation.

Sorry for the confusion of the bad note in the attachment. I also have the strace from the get, where you can see the 24S wait with no correlating hangs in the other processes.

The Get is in the testgrid link above, the get was too large for an attachment to TRAC.

comment:9 in reply to: ↑ 6 Changed at 2010-02-26T04:32:48Z by zooko

Replying to stott:

Updating with attached straces

http://testgrid.allmydata.org:3567/uri/URI%3ADIR2RO%3Axqm7hlme3oncrzpoqr5thvsh7a%3A64dg3rttsh6nqujoyr62nt7kssmmqaf5rv5nozyuvqocpi4mzzba/

What is this URL? It appears to be a broken link.

comment:10 in reply to: ↑ 8 Changed at 2010-02-26T04:53:16Z by zooko

Replying to stott:

I should have specified, this latest operation was a Tahoe get of that 106MB file.

Sorry, I'm still confused. So node1.pid.9394 is strace of the gateway during a 106 MB GET and node2.pid.3225 is the strace of a random server during that same GET? And the encoding parameters were: shares.needed = 32 shares.happy = 35 shares.total = 40 ?

comment:11 Changed at 2010-02-26T04:57:39Z by zooko

And, when you did that GET, how well did it perform? Do you know about the "Recent Uploads and Downloads" page? From there you can find timing information about the transfer (as observed by the Tahoe-LAFS gateway itself) that might be useful. Could you attach the timing results from that particular GET (if it is still there on the Recent Uploads and Downloads) or from a new similar GET?

comment:12 Changed at 2010-02-26T05:00:17Z by stott

http://codepad.org/cxuaFwz2 here is the relevant recent get for that download. I'll attempt to attach the html

Changed at 2010-02-26T05:01:20Z by stott

recent stats of the get

comment:13 Changed at 2010-02-26T05:34:56Z by zooko

Okay, thank you for the timings in codepad-tahoeget.html . I see that it was a mutable file of size 106 MB, with K=32, N=40 encoding. Therefore I would expect gateway to (temporarily) use up to 2 times 106 MB of RAM when downloading the file.

(Aside: hmm, docs/performance.txt says that it will take N/K*A. I don't think that's right—I think it will take just A (possibly times a factor of 2 or so of memory usage because it does a certain amount of copying the data in RAM). I'll change the doc to say just A there.)

Okay, and the timings reported by the gateway node for each server will just have a single data point because shares of a mutable file are transferred in one big block of size A/K. (As contrasted with shares of an immutable file, which are transferred in a succession of blocks each of size 128 KiB/K.)

Each of these blocks should be about 106MB/32≅3.33MB.

The timing results are that one of the servers—cvpcsjio—sent its 3.33MB block in 442ms and every other server sent its 3.33MB block in 22sec.

(Aside: I would really like to know if that was exactly 22,000ms for every one of them or what. The display of these measurements should include a fixed and appropriate amount of precision instead of rounding off to seconds.)

What is special about cvpcsjio?

Okay, so does this make sense so far? The next question in my mind is how much of that 22 seconds was spent doing what. That 22 seconds encompasses the following sequence of actions:

  1. The gateway serializes and sends a request to the server to read the block.
  2. The server looks up the requested share in its filesystem.
  3. The server reads the share data from the filesystem.
  4. The server sends the share data back over the TCP connection to the gateway.

Looking at the strace of one of the servers (which one was it? Was it cvpcsjio?) might help explain how much of the 22 seconds was spent in which of those steps. Also turning on verbose logging (per docs/logging.txt) on one of the servers would give information, but it would also make that server use a lot more CPU and possibly respond slower.

comment:14 follow-up: Changed at 2010-02-26T05:52:35Z by zooko

Hm... It might be worth adding to docs/performance.txt a note about the fact that when downloading a mutable file you first have to wait for the entire file data to transfer from the storage servers to the gateway, and then the data streams from the gateway to your client. By "streams" here, I mean that your client can receive the first part of the data before the gateway has sent the last part of the data. Observe that the data of a mutable file does not "stream" from the storage servers all the way to your client—your client won't see the first part of the mutable file until after the gateway has received the last part of the mutable file from the storage servers.

By contrast, if you download an immutable file it will "stream" all the way from the storage servers to your client—your client can see the first part of the immutable file as soon as the gateway receives the first block of each share from the storage servers, long before the gateway has seen the last part of the immutable file.

This is potentially another reason to prefer immutable files over mutable files for performance when you're using Tahoe-LAFS v1.6. (How big of a performance issue this is depends on the details of your network and servers and the size of the file.)

comment:15 in reply to: ↑ 14 Changed at 2010-02-26T05:55:33Z by zooko

Replying to zooko:

This is potentially another reason to prefer immutable files over mutable files for performance when you're using Tahoe-LAFS v1.6. (How big of a performance issue this is depends on the details of your network and servers and the size of the file.)

It would be interesting to see the timing results of downloading a 106 MB immutable file on stott's petabyte-on-a-budget box. It would also be interesting to know the total time from the perspective of the client (which is different from the perspective of the Tahoe-LAFS gateway). For example, if you are downloading the file with "tahoe get" then use the bash "time" builtin to tell you how long from when you pressed ENTER on the "tahoe get" command to when that command finished. Or if you are downloading the file with wget or curl then capture the output that those tools can emit to show elapsed time and observe throughput of their download.

comment:16 Changed at 2010-02-26T06:42:48Z by stott

I was collecting those stats as well, the curl and the tahoe get were both taking the same amount of time indicated by the gateway.

If you read the strace output, you can see that it appears that each server actually sends its data in ~450-600ms however the measurement of 22S appears to be the total time of all servers, not the actual time per server. There is nothing special about cvpcsjio except that I believe it actually gave an accurate measure.

I'll test again tomorrow with an immutable file and provide more details.

comment:17 Changed at 2010-02-26T15:50:45Z by zooko

Oh, I see! Yes, this is suggestive of either a bug in mutable file download performance or a bug in the timings report for mutable file downloads, or both.

comment:18 Changed at 2010-03-04T05:58:38Z by zooko

I'll examine the source code for the timing reports for mutable file downloads.

comment:19 Changed at 2010-03-04T06:03:36Z by stott

I have upgraded to 1.6.0. The speeds upload/download are the same.

Recent stats no longer show uploads. Recent stats now shows downloads as the newest attachment shows.

Changed at 2010-03-04T06:05:26Z by stott

comment:20 Changed at 2010-03-09T04:45:32Z by zooko

Okay what's the status of this issue? I think it is time to open new tickets for more specific issues now that we can tell them apart.

The first issue that began the ticket was that upload of a large mutable file caused high CPU load on the storage servers. I'll open a new ticket for that issue.

comment:21 Changed at 2010-03-09T06:36:08Z by zooko

Okay I've created #983 (high CPU load on storage servers when uploading large mutable file) to track that issue.

comment:22 Changed at 2010-08-06T22:19:44Z by warner

  • Component changed from website to code

comment:23 Changed at 2010-09-03T18:07:56Z by zooko

Stott: I'm not sure what the next step is to make progress on this ticket or to close this ticket. How about: you rerun your performance tests using the shiny new Tahoe-LAFS v1.8.0c3. :-)

It should be not worse than the previous version for mutable files or for uploading immutable files, and better for downloading immutable files. It might be a little bit better if your servers are all similarly performant and if your shares are well distributed, or a lot better if there is more heterogeneity in your server performance, or if some servers are overloaded compared to others.

comment:24 Changed at 2011-03-17T18:07:23Z by zooko

  • Keywords response-needed added
  • Owner changed from somebody to stott

Stott: would you please run your performance tests using Tahoe-LAFS v1.8.2? If you're not going to do that, let's just close this ticket because it is old and because I don't know what to do to make progress towards closing it. If you have some other request or can explain more specifically one particular issue that you had then we could use that as a starting point for making progress.

comment:25 Changed at 2012-05-26T05:53:39Z by zooko

  • Resolution set to cannot reproduce
  • Status changed from new to closed

Reading back over this ticket, I don't see a lot of evidence about a performance issue specifically in immutable files. Most of the problems turned out to be with mutables, including ticket #983, which has been fixed. We don't know for sure that there aren't still performance issues in the current version when used on a 45-drive box, but until we get more evidence we'll leave this issue closed.

Note: See TracTickets for help on using tickets.