#1870 new defect

leasedb: performance regression — at Version 14

Reported by: davidsarah Owned by: daira
Priority: normal Milestone: soon
Component: code-storage Version: 1.9.2
Keywords: leasedb performance regression sqlite Cc:
Launchpad Bug:

Description (last modified by daira)

The 1818-leasedb branch has a performance regression that shows up when running the test suite -- in fact, the test suite is not able to pass at the moment purely due to timeouts.

Since the regression does not show up when using make tmpfstest (which uses a memory-based tmpfs filesystem rather than disk), my tentative conclusion is that it is due to the latency of leasedb database syncs. There are currently many redundant syncs due to every SQL query/update being in a separate transaction, and due to there being more SQL queries and updates than necessary per storage API request. We could also use a more relaxed consistency mode, if that is safe.

Change History (14)

comment:1 Changed at 2012-11-20T01:38:50Z by davidsarah

  • Description modified (diff)
  • Status changed from new to assigned

comment:2 Changed at 2012-11-20T01:39:56Z by davidsarah

  • Milestone changed from undecided to 1.11.0

comment:3 Changed at 2012-11-20T01:40:11Z by davidsarah

  • Component changed from unknown to code-storage

comment:4 follow-up: Changed at 2013-02-28T00:02:30Z by zooko

Here are my notes about this:

https://tahoe-lafs.org/pipermail/tahoe-dev/2012-December/007877.html

Bottom line: I believe we should turn on sqlite's synchronous = NORMAL, journal mode = WAL.

Also, that we should fix ticket #1893, which would reduce this load.

comment:5 Changed at 2013-02-28T04:41:17Z by davidsarah

I thought that mode caused a problem with file handle leakage? Or am I misremembering?

comment:6 Changed at 2013-02-28T04:43:53Z by davidsarah

No, I remembered correctly. In current leasedb.py:

# synchronous = OFF is necessary for leasedb to pass tests for the time being,
# since using synchronous = NORMAL causes failures that are apparently due to
# a file descriptor leak, and the default synchronous = FULL causes the tests
# to time out.

comment:7 Changed at 2013-07-04T19:19:25Z by daira

  • Description modified (diff)
  • Keywords blocks-cloud-merge added
  • Owner changed from davidsarah to markberger
  • Status changed from assigned to new

comment:8 in reply to: ↑ 4 ; follow-up: Changed at 2013-07-05T11:49:38Z by daira

Replying to zooko:

Also, that we should fix ticket #1893, which would reduce this load.

I'm skeptical. We don't even know whether most of the increased latency is for mutable or immutable operations, and #1893 would only affect mutable writes. In any case, that can't be the cause of the regression, since trunk has always renewed leases on mutable writes.

comment:9 in reply to: ↑ 8 ; follow-up: Changed at 2013-07-05T13:12:28Z by zooko

Replying to daira:

Replying to zooko:

Also, that we should fix ticket #1893, which would reduce this load.

I'm skeptical. We don't even know whether most of the increased latency is for mutable or immutable operations, and #1893 would only affect mutable writes. In any case, that can't be the cause of the regression, since trunk has always renewed leases on mutable writes.

I didn't say that #1893 could be the cause of the regression!

But judging from my comments in https://tahoe-lafs.org/pipermail/tahoe-dev/2012-December/007877.html, it sounds like calls to leasedb might be happening 3X as often as they need to or even more, so that might have a significant effect on performance.

comment:10 Changed at 2013-07-05T16:05:24Z by daira

I've split the file descriptor issue out to #2015.

comment:11 in reply to: ↑ 9 Changed at 2013-07-05T16:06:22Z by daira

Replying to zooko:

But judging from my comments in https://tahoe-lafs.org/pipermail/tahoe-dev/2012-December/007877.html, it sounds like calls to leasedb might be happening 3X as often as they need to or even more, so that might have a significant effect on performance.

Yes, but most of those calls would happen regardless of #1893, I think.

Last edited at 2013-07-05T16:06:58Z by daira (previous) (diff)

comment:12 Changed at 2013-07-22T19:55:32Z by daira

  • Owner changed from markberger to daira
  • Status changed from new to assigned

comment:13 Changed at 2013-07-22T19:58:24Z by daira

  • Keywords blocks-cloud-merge removed

Now that #2015 is fixed, I think this no longer blocks merging the cloud branch, even though I would very much like to reduce or eliminate the remaining performance regression before merging.

comment:14 Changed at 2013-07-22T20:02:01Z by daira

  • Description modified (diff)
Note: See TracTickets for help on using tickets.