Opened at 2012-11-20T01:37:31Z
Last modified at 2021-03-30T18:40:19Z
#1870 new defect
leasedb: performance regression — at Version 14
Reported by: | davidsarah | Owned by: | daira |
---|---|---|---|
Priority: | normal | Milestone: | soon |
Component: | code-storage | Version: | 1.9.2 |
Keywords: | leasedb performance regression sqlite | Cc: | |
Launchpad Bug: |
Description (last modified by daira)
The 1818-leasedb branch has a performance regression that shows up when running the test suite -- in fact, the test suite is not able to pass at the moment purely due to timeouts.
Since the regression does not show up when using make tmpfstest (which uses a memory-based tmpfs filesystem rather than disk), my tentative conclusion is that it is due to the latency of leasedb database syncs. There are currently many redundant syncs due to every SQL query/update being in a separate transaction, and due to there being more SQL queries and updates than necessary per storage API request. We could also use a more relaxed consistency mode, if that is safe.
Change History (14)
comment:1 Changed at 2012-11-20T01:38:50Z by davidsarah
- Description modified (diff)
- Status changed from new to assigned
comment:2 Changed at 2012-11-20T01:39:56Z by davidsarah
- Milestone changed from undecided to 1.11.0
comment:3 Changed at 2012-11-20T01:40:11Z by davidsarah
- Component changed from unknown to code-storage
comment:4 follow-up: ↓ 8 Changed at 2013-02-28T00:02:30Z by zooko
comment:5 Changed at 2013-02-28T04:41:17Z by davidsarah
I thought that mode caused a problem with file handle leakage? Or am I misremembering?
comment:6 Changed at 2013-02-28T04:43:53Z by davidsarah
No, I remembered correctly. In current leasedb.py:
# synchronous = OFF is necessary for leasedb to pass tests for the time being, # since using synchronous = NORMAL causes failures that are apparently due to # a file descriptor leak, and the default synchronous = FULL causes the tests # to time out.
comment:7 Changed at 2013-07-04T19:19:25Z by daira
- Description modified (diff)
- Keywords blocks-cloud-merge added
- Owner changed from davidsarah to markberger
- Status changed from assigned to new
comment:8 in reply to: ↑ 4 ; follow-up: ↓ 9 Changed at 2013-07-05T11:49:38Z by daira
Replying to zooko:
Also, that we should fix ticket #1893, which would reduce this load.
I'm skeptical. We don't even know whether most of the increased latency is for mutable or immutable operations, and #1893 would only affect mutable writes. In any case, that can't be the cause of the regression, since trunk has always renewed leases on mutable writes.
comment:9 in reply to: ↑ 8 ; follow-up: ↓ 11 Changed at 2013-07-05T13:12:28Z by zooko
Replying to daira:
Replying to zooko:
Also, that we should fix ticket #1893, which would reduce this load.
I'm skeptical. We don't even know whether most of the increased latency is for mutable or immutable operations, and #1893 would only affect mutable writes. In any case, that can't be the cause of the regression, since trunk has always renewed leases on mutable writes.
I didn't say that #1893 could be the cause of the regression!
But judging from my comments in https://tahoe-lafs.org/pipermail/tahoe-dev/2012-December/007877.html, it sounds like calls to leasedb might be happening 3X as often as they need to or even more, so that might have a significant effect on performance.
comment:10 Changed at 2013-07-05T16:05:24Z by daira
I've split the file descriptor issue out to #2015.
comment:11 in reply to: ↑ 9 Changed at 2013-07-05T16:06:22Z by daira
Replying to zooko:
But judging from my comments in https://tahoe-lafs.org/pipermail/tahoe-dev/2012-December/007877.html, it sounds like calls to leasedb might be happening 3X as often as they need to or even more, so that might have a significant effect on performance.
Yes, but most of those calls would happen regardless of #1893, I think.
comment:12 Changed at 2013-07-22T19:55:32Z by daira
- Owner changed from markberger to daira
- Status changed from new to assigned
comment:13 Changed at 2013-07-22T19:58:24Z by daira
- Keywords blocks-cloud-merge removed
Now that #2015 is fixed, I think this no longer blocks merging the cloud branch, even though I would very much like to reduce or eliminate the remaining performance regression before merging.
comment:14 Changed at 2013-07-22T20:02:01Z by daira
- Description modified (diff)
Here are my notes about this:
https://tahoe-lafs.org/pipermail/tahoe-dev/2012-December/007877.html
Bottom line: I believe we should turn on sqlite's synchronous = NORMAL, journal mode = WAL.
Also, that we should fix ticket #1893, which would reduce this load.