#1870 new defect

leasedb: performance regression

Reported by: davidsarah Owned by: daira
Priority: normal Milestone: 1.15.0
Component: code-storage Version: 1.9.2
Keywords: leasedb performance regression sqlite Cc:
Launchpad Bug:

Description (last modified by daira)

The 1818-leasedb branch has a performance regression that shows up when running the test suite -- in fact, the test suite is not able to pass at the moment purely due to timeouts.

Since the regression does not show up when using make tmpfstest (which uses a memory-based tmpfs filesystem rather than disk), my tentative conclusion is that it is due to the latency of leasedb database syncs. There are currently many redundant syncs due to every SQL query/update being in a separate transaction, and due to there being more SQL queries and updates than necessary per storage API request. We could also use a more relaxed consistency mode, if that is safe.

Change History (40)

comment:1 Changed at 2012-11-20T01:38:50Z by davidsarah

  • Description modified (diff)
  • Status changed from new to assigned

comment:2 Changed at 2012-11-20T01:39:56Z by davidsarah

  • Milestone changed from undecided to 1.11.0

comment:3 Changed at 2012-11-20T01:40:11Z by davidsarah

  • Component changed from unknown to code-storage

comment:4 follow-up: Changed at 2013-02-28T00:02:30Z by zooko

Here are my notes about this:

https://tahoe-lafs.org/pipermail/tahoe-dev/2012-December/007877.html

Bottom line: I believe we should turn on sqlite's synchronous = NORMAL, journal mode = WAL.

Also, that we should fix ticket #1893, which would reduce this load.

comment:5 Changed at 2013-02-28T04:41:17Z by davidsarah

I thought that mode caused a problem with file handle leakage? Or am I misremembering?

comment:6 Changed at 2013-02-28T04:43:53Z by davidsarah

No, I remembered correctly. In current leasedb.py:

# synchronous = OFF is necessary for leasedb to pass tests for the time being,
# since using synchronous = NORMAL causes failures that are apparently due to
# a file descriptor leak, and the default synchronous = FULL causes the tests
# to time out.

comment:7 Changed at 2013-07-04T19:19:25Z by daira

  • Description modified (diff)
  • Keywords blocks-cloud-merge added
  • Owner changed from davidsarah to markberger
  • Status changed from assigned to new

comment:8 in reply to: ↑ 4 ; follow-up: Changed at 2013-07-05T11:49:38Z by daira

Replying to zooko:

Also, that we should fix ticket #1893, which would reduce this load.

I'm skeptical. We don't even know whether most of the increased latency is for mutable or immutable operations, and #1893 would only affect mutable writes. In any case, that can't be the cause of the regression, since trunk has always renewed leases on mutable writes.

comment:9 in reply to: ↑ 8 ; follow-up: Changed at 2013-07-05T13:12:28Z by zooko

Replying to daira:

Replying to zooko:

Also, that we should fix ticket #1893, which would reduce this load.

I'm skeptical. We don't even know whether most of the increased latency is for mutable or immutable operations, and #1893 would only affect mutable writes. In any case, that can't be the cause of the regression, since trunk has always renewed leases on mutable writes.

I didn't say that #1893 could be the cause of the regression!

But judging from my comments in https://tahoe-lafs.org/pipermail/tahoe-dev/2012-December/007877.html, it sounds like calls to leasedb might be happening 3X as often as they need to or even more, so that might have a significant effect on performance.

comment:10 Changed at 2013-07-05T16:05:24Z by daira

I've split the file descriptor issue out to #2015.

comment:11 in reply to: ↑ 9 Changed at 2013-07-05T16:06:22Z by daira

Replying to zooko:

But judging from my comments in https://tahoe-lafs.org/pipermail/tahoe-dev/2012-December/007877.html, it sounds like calls to leasedb might be happening 3X as often as they need to or even more, so that might have a significant effect on performance.

Yes, but most of those calls would happen regardless of #1893, I think.

Last edited at 2013-07-05T16:06:58Z by daira (previous) (diff)

comment:12 Changed at 2013-07-22T19:55:32Z by daira

  • Owner changed from markberger to daira
  • Status changed from new to assigned

comment:13 Changed at 2013-07-22T19:58:24Z by daira

  • Keywords blocks-cloud-merge removed

Now that #2015 is fixed, I think this no longer blocks merging the cloud branch, even though I would very much like to reduce or eliminate the remaining performance regression before merging.

comment:14 Changed at 2013-07-22T20:02:01Z by daira

  • Description modified (diff)

comment:15 Changed at 2013-07-23T01:07:49Z by zooko

I just ran all unit tests of trunk, and of 1819-cloud-merge-opensource, and the latter took a lot longer:

zooko@spark ~/playground/tahoe-lafs $ cat time-trial-trunk.txt 
Ran 1139 tests in 361.801s

PASSED (skips=6, expectedFailures=3, successes=1130)

real    6m23.092s
user    4m13.752s
sys     0m22.308s
zooko@spark ~/playground/tahoe-lafs $ cat time-trial-1819-opensource.txt 
Ran 1185 tests in 917.104s

PASSED (skips=6, expectedFailures=4, successes=1175)

real    15m28.045s
user    6m36.288s
sys     0m59.712s
zooko@spark ~/playground/tahoe-lafs $ cat time-trial-trunk-2.txt 
Ran 1139 tests in 353.694s

PASSED (skips=6, expectedFailures=3, successes=1130)

real    6m6.877s
user    4m14.344s
sys     0m22.152s

comment:16 follow-up: Changed at 2013-07-23T05:19:22Z by zooko

My next question was: is the 1819-cloud-merge-opensource branch taking longer to do the same tests as the master branch? Or is it that the extra tests on the new branch (1185 vs. 1139 on master) are what is making the test suite take longer? The answer is that it is at least partially that existing tests are slower, for example:

zooko@spark ~/playground/tahoe-lafs $ cat log.1819-cloud-merge-opensource-timings-2.txt
allmydata.test.test_cli.Backup.test_backup ... [OK]
(17.674 secs)

-------------------------------------------------------------------------------
Ran 1 tests in 17.675s

PASSED (successes=1)
zooko@spark ~/playground/tahoe-lafs $ cat log.master-timings-2.txt
allmydata.test.test_cli.Backup.test_backup ... [OK]
(13.620 secs)

-------------------------------------------------------------------------------
Ran 1 tests in 13.622s

PASSED (successes=1)

comment:17 follow-up: Changed at 2013-07-23T05:28:29Z by zooko

Next, I extended fdleakfinder to print out a histogram of how many times each filename was opened and ran that on allmydata.test.test_cli.Backup.test_backup. This showed that 1819 branch is not opening sqlite db files very often in this test (just once each), but that it is opening the share files a lot more often:

zooko@spark ~/playground/tahoe-lafs $ tail log.master-fdleakfinderout-2.txt
  73),
 ('"cli/Backup/backup/servers/k6vb2bpd/storage/shares/eq/eqbt53qswmzlvr5r6tfupcfyam/3", O_RDONLY',
  73),
 ('"cli/Backup/backup/servers/rvsry4kn/storage/shares/eq/eqbt53qswmzlvr5r6tfupcfyam/6", O_RDONLY',
  73),
 ('"cli/Backup/backup/servers/xgru5adv/storage/shares/eq/eqbt53qswmzlvr5r6tfupcfyam/9", O_RDONLY',
  73),
 ('"cli/Backup/backup/servers/ysbz4st7/storage/shares/eq/eqbt53qswmzlvr5r6tfupcfyam/5", O_RDONLY',
  73),
 ('"/dev/urandom", O_RDONLY', 236)]
zooko@spark ~/playground/tahoe-lafs $ tail log.1819-cloud-merge-opensource-fdleakfinderout-2.txt
  108),
 ('"cli/Backup/backup/servers/k6vb2bpd/storage/shares/ef/efe5w4eitpntre3p4zmbbqy7cy/0", O_RDONLY',
  108),
 ('"cli/Backup/backup/servers/rvsry4kn/storage/shares/ef/efe5w4eitpntre3p4zmbbqy7cy/6", O_RDONLY',
  108),
 ('"cli/Backup/backup/servers/xgru5adv/storage/shares/ef/efe5w4eitpntre3p4zmbbqy7cy/2", O_RDONLY',
  108),
 ('"cli/Backup/backup/servers/ysbz4st7/storage/shares/ef/efe5w4eitpntre3p4zmbbqy7cy/4", O_RDONLY',
  108),
 ('"/dev/urandom", O_RDONLY', 136)]

comment:18 Changed at 2013-07-24T04:56:02Z by daira

Interesting. Is it possible to identify where in the code the extra file opens are?

comment:19 Changed at 2013-08-01T20:33:54Z by markberger

I'm not sure if this will help, but I ran the unit tests on master and 1819-cloud-merge-opensource and calculated the difference in time for all shared tests. Here are all the tests which are more than five seconds slower on the current cloud branch:

                              allmydata.test.test_cli.Cp.test_copy_using_filecap        26.734000s
            allmydata.test.test_mutable.Update.test_replace_locations_max_shares	16.467000s
               allmydata.test.test_upload.EncodingParameters.test_query_counting	8.411000s
                   allmydata.test.test_mutable.Problems.test_privkey_query_error	7.963000s
                     allmydata.test.test_mutable.Update.test_append_power_of_two	6.575000s
                 allmydata.test.test_mutable.Problems.test_privkey_query_missing	6.236000s
            allmydata.test.test_mutable.Problems.test_block_and_hash_query_error	6.175000s
                           allmydata.test.test_download.DownloadV2.test_download	5.764000s
   allmydata.test.test_upload.EncodingParameters.test_dropped_servers_in_encoder	5.594000s
allmydata.test.test_upload.EncodingParameters.test_exception_messages_during_...	5.115000s
                         allmydata.test.test_cli.CreateAlias.test_create_unicode	5.045000s
                             allmydata.test.test_cli.Rm.test_unlink_without_path	5.028000s

You can view the rest of the data I collected here: https://gist.github.com/markberger/136765c5e87480b8017e

comment:20 in reply to: ↑ 17 Changed at 2013-08-02T14:31:37Z by markberger

Replying to zooko:

Next, I extended fdleakfinder to print out a histogram of how many times each filename was opened and ran that on allmydata.test.test_cli.Backup.test_backup. This showed that 1819 branch is not opening sqlite db files very often in this test (just once each), but that it is opening the share files a lot more often:

zooko@spark ~/playground/tahoe-lafs $ tail log.master-fdleakfinderout-2.txt
  73),
 ('"cli/Backup/backup/servers/k6vb2bpd/storage/shares/eq/eqbt53qswmzlvr5r6tfupcfyam/3", O_RDONLY',
  73),
 ('"cli/Backup/backup/servers/rvsry4kn/storage/shares/eq/eqbt53qswmzlvr5r6tfupcfyam/6", O_RDONLY',
  73),
 ('"cli/Backup/backup/servers/xgru5adv/storage/shares/eq/eqbt53qswmzlvr5r6tfupcfyam/9", O_RDONLY',
  73),
 ('"cli/Backup/backup/servers/ysbz4st7/storage/shares/eq/eqbt53qswmzlvr5r6tfupcfyam/5", O_RDONLY',
  73),
 ('"/dev/urandom", O_RDONLY', 236)]
zooko@spark ~/playground/tahoe-lafs $ tail log.1819-cloud-merge-opensource-fdleakfinderout-2.txt
  108),
 ('"cli/Backup/backup/servers/k6vb2bpd/storage/shares/ef/efe5w4eitpntre3p4zmbbqy7cy/0", O_RDONLY',
  108),
 ('"cli/Backup/backup/servers/rvsry4kn/storage/shares/ef/efe5w4eitpntre3p4zmbbqy7cy/6", O_RDONLY',
  108),
 ('"cli/Backup/backup/servers/xgru5adv/storage/shares/ef/efe5w4eitpntre3p4zmbbqy7cy/2", O_RDONLY',
  108),
 ('"cli/Backup/backup/servers/ysbz4st7/storage/shares/ef/efe5w4eitpntre3p4zmbbqy7cy/4", O_RDONLY',
  108),
 ('"/dev/urandom", O_RDONLY', 136)]

I'm not sure if this is relevant, but that test runs significantly faster on the cloud branch:

allmydata.test.test_cli.Backup.test_backup	-5.986000s

Edit: I posted the wrong test data.

Last edited at 2013-08-02T14:32:23Z by markberger (previous) (diff)

comment:21 in reply to: ↑ 16 Changed at 2013-08-02T14:41:54Z by markberger

Replying to zooko:

My next question was: is the 1819-cloud-merge-opensource branch taking longer to do the same tests as the master branch? Or is it that the extra tests on the new branch (1185 vs. 1139 on master) are what is making the test suite take longer? The answer is that it is at least partially that existing tests are slower, for example:

zooko@spark ~/playground/tahoe-lafs $ cat log.1819-cloud-merge-opensource-timings-2.txt
allmydata.test.test_cli.Backup.test_backup ... [OK]
(17.674 secs)

-------------------------------------------------------------------------------
Ran 1 tests in 17.675s

PASSED (successes=1)
zooko@spark ~/playground/tahoe-lafs $ cat log.master-timings-2.txt
allmydata.test.test_cli.Backup.test_backup ... [OK]
(13.620 secs)

-------------------------------------------------------------------------------
Ran 1 tests in 13.622s

PASSED (successes=1)

According to data I collected, the total time to execute the extra tests on 1819-cloud-merge-opensource is 232.95s. There is a difference of about 628s with the tests that both branches have in common.

comment:22 follow-up: Changed at 2013-08-03T00:23:18Z by daira

markberger: can you test the hypothesis that the main problem is with an increase in latency of filesystem operations? To do that, repeat the comparison of trunk with 1819-cloud-merge-opensource, but using make tmpfstest for both.

comment:23 in reply to: ↑ 22 Changed at 2013-08-05T23:31:32Z by markberger

After running the tests again using make tmpfstest, the increased latency seems to be the largest factor of the regression.

Total time to run tests on master: 320s
Total time to run tests on cloud branch: 452s
Total time to run tests only on cloud branch: 110s

Total difference between shared tests: 22s slower on cloud branch

The 22s regression appears to occur in mutable update:

          allmydata.test.test_mutable.Update.test_replace_locations_max_shares	15.051000s
                              allmydata.test.test_cli.Cp.test_copy_using_filecap	5.158000s
                     allmydata.test.test_mutable.Update.test_append_power_of_two	2.624000s
                allmydata.test.test_mutable.Update.test_multiple_segment_replace	2.449000s
                 allmydata.test.test_mutable.Update.test_replace_in_last_segment	2.378000s
                                  allmydata.test.test_mutable.Update.test_append	2.349000s
              allmydata.test.test_mutable.Update.test_replace_zero_length_middle	2.036000s
           allmydata.test.test_mutable.Update.test_replace_zero_length_beginning	1.937000s
                       allmydata.test.test_mutable.Update.test_replace_segstart1	1.931000s
                       allmydata.test.test_mutable.Update.test_replace_beginning	1.915000s
                          allmydata.test.test_mutable.Update.test_replace_middle	1.900000s
                      allmydata.test.test_mutable.Update.test_replace_and_extend	1.885000s
           allmydata.test.test_mutable.Update.test_replace_zero_length_segstart1	1.864000s

The rest of the tests have insignificant differences.

comment:24 Changed at 2013-08-06T01:38:49Z by daira

Thanks, that's very helpful.

I'm not too worried about the 22s regression. It's the rest of the difference shown in comment:15 that we need to focus on, by reducing the number of syncs.

Last edited at 2013-08-06T01:50:54Z by daira (previous) (diff)

comment:25 Changed at 2013-08-06T07:41:54Z by zooko

I was getting confused by the different measurements (mine and markberger's) of the timings of different test runs (comment:15, comment:16, comment:19, comment:20, comment:21, comment:23).

So, I tried to reproduce markberger's observation from comment:19 that the test case which takes the most additional time on the cloud branch is allmydata.test.test_cli.Cp.test_copy_using_filecap.

I also observe that this test takes about twice as long on cloud branch as on trunk:

zooko@spark ~/playground/LAFS/tahoe-lafs-trunk $ ./bin/tahoe debug trial allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
  Cp
    test_copy_using_filecap ...                                            [OK]

-------------------------------------------------------------------------------
Ran 1 tests in 12.245s

PASSED (successes=1)
zooko@spark ~/playground/LAFS/tahoe-lafs-trunk $ ./bin/tahoe debug trial allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
  Cp
    test_copy_using_filecap ...                                            [OK]

-------------------------------------------------------------------------------
Ran 1 tests in 12.560s

PASSED (successes=1)
zooko@spark ~/playground/LAFS/tahoe-lafs-trunk $ ./bin/tahoe debug trial allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
  Cp
    test_copy_using_filecap ...                                            [OK]

-------------------------------------------------------------------------------
Ran 1 tests in 12.610s

PASSED (successes=1)
zooko@spark ~/playground/LAFS/tahoe-lafs $ ./bin/tahoe debug trial allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
  Cp
    test_copy_using_filecap ...                                            [OK]

-------------------------------------------------------------------------------
Ran 1 tests in 23.842s

PASSED (successes=1)
zooko@spark ~/playground/LAFS/tahoe-lafs $ ./bin/tahoe debug trial allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
  Cp
    test_copy_using_filecap ...                                            [OK]

-------------------------------------------------------------------------------
Ran 1 tests in 23.473s

PASSED (successes=1)
zooko@spark ~/playground/LAFS/tahoe-lafs $ ./bin/tahoe debug trial allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
  Cp
    test_copy_using_filecap ...                                            [OK]

-------------------------------------------------------------------------------
Ran 1 tests in 24.055s

PASSED (successes=1)

So then I ran it under strace, as instructed by the fdleakfinder docs. It took about 64 seconds to run this test under strace on the branch:

zooko@spark ~/playground/LAFS/tahoe-lafs $ strace -q -a1 -s0 -ff -e trace=desc -tttT -ostrace.file ./bin/tahoe debug trial allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
  Cp
    test_copy_using_filecap ...                                            [OK]

-------------------------------------------------------------------------------
Ran 1 tests in 63.853s

PASSED (successes=1)

… and about 41 seconds to do it on trunk:

zooko@spark ~/playground/LAFS/tahoe-lafs-trunk $ strace -q -a1 -s0 -ff -e trace=desc -tttT -ostrace.file ./bin/tahoe debug trial allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
  Cp
    test_copy_using_filecap ...                                            [OK]

-------------------------------------------------------------------------------
Ran 1 tests in 41.643s

PASSED (successes=1)

I ran fdleakfinder on the strace files. It says that on trunk 6499 files were opened during the course of the test, and on branch 13,999 files were opened.

From eyeballing the output, I see one kind of file that is used a lot on branch and not at all on trunk. There are 6520 uniquely-named tempfiles, located in /var/tmp used on branch and 0 on trunk. Each one is opened only once on branch. Here's an excerpt from fdleakfinder's report:

 ('/var/tmp/etilqs_zrkh9HqWCQ5aDGN', 1),
 ('/var/tmp/etilqs_zsOeiwtD2Vt0LJ7', 1),
 ('/var/tmp/etilqs_zsr0p0ohZWFxMW3', 1),
 ('/var/tmp/etilqs_zsv4OH643Kak6XZ', 1),
 ('/var/tmp/etilqs_zu7fBV4rMZW8BdB', 1),
 ('/var/tmp/etilqs_zvZD3WtYAcZ75kD', 1),
 ('/var/tmp/etilqs_zwFqpNGPbuP1wCt', 1),
 ('/var/tmp/etilqs_zwKxVFd1uODwkfh', 1),

Here's the count:

zooko@spark ~/playground/LAFS $ grep var/tmp/ tahoe-lafs/fdl-openedfiles.txt  | wc -l
6520
zooko@spark ~/playground/LAFS $ grep var/tmp/ tahoe-lafs-trunk/fdl-openedfiles.txt  | wc -l
0

A searching engine query for "sqlite /var/tmp/" took me to this thread, in which sqlite inventor Richard Hipp seems to be saying that sqlite uses these temp files for computing a sql query which could require too much RAM, and that the files in /var/tmp are for holding a temporary table that sqlite generates to compute this query:

http://www.mail-archive.com/sqlite-users@sqlite.org/msg73481.html

The example in the sqlite mailing list discussion is the "UNION ALL" query, which we don't use. But apparently some sql query that we are using is causing sqlite to generate a temporary table.

comment:26 Changed at 2013-08-06T20:27:37Z by daira

  • Status changed from assigned to new

Eek. If there are queries using space linear in the number of shares stored, that won't scale at all.

I wonder which queries are creating the temporary tables? I'll see if I can find out.

comment:27 Changed at 2013-08-06T20:46:59Z by daira

Only the following queries are used in that test:

INSERT OR REPLACE INTO `leases` VALUES (?,?,?,?,?)
INSERT OR REPLACE INTO `shares` VALUES (?,?,?,?,?,?,?)
SELECT `storage_index`, `shnum` FROM `shares`
       WHERE `storage_index`=? AND `shnum`=?
UPDATE `shares` SET `state`=?, `used_space`=?, `backend_key`=?
       WHERE `storage_index`=? AND `shnum`=? AND `state`!=?'

(found by instrumenting leasedb.py and piping the output through |sort |uniq).

These are not queries that I would expect to use temporary tables, given that (storage_index, shnum) is the primary key on shares.

comment:28 Changed at 2013-08-06T21:06:42Z by daira

The temporary files are opened during INSERT OR REPLACE INTO and UPDATE queries. They are not opened during SELECT queries.

comment:30 Changed at 2013-08-06T21:21:14Z by daira

Aha, I believe these are Statement Journal Files (section 2.5 of http://www.sqlite.org/tempfiles.html).

What's confusing is that "A statement journal is only created for an UPDATE or INSERT statement that might change multiple rows of a database [...]". INSERT OR REPLACE INTO queries by definition never affect multiple rows. The UPDATE statement in comment:27 cannot update multiple rows because the WHERE clause specifies exact values for storage_index and shnum, which together form the primary key, although I can see why sqlite might not "know" that in advance.

comment:31 Changed at 2013-08-06T22:54:54Z by daira

I added this to dbutil.py, and the strace output no longer shows any files in /var/tmp being opened:

    # Store temporary files not associated with transaction control in memory
    # where possible, provided that sqlite has been compiled to allow that. See
    # <http://www.sqlite.org/tempfiles.html> and
    # <http://www.sqlite.org/pragma.html#pragma_temp_store> for details.
    c.execute("PRAGMA temp_store = MEMORY;")

comment:32 Changed at 2013-08-06T23:16:06Z by daira

1819-cloud-merge branch with PRAGMA temp_store = MEMORY;:

PASSED (skips=13, expectedFailures=4, successes=1216)

real	19m37.547s
user	5m1.051s
sys	0m34.726s

and without:

PASSED (skips=13, expectedFailures=4, successes=1216)

real	20m34.003s
user	5m0.095s
sys	0m37.058s

So a small improvement (1 minute on the full test suite) but nothing spectacular :-(

comment:33 follow-up: Changed at 2013-08-07T16:29:49Z by warner

As we noticed in the dev-chat today, Cp.test_copy_using_filecap is mistakenly copying its entire basedir into the tahoe filesystem (which includes all 10 server basedirs, and their shares, which are being created during the cp by the files being copied in.. it's remarkable that the process terminates at all). This resulted in something like 174 shares being created (per server), instead of 4. Fixing this doesn't address the overall regression, but does explain why this one test was taking 17 seconds even on trunk.

OTOH, leaving that test broken while we figure out the performance regression provides us with a convenient lots-of-shares stress test, that's highly responsive to changes in the leasedb times. I'd much rather iterate on a 30-ish second test than on the full test suite.

comment:34 in reply to: ↑ 33 Changed at 2013-08-11T13:16:43Z by daira

Replying to warner:

As we noticed in the dev-chat today, Cp.test_copy_using_filecap is mistakenly copying its entire basedir into the tahoe filesystem (which includes all 10 server basedirs, and their shares, which are being created during the cp by the files being copied in.. it's remarkable that the process terminates at all). This resulted in something like 174 shares being created (per server), instead of 4. Fixing this doesn't address the overall regression, but does explain why this one test was taking 17 seconds even on trunk.

Fixed in #2048.

OTOH, leaving that test broken while we figure out the performance regression provides us with a convenient lots-of-shares stress test, that's highly responsive to changes in the leasedb times. I'd much rather iterate on a 30-ish second test than on the full test suite.

That's fine, we'll just not pull that fix onto 1819-cloud-merge[-opensource] yet.

Last edited at 2013-08-11T13:17:04Z by daira (previous) (diff)

comment:35 Changed at 2013-08-28T15:37:05Z by daira

  • Milestone changed from soon to 1.12.0

comment:36 Changed at 2014-12-06T15:12:55Z by daira

Actually we shouldn't pay any attention to the unfixed version of Cp.test_copy_using_filecap, because it would be expected to have different behaviour on trunk vs the cloud branch anyway due to the different number of files in the base directory. So that was a red herring, although I'm glad we fixed it.

comment:37 Changed at 2014-12-06T15:15:07Z by daira

If we want a stress test, allmydata.test.test_mutable.Update.test_replace_locations_max_shares is pretty stressful :-p

comment:38 Changed at 2016-03-22T05:02:25Z by warner

  • Milestone changed from 1.12.0 to 1.13.0

Milestone renamed

comment:39 Changed at 2016-06-28T18:17:14Z by warner

  • Milestone changed from 1.13.0 to 1.14.0

renaming milestone

comment:40 Changed at 2020-06-30T14:45:13Z by exarkun

  • Milestone changed from 1.14.0 to 1.15.0

Moving open issues out of closed milestones.

Note: See TracTickets for help on using tickets.