[tahoe-lafs-trac-stream] [tahoe-lafs] #1870: leasedb: performance regression
tahoe-lafs
trac at tahoe-lafs.org
Tue Aug 6 07:41:54 UTC 2013
#1870: leasedb: performance regression
-------------------------+-------------------------------------------------
Reporter: | Owner: daira
davidsarah | Status: assigned
Type: defect | Milestone: 1.11.0
Priority: normal | Version: 1.9.2
Component: code- | Keywords: leasedb performance regression
storage | sqlite
Resolution: |
Launchpad Bug: |
-------------------------+-------------------------------------------------
Comment (by zooko):
I was getting confused by the different measurements (mine and
markberger's) of the timings of different test runs (comment:15,
comment:16, comment:19, comment:20, comment:21, comment:23).
So, I tried to reproduce markberger's observation from comment:19 that the
test case which takes the most additional time on the cloud branch is
{{{allmydata.test.test_cli.Cp.test_copy_using_filecap}}}.
I also observe that this test takes about twice as long on cloud branch as
on trunk:
{{{
zooko at spark ~/playground/LAFS/tahoe-lafs-trunk $ ./bin/tahoe debug trial
allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
Cp
test_copy_using_filecap ...
[OK]
-------------------------------------------------------------------------------
Ran 1 tests in 12.245s
PASSED (successes=1)
zooko at spark ~/playground/LAFS/tahoe-lafs-trunk $ ./bin/tahoe debug trial
allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
Cp
test_copy_using_filecap ...
[OK]
-------------------------------------------------------------------------------
Ran 1 tests in 12.560s
PASSED (successes=1)
zooko at spark ~/playground/LAFS/tahoe-lafs-trunk $ ./bin/tahoe debug trial
allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
Cp
test_copy_using_filecap ...
[OK]
-------------------------------------------------------------------------------
Ran 1 tests in 12.610s
PASSED (successes=1)
}}}
{{{
zooko at spark ~/playground/LAFS/tahoe-lafs $ ./bin/tahoe debug trial
allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
Cp
test_copy_using_filecap ...
[OK]
-------------------------------------------------------------------------------
Ran 1 tests in 23.842s
PASSED (successes=1)
zooko at spark ~/playground/LAFS/tahoe-lafs $ ./bin/tahoe debug trial
allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
Cp
test_copy_using_filecap ...
[OK]
-------------------------------------------------------------------------------
Ran 1 tests in 23.473s
PASSED (successes=1)
zooko at spark ~/playground/LAFS/tahoe-lafs $ ./bin/tahoe debug trial
allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
Cp
test_copy_using_filecap ...
[OK]
-------------------------------------------------------------------------------
Ran 1 tests in 24.055s
PASSED (successes=1)
}}}
So then I ran it under strace, as instructed by
[https://pypi.python.org/pypi/fdleakfinder the fdleakfinder docs]. It took
about 64 seconds to run this test under strace on the branch:
{{{
zooko at spark ~/playground/LAFS/tahoe-lafs $ strace -q -a1 -s0 -ff -e
trace=desc -tttT -ostrace.file ./bin/tahoe debug trial
allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
Cp
test_copy_using_filecap ...
[OK]
-------------------------------------------------------------------------------
Ran 1 tests in 63.853s
PASSED (successes=1)
}}}
… and about 41 seconds to do it on trunk:
{{{
zooko at spark ~/playground/LAFS/tahoe-lafs-trunk $ strace -q -a1 -s0 -ff -e
trace=desc -tttT -ostrace.file ./bin/tahoe debug trial
allmydata.test.test_cli.Cp.test_copy_using_filecap
allmydata.test.test_cli
Cp
test_copy_using_filecap ...
[OK]
-------------------------------------------------------------------------------
Ran 1 tests in 41.643s
PASSED (successes=1)
}}}
I ran fdleakfinder on the strace files. It says that on trunk 6499 files
were opened during the course of the test, and on branch 13,999 files were
opened.
From eyeballing the output, I see one kind of file that is used a lot on
branch and not at all on trunk. There are 6520 uniquely-named tempfiles,
located in {{{/var/tmp}}} used on branch and 0 on trunk. Each one is
opened only once on branch. Here's an excerpt from fdleakfinder's report:
{{{
('/var/tmp/etilqs_zrkh9HqWCQ5aDGN', 1),
('/var/tmp/etilqs_zsOeiwtD2Vt0LJ7', 1),
('/var/tmp/etilqs_zsr0p0ohZWFxMW3', 1),
('/var/tmp/etilqs_zsv4OH643Kak6XZ', 1),
('/var/tmp/etilqs_zu7fBV4rMZW8BdB', 1),
('/var/tmp/etilqs_zvZD3WtYAcZ75kD', 1),
('/var/tmp/etilqs_zwFqpNGPbuP1wCt', 1),
('/var/tmp/etilqs_zwKxVFd1uODwkfh', 1),
}}}
Here's the count:
{{{
zooko at spark ~/playground/LAFS $ grep var/tmp/ tahoe-lafs/fdl-
openedfiles.txt | wc -l
6520
zooko at spark ~/playground/LAFS $ grep var/tmp/ tahoe-lafs-trunk/fdl-
openedfiles.txt | wc -l
0
}}}
A searching engine query for "sqlite /var/tmp/" took me to this thread, in
which sqlite inventor Richard Hipp seems to be saying that sqlite uses
these temp files for computing a sql query which could require too much
RAM, and that the files in {{{/var/tmp}}} are for holding a temporary
table that sqlite generates to compute this query:
http://www.mail-archive.com/sqlite-users@sqlite.org/msg73481.html
The example in the sqlite mailing list discussion is the "UNION ALL"
query, which we don't use. But apparently some sql query that we are using
is causing sqlite to generate a temporary table.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1870#comment:25>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list