Context Navigation

dirnodes could cache encrypted/serialized entries for speed

Reported by:	warner	Owned by:	kevan
Priority:	minor	Milestone:	1.5.0
Component:	code-dirnodes	Version:	0.8.0
Keywords:	dirnode	Cc:
Launchpad Bug:

Description

We found that for large directories (353 entries), it takes a non-trivial amount of time to serialize the child entries into a single string: 500ms. This currently represents about 5% of the time necessary to update the directory, and might grow to 10% once we make some other performance improvements.

I suspect (but do not have confirmation) that most of this time is spent encrypting the child write-caps. We do a couple of hashes, an HMAC, and a short AES encryption for each one.

Since most updates only modify one row, we could consider caching the serialized forms and re-using them during update. Basically _unpack_contents would return a dict mapping childname to (name, rocap, rwcap, metadata, serialized), and all update functions would put a None in the 'serialized' slot. _pack_contents would use the 'serialized' slot if present, otherwise it would re-serialize the other parameters.

This is only a minor improvement, so I expect it will be a long time before we decide to implement it, but I wanted to get the ideas down before we forgot.

Attachments (8)

get_dirnode_strings.py (1.3 KB) - added by kevan at 2009-07-02T02:51:56Z.: a first shot at making a test program
tests.txt (26.1 KB) - added by kevan at 2009-07-04T03:59:26Z.: tests for _unpack_contents, _pack_contents, and CachingDict?
optimizations.txt (22.3 KB) - added by kevan at 2009-07-04T04:00:08Z.: changing dirnode.py to use CachingDict?
dict.2.txt (19.5 KB) - added by kevan at 2009-07-05T01:14:09Z.
dict.txt (19.5 KB) - added by kevan at 2009-07-05T22:34:09Z.: CachingDict? implementation
bench.txt (19.9 KB) - added by kevan at 2009-07-05T22:34:41Z.: benchmark modifications for testing optimized code
modified_benchmark (1.9 KB) - added by kevan at 2009-07-05T22:34:58Z.: Benchmarks for optimized code
unmodified_benchmark (1.9 KB) - added by kevan at 2009-07-05T22:35:14Z.: benchmarks for unoptimized code

Download all attachments as: .zip

Change History (35)

comment:1 Changed at 2008-04-24T23:50:35Z by warner

Component changed from code-encoding to code-dirnodes

comment:2 Changed at 2008-06-01T20:57:56Z by warner

Milestone changed from eventually to undecided

comment:3 Changed at 2009-05-04T16:43:45Z by zooko

See also #414 (profiling on directory unpacking) and #383 (large directories take a long time to modify).

comment:4 Changed at 2009-05-04T16:44:44Z by zooko

See also #327 (performance measurement of directories).

comment:5 Changed at 2009-06-25T15:53:12Z by zooko

Owner set to kevan

Kevan Carstensen expressed interest in working on this.

comment:6 Changed at 2009-07-02T02:39:28Z by kevan

So, summarizing discussions with zooko and warner...

We first need to handle testing for backwards compatibility. This means that we'll want to capture a base32-encoded representation of the output of _pack_contents as it sits now when run against some small, known directory structure. We'll then use that to test _unpack_contents (to make sure that the unpacked directory structure is what we're expecting), and _pack contents (to make sure that packing the same known directory structure results in the same packed representation).

Then we can do the optimizations that this ticket suggests.

(that didn't really need to be a list, huh.)

comment:7 Changed at 2009-07-02T02:51:12Z by kevan

I've started work on the first point by writing a little program to try to get the base32 representation of the output of _pack_contents. Unfortunately, I run into path issues when I try to run it.

At the top of my script, I have

import sys

base = "/path-to-tahoe/tahoe/support/lib/python2.6/site-packages"
sys.path.append(base)

to tell the Python interpreter about tahoe's support directory. The goal was to have a somewhat less robust version of the detection that the tahoe executable does when it first starts (since I know where the directory is on my system, and don't imagine that the script is going to be run elsewhere, since it's kind of a one-time deal to make tests work).

When I run the script, and try to import something from the tests directory, I get

Traceback (most recent call last):
  File "get_dirnode_strings.py", line 7, in <module>
    from test.test_dirnode import FakeClient
  File "/path-to-tahoe/tahoe/src/allmydata/test/__init__.py", line 2, in <module>
    from foolscap.logging.incident import IncidentQualifier
ImportError: No module named foolscap.logging.incident

which would seem to suggest that my path hacking doesn't do what I want it to.

Is there something easy that I'm missing that would make this work?

(also, my approach for getting this information is based on what of tahoe's code I've read, which isn't much. if there's a more obvious way to get the strings that we need, or if my program is flawed in some other way, please let me know)

Changed at 2009-07-02T02:51:56Z by kevan

Attachment get_dirnode_strings.py added

a first shot at making a test program

comment:8 Changed at 2009-07-02T03:15:08Z by zooko

The way to do that is to set the PYTHONPATH environment variable before invoking the python interpreter. There are some hacks (mostly due to setuptools, I think) that need to run at interpreter startup time and learn about the available Python packages, so adding to your sys.path after interpreter startup won't work.

It was a good idea though.

If you can't or don't want to set your PYTHONPATH, then you'll need to find the foolscap package, probably in /path-to-tahoe/tahoe/support/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg, and add that whole path (including the thing.egg part) to your sys.path, and then do likewise with the other packages.

comment:9 Changed at 2009-07-02T04:28:27Z by kevan

That fixed the issue -- thanks.

Just so I'm clear on the next part:

Currently, _unpack_contents returns a dictionary mapping the name of each child node to a pair (child node instance, metadata), where metadata is a dictionary of metadata. _pack_contents gets the rocap and rwcap (if present) from the child node instance. What we want to do is

Modify _unpack_contents to return a dict mapping the name of each child node to a triple (child node instance, metadata, serialized), where child node instance and metadata are as they are now, and serialized is the serialized from of the other data.
Modify _pack_contents to deal with this new representation; specifically, to use serialized if present instead of serializing the other data, which we think is expensive.
Modify Adder, MetadataSetter and Deleter to work with these changes; specifically, they should, when modifying a node, set the corresponding serialized value to None, which will tell _pack_contents that it needs to re-serialize that node.
Write tests to make sure that all of this works (probably before doing the first 3 items). In particular,
- If we pass contents packed with an unmodified _pack_contents to the new _unpack_contents, we should get the same underlying directory structure back (i.e.: all expected child node names are present, and metadata values are as they should be. Is there anything else we'd want to check?)
- The output of the new _pack_contents should be the same as the output from the old one, given the same directory structure.
- We should be able to interact with Adder, Deleter, and MetadataSetter in the same way that we do now. Are there already tests to adequately verify this, or should I plan on writing those, too?

(apologies if this is pedantic, but I'm still getting the hang of tahoe, and don't want to go off on an irrelevant tangent because I've made a bad assumption somewhere)

comment:10 Changed at 2009-07-02T04:39:17Z by zooko

That sounds good to me!

comment:11 Changed at 2009-07-02T18:27:29Z by warner

I usually write code like this by adding a new unit test. You can run a specific unit test with a minimum of fuss (and startup time and extraneous noise) by doing e.g. make quicktest TEST=allmydata.test.test_dirnode.DeepStats.test_stats . That will get all the PYTHONPATH stuff set up for you. Anything you print from the unit test will get displayed, so I use this for one-off tools all the time.

Also, you can use misc/run-with-pythonpath.py, which will set up the right environment and then run the command of your choice. For example, if you wrote a foo.py to do the stuff you just described, then you could invoke python misc/run-with-pythonpath.py python foo.py.

I'm disappointed that the sys.path technique you tried didn't work.. it used to.

Your approach to _pack_contents sounds great! Note that the serialized form should contain the whole name+rocap+encrwcap+metadata string (i.e. be sure to include the name). That way, if _pack_contents sees that it has a pre-serialized string available, it can just append that to the list of entries that it's building, and doesn't need to re-serialize the childname either. At the end of that loop, it should do a single "".join(entries) to perform the final assembly (rather than doing incremental += operations, which would be a lot slower).

If we pass contents packed with an unmodified _pack_contents to the new _unpack_contents, we should get the same underlying directory structure back (i.e.: all expected child node names are present, and metadata values are as they should be. Is there anything else we'd want to check?)

We should make sure that the rocap/rwcap is the same too. Doing child.get_uri() and comparing it against a constant is plenty.

We should be able to interact with Adder, Deleter, and MetadataSetter? in the same way that we do now. Are there already tests to adequately verify this, or should I plan on writing those, too?

test_dirnode.py should already have test coverage for the modifiers. It exercises all the NewDirectoryNode methods, like set_uri and list. If they pass, then you've updated the modifier classes successfully.

Also, if it works on your platform (it's somewhat touchy right now), use our test-coverage tools: make quicktest-figleaf figleaf-output, and check to see that all the code you've touched is actually being run. The buildbot has a link to the current coverage data (generated under py2.4; it seems that py2.5 gives slightly different answers). I think it should be pretty easy to make these changes and achieve the same or better coverage than before.

Also, I don't know if it'd be worth it, but you could get fancy and instead of changing the data structure from a list of two-tuples to a list of three-tuples, you could:

create a subclass of list
- have it store an auxilliary value (the unmodified serialized string) for each key
- override __setitem__ to remove that auxilliary value
- add a get_both_items method, which takes a key and returns a two-tuple of (aux-serialized, value) (which is really (aux-serialized, (child,metadata)))
- add a set_both_items method which takes (key, aux-serialized, (child,metadata))
then the Modifier classes like Adder and Deleter don't have to change: they'll set the entries that have changed and leave the rest alone
now _pack_contents uses get_both_items and prefers the pre-serialized form if it's still there

Since NewDirectoryNode.list returns whatever _unpack_contents returns, you might manage to write less code if you use this technique. Otherwise you'll need to find all the callers of _read and update them to tolerate the new three-tuple (and add a wrapper to list to convert it into two-tuple form for external callers, so that it continues to honor the same interface defined in allmydata.interfaces.IDirectoryNode).

Oh, and of course, it would be a good idea to actually measure the time that serialization/deserialization takes before doing any of this work. Create 10/100/1k/10k/100k entries at random, call _pack_contents under timeit.py or some other sort of how-long-does-it-take loop, and figure out an A+Bx curve (I'd expect the serialization time to be some constant A plus some per-entry time B, let's figure out what A and B are). Let's make sure that serialization is the culprit, it might be that call to _create_node in unpack that we should focus on. And let's set a reasonable goal.. maybe we should be able to unpack+modify+pack 10k entries in less than a second, or something.

comment:12 Changed at 2009-07-02T18:34:53Z by zooko

The measuring part is my job -- #327 (performance measurement of directories).

comment:13 Changed at 2009-07-04T00:12:54Z by kevan

Okay, I've got tests in place -- I'll attach them in a bit.

One minor nit: I was working on the assumption that _pack_contents and _unpack_contents were inverses -- i.e., if I do _pack_contents(_unpack_contents(string)), I should get string back. I made this a condition in my test function, but it failed -- _pack_contents would return a different string, even when fed with the unmodified dict returned from _unpack_contents. But all of the data inside the dict seem the same -- at least the rocaps, rwcaps, and metadata. Is this normal?

Also, the hardcoding of the base32-encoded representation of that directory tree is kind of horrific -- any ideas for a better way of doing that?

comment:14 Changed at 2009-07-04T03:14:51Z by zooko

src/allmydata/dirnode.py#L201

The reason is that the encryption of the write-cap uses a random IV. There isn't an easy way to make that deterministic and still secure right now, so we can't rely on it for testing.

(Hm, for future reference -- i.e. after Tahoe v1.5.0 -- maybe we could generate the random IV as the secure hash of the write cap itself. That would be make it deterministic without, as far as I can currently see, losing security.)

As to the base32-encoded directory tree, that looks okay to me.

comment:15 Changed at 2009-07-04T03:58:46Z by kevan

Okay. Given that, do you think that the other tests in test_unpack_and_pack_behavior() are going to be enough?

I've finished doing a few things.

I've changed the tests for _pack_contents and _unpack_contents in test_dirnode.py to not check the output of _pack_contents for equality, since that will always fail at the moment. I've also moved a misplaced comment.
I've added tests for CachingDict, the data structure described by warner above (cool idea, by the way).
I've added an implementation of CachingDict to dirnode.py. It basically does what warner suggested that it do.
I've changed _pack_contents, _unpack_contents, and _create to use CachingDicts instead of dicts.

With these done (tests.txt, dict.txt, optimizations.txt applied), python setup.py test passes on my machine.

Changed at 2009-07-04T03:59:26Z by kevan

Attachment tests.txt added

tests for _unpack_contents, _pack_contents, and CachingDict?

Changed at 2009-07-04T04:00:08Z by kevan

Attachment optimizations.txt added

changing dirnode.py to use CachingDict?

comment:16 Changed at 2009-07-04T14:04:16Z by zooko

I've looked over your patches and it looks good! I'll work on my benchmarks of dirnode now so I can see what difference it makes to CPU usage.

comment:17 Changed at 2009-07-05T01:13:38Z by kevan

I noticed when running a test script that zooko had that I was throwing away *args and kwargs in my CachingDict? subclass. I'm attaching a patch that fixes that issue.

Changed at 2009-07-05T01:14:09Z by kevan

Attachment dict.2.txt added

comment:18 Changed at 2009-07-05T01:24:48Z by kevan

(erm, not kwargs -- typo)

comment:19 Changed at 2009-07-05T17:38:03Z by zooko

e414c73877ba6337 adds benchmarking of unpack-followed-by-repack, which is the functionality that this ticket and Kevan's patch is about.

comment:20 Changed at 2009-07-05T22:33:37Z by kevan

I've run the benchmarks, and am attaching the results.

To get the unmodifed_results file, I simply ran bench_dirnode.py against an unmodified checkout from trunk.

To get the modified_results file, I needed to change a call to dict in bench_dirnode.py to refer to dirnode.CachingDict (otherwise _pack_contents raises an AssertionError). I'm attaching a patch that duplicates that change. I then applied dict.txt, optimizations.txt, and tests.txt to an otherwise unmodified source tree, and ran bench_dirnode.py against it.

The results seem to favor the optimized implementation.

I'm also updating dict.txt again -- hopefully for the last time -- to fix another *args related issue that I found.

Changed at 2009-07-05T22:34:09Z by kevan

Attachment dict.txt added

CachingDict? implementation

Changed at 2009-07-05T22:34:41Z by kevan

Attachment bench.txt added

benchmark modifications for testing optimized code

Changed at 2009-07-05T22:34:58Z by kevan

Attachment modified_benchmark added

Benchmarks for optimized code

Changed at 2009-07-05T22:35:14Z by kevan

Attachment unmodified_benchmark added

benchmarks for unoptimized code

comment:21 Changed at 2009-07-07T09:09:22Z by warner

could you summarize the benchmark numbers in terms of A+Bx factors? Also, it's not clear to me what those numbers point to as being the expensive part: is it the netstring parsing? or the encryption?

comment:22 Changed at 2009-07-07T14:38:32Z by zooko

Summarizing into A+Bx is not the way I do it. Instead I look at a handful of data points, like this:

benchmarking <function unpack at 0x32c0970>
N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.09,   2th-worst:    0.09, worst:    0.10 (of      5), reps/s:     11, ave rate:      732
N:     256, time: best:    0.35,   2th-best:    0.35, ave:    0.36,   2th-worst:    0.35, worst:    0.39 (of      5), reps/s:      2, ave rate:      714
N:    1024, time: best:    1.51,   2th-best:    1.51, ave:    1.55,   2th-worst:    1.51, worst:    1.72 (of      5), reps/s:      0, ave rate:      659
N:    4096, time: best:   10.48,   2th-best:   10.49, ave:   10.66,   2th-worst:   10.53, worst:   11.29 (of      5), reps/s:      0, ave rate:      384
benchmarking <function pack at 0x32c0930>
N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.09,   2th-worst:    0.09, worst:    0.09 (of      5), reps/s:     11, ave rate:      747
N:     256, time: best:    0.35,   2th-best:    0.35, ave:    0.35,   2th-worst:    0.35, worst:    0.35 (of      5), reps/s:      2, ave rate:      728
N:    1024, time: best:    1.52,   2th-best:    1.53, ave:    1.53,   2th-worst:    1.53, worst:    1.53 (of      5), reps/s:      0, ave rate:      670
N:    4096, time: best:   10.48,   2th-best:   10.49, ave:   10.50,   2th-worst:   10.52, worst:   10.53 (of      5), reps/s:      0, ave rate:      390
benchmarking <function unpack_and_repack at 0x32c09b0>
N:      64, time: best:    0.08,   2th-best:    0.09, ave:    0.09,   2th-worst:    0.09, worst:    0.09 (of      5), reps/s:     11, ave rate:      744
N:     256, time: best:    0.35,   2th-best:    0.35, ave:    0.35,   2th-worst:    0.35, worst:    0.35 (of      5), reps/s:      2, ave rate:      727
N:    1024, time: best:    1.52,   2th-best:    1.52, ave:    1.54,   2th-worst:    1.53, worst:    1.60 (of      5), reps/s:      0, ave rate:      665
N:    4096, time: best:   10.49,   2th-best:   10.50, ave:   10.51,   2th-worst:   10.51, worst:   10.52 (of      5), reps/s:      0, ave rate:      390

This is a good example of why I do it this way: because it isn't linear! So any A+Bx summary would be off. I already have a patch which fixes the observable non-linearity there. I guess I might as well commit that one so that everyone can see it. There -- efafcfb91a09b4de, entitled "directories: keep track of your position as you decode netstring after netstring from an input buffer instead of copying the trailing part | This makes decoding linear in the number of netstrings instead of O(N²).". After this patch, the same benchmark on the same machine says:

benchmarking <function unpack at 0x33429b0>
N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.09,   2th-worst:    0.08, worst:    0.10 (of      5), reps/s:     11, ave rate:      750
N:     256, time: best:    0.33,   2th-best:    0.33, ave:    0.34,   2th-worst:    0.33, worst:    0.37 (of      5), reps/s:      2, ave rate:      755
N:    1024, time: best:    1.28,   2th-best:    1.28, ave:    1.32,   2th-worst:    1.28, worst:    1.48 (of      5), reps/s:      0, ave rate:      776
N:    4096, time: best:    4.83,   2th-best:    4.83, ave:    4.97,   2th-worst:    4.83, worst:    5.51 (of      5), reps/s:      0, ave rate:      824
benchmarking <function pack at 0x3342970>
N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.08,   2th-worst:    0.08, worst:    0.09 (of      5), reps/s:     11, ave rate:      760
N:     256, time: best:    0.33,   2th-best:    0.33, ave:    0.33,   2th-worst:    0.33, worst:    0.33 (of      5), reps/s:      2, ave rate:      767
N:    1024, time: best:    1.29,   2th-best:    1.29, ave:    1.29,   2th-worst:    1.29, worst:    1.29 (of      5), reps/s:      0, ave rate:      794
N:    4096, time: best:    4.83,   2th-best:    4.83, ave:    4.83,   2th-worst:    4.83, worst:    4.84 (of      5), reps/s:      0, ave rate:      847
benchmarking <function unpack_and_repack at 0x33429f0>
N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.08,   2th-worst:    0.08, worst:    0.09 (of      5), reps/s:     11, ave rate:      759
N:     256, time: best:    0.33,   2th-best:    0.33, ave:    0.33,   2th-worst:    0.33, worst:    0.33 (of      5), reps/s:      2, ave rate:      767
N:    1024, time: best:    1.29,   2th-best:    1.29, ave:    1.30,   2th-worst:    1.29, worst:    1.36 (of      5), reps/s:      0, ave rate:      786
N:    4096, time: best:    4.82,   2th-best:    4.83, ave:    4.85,   2th-worst:    4.87, worst:    4.90 (of      5), reps/s:      0, ave rate:      844

As to your question about what's the expensive part, after the optimization patch from Kevan plus a few that I have here in my sandbox, benchmarking shows:

benchmarking <function unpack at 0x33003f0>
N:      64, time: best:    0.07,   2th-best:    0.07, ave:    0.08,   2th-worst:    0.07, worst:    0.08 (of      5), reps/s:     13, ave rate:      849
N:     256, time: best:    0.29,   2th-best:    0.30, ave:    0.31,   2th-worst:    0.32, worst:    0.32 (of      5), reps/s:      3, ave rate:      832
N:    1024, time: best:    1.24,   2th-best:    1.24, ave:    1.30,   2th-worst:    1.27, worst:    1.52 (of      5), reps/s:      0, ave rate:      787
N:    4096, time: best:    4.28,   2th-best:    4.29, ave:    4.40,   2th-worst:    4.31, worst:    4.82 (of      5), reps/s:      0, ave rate:      931
benchmarking <function pack at 0x33003b0>
N:      64, time: best:    0.07,   2th-best:    0.07, ave:    0.08,   2th-worst:    0.08, worst:    0.08 (of      5), reps/s:     13, ave rate:      843
N:     256, time: best:    0.29,   2th-best:    0.29, ave:    0.30,   2th-worst:    0.31, worst:    0.32 (of      5), reps/s:      3, ave rate:      840
N:    1024, time: best:    1.14,   2th-best:    1.14, ave:    1.16,   2th-worst:    1.17, worst:    1.20 (of      5), reps/s:      0, ave rate:      883
N:    4096, time: best:    4.30,   2th-best:    4.31, ave:    4.53,   2th-worst:    4.64, worst:    4.99 (of      5), reps/s:      0, ave rate:      904
benchmarking <function unpack_and_repack at 0x3300430>
N:      64, time: best:    0.07,   2th-best:    0.07, ave:    0.07,   2th-worst:    0.07, worst:    0.08 (of      5), reps/s:     13, ave rate:      860
N:     256, time: best:    0.29,   2th-best:    0.29, ave:    0.29,   2th-worst:    0.29, worst:    0.29 (of      5), reps/s:      3, ave rate:      880
N:    1024, time: best:    1.14,   2th-best:    1.14, ave:    1.15,   2th-worst:    1.14, worst:    1.21 (of      5), reps/s:      0, ave rate:      889
N:    4096, time: best:    4.28,   2th-best:    4.28, ave:    4.29,   2th-worst:    4.29, worst:    4.30 (of      5), reps/s:      0, ave rate:      955

and profiling shows:

benchmarking <function unpack at 0x3278430>
N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.09,   2th-worst:    0.08, worst:    0.10 (of      5), reps/s:     11, ave rate:      738
N:     256, time: best:    0.34,   2th-best:    0.34, ave:    0.34,   2th-worst:    0.34, worst:    0.38 (of      5), reps/s:      2, ave rate:      744
N:    1024, time: best:    1.29,   2th-best:    1.29, ave:    1.48,   2th-worst:    1.46, worst:    2.04 (of      5), reps/s:      0, ave rate:      693
N:    4096, time: best:    4.84,   2th-best:    4.85, ave:    4.97,   2th-worst:    4.87, worst:    5.45 (of      5), reps/s:      0, ave rate:      824
benchmarking <function pack at 0x32783f0>
N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.10,   2th-worst:    0.09, worst:    0.18 (of      5), reps/s:      9, ave rate:      611
N:     256, time: best:    0.33,   2th-best:    0.34, ave:    0.34,   2th-worst:    0.34, worst:    0.34 (of      5), reps/s:      2, ave rate:      764
N:    1024, time: best:    1.29,   2th-best:    1.29, ave:    1.30,   2th-worst:    1.30, worst:    1.30 (of      5), reps/s:      0, ave rate:      790
N:    4096, time: best:    4.85,   2th-best:    4.85, ave:    4.88,   2th-worst:    4.87, worst:    4.96 (of      5), reps/s:      0, ave rate:      840
benchmarking <function unpack_and_repack at 0x3278470>
N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.08,   2th-worst:    0.08, worst:    0.09 (of      5), reps/s:     11, ave rate:      754
N:     256, time: best:    0.33,   2th-best:    0.34, ave:    0.34,   2th-worst:    0.34, worst:    0.34 (of      5), reps/s:      2, ave rate:      764
N:    1024, time: best:    1.30,   2th-best:    1.30, ave:    1.31,   2th-worst:    1.31, worst:    1.37 (of      5), reps/s:      0, ave rate:      779
N:    4096, time: best:    4.85,   2th-best:    4.87, ave:    4.89,   2th-worst:    4.91, worst:    4.96 (of      5), reps/s:      0, ave rate:      837
         674783 function calls in 4.909 CPU seconds

   Ordered by: internal time
   List reduced from 75 to 32 due to restriction <32>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)   
    36830    2.422    0.000    2.591    0.000 base32.py:60(b2a_l)
     7366    0.527    0.000    1.606    0.000 base32.py:208(a2b_l)
    54026    0.154    0.000    0.154    0.000 netstring.py:3(netstring)   
    20879    0.149    0.000    0.149    0.000 hashutil.py:26(digest)
     7366    0.111    0.000    0.111    0.000 hashutil.py:163(_xor)
    36830    0.093    0.000    0.093    0.000 string.py:306(join)
     3683    0.088    0.000    0.088    0.000 encoder.py:205(iterencode)  
    58928    0.080    0.000    0.080    0.000 string.py:480(translate)
        1    0.076    0.076    1.905    1.905 dirnode.py:248(_pack_contents)
     7366    0.066    0.000    0.066    0.000 netstring.py:7(split_netstring)
    29464    0.060    0.000    2.141    0.000 base32.py:48(b2a)
    49124    0.058    0.000    0.058    0.000 hashutil.py:23(update)
     3683    0.056    0.000    0.386    0.000 dirnode.py:196(_encrypt_rwcap)
     7366    0.055    0.000    0.199    0.000 hashutil.py:48(tagged_pair_hash)
    13513    0.054    0.000    0.134    0.000 hashutil.py:38(tagged_hasher)
     3683    0.050    0.000    0.050    0.000 decoder.py:341(raw_decode)  
        1    0.046    0.046    2.999    2.999 dirnode.py:218(_unpack_contents)
     3683    0.042    0.000    0.155    0.000 dirnode.py:207(_decrypt_rwcapdata)
     7366    0.040    0.000    1.697    0.000 base32.py:199(a2b)
   125222    0.040    0.000    0.040    0.000 assertutil.py:30(precondition)
     3683    0.040    0.000    0.151    0.000 hashutil.py:166(hmac)
    13513    0.037    0.000    0.286    0.000 hashutil.py:43(tagged_hash) 
    20879    0.035    0.000    0.035    0.000 hashutil.py:19(__init__)

Whoops, time for me to go to work! I hope to measure and commit the rest of my optimization patches today after work. There is one that I really want other people's input on before I commit it... That one generates the per-entry IV with a secure hash of the writecap instead of with os.urandom(16).

comment:23 Changed at 2009-07-09T04:35:11Z by zooko

Resolution set to fixed
Status changed from new to closed

Fixed by 903005a52830ba96 but see also the new #752 (speed up directories more). Thanks, Kevan!

comment:24 Changed at 2009-07-10T01:59:41Z by kevan

Note that r3971 didn't actually commit the optimization code -- just the dictionary that it uses. If you want that, you should also apply the optimizations.txt patch and the tests.txt patch (which also contains tests for CachingDict?).

comment:25 Changed at 2009-07-10T02:21:13Z by zooko

Oh how embarassing that I thought that was the optimization patch. I was actually going to ask you what you thought about the mysterious fact that 903005a52830ba96 seems to moderately speed up pack and unpack but not so much unpack-and-repack!

comment:26 Changed at 2009-07-10T03:24:34Z by zooko

Now that I've applied your actual optimization patch, time to unpack and repack a 4096-entry directory dropped substantially. However, time to pack and time to unpack also changed, which suggests that my benchmark script is accidentally using your cache, or that extraneous factors on my system are screwing up the results or something.

Before optimizations.txt:

benchmarking <function unpack at 0x1b78320>
N:      64, time: best:    0.08,   2th-best:    0.09, ave:    0.10,   2th-worst:    0.10, worst:    0.15 (of      5), reps/s:      9, ave rate:      632
N:     256, time: best:    0.35,   2th-best:    0.37, ave:    0.38,   2th-worst:    0.39, worst:    0.40 (of      5), reps/s:      2, ave rate:      675
N:    1024, time: best:    1.40,   2th-best:    1.42, ave:    1.63,   2th-worst:    1.74, worst:    1.88 (of      5), reps/s:      0, ave rate:      629
N:    4096, time: best:    5.29,   2th-best:    5.90, ave:    6.16,   2th-worst:    6.36, worst:    7.15 (of      5), reps/s:      0, ave rate:      665
benchmarking <function pack at 0x1b782a8>
N:      64, time: best:    0.09,   2th-best:    0.09, ave:    0.09,   2th-worst:    0.09, worst:    0.09 (of      5), reps/s:     11, ave rate:      705
N:     256, time: best:    0.36,   2th-best:    0.36, ave:    0.39,   2th-worst:    0.37, worst:    0.51 (of      5), reps/s:      2, ave rate:      650
N:    1024, time: best:    1.38,   2th-best:    1.43, ave:    1.51,   2th-worst:    1.51, worst:    1.80 (of      5), reps/s:      0, ave rate:      676
N:    4096, time: best:    5.22,   2th-best:    5.77, ave:    6.08,   2th-worst:    6.33, worst:    6.79 (of      5), reps/s:      0, ave rate:      674
benchmarking <function unpack_and_repack at 0x1b78398>
N:      64, time: best:    0.10,   2th-best:    0.10, ave:    0.14,   2th-worst:    0.18, worst:    0.19 (of      5), reps/s:      7, ave rate:      473
N:     256, time: best:    0.36,   2th-best:    0.37, ave:    0.39,   2th-worst:    0.40, worst:    0.42 (of      5), reps/s:      2, ave rate:      664
N:    1024, time: best:    1.41,   2th-best:    1.42, ave:    1.46,   2th-worst:    1.48, worst:    1.52 (of      5), reps/s:      0, ave rate:      702
N:    4096, time: best:    5.22,   2th-best:    5.58, ave:    5.81,   2th-worst:    6.09, worst:    6.20 (of      5), reps/s:      0, ave rate:      705

after optimizations.txt:

benchmarking <function unpack at 0x1b78398>
N:      64, time: best:    0.05,   2th-best:    0.05, ave:    0.06,   2th-worst:    0.06, worst:    0.07 (of      5), reps/s:     17, ave rate:     1091
N:     256, time: best:    0.21,   2th-best:    0.21, ave:    0.23,   2th-worst:    0.24, worst:    0.27 (of      5), reps/s:      4, ave rate:     1093
N:    1024, time: best:    0.83,   2th-best:    0.84, ave:    0.88,   2th-worst:    0.87, worst:    1.00 (of      5), reps/s:      1, ave rate:     1170
N:    4096, time: best:    3.29,   2th-best:    3.35, ave:    3.54,   2th-worst:    3.58, worst:    4.09 (of      5), reps/s:      0, ave rate:     1156
benchmarking <function pack at 0x1b78320>
N:      64, time: best:    0.06,   2th-best:    0.06, ave:    0.06,   2th-worst:    0.06, worst:    0.06 (of      5), reps/s:     17, ave rate:     1106
N:     256, time: best:    0.22,   2th-best:    0.22, ave:    0.22,   2th-worst:    0.22, worst:    0.23 (of      5), reps/s:      4, ave rate:     1158
N:    1024, time: best:    0.85,   2th-best:    0.85, ave:    0.86,   2th-worst:    0.87, worst:    0.88 (of      5), reps/s:      1, ave rate:     1188
N:    4096, time: best:    3.27,   2th-best:    3.28, ave:    3.36,   2th-worst:    3.38, worst:    3.60 (of      5), reps/s:      0, ave rate:     1218
benchmarking <function unpack_and_repack at 0x1b78410>
N:      64, time: best:    0.06,   2th-best:    0.06, ave:    0.07,   2th-worst:    0.09, worst:    0.10 (of      5), reps/s:     13, ave rate:      881
N:     256, time: best:    0.22,   2th-best:    0.22, ave:    0.28,   2th-worst:    0.26, worst:    0.46 (of      5), reps/s:      3, ave rate:      929
N:    1024, time: best:    0.85,   2th-best:    0.90, ave:    1.00,   2th-worst:    1.02, worst:    1.34 (of      5), reps/s:      0, ave rate:     1022
N:    4096, time: best:    3.25,   2th-best:    3.33, ave:    3.49,   2th-worst:    3.67, worst:    3.74 (of      5), reps/s:      0, ave rate:     1172

I've applied your patches plus one from me to fix a conflict between one of my optimizations and one of yours.

comment:27 Changed at 2009-09-30T19:06:28Z by zooko

Milestone changed from undecided to 1.5.0

The patch to cache entries was and required follow-up merge patch 34213cd2c70246f3.

At around the same time other patches to optimize directory processing were also committed: efafcfb91a09b4de (the big one that makes processing linear instead of O(N²)), c0d1e7deaec145d6, 786ed012b3510135 (which required urgent security fix follow-up c1d5717cf0ecd68f.

Note: See TracTickets for help on using tickets.

Download in other formats: