[tahoe-dev] [tahoe-lafs] #798: improve random-access download to retrieve/decrypt less data
tahoe-lafs
trac at tahoe-lafs.org
Mon Aug 9 05:12:53 UTC 2010
#798: improve random-access download to retrieve/decrypt less data
------------------------------+---------------------------------------------
Reporter: warner | Owner: davidsarah
Type: enhancement | Status: assigned
Priority: major | Milestone: 1.8.0
Component: code-network | Version: 1.5.0
Resolution: | Keywords: performance pycryptopp random-access download large confidentiality review-needed
Launchpad Bug: |
------------------------------+---------------------------------------------
Comment (by davidsarah):
Reviewing [4652]:
* the {{{(start, length)}}} form of the {{{[Simple]Spans}}} constructor
is not used outside test code (and the test code can be changed to pass a
{{{[(start, length)]}}} array). Removing this would slightly simplify the
constructor and avoid a possibly error-prone overloading.
* in the {{{Spans}}} class comment, ", frequently used to represent
.newsrc contents" is out-of-context and not needed.
* in the {{{_check}}} method of {{{Spans}}}, if assertions are switched
on then the {{{self._spans}}} array is re-sorted in order to check whether
it is ordered. This is unnecessary: if you add an {{{assert length > 0,
length}}} in the loop, then the loop will be checking a condition that is
stronger than the array being ordered, given that the starts and lengths
are numbers. (The sorting actually takes O(n) time rather than O(n log n)
on each call to {{{_check}}}, because
[http://bugs.python.org/file4451/timsort.txt Timsort] will detect the
ordered input, but it's still unnecessary overhead.)
* the assert statements should include the variables they use in the
exception message, e.g. {{{assert start > prev_end, (start, prev_end)}}}.
* "{{{overlap(s_start, s_length, start, length) or adjacent(s_start,
s_length, start, length)}}}" is equivalent to "{{{overlap(s_start,
s_length, start-1, length+2)}}}".
* in the only other use of {{{adjacent}}} (in {{{DataSpans.add}}}), only
the {{{start0 < start1}}} case should be able to occur. Inline and
simplify.
* in the loop over {{{enumerate(self._spans)}}}, you could exit early
when {{{s_start > start+length}}}. At that point you know where to insert
the {{{(start, length)}}} span without having to re-sort.
* a {{{Spans}}} object behaves somewhat like a set of the elements in all
of its spans, but the {{{__contains__}}} and {{{__iter__}}} methods are
not consistent with that (instead viewing it as a set of {{{(start,
length)}}} pairs). I realize this may allow for more convenient use of
{{{in}}} and iteration, but it should at least be documented.
* {{{_check}}} and {{{assert_invariants}}} do similar things; give them
the same name.
* {{{DataSpans._dump}}} is poorly named.
* {{{DataSpans.assert_invariants}}} should check that none of the data
strings are zero-length.
* is it intentional that {{{DataSpans.add}}} calls
{{{self.assert_invariants()}}} but {{{remove}}} (and {{{pop}}}, although
that's much simpler) don't?
* {{{if s_start <= start < s_end:}}} I find this Python construct too
clever by half. Anyway, at this point {{{s_start <= start}}} is always
true (otherwise we won't get past the first {{{continue}}}), so I would
write this as
{{{
assert s_start <= start, (s_start, start)
if start < s_end:
}}}
* Perhaps rename {{{s_*}}} to {{{old_*}}}.
* {{{DataSpans.add}}}: if I understand correctly, case A also covers:
{{{
OLD OLD OLD OLD
NEW NEW NEWW NEEWW
}}}
This isn't immediately clear from the comment. Depicting it as
{{{
OLD
NEW.....
}}}
might help. Also, use uppercase consistently for the cases.
* {{{suffix_len}}} in case D has a different meaning to {{{suffix_len}}}
in case E. Maybe rename it to {{{replace_len}}} for case D.
Tests:
* The {{{Spans}}} class is tested by {{{ByteSpans}}}, but it just stores
spans of integers, not necessarily byte offsets. I would suggest
{{{s/ByteSpans/TestSpans/}}} and {{{s/StringSpans/TestDataSpans/}}}.
* I could be wrong, but I don't think the deterministic tests are
covering all cases in {{{add}}} and {{{remove}}}. I'd prefer to see those
tests have full coverage rather than relying on the randomized tests to
make up the gap.
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/798#comment:18>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-dev
mailing list