[tahoe-dev] [tahoe-lafs] #798: improve random-access download to retrieve/decrypt less data

tahoe-lafs trac at tahoe-lafs.org
Mon Aug 9 05:12:53 UTC 2010


#798: improve random-access download to retrieve/decrypt less data
------------------------------+---------------------------------------------
     Reporter:  warner        |       Owner:  davidsarah                                                                       
         Type:  enhancement   |      Status:  assigned                                                                         
     Priority:  major         |   Milestone:  1.8.0                                                                            
    Component:  code-network  |     Version:  1.5.0                                                                            
   Resolution:                |    Keywords:  performance pycryptopp random-access download large confidentiality review-needed
Launchpad Bug:                |  
------------------------------+---------------------------------------------

Comment (by davidsarah):

 Reviewing [4652]:

  * the {{{(start, length)}}} form of the {{{[Simple]Spans}}} constructor
 is not used outside test code (and the test code can be changed to pass a
 {{{[(start, length)]}}} array). Removing this would slightly simplify the
 constructor and avoid a possibly error-prone overloading.

  * in the {{{Spans}}} class comment, ", frequently used to represent
 .newsrc contents" is out-of-context and not needed.

  * in the {{{_check}}} method of {{{Spans}}}, if assertions are switched
 on then the {{{self._spans}}} array is re-sorted in order to check whether
 it is ordered. This is unnecessary: if you add an {{{assert length > 0,
 length}}} in the loop, then the loop will be checking a condition that is
 stronger than the array being ordered, given that the starts and lengths
 are numbers. (The sorting actually takes O(n) time rather than O(n log n)
 on each call to {{{_check}}}, because
 [http://bugs.python.org/file4451/timsort.txt Timsort] will detect the
 ordered input, but it's still unnecessary overhead.)

  * the assert statements should include the variables they use in the
 exception message, e.g. {{{assert start > prev_end, (start, prev_end)}}}.

  * "{{{overlap(s_start, s_length, start, length) or adjacent(s_start,
 s_length, start, length)}}}" is equivalent to "{{{overlap(s_start,
 s_length, start-1, length+2)}}}".

  * in the only other use of {{{adjacent}}} (in {{{DataSpans.add}}}), only
 the {{{start0 < start1}}} case should be able to occur. Inline and
 simplify.

  * in the loop over {{{enumerate(self._spans)}}}, you could exit early
 when {{{s_start > start+length}}}. At that point you know where to insert
 the {{{(start, length)}}} span without having to re-sort.

  * a {{{Spans}}} object behaves somewhat like a set of the elements in all
 of its spans, but the {{{__contains__}}} and {{{__iter__}}} methods are
 not consistent with that (instead viewing it as a set of {{{(start,
 length)}}} pairs). I realize this may allow for more convenient use of
 {{{in}}} and iteration, but it should at least be documented.

  * {{{_check}}} and {{{assert_invariants}}} do similar things; give them
 the same name.

  * {{{DataSpans._dump}}} is poorly named.

  * {{{DataSpans.assert_invariants}}} should check that none of the data
 strings are zero-length.

  * is it intentional that {{{DataSpans.add}}} calls
 {{{self.assert_invariants()}}} but {{{remove}}} (and {{{pop}}}, although
 that's much simpler) don't?

  * {{{if s_start <= start < s_end:}}} I find this Python construct too
 clever by half. Anyway, at this point {{{s_start <= start}}} is always
 true (otherwise we won't get past the first {{{continue}}}), so I would
 write this as
 {{{
 assert s_start <= start, (s_start, start)
 if start < s_end:
 }}}

  * Perhaps rename {{{s_*}}} to {{{old_*}}}.

  * {{{DataSpans.add}}}: if I understand correctly, case A also covers:
 {{{
     OLD      OLD    OLD    OLD
 NEW       NEW      NEWW   NEEWW
 }}}
    This isn't immediately clear from the comment. Depicting it as
 {{{
     OLD
 NEW.....
 }}}
    might help. Also, use uppercase consistently for the cases.

  * {{{suffix_len}}} in case D has a different meaning to {{{suffix_len}}}
 in case E. Maybe rename it to {{{replace_len}}} for case D.

 Tests:

  * The {{{Spans}}} class is tested by {{{ByteSpans}}}, but it just stores
 spans of integers, not necessarily byte offsets. I would suggest
 {{{s/ByteSpans/TestSpans/}}} and {{{s/StringSpans/TestDataSpans/}}}.

  * I could be wrong, but I don't think the deterministic tests are
 covering all cases in {{{add}}} and {{{remove}}}. I'd prefer to see those
 tests have full coverage rather than relying on the randomized tests to
 make up the gap.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/798#comment:18>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-dev mailing list