[tahoe-lafs-trac-stream] [tahoe-lafs] #393: mutable: implement MDMF

tahoe-lafs trac at tahoe-lafs.org
Sat Aug 13 14:10:07 PDT 2011


#393: mutable: implement MDMF
-------------------------+-------------------------------------------------
     Reporter:  warner   |      Owner:  zooko
         Type:           |     Status:  new
  enhancement            |  Milestone:  1.9.0
     Priority:  major    |    Version:  1.0.0
    Component:  code-    |   Keywords:  newcaps performance random-access
  mutable                |  privacy gsoc mdmf mutable backward-
   Resolution:           |  compatibility forward-compatibility review-
Launchpad Bug:           |  needed
-------------------------+-------------------------------------------------

Comment (by kevan):

 Replying to [comment:124 davidsarah]:
 > Notes on [5124/ticket393-MDMF-2]:
 > * This:
 > {{{
 > self._need_privkey = fetch_privkey or verify
 > if self._node.get_privkey() and not verify:
 >     self._need_privkey = False
 > }}}
 >   could be simplified to
 > {{{
 > self._need_privkey = verify or (fetch_privkey
 >                                 and not self._node.get_privkey())
 > }}}
 > * This:
 > {{{
 > self._verify = False
 > if verify:
 >     self._verify = True
 > }}}
 >   could be
 > {{{
 > self._verify = verify
 > }}}
 >
 > * {{{self._paused_deferred = None}}} is a typo, it should be
 {{{_pause_deferred}}}.
 >
 > * {{{self._paused}}} seems to be {{{True}}} whenever
 {{{self._pause_deferred is not None}}}, except that there is one place in
 the {{{download}}} method that sets {{{self._paused}}} to {{{False}}}
 without setting {{{self._pause_deferred}}} to {{{None}}}.
 >   I don't think that the {{{self._paused = None}}} there is actually
 necessary; everything will work if the download is started while paused.
 That would allow the code to be simplified by removing {{{self._paused}}}.

 Good observations.

 >
 > * Is this calculation correct?
 > {{{
 > # our start segment is the first segment containing the
 > # offset we were given.
 > start = mathutil.div_ceil(self._offset,
 >                           self._segment_size)
 > # this gets us the first segment after self._offset. Then
 > # our start segment is the one before it.
 > start -= 1
 > }}}
 >   I would have thought that the segment containing {{{offset}}} was the
 floor of {{{offset}}} / {{{segment_size}}}.

 Indeed. Both this and the similar end segment calculation are wrong.

 This code gets exercised for partial reads, and doesn't behave correctly
 for reads that start at an offset that is evenly divisible by the segment
 size. For example, suppose that the segment size is 131073, and that we
 want to read starting at offset 131073. Since bytes are zero-indexed, we
 actually want to read the 131074th byte, which is the first byte in the
 second segment (or segment 1, since segments are also zero-indexed).
 {{{div_ceil(131073, 131073) = 1}}}, but the unconditional {{{start -= 1}}}
 means that we'll actually start reading from segment 0, which is wrong.
 Floor division handles this case correctly.

 The end segment calculation is wrong in a similar way, but manages to work
 anyway because of a different bug. Specifically, it expects to read one
 more byte of data than is actually necessary to fulfill the read. For
 example, suppose that we want to read 5 bytes starting at offset 131073.
 Then we'll read 131073, 131074, 131075, 131076, and 131077, or the range
 131073-131077 inclusive. The code doesn't calculate this range correctly,
 because it uses {{{offset + read_size}}} to calculate the endpoint; in
 this case, it produces the range 131073 - 131078.

 I've attached a patch that adds tests for both of these issues. The case
 where the read starts on a segment boundary fails without modification on
 the ticket393-MDMF-2 branch. To see the test where a read ends on a
 segment boundary fail, you need to add {{{end_data -= 1}}} after
 {{{end_data}}} is first calculated around line 400 of {{{retrieve.py}}} to
 fix the off-by-one error there.

 Both tests pass with floor division. I've also fixed similar equations for
 partial updates of MDMF files in {{{filenode.py}}}; these already used
 floor division, but still read more data than necessary to perform an
 update.

 > * {{{Unusued}}} -> {{{Unused}}} at retrieve.py line 305
 >
 > * set {{{self._block_hash_trees}}} to {{{None}}} in {{{decode()}}}, and
 make its initialization in {{{_setup_encoding_parameters}}} conditional on
 {{{if self._block_hash_trees is not None:}}}.

 Fixed in the updated patch.

 > My preferred whitespace conventions:
 > * one blank line before a block comment if the preceding line is code at
 the same indent level
 >   (this makes it harder to mistake the code as part of the comment, and
 makes the comment
 >   easier to read IMHO). examples are lines 235 and 285 of retrieve.py
 > * two blank lines between top-level classes, or groups of methods
 > * otherwise only one blank line between methods

 Noted; I'll start fixing this.

 Replying to [comment:116 warner]:
 > So I think this ought to be:
 >
 > {{{
 > SHARE_HASH_CHAIN_SIZE = (HASH_SIZE+2)*mathutil.log_ceil(256, 2)
 > }}}
 >
 > at least that gives me (32+2)*8, which feels right.

 Good catch; thanks.

 > Kevan: could you update this (and add a new patch with both this and the
 merge
 > conflicts resolved)?

 Is there a compelling reason to keep posting patches that encompass the
 entire MDMF feature instead of using the ticket393-MDMF-2 branch? I find
 the latter much easier to work with.

 Also, would it be worthwhile to make it so the buildbots can easily run
 tests with 2048-bit keys? Or to add a target to the Makefile to do that?
 Might make these sorts of issues easier to find in the future.

 I'll attach a patch against ticket393-MDMF-2 that addresses davidsarahs'
 review comments (it looks like warner and davidsarah have addressed
 warner's comments in comment:116, at least in that branch). If anyone
 wants one, I can also make a big patch bundle against trunk that contains
 the whole MDMF feature.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/393#comment:126>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list