[tahoe-lafs-trac-stream] [tahoe-lafs] #1526: make sure the new MDMF extension field is forward-compatible and safe

tahoe-lafs trac at tahoe-lafs.org
Sun Sep 4 15:09:23 PDT 2011


#1526: make sure the new MDMF extension field is forward-compatible and safe
-------------------------+-------------------------------------------------
     Reporter:  zooko    |      Owner:
         Type:  defect   |     Status:  new
     Priority:           |  Milestone:  1.9.0
  critical               |    Version:  1.9.0a1
    Component:  code-    |   Keywords:  forward-compatibility mdmf design-
  mutable                |  review-needed
   Resolution:           |
Launchpad Bug:           |
-------------------------+-------------------------------------------------

Old description:

> In #393 we've added "extension fields" to the MDMF caps. If I recall
> correctly, the original motivation was that future writers of MDMFs might
> want to include the {{{K}}} and {{{segsize}}} parameters in the cap
> itself (in addition to inside the share data) and future readers might
> want to take advantage of that information so that they don't need to do
> an extra round trip to learn that information by fetching some of the
> share data.
>
> It is important that such future writers of MDMFs don't exclude older
> (v1.9) readers of MDMFs from being able to read those caps or even (gah!)
> cause them to get the wrong data or to incur an error when they try to
> read those caps. Therefore we need a way to include data in the caps
> which older (v1.9) readers will reliably and safely ignore (and still be
> able to read the file data correctly) but future readers can use if they
> want.
>
> I thought we had decided to make a generic field for "extensions" in the
> MDMF caps, and not to make the current (1.9) reader or writer actually
> use this extension field yet. But the current code in trunk constrains
> that field instead of allowing it to be generically extensible, and it
> seems to try to use the numbers contained therein for its {{{K}}} and
> {{{segsize}}} values in some (?) cases.
>
> As for the constraints, first it constrains the extension field to
> [source:trunk/src/allmydata/uri.py?annotate=blame&rev=5220#L31 contain
> only the characters 0-9 and :], and then it requires it to be
> [source:trunk/src/allmydata/mutable/filenode.py?annotate=blame&rev=5227#L120
> exactly two elements long].
>
> Future versions of MDMF writers can't use the extension field then, to
> communicate anything which isn't made up of {{{0-9}}} characters and
> exactly one {{{:}}} character. It might allow some future use if it is
> instead a less constrained field which can have a larger set of
> characters, and which has space in it for messages that Tahoe-LAFS v1.9
> readers will parse and then ignore.
>
> As for the use of that field to initialize the {{{K}}} and {{{segsize}}}
> values, I haven't read through the code carefully enough to see if it
> does that correctly and if it has good tests. If we're going to keep the
> code in there that uses those values for download, then it probably makes
> sense to add the code which writes those values into the cap on upload!
>
> That latter part—putting the {{{K}}} and {{{segsize}}} into the cap when
> generating the cap when writing—is very simple to do, whereas the former
> part is potentially complicated.
>
> What, for example, happens if the {{{segsize}}} indicated in the cap and
> the {{{segsize}}} indicated
> [source:trunk/src/allmydata/mutable/filenode.py?annotate=blame&rev=5227#L1107
> in the version info] differ? Can the {{{segsize}}} or the {{{K}}} change
> in different versions of the same MDMF? (I'm pretty sure it can't, but if
> it can't then maybe the value in the cap should be the ''only'' place
> that {{{K}}} or {{{segsize}}} exist.) Does the current trunk MDMF reader
> actually really use this value? Scanning through the code, I don't think
> so but I'm not 100% sure yet.
>
> There are three possibilities that I think we should consider for v1.9.0:
>
> Proposal 1 (extension field for future use—currently unused)
>
> * 1. a. Loosen the constraint-checking on the extension field in MDMF
> caps to allow a larger character class and have almost no constraints
> except those necessary for safe and easy parsing to find where the field
> begins and ends.
> * 1. b. Eliminate all code which uses the contents of the extension field
> when reading.
> * 1. c. [OPTIONAL] Write code (if it isn't already there) to populate the
> contents of that extension field with {{{K:segsize}}} when generating a
> URL. (The way it encodes {{{K}}} and {{{segsize}}} into the extension
> field has, of course, to fit into the constraints of the extension field.
> In addition to that, it should ''not'' consume the entire extension
> field, but should allow a safe and easy way for other fields to be added
> into the extension field such that they can be unambiguously parsed apart
> from the {{{K}}} and {{{segsize}}} fields.)
> * 1. d. Think about whether this proposal will lead to
> unsafety/insecurity or forward-compatibility problems.
>
> Proposal 2 ({{{K}}} and {{{segsize}}} in cap):
>
> * 2. a. Define part of the MDMF cap to hold {{{K}}} and {{{segsize}}}.
> This is in fact exactly the same as the "extension field" in the current
> trunk, but we stop calling it the "extension field" and start calling it
> {{{K}}} and {{{segsize}}}.
> * 2. b. Eliminate all code which uses {{{K}}} and {{{segsize}}} values
> from anywhere ''other'' than the cap when reading.
> * 2. c. Write code (if it isn't already there) to populate the contents
> of the {{{K}}} and {{{segsize}}} parts of the MDMF cap when generating a
> cap.
> * 2. d. Think about whether this proposal will lead to
> unsafety/insecurity or forward-compatibility problems.
>
> Proposal 3 ({{{K}}} and {{{segsize}}} in cap, plus an extension field):
>
> * Do all of Proposal 2 to encode {{{K}}} and {{{segsize}}} into the cap,
> and then also do proposal 1 (except for 1.c. of course) to provide an
> extension field for future use.
>
> I think that Proposal 1 is the least likely to delay or destabilize
> Tahoe-LAFS v1.9, especially if we leave out the optional 1.c. step. If
> v1.9 does not attempt to use the extension field in any way other than
> telling where it begins and ends, then future MDMF users will not have to
> worry that what they put in there will cause problems for old v1.9 users.
> By removing all the code that does anything with the extension field
> (aside from the regex which allows the extension field to be present in
> an MDMF cap), we can simplify the current 1.9 alpha code for easier
> review.
>
> I'd like to hear your opinion about this! (Especially if you are Kevan,
> David-Sarah, or Brian.)

New description:

 In #393 we've added "extension fields" to the MDMF caps. If I recall
 correctly, the original motivation was that future writers of MDMFs might
 want to include the {{{K}}} and {{{segsize}}} parameters in the cap itself
 (in addition to inside the share data) and future readers might want to
 take advantage of that information so that they don't need to do an extra
 round trip to learn that information by fetching some of the share data.

 It is important that such future writers of MDMFs don't exclude older
 (v1.9) readers of MDMFs from being able to read those caps or even (gah!)
 cause them to get the wrong data or to incur an error when they try to
 read those caps. Therefore we need a way to include data in the caps which
 older (v1.9) readers will reliably and safely ignore (and still be able to
 read the file data correctly) but future readers can use if they want.

 I thought we had decided to make a generic field for "extensions" in the
 MDMF caps, and not to make the current (1.9) reader or writer actually use
 this extension field yet. But the current code in trunk constrains that
 field instead of allowing it to be generically extensible, and it seems to
 try to use the numbers contained therein for its {{{K}}} and {{{segsize}}}
 values in some (?) cases.

 The MDMF filecap format is currently defined as:

 {{{
   URI:MDMF:$writekey:$fingerprint[(:$extension)*]
 }}}

 and all existing caps are created with two extension fields: the first is
 K (as an integer), the second is segsize (also as an integer). The
 intention is to allow additional extension fields to be added in the
 future, perhaps hints that could speed up a download.

 But the current code first constrains each extension field to
 [source:trunk/src/allmydata/uri.py?annotate=blame&rev=5220#L31 contain
 only the characters 0-9 and :], and then it requires there to be
 [source:trunk/src/allmydata/mutable/filenode.py?annotate=blame&rev=5227#L120
 exactly two such fields].

 Future versions of MDMF writers can't use the extensions then, to
 communicate anything other than the currently defined K and segsize
 fields. They might allow some future use if each field can have a larger
 set of characters, and if there is space for messages that Tahoe-LAFS v1.9
 readers will parse and then ignore.

 I'd like to have this in 1.9 because then 1.9 will be tolerant of caps
 generated by future versions that have a different number of extension
 fields.

 As for the use of that field to initialize the {{{K}}} and {{{segsize}}}
 values, I haven't read through the code carefully enough to see if it does
 that correctly and if it has good tests. This is potentially complicated.

 What, for example, happens if the {{{segsize}}} indicated in the cap and
 the {{{segsize}}} indicated
 [source:trunk/src/allmydata/mutable/filenode.py?annotate=blame&rev=5227#L1107
 in the version info] differ? Can the {{{segsize}}} or the {{{K}}} change
 in different versions of the same MDMF? (I'm pretty sure it can't, but if
 it can't then maybe the value in the cap should be the ''only'' place that
 {{{K}}} or {{{segsize}}} exist.) Does the current trunk MDMF reader
 actually really use this value? Scanning through the code, I don't think
 so but I'm not 100% sure yet.

 There are three possibilities that I think we should consider for v1.9.0:

 Proposal 1 (extension field for future use—currently unused)

 * 1. a. Loosen the constraint-checking on the extension field in MDMF caps
 to allow a larger character class and have almost no constraints except
 those necessary for safe and easy parsing to find where the field begins
 and ends.
 * 1. b. Eliminate all code which uses the contents of the extension field
 when reading.
 * 1. c. Reconsider whether to populate the contents of that extension
 field with {{{K:segsize}}} when generating a URL. (The way {{{K}}} and
 {{{segsize}}} are encoded into the extension field has, of course, to fit
 into the constraints of the extension field. In addition to that, it
 should ''not'' consume the entire extension field, but should allow a safe
 and easy way for other fields to be added into the extension field such
 that they can be unambiguously parsed apart from the {{{K}}} and
 {{{segsize}}} fields.)
 * 1. d. Think about whether this proposal will lead to unsafety/insecurity
 or forward-compatibility problems.

 Proposal 2 ({{{K}}} and {{{segsize}}} in cap):

 * 2. a. Define part of the MDMF cap to hold {{{K}}} and {{{segsize}}}.
 This is in fact exactly the same as the "extension field" in the current
 trunk, but we stop calling it the "extension field" and start calling it
 {{{K}}} and {{{segsize}}}.
 * 2. b. Eliminate all code which uses {{{K}}} and {{{segsize}}} values
 from anywhere ''other'' than the cap when reading.
 * 2. c. Think about whether this proposal will lead to unsafety/insecurity
 or forward-compatibility problems.

 Proposal 3 ({{{K}}} and {{{segsize}}} in cap, plus an extension field):

 * Do all of Proposal 2 to encode {{{K}}} and {{{segsize}}} into the cap,
 and then also do proposal 1.a. to provide an extension field for future
 use.

 [Note: Zooko was talking below about slightly different versions of the
 proposals -- he wasn't sure whether the existing code included K and
 segsize, which in fact it does.]

 I think that Proposal 1 is the least likely to delay or destabilize Tahoe-
 LAFS v1.9 ~~especially if we leave out the optional 1.c. step~~. If v1.9
 does not attempt to use the extension field in any way other than telling
 where it begins and ends, then future MDMF users will not have to worry
 that what they put in there will cause problems for old v1.9 users. By
 removing all the code that does anything with the extension field (aside
 from the regex which allows the extension field to be present in an MDMF
 cap), we can simplify the current 1.9 alpha code for easier review.

 I'd like to hear your opinion about this! (Especially if you are Kevan,
 David-Sarah, or Brian.)

--

Comment (by davidsarah):

 [Description edited to merge in comments from the duplicate #1509, and
 take into account comment:1.]

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1526#comment:5>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list