[tahoe-lafs-trac-stream] [tahoe-lafs] #753: use longer storage index / cap for collision resistance

tahoe-lafs trac at tahoe-lafs.org
Thu Jul 28 14:51:53 PDT 2011


#753: use longer storage index / cap for collision resistance
-------------------------------+------------------------------
     Reporter:  warner         |      Owner:
         Type:  defect         |     Status:  new
     Priority:  major          |  Milestone:  undecided
    Component:  code-encoding  |    Version:  1.4.1
   Resolution:                 |   Keywords:  newcaps security
Launchpad Bug:                 |
-------------------------------+------------------------------

Old description:

> As we work on new encoding schemes (like DSA-based mutable files), I'm
> thinking that we want to put a lower bound on the cap/SI length to
> maintain
> reasonable margin against collisions. 256 bits would be more than enough.
> 128
> is ok, but a bit tight. 92 bits would make me nervous.
>
> robk's friend Sylvan expressed concerns about Tahoe (and mnet/mojonation
> before it) because, for something that is meant as a backup system, even
> the
> slightest possibility of the CHK-based indexing scheme mapping two
> documents
> to the same storage index was too high for him. (I believe he would be
> more satisfied with a scheme that used centrally-allocated guaranteed-
> unique
> storage indices, which we could do but would require more coordination
> and
> longer caps, since we could no longer use a randomly-generated readkey to
> derive the storage index. In exchange for a controllable but non-zero
> probability of collision, we get to avoid central coordination and use
> smaller caps).
>
> The specific places where collisions could occur are:
>
>  * mapping from file contents to CHK-derived readkey
>  * mapping from readkey (CHK-derived or randomly generated) to storage
> index
>  * mapping from randomly-generated mutable writekey to storage index
>
> The "birthday paradox" determines the chance of collision. If I'm doing
> my
> math right, if you want less than '{{{p}}}' chance of getting any
> collisions
> when selecting items out of a namespace of size '{{{N}}}', then you can't
> select more than {{{C = sqrt(2*N*p)}}} items. This is called a "paradox"
> (where "surprise" would be a better term) because that square root causes
> C
> to be surprisingly low: for birthdays (in which N=365), p=0.5 leads to
> C=19.
> In the Tahoe context, {{{C}}} is the number of files you can add to the
> grid.
>
> In the current case, our 128-bit storage index (N=2¹²⁸) means that p=0.5
> gets us a nice large 2⁶⁴ number of files, except that p=0.5 is
> insufficient
> margin: we'd much prefer a vanishingly small chance of collision, like
> p=2⁻⁶⁴. Fortunately we get two bits of margin for every one bit we reduce
> from C. The table looks like:
>
> ||N||numfiles|| ||prob(collision)||
> ||96||2⁴⁸||->||2⁻¹ (0.5)||
> ||96||2⁴⁰||->||2⁻¹⁷||
> ||96||2³²||->||2⁻³³||
> ||96||2²⁴||->||2⁻⁴⁹||
> ||128||2⁶⁴||->||2⁻¹ (0.5)||
> ||128||2⁵⁶||->||2⁻¹⁷||
> ||128||2⁴⁸||->||2⁻³³||
> ||128||2³²||->||2⁻⁶⁵||
> ||192||2⁹⁶||->||2⁻¹||
> ||192||2⁸⁰||->||2⁻³³||
> ||192||2⁶⁴||->||2⁻⁶⁵||
> ||256||2¹²⁸||->||2⁻¹ (0.5)||
> ||256||2⁹⁶||->||2⁻⁶⁵||
>

> Note that our {{{N}}} is the minimum of the storage-index size and the
> top-most cap value (i.e. the readkey for immutable files, or the writekey
> for
> mutable files). So a DSA-based mutable file with a 92-bit writecap gives
> us
> an {{{N}}} of 2⁹², even if it is expanded into a storage-index of 128 or
> 256 bits.
>
> Also note that the allmydata.com grid currently has something like 10M
> objects in it, about C=2²³.
>
> So, I'm thinking that as much as a nice short 96-bit DSA mutable writecap
> would be nice, it's too short to provide enough collision margin. I want
> to
> be able to put trillions of files into a grid, and I want a the chance of
> collision to be so small that I don't ever need to worry about it, and 96
> bits isn't really there. 128 bits is probably good enough, but doesn't
> have
> enough margin to be obviously and unquestionably safe (C=2³² is a lot of
> files but you can imagine people wanting more, p=2⁻⁶⁴ is a tiny
> probability
> but you can imagine people wanting a bit better). 256 would be plenty
> (but of
> course I want my filecaps to be shorter than that).

New description:

 As we work on new encoding schemes (like DSA-based mutable files), I'm
 thinking that we want to put a lower bound on the cap/SI length to
 maintain
 reasonable margin against collisions. 256 bits would be more than enough.
 128
 is ok, but a bit tight. 92 bits would make me nervous.

 robk's friend Sylvan expressed concerns about Tahoe (and mnet/mojonation
 before it) because, for something that is meant as a backup system, even
 the
 slightest possibility of the CHK-based indexing scheme mapping two
 documents
 to the same storage index was too high for him. (I believe he would be
 more satisfied with a scheme that used centrally-allocated guaranteed-
 unique
 storage indices, which we could do but would require more coordination and
 longer caps, since we could no longer use a randomly-generated readkey to
 derive the storage index. In exchange for a controllable but non-zero
 probability of collision, we get to avoid central coordination and use
 smaller caps).

 The specific places where collisions could occur are:

  * mapping from file contents to CHK-derived readkey
  * mapping from readkey (CHK-derived or randomly generated) to storage
 index
  * mapping from randomly-generated mutable writekey to storage index

 The "birthday paradox" determines the chance of collision. If I'm doing my
 math right, if you want less than '{{{p}}}' chance of getting any
 collisions
 when selecting items out of a namespace of size '{{{N}}}', then you can't
 select more than {{{C = sqrt(2*N*p)}}} items. This is called a "paradox"
 (where "surprise" would be a better term) because that square root causes
 C
 to be surprisingly low: for birthdays (in which N=365), p=0.5 leads to
 C=19.
 In the Tahoe context, {{{C}}} is the number of files you can add to the
 grid.

 In the current case, our 128-bit storage index (N=2^128^) means that p=0.5
 gets us a nice large 2^64^ number of files, except that p=0.5 is
 insufficient
 margin: we'd much prefer a vanishingly small chance of collision, like
 p=2^-64^. Fortunately we get two bits of margin for every one bit we
 reduce
 from C. The table looks like:

 ||N||numfiles|| ||prob(collision)||
 ||96||2^48^||->||2^-1^ (0.5)||
 ||96||2^48^||->||2^-17^||
 ||96||2^32^||->||2^-33^||
 ||96||2^24^||->||2^-49^||
 ||128||2^64^||->||2^-1^ (0.5)||
 ||128||2^56^||->||2^-17^||
 ||128||2^48^||->||2^-33^||
 ||128||2^32^||->||2^-65^||
 ||192||2^96^||->||2^-1^||
 ||192||2^80^||->||2^-33^||
 ||192||2^64^||->||2^-65^||
 ||256||2^128^||->||2^-1^ (0.5)||
 ||256||2^96^||->||2^-65^||


 Note that our {{{N}}} is the minimum of the storage-index size and the
 top-most cap value (i.e. the readkey for immutable files, or the writekey
 for
 mutable files). So a DSA-based mutable file with a 92-bit writecap gives
 us
 an {{{N}}} of 2^92^, even if it is expanded into a storage-index of 128 or
 256 bits.

 Also note that the allmydata.com grid currently has something like 10M
 objects in it, about C=2^23^.

 So, I'm thinking that as much as a nice short 96-bit DSA mutable writecap
 would be nice, it's too short to provide enough collision margin. I want
 to
 be able to put trillions of files into a grid, and I want a the chance of
 collision to be so small that I don't ever need to worry about it, and 96
 bits isn't really there. 128 bits is probably good enough, but doesn't
 have
 enough margin to be obviously and unquestionably safe (C=2^32^ is a lot of
 files but you can imagine people wanting more, p=2^-64^ is a tiny
 probability
 but you can imagine people wanting a bit better). 256 would be plenty (but
 of
 course I want my filecaps to be shorter than that).

--

Comment (by davidsarah):

 Use trac markup for superscripts instead of Unicode; it's usually rendered
 much more clearly, and doesn't depend on font support.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/753#comment:8>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list