[tahoe-lafs-trac-stream] [tahoe-lafs] #1824: Tahoe process gone wild

tahoe-lafs trac at tahoe-lafs.org
Thu Apr 25 19:53:13 UTC 2013


#1824: Tahoe process gone wild
--------------------------+--------------------------------
     Reporter:  kpreid    |      Owner:  daira
         Type:  defect    |     Status:  new
     Priority:  critical  |  Milestone:  1.11.0
    Component:  code      |    Version:  1.9.2
   Resolution:            |   Keywords:  hang repair memory
Launchpad Bug:            |
--------------------------+--------------------------------

Comment (by zooko):

 I looked at the code a bit and I didn't notice a way for a loop to happen.
 The stack trace from the original report features this line:

 {{{
           File
 "/External/Projects/tahoe/src/allmydata/mutable/retrieve.py", line 736, in
 _validation_or_decoding_failed
 }}}

 But in [source:1.9.2/src/allmydata/mutable/retrieve.py?rev=5480#L725 the
 1.9.2 version of retrieve.py] and in the
 [source:git/src/allmydata/mutable/retrieve.py?annotate=blame&rev=d8c536847b1ea577f9ac5b6aa98c7ce5d1961c8c
 current version of retrieve.py] there isn't the same code.

 Hm... let's see. There is no way that I can see that
 {{{_validation_or_decoding_failed}}} could lead to a loop,
 [source:git/src/allmydata/mutable/retrieve.py?annotate=blame&rev=d9c1064d42a322b58d3243923d95fd56235d7c89#L720
 in 1.9.0]. Is there any way that there could be a loop that calls out to
 {{{_validation_or_decoding_failed}}} over and over? Well,
 [source:git/src/allmydata/mutable/retrieve.py?annotate=blame&rev=d9c1064d42a322b58d3243923d95fd56235d7c89#L612
 this stuff here] is complicated. Could it have a loop? No, I don't see it.
 Any errback from the call to {{{_validation_or_decoding_failed}}}
 should... Wait, what happens when you get an exception from your errback?
 I assume it just errbacks again? Hm, this code is very confusing because
 there are two variables in different scopes named {{{dl}}}. Anyway, I
 think the //intent// is that the errback goes here, where it terminates
 and doesn't loop:
 [source:git/src/allmydata/mutable/retrieve.py?annotate=blame&rev=d9c1064d42a322b58d3243923d95fd56235d7c89#L252
 loop()].

 The
 [source:git/src/allmydata/mutable/retrieve.py?annotate=blame&rev=d8c536847b1ea577f9ac5b6aa98c7ce5d1961c8c#L614
 modern code] that replaced this doesn't have the two variables named
 {{{dl}}}. It still looks like you'll get an exception from the errback
 (named {{{_handle_bad_shares}}} in this version). Hm, in the current
 version that exception will then get ignored by
 [source:git/src/allmydata/mutable/retrieve.py?annotate=blame&rev=d8c536847b1ea577f9ac5b6aa98c7ce5d1961c8c#L642
 line 642] {{{dl = deferredutil.gatherResults(ds)}}} since
 {{{deferredutil.gatherResults()}}} always sets {{{consumeErrors=True}}}.
 Could that lead to a loop?

 Daira: help!

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1824#comment:24>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list