[tahoe-dev] [tahoe-lafs] #698: corrupted file displayed to user after failure to download followed by retry

Thu May 7 12:47:53 PDT 2009

#698: corrupted file displayed to user after failure to download followed by
retry
--------------------------+-------------------------------------------------
 Reporter:  zooko         |           Owner:       
     Type:  defect        |          Status:  new  
 Priority:  critical      |       Milestone:  1.5.0
Component:  code-network  |         Version:  1.4.1
 Keywords:  integrity     |   Launchpad_bug:       
--------------------------+-------------------------------------------------

Comment(by warner):

 That's pretty weird. The first thing that comes to mind is that a server
 connection could have been lost in the middle of the download (in this
 case,
 after we've retrieved the UEB and some of the hashes, but before we've
 retrieved the first data block). The web server has to commit to success
 (200) or failure (404 or 500 or something) before it starts sending any of
 the plaintext, but it doesn't want to store the entire file either. So it
 bases the HTTP response code upon the initial availability of k servers,
 and
 hopes they'll stick around for the whole download.

 When we get a "late failure" (i.e. one of the servers disconnects in the
 middle), the webapi doesn't have a lot of choices. At the moment, it emits
 a
 brief error message (attached to whatever partial content has already been
 written out), then drops the HTTP connection, and hopes that the client is
 observant enough to notice that the number of received bytes does not
 match
 the previously-sent Content-Length header, and then announce an error on
 the
 client side.

 If the application doing the fetch (perhaps the browser, perhaps tiddywiki
 itself?) doesn't strictly check the Content-Length header, then it could
 get
 partial content without an error message.

 There are two directions to fix this:

  * change the webapi to use "Chunked Encoding", basically delivering data
 one
    segment at a time, possibly giving the server a chance to emit an error
    header in between segments: this would let us respond better to these
    errors
  * fix the other download-should-be-better tickets (#193, #287) to
 tolerate
    lost servers better, which might reduce the rate at which these errors
    occur

 I'm not sure what's up with the happens-again-after-retry part of this.
 For
 the benefit of partial-range fetches, we sometimes cache the file's
 contents
 locally, and I don't know how that would interact with lost-server errors.
 It's at least conceivable that the caching mechanism doesn't realize that
 an
 error occurred, and tries to pass partial data to the second download
 attempt. But most browsers don't send a Range: header at all (it's mostly
 streaming media players which do that), and I believe that the webapi will
 skip this whole caching thing unless it sees a Range: header.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/698#comment:1>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid