[tahoe-dev] [tahoe-lafs] #698: corrupted file displayed to user after failure to download followed by retry
tahoe-lafs
trac at allmydata.org
Thu May 7 12:47:53 PDT 2009
#698: corrupted file displayed to user after failure to download followed by
retry
--------------------------+-------------------------------------------------
Reporter: zooko | Owner:
Type: defect | Status: new
Priority: critical | Milestone: 1.5.0
Component: code-network | Version: 1.4.1
Keywords: integrity | Launchpad_bug:
--------------------------+-------------------------------------------------
Comment(by warner):
That's pretty weird. The first thing that comes to mind is that a server
connection could have been lost in the middle of the download (in this
case,
after we've retrieved the UEB and some of the hashes, but before we've
retrieved the first data block). The web server has to commit to success
(200) or failure (404 or 500 or something) before it starts sending any of
the plaintext, but it doesn't want to store the entire file either. So it
bases the HTTP response code upon the initial availability of k servers,
and
hopes they'll stick around for the whole download.
When we get a "late failure" (i.e. one of the servers disconnects in the
middle), the webapi doesn't have a lot of choices. At the moment, it emits
a
brief error message (attached to whatever partial content has already been
written out), then drops the HTTP connection, and hopes that the client is
observant enough to notice that the number of received bytes does not
match
the previously-sent Content-Length header, and then announce an error on
the
client side.
If the application doing the fetch (perhaps the browser, perhaps tiddywiki
itself?) doesn't strictly check the Content-Length header, then it could
get
partial content without an error message.
There are two directions to fix this:
* change the webapi to use "Chunked Encoding", basically delivering data
one
segment at a time, possibly giving the server a chance to emit an error
header in between segments: this would let us respond better to these
errors
* fix the other download-should-be-better tickets (#193, #287) to
tolerate
lost servers better, which might reduce the rate at which these errors
occur
I'm not sure what's up with the happens-again-after-retry part of this.
For
the benefit of partial-range fetches, we sometimes cache the file's
contents
locally, and I don't know how that would interact with lost-server errors.
It's at least conceivable that the caching mechanism doesn't realize that
an
error occurred, and tries to pass partial data to the second download
attempt. But most browsers don't send a Range: header at all (it's mostly
streaming media players which do that), and I believe that the webapi will
skip this whole caching thing unless it sees a Range: header.
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/698#comment:1>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list