#719 new defect

Making requests too soon after startup can fail

Reported by: bewst Owned by:
Priority: major Milestone: soon
Component: code-frontend Version: 1.4.1
Keywords: download upload check repair usability error wui availability reliability Cc:
Launchpad Bug:

Description (last modified by daira)

$ tahoe start
STARTING /export/home/dave/.tahoe
client node probably started
$ tahoe ls
Error during GET: 410 Gone UnrecoverableFileError: the directory (or mutable file) could not be retrieved, because there were insufficient good shares. This might indicate that no servers were connected, insufficient servers were connected, the URI was corrupt, or that shares have been lost due to server departure, hard drive failure, or disk corruption. You should perform a filecheck on this object to learn more.
$ tahoe ls
Welcome_to_Allmydata.pdf
_My Shared Files_
_Recycle bin_
bak
c++std2003.pdf
$

Change History (10)

comment:1 Changed at 2009-05-31T21:11:52Z by warner

  • Component changed from unknown to code-network
  • Owner nobody deleted

This is an issue with hidden depths.. how should the client node know that it has connected to every server that it's ever going to need?

But it should be easy to improve the situation somewhat. To start with, there should be some internal function that keeps track of "progress towards full connection":

  • have we connected to the introducer? how long ago did we connect? do we even have an introducer.furl?
  • how many storage servers have we been told about? how many are connected? how many are left? how long have we been trying to connect to them?

Then, when a directory retrieve or a file download fails due to insufficient shares, this function could provide additional human-useful data, like saying "we couldn't retrieve that directory right now, but since it looks like we've only been connected to the introducer for two seconds, maybe we just don't know about enough servers yet. You should try again in ten seconds.".

I'm not sure how to deliver that extra information. Specifically, the tahoe node should not try to guess whether this is a transient failure or a permanent one: we don't want to resort to heuristics or fixed timeouts. So this extra data is advisory and should be interpreted by a human rather than a piece of code.

So from the webapi point of view, 410 still seems like the right response code, but maybe we can add the text to the response body, and make sure that the CLI tools will deliver this body to stderr.

We have similar issues in a browser. I don't know when browsers will show the response body for things like 410 GONE, but maybe we can use the same technique.

comment:2 Changed at 2010-04-04T17:10:48Z by davidsarah

  • Keywords usability error wui added
  • Milestone changed from undecided to 1.7.0

This issue also affects the WUI. Some browsers (in particular IE) will hide response bodies for HTTP errors by default, but that doesn't mean that isn't the right place to put human-readable info about the error; the HTTP spec specifically says that browsers SHOULD display the entity body for errors (see the end of RFC 2616 section 6.1.1).

Last edited at 2012-04-23T18:42:12Z by davidsarah (previous) (diff)

comment:3 Changed at 2010-06-18T23:28:17Z by zooko

  • Milestone changed from 1.7.0 to eventually

comment:4 Changed at 2010-07-24T00:43:37Z by davidsarah

  • Component changed from code-network to code-frontend-web
  • Milestone changed from eventually to soon

comment:5 Changed at 2010-08-16T20:53:33Z by davidsarah

  • Component changed from code-frontend-web to code-frontend
  • Keywords download upload check repair added

This issue affects all operations including check and repair, and all frontends.

comment:6 Changed at 2012-04-23T18:40:35Z by davidsarah

See also #1596 for the error-reporting aspect (not just on start-up).

comment:7 Changed at 2013-05-30T19:22:03Z by daira

  • Description modified (diff)
  • Keywords availability reliability added

comment:8 Changed at 2013-08-02T04:27:41Z by daira

#2043 was a duplicate.

comment:9 Changed at 2015-04-07T00:02:44Z by dawuud

Daira and I are working on the related ticket #1449. Can we also satisfy this ticket?

Last edited at 2015-04-13T23:00:34Z by daira (previous) (diff)

comment:10 Changed at 2017-09-19T17:20:49Z by Brian Warner <warner@…>

In 04b34b6/trunk:

Merge PR417: rewrite tahoe start/stop/daemonize

Note: See TracTickets for help on using tickets.