#974 closed defect (fixed)

provide user-friendly error message if the CLI can't connect to the gateway

Reported by: zooko Owned by: warner
Priority: major Milestone: 1.10.0
Component: code-frontend-cli Version: 1.6.0
Keywords: usability easy error Cc:
Launchpad Bug:

Description

As reported in: http://allmydata.org/pipermail/tahoe-dev/2010-February/004013.html , if the CLI can't connect to the gateway it emits a Python stack trace.

To fix this ticket, make it so this emits a nice user-friendly error message saying that it couldn't connect to the Tahoe-LAFS gateway, which host and port it tried to connect to, and from which configuration file it got the idea that it should use that host+port.

Attachments (1)

ticket-974.patch (354 bytes) - added by guyzmo at 2010-08-03T18:53:43Z.

Download all attachments as: .zip

Change History (25)

comment:1 Changed at 2010-03-02T01:19:32Z by davidsarah

  • Milestone changed from undecided to 1.7.0
  • Owner set to davidsarah
  • Status changed from new to assigned

comment:2 Changed at 2010-06-12T21:10:13Z by davidsarah

  • Milestone changed from 1.7.0 to 1.7.1

comment:3 Changed at 2010-07-18T00:40:16Z by davidsarah

  • Keywords easy error added
  • Milestone changed from 1.7.1 to 1.8.0

comment:4 Changed at 2010-07-23T08:35:17Z by davidsarah

<p-static> don't suppose you know why I'd get connection failures when I try to use the tahoe CLI? :(

<p-static> I could swear this worked in 1.6 but downgrading doesn't seem to help XD

<zooko> The cli connects to a tahoe-lafs gateway node.

<p-static> the node is running fine, the WUI is accessible

<zooko> I guess then that the cli must be using a different host+port number than your web browser is! :-)

<zooko> Or coming from a different IP addr.

<p-static> it's on the same box so I'm not sure how that would happen XD

<p-static> also, looking at twistd.log, it looks like it opens and then immediately closes a UDP socket when it starts up

<p-static> any way to get more verbose logging?

<zooko> docs/logging.txt

<davidsarah> there's a ticket about printing out which port and address the CLI tried to connect to when it fails to connect

<davidsarah> we should increase the priority of that

<zooko> but, I'm pretty sure that cli, i.e. "tahoe ls" is doing an HTTP request to the web gateway.

<zooko> So if the cli can't connect and your web browser can then the first thing...

<zooko> What davidsarah said, actually. :-)

<zooko> p-static: how about you edit your tahoe code to print out the port and address that it was using when it fails to connect. :-)

[...]

<p-static> any pointers on where I could add a print? I could probably figure it out from there XD

* zooko looks in source code

<zooko> Well, what's the error message about "can't connect"

<zooko> does it include a stack trace?

<p-static> http://pastebin.com/t2KjAsHm

[...]

<zooko> http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/scripts/common_http.py?rev=11077ea74de4d59a

<zooko> Maybe put a try: except socket.error:

<zooko> around the line from the stack trace, which is (somewhat surprisingly) line 60

<zooko> c.endheaders()

<p-static> oh, hmm, think I might have it

<zooko> nooooo... now who will fix this issue for us?

<p-static> it's trying to connect to 127.0.0.1

<p-static> but the node is listening on a public port

<davidsarah> that ticket is #974

<davidsarah> oh, I'm the owner :-) so perhaps I should fix it

<davidsarah> it has milestone 1.8 already, anyway

<p-static> heh. so if I have the node listening on a public interface, do I need to set the CLI address somewhere else?

<zooko> You can have the node listen on multiple interfaces.

<zooko> And you can set the cli host+port with a cmdline option or with a config file option.

<p-static> oh that'd be useful. is it comma-separated or something for multiple interfaces?

<zooko> http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/configuration.txt?rev=1c7e71ee52caaf00#L26

<zooko> If you leave off interface= entirely then it will listen on all interfaces.

<p-static> saw that, but I only want to have it listen on one of the public interfaces

<p-static> was looking at the twisted strports documentation but none of the examples specify multiple interfaces

<zooko> hm. maybe i was wrong about that. i know i spec multiple interfaces for foolscap all the time, though.

<zooko> comma-separated

<p-static> aha!

<p-static> ~/.tahoe/node.url had 127.0.0.1 in it for some reason

<p-static> dunno if that's a default or just stale config

<zooko> I'm not sure either.

<davidsarah> ah, a fix for #974 would immediately have told p-static that ~/.tahoe/node.url had 127.0.0.1 in it

Changed at 2010-08-03T18:53:43Z by guyzmo

comment:5 Changed at 2010-08-03T18:54:19Z by guyzmo

I suggest a first patch (that does not tell exactly what configuration is wrong, but at least what is the URL tahoe tries to connect to). You'll find it posted here in attachment.

comment:6 Changed at 2010-08-03T19:16:24Z by guyzmo

< warner> hm. CLI tools pay attention to node.url (and only node.url: they do not even read tahoe.cfg)
< warner> but yeah, when we generate node.url for the first time (i.e. in 'tahoe start'), we could maybe use the data in tahoe.cfg to build the URL
< warner> it's not trivial, but probably not hard

comment:7 Changed at 2010-08-09T22:17:55Z by zooko

  • Milestone changed from 1.8.0 to soon

comment:8 Changed at 2010-08-09T23:45:53Z by davidsarah

  • Milestone changed from soon to 1.9.0

comment:9 Changed at 2011-05-05T05:27:07Z by zooko

  • Keywords test-needed added
  • Owner changed from davidsarah to guyzmo
  • Status changed from assigned to new

Guyzmo: thank you for the patch!

The patch in attachment:ticket-974.patch needs a unit test to show that the current code does the wrong thing in this case and the code after the patch does the right thing.

See test_cli.CreateAlias.test_create for an example of how to assert that the code under test printed out an acceptable string and exited with the right exit code.

See test_mv_error_if_DELETE_fails for an example of how to make the call to do_http() (that the code under test makes) raise socket.error.

Hm, also socket.error may be too specific. Probably if the call to do_http() raises any exception then tahoe add-alias ought to report that exception in a nice way instead of a Python traceback? Or at least any EnvironmentError, which indicates that there is a problem from the operating system, such as a socket error.

Adding the tag test-needed to this ticket.

comment:10 Changed at 2011-05-14T22:41:16Z by davidsarah

More IRC discussion of a problem where this ticket would have helped:

<secorp> so if you do something like "tahoe ls -d directory/ --dir-cap=<dircap>"

<secorp> That gives the socket error?

<bj0> yup

<secorp> What does the "node.url" file look like on the node that doesn't work?

<secorp> Meaning, is it something like:

<secorp> http://127.0.0.1:9001 ?

<bj0> it's throwing it at: resp = do_http("GET", url)

...

<secorp> Does the exception print out the url it is trying to use in the do_http call?

...

<bj0> no it doesn't

<bj0> would that be in a log somewhere?

<secorp> I'm not sure, it might be in the <directory>/logs/twistd.log file

<bj0> na it just something like 'port opened' 'port closed'

...

<zooko> Did you ever figure out your problem with a port not being open?

<bj0> no, can't figure out whats wrong

<bj0> the weird part is it's only the cli that has the problem

<bj0> i can browse the files with the webui

<zooko> Sounds like it is the connection from the tahoe cli to the wapi that is failing, right?

<bj0> the GET request fails

<zooko> So I guess the HTTP GET request is not aimed at the right host+port pair.

<bj0> secorp was out of ideas and i got nothing :/

<zooko> Should be the same host+port that your web browser is hitting.

<bj0> yea, it doesn't print out what url it's trying to connect to

<bj0> i wonder

<bj0> node.url has 127.0.0.1 as the ip, but in .cfg i changed the ip to eth0's ip, would that break it?

<zooko> #974

<zooko> Yes I think that would prevent connections to the wapi/wui over localhost. IIUC

<bj0> i'll try changing it back

<bj0> if i drop 'interface=' it should bind to 0.0.0.0 right?

<bj0> hah that fixed it

comment:11 Changed at 2011-05-14T22:47:32Z by davidsarah

guyzmo's patch only reports the error for the tahoe add-alias command. To avoid having to change code in all the CLI commands, a better approach would be to change do_http to throw an exception that includes the node URL in its message.

comment:12 Changed at 2011-05-14T22:49:29Z by davidsarah

  • Priority changed from major to critical

Since this would help with one of Tahoe's most commonly encountered usability problems (trying to connect to the wrong node URL), I'm bumping its priority to critical for 1.9.0.

comment:13 Changed at 2011-07-12T19:25:07Z by davidsarah

  • Owner changed from guyzmo to davidsarah
  • Status changed from new to assigned

comment:14 Changed at 2011-08-02T15:56:18Z by davidsarah

  • Milestone changed from 1.9.0 to 1.10.0

comment:15 Changed at 2012-03-29T16:10:01Z by zooko

  • Priority changed from critical to major

comment:16 Changed at 2012-03-31T21:09:43Z by amiller

  • Owner changed from davidsarah to amiller
  • Status changed from assigned to new

comment:17 Changed at 2012-03-31T23:07:41Z by amiller

  • Keywords review-needed added; test-needed removed

I used guyzmo's patch except for catching the exception within do_http, as per comment:13. Added a unit test based on suggestions from comment:9

https://github.com/amiller/tahoe-lafs/pull/3.patch

comment:18 Changed at 2012-04-01T03:40:24Z by davidsarah

  • Milestone changed from 1.11.0 to 1.10.0

Looks good, just a couple of style issues:

  • Write BadResponse as a top-level class like this:
    class BadResponse(object):
        def __init__(self, url, err):
            self.status = -1
            self.reason = "Error trying to connect to %s: %s" % (url, err)
        def read(self, _):
            return ""
    

comment:19 Changed at 2012-04-01T03:42:28Z by davidsarah

  • Keywords review-needed removed

comment:20 Changed at 2012-09-04T16:40:04Z by warner

  • Owner changed from amiller to warner

I'll land this

comment:21 Changed at 2012-09-04T22:49:05Z by Brian Warner <warner@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 15c95c2e1201d398:

Apply David-Sarah's recommended changes. Closes #974

comment:22 Changed at 2012-09-04T22:49:51Z by Brian Warner <warner@…>

In 15c95c2e1201d398:

Apply David-Sarah's recommended changes. Closes #974

comment:23 Changed at 2012-09-05T00:07:25Z by davidsarah

Reviewed 15c95c2e1201d398, +1.

comment:24 Changed at 2012-09-05T00:08:46Z by davidsarah

For reference, the original patch was applied as 4f19f2b4b4c038c7.

Note: See TracTickets for help on using tickets.