#1928 assigned defect

web redirects should use relative URLs

Reported by: leif Owned by: davidsarah
Priority: normal Milestone: soon
Component: code-frontend-web Version: 1.9.2
Keywords: http redirect webapi Cc: tahoe-lafs.org@…
Launchpad Bug:

Description (last modified by daira)

Certain uses of the web interface result in unfollowable redirects.

This request to the web interface returns a redirect to a newly created directory, as a relative URL:

curl -v -F t=mkdir -F redirect_to_result=true http://localhost:3456/uri

< Location: uri/URI%3ADIR2%3Ajhjqp....

I think this is good. Unfortunately, this relative URL does not include a trailing slash, so when it is followed the response is another redirect, to append the slash. This second redirect is not relative. It begins with http://hostname:port/uri/URI.... where "hostname" is the host part of the value in the request's Host header and "port" is the configured web.port. Even if the request's Host header includes a port number, the web.port is used in the absolute URL constructed.

I think all redirects should use relative URLs, because according to https://en.wikipedia.org/wiki/HTTP_location "most popular web browsers tolerate the passing of a relative URL as the value for a Location header.[citation needed]".

If absolute URLs must be constructed for some reason, the port from the Host header should be used.

The motivation for this request is to make the web interface more usable on ports that are not the configured web.port, for example via SSH port forwarding or a Tor hidden service.

This also makes it easier to run tahoe as an unprivileged user while proxying port 80 to it.

If I'm not mistaken, currently, such a proxying configuration would require rewriting the absolute redirects in Tahoe's responses (perhaps with Apache's ProxyPassReverse? directive) to avoid having certain functions (like the 2nd redirect after creating a directory) fail.

By always using relative redirects, simple TCP proxies (like SSH port forwarding) can be accommodated and the web ui shouldn't need to think about port numbers in URLs at all.

Change History (14)

comment:1 Changed at 2013-03-14T20:14:39Z by davidsarah

  • Component changed from unknown to code-frontend-web
  • Keywords http redirect webapi added
  • Milestone changed from undecided to 1.11.0
  • Status changed from new to assigned

+1.

comment:2 follow-up: Changed at 2013-03-15T23:48:42Z by bsd

I'm trying to reverse proxy the tahoe web frontend through nginx in order to use nginx's built-in auth_basic functionality, since there currently isn't a way to authenticate access to the web interface. I can successfully look at the welcome page, but if I want to access a tahoe URI, it'll try to use the port configured in tahoe.cfg, instead of passing through nginx. Since tahoe is running on localhost, my connection fails.

[node]
web.port = tcp:4567:interface=127.0.0.1
web.static = public_html
                proxy_pass http://127.0.0.1:4567/;
                autoindex       off;
                proxy_set_header Accept-Encoding '';
                proxy_ignore_headers Cache-Control Expires;
                proxy_set_header Referer $http_referer;
                proxy_set_header Host $host;
                proxy_set_header Cookie $http_cookie;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-Host $host;
                proxy_set_header X-Forwarded-Server $host;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                auth_basic "Restricted";
                auth_basic_user_file /path/to/htpasswd;

These are the relevant configuration bits for tahoe and nginx, respectively.

comment:3 Changed at 2013-09-14T22:39:08Z by daira

  • Description modified (diff)

#1861 seems to be in conflict with this ticket.

comment:4 Changed at 2014-09-26T23:17:13Z by daira

#2299 was a duplicate:

I'm forwarding :3456 to my local machine as :34561 via SSH, but whenever I click a link/button, like "View File or Directory" or "Recent and Active Operations", I get redirected to a page at :3456 and hit a 404. In the case of the "Recent and Active Operations" link, the anchor-tag just specifies "status" for the *href* and I don't think it's being preempted in JS (the next JS that runs seems to be something like "unloadEvent"). Therefore, it might be getting redirected at the web-server or the backend.

warner wrote on that ticket:

Hrm, I thought we'd fixed all of the href targets and form/button targets to use relative URLs. Originally there were lots of absolute URLs (which caused exactly this problem: we had some AllMyData servers that basically reverse-proxied requests into a localhost:3456 URL, and every once in a while the internal host+port would leak). I remember that some of the absolute URLs were not easy to fix (but I don't remember the reasons right now).

Nothing should be getting updated with JS.. it should all be the responsiblity of the HTML-generating code in src/allmydata/web/directory.py.

Last edited at 2014-09-26T23:17:25Z by daira (previous) (diff)

comment:5 Changed at 2014-09-26T23:49:50Z by daira

Lcstyle wrote on #461:

I just looked at:

the Welcome Page Directory WUI page more info page <several other pages>

None of them have any references to wrong hostname and are all relative.

I think this is true for the pages that are easily reachable via obvious links, but not for the case in the Description of this bug. I'm not sure what exactly is happening in #2299 / comment:4.

Last edited at 2014-09-26T23:51:31Z by daira (previous) (diff)

comment:6 in reply to: ↑ description Changed at 2014-09-27T12:40:16Z by daira

Replying to leif:

This request to the web interface returns a redirect to a newly created directory, as a relative URL:

curl -v -F t=mkdir -F redirect_to_result=true http://localhost:3456/uri

< Location: uri/URI%3ADIR2%3Ajhjqp....

I think this is good. Unfortunately, this relative URL does not include a trailing slash, so when it is followed the response is another redirect, to append the slash. This second redirect is not relative.

As well as fixing the redirect to be local, for efficiency we should change the mkdir to redirect to the URL ending with a slash.

comment:7 in reply to: ↑ 2 Changed at 2015-01-22T11:36:00Z by lpirl

Replying to bsd:

if I want to access a tahoe URI, it'll try to use the port configured in tahoe.cfg, instead of passing through nginx. Since tahoe is running on localhost, my connection fails.

+1 - this also happens when creating directories.

IMHO, this is quite a show stopper for Tahoe in production environments. (e.g. in a virtual machine, behind the reverse proxy of the host)

comment:8 Changed at 2015-10-29T01:34:24Z by lpirl

  • Cc tahoe-lafs.org@… added

comment:9 follow-up: Changed at 2016-01-31T12:09:07Z by lpirl

As a workaround, you can make nginx modify responses accordingly, for example:

proxy_redirect http://example.com:8080/ http://example.com/;

See also nginx docs

comment:10 in reply to: ↑ 9 Changed at 2016-01-31T14:24:57Z by leif

Replying to lpirl:

As a workaround, you can make nginx modify responses accordingly, for example:

proxy_redirect http://example.com:8080/ http://example.com/;

See also nginx docs

I was going to suggest this should be added to lafs-rpg until this bug is fixed... but then I decided to do some digging and see if the bug would actually be difficult to fix in Tahoe.

What I found is that, at least with Twisted 13.0.0 and Nevow 0.11.1, the correct port number from the request's Host header is now included in the reponse's Location header! So, the workarounds shouldn't be necessary anymore and we can put TCP proxies in front of our web gateways and not end up with broken redirects.

Here is what I found while trying to determine how these redirects are made:

  • Various objects in Tahoe (in places like directory.py) subclass Nevow's rend.Page and use its addSlash feature.
  • Nevow's rend.Page then calls request.URLPath() which I believe is from twisted.web.server.Request.URLPath, which calls twisted.python.urlpath.URLPath.fromRequest which calls back to twisted.web.server.Request.prePathURL which calls _prePathURL which finally calls twisted.web.http.Request.getHost which returns a twisted.internet.tcp.Port from which it (Request.getHost) brazenly accesses the apparently-undocumented instance attribute port and stuffs it in a URL. (Or so it seems.)
    • I was going to say this seems problematic as it would prevent Twisted's webserver from being run on other transports, like a UNIX Socket. So I created an example of that with mkdir -p foo/bar; twistd -n web -p unix://tmp/unixweb --path foo and a made a TCP-to-unix proxy with socat TCP4-LISTEN:8080,fork,reuseaddr unix://tmp/unixweb and then sent a request with curl -v http://127.0.0.1:8080/bar ... but much to my surprise I got a response with Location: http://127.0.0.1:8080/bar/! So then I tested with Tahoe using my original instructions in this ticket description and found that the correct port number is now there as well. From reading the code I linked to above I'm not actually sure how this is happening, but it is.
      • I did find a case where a Twisted webserver listening on a UNIX socket produces bad redirects, though: When there are HTTP/1.0 requests (meaning, without a Host header), the Location header in the response begins with http://None/).
    • I wonder if Twisted actually needs to make absolute redirects for some reason, or if it would be OK for it to start making relative redirects all the time?
  • For more gory details of addSlash, check out warner's Nevow issue #52: deprecation warning in addSlash redirects on py2.6, regarding Tahoe issue #2312.

Anyway, although this problem is now not so bad anymore, I'm not going to close this ticket because I still think we should have relative redirects everywhere. Hopefully the links above will help me or someone else figure out how to make that happen in the future.

Last edited at 2016-01-31T14:38:49Z by leif (previous) (diff)

comment:11 follow-up: Changed at 2016-01-31T14:35:43Z by leif

Actually, as long as we have any absolute redirects, a rewrite workaround will still be necessary in the case that someone wants to put SSL in front of their web gateway because Twisted has no way of knowing that its absolute redirects should be https://.

comment:12 in reply to: ↑ 11 Changed at 2016-01-31T19:49:36Z by lpirl

Replying to leif:

Actually, as long as we have any absolute redirects, a rewrite workaround will still be necessary in the case that someone wants to put SSL in front of their web gateway because Twisted has no way of knowing that its absolute redirects should be https://.

Exactly, and using SSL is essential when accessing the WUI over the Internet (regarding the confidentiality of the URLs).

Couldn't Twisted look for X-Forwarded-* headers? Esp. X-Forwarded-Proto (or the more recent Forwarded header)?

comment:13 Changed at 2016-01-31T20:01:09Z by leif

You could also use i2p or tor hidden services or SSH tunnels or various other things instead of HTTPS :)

Yes, I suppose if your TLS frontend is aware of HTTP and can add a header, Twisted could be made to look at that (maybe it started to at some point? as I said above I don't actually understand how it is getting the port number correct now).

It would be simpler if it could just make relative redirects, though.

comment:14 Changed at 2016-01-31T23:54:57Z by lpirl

True, but sometimes HTTP and ordinary Web browser access is desired. :) And yes, this ticket is still valid since making the proxy fixing the redirects is really nothing more than a workaround.

Note: See TracTickets for help on using tickets.