web redirects should use relative URLs #1928

Open
opened 2013-03-11 08:33:11 +00:00 by leif · 13 comments
Owner

Certain uses of the web interface result in unfollowable redirects.

This request to the web interface returns a redirect to a newly created directory, as a relative URL:

curl -v -F t=mkdir -F redirect_to_result=true http://localhost:3456/uri

< Location: uri/URI%3ADIR2%3Ajhjqp....

I think this is good. Unfortunately, this relative URL does not include a trailing slash, so when it is followed the response is another redirect, to append the slash. This second redirect is not relative. It begins with http://hostname:port/uri/URI.... where "hostname" is the host part of the value in the request's Host header and "port" is the configured web.port. Even if the request's Host header includes a port number, the web.port is used in the absolute URL constructed.

I think all redirects should use relative URLs, because according to https://en.wikipedia.org/wiki/HTTP_location "most popular web browsers tolerate the passing of a relative URL as the value for a Location header.[needed]citation".

If absolute URLs must be constructed for some reason, the port from the Host header should be used.

The motivation for this request is to make the web interface more usable on ports that are not the configured web.port, for example via SSH port forwarding or a Tor hidden service.

This also makes it easier to run tahoe as an unprivileged user while proxying port 80 to it.

If I'm not mistaken, currently, such a proxying configuration would require rewriting the absolute redirects in Tahoe's responses (perhaps with Apache's ProxyPassReverse directive) to avoid having certain functions (like the 2nd redirect after creating a directory) fail.

By always using relative redirects, simple TCP proxies (like SSH port forwarding) can be accommodated and the web ui shouldn't need to think about port numbers in URLs at all.

Certain uses of the web interface result in unfollowable redirects. This request to the web interface returns a redirect to a newly created directory, as a relative URL: curl -v -F t=mkdir -F redirect_to_result=true <http://localhost:3456/uri> < Location: uri/URI%3ADIR2%3Ajhjqp.... I think this is good. Unfortunately, this relative URL does not include a trailing slash, so when it is followed the response is another redirect, to append the slash. This second redirect is not relative. It begins with <http://hostname:port/uri/URI>.... where "hostname" is the host part of the value in the request's Host header and "port" is the configured web.port. Even if the request's Host header includes a port number, the web.port is used in the absolute URL constructed. I think all redirects should use relative URLs, because according to <https://en.wikipedia.org/wiki/HTTP_location> "most popular web browsers tolerate the passing of a relative URL as the value for a Location header.[needed]citation". If absolute URLs must be constructed for some reason, the port from the Host header should be used. The motivation for this request is to make the web interface more usable on ports that are not the configured web.port, for example via SSH port forwarding or a Tor hidden service. This also makes it easier to run tahoe as an unprivileged user while proxying port 80 to it. If I'm not mistaken, currently, such a proxying configuration would require rewriting the absolute redirects in Tahoe's responses (perhaps with Apache's [ProxyPassReverse](wiki/ProxyPassReverse) directive) to avoid having certain functions (like the 2nd redirect after creating a directory) fail. By always using relative redirects, simple TCP proxies (like SSH port forwarding) can be accommodated and the web ui shouldn't need to think about port numbers in URLs at all.
tahoe-lafs added the
unknown
normal
defect
1.9.2
labels 2013-03-11 08:33:11 +00:00
tahoe-lafs added this to the undecided milestone 2013-03-11 08:33:11 +00:00
davidsarah commented 2013-03-14 20:14:39 +00:00
Author
Owner

+1.

+1.
tahoe-lafs added
code-frontend-web
and removed
unknown
labels 2013-03-14 20:14:39 +00:00
tahoe-lafs modified the milestone from undecided to 1.11.0 2013-03-14 20:14:39 +00:00
Author
Owner

I'm trying to reverse proxy the tahoe web frontend through nginx in order to use nginx's built-in auth_basic functionality, since there currently isn't a way to authenticate access to the web interface. I can successfully look at the welcome page, but if I want to access a tahoe URI, it'll try to use the port configured in tahoe.cfg, instead of passing through nginx. Since tahoe is running on localhost, my connection fails.

[node]
web.port = tcp:4567:interface=127.0.0.1
web.static = public_html
                proxy_pass http://127.0.0.1:4567/;
                autoindex       off;
                proxy_set_header Accept-Encoding '';
                proxy_ignore_headers Cache-Control Expires;
                proxy_set_header Referer $http_referer;
                proxy_set_header Host $host;
                proxy_set_header Cookie $http_cookie;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-Host $host;
                proxy_set_header X-Forwarded-Server $host;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                auth_basic "Restricted";
                auth_basic_user_file /path/to/htpasswd;

These are the relevant configuration bits for tahoe and nginx, respectively.

I'm trying to reverse proxy the tahoe web frontend through nginx in order to use nginx's built-in auth_basic functionality, since there currently isn't a way to authenticate access to the web interface. I can successfully look at the welcome page, but if I want to access a tahoe URI, it'll try to use the port configured in tahoe.cfg, instead of passing through nginx. Since tahoe is running on localhost, my connection fails. ``` [node] web.port = tcp:4567:interface=127.0.0.1 web.static = public_html ``` ``` proxy_pass http://127.0.0.1:4567/; autoindex off; proxy_set_header Accept-Encoding ''; proxy_ignore_headers Cache-Control Expires; proxy_set_header Referer $http_referer; proxy_set_header Host $host; proxy_set_header Cookie $http_cookie; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-Host $host; proxy_set_header X-Forwarded-Server $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; auth_basic "Restricted"; auth_basic_user_file /path/to/htpasswd; ``` These are the relevant configuration bits for tahoe and nginx, respectively.
daira commented 2013-09-14 22:39:08 +00:00
Author
Owner

#1861 seems to be in conflict with this ticket.

#1861 seems to be in conflict with this ticket.
daira commented 2014-09-26 23:17:13 +00:00
Author
Owner

#2299 was a duplicate:

I'm forwarding :3456 to my local machine as :34561 via SSH, but whenever I click a link/button, like "View File or Directory" or "Recent and Active Operations", I get redirected to a page at :3456 and hit a 404. In the case of the "Recent and Active Operations" link, the anchor-tag just specifies "status" for the href and I don't think it's being preempted in JS (the next JS that runs seems to be something like "unloadEvent"). Therefore, it might be getting redirected at the web-server or the backend.

warner wrote on that ticket:

Hrm, I thought we'd fixed all of the href targets and form/button targets to use relative URLs. Originally there were lots of absolute URLs (which caused exactly this problem: we had some AllMyData servers that basically reverse-proxied requests into a localhost:3456 URL, and every once in a while the internal host+port would leak). I remember that some of the absolute URLs were not easy to fix (but I don't remember the reasons right now).

Nothing should be getting updated with JS.. it should all be the responsiblity of the HTML-generating code in source:src/allmydata/web/directory.py.

#2299 was a duplicate: > I'm forwarding :3456 to my local machine as :34561 via SSH, but whenever I click a link/button, like "View File or Directory" or "Recent and Active Operations", I get redirected to a page at :3456 and hit a 404. In the case of the "Recent and Active Operations" link, the anchor-tag just specifies "status" for the *href* and I don't think it's being preempted in JS (the next JS that runs seems to be something like "unloadEvent"). Therefore, it might be getting redirected at the web-server or the backend. warner wrote on that ticket: > Hrm, I thought we'd fixed all of the href targets and form/button targets to use relative URLs. Originally there were lots of absolute URLs (which caused exactly this problem: we had some AllMyData servers that basically reverse-proxied requests into a localhost:3456 URL, and every once in a while the internal host+port would leak). I remember that some of the absolute URLs were not easy to fix (but I don't remember the reasons right now). > > Nothing should be getting updated with JS.. it should all be the responsiblity of the HTML-generating code in source:src/allmydata/web/directory.py.
daira commented 2014-09-26 23:49:50 +00:00
Author
Owner

Lcstyle wrote on #461:

I just looked at:

the Welcome Page
Directory WUI page
more info page

None of them have any references to wrong hostname and are all relative.

I think this is true for the pages that are easily reachable via obvious links, but not for the case in the Description of this bug. I'm not sure what exactly is happening in #2299 / comment:132619.

Lcstyle wrote on #461: > I just looked at: > > the Welcome Page > Directory WUI page > more info page > <several other pages> > > None of them have any references to wrong hostname and are all relative. I think this is true for the pages that are easily reachable via obvious links, but not for the case in the Description of this bug. I'm not sure what exactly is happening in #2299 / [comment:132619](/tahoe-lafs/trac-2024-07-25/issues/1928#issuecomment-132619).
daira commented 2014-09-27 12:40:16 +00:00
Author
Owner

Replying to leif:

This request to the web interface returns a redirect to a newly created directory, as a relative URL:

curl -v -F t=mkdir -F redirect_to_result=true http://localhost:3456/uri

< Location: uri/URI%3ADIR2%3Ajhjqp....

I think this is good. Unfortunately, this relative URL does not include a trailing slash, so when it is followed the response is another redirect, to append the slash. This second redirect is not relative.

As well as fixing the redirect to be local, for efficiency we should change the mkdir to redirect to the URL ending with a slash.

Replying to [leif](/tahoe-lafs/trac-2024-07-25/issues/9403): > This request to the web interface returns a redirect to a newly created directory, as a relative URL: > > curl -v -F t=mkdir -F redirect_to_result=true <http://localhost:3456/uri> > > < Location: uri/URI%3ADIR2%3Ajhjqp.... > > I think this is good. Unfortunately, this relative URL does not include a trailing slash, so when it is followed the response is another redirect, to append the slash. This second redirect is not relative. As well as fixing the redirect to be local, for efficiency we should change the mkdir to redirect to the URL ending with a slash.
lpirl commented 2015-01-22 11:36:00 +00:00
Author
Owner

Replying to bsd:

if I want to access a tahoe URI, it'll try to use the port configured in tahoe.cfg, instead of passing through nginx. Since tahoe is running on localhost, my connection fails.

+1 - this also happens when creating directories.

IMHO, this is quite a show stopper for Tahoe in production environments.
(e.g. in a virtual machine, behind the reverse proxy of the host)

Replying to [bsd](/tahoe-lafs/trac-2024-07-25/issues/1928#issuecomment-132617): > if I want to access a tahoe URI, it'll try to use the port configured in tahoe.cfg, instead of passing through nginx. Since tahoe is running on localhost, my connection fails. > +1 - this also happens when creating directories. IMHO, this is quite a show stopper for Tahoe in production environments. (e.g. in a virtual machine, behind the reverse proxy of the host)
lpirl commented 2016-01-31 12:09:07 +00:00
Author
Owner

As a workaround, you can make nginx modify responses accordingly, for example:

proxy_redirect http://example.com:8080/ http://example.com/;

See also nginx docs

As a workaround, you can make nginx modify responses accordingly, for example: `proxy_redirect http://example.com:8080/ http://example.com/;` See also [nginx docs](http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_redirect)
Author
Owner

Replying to lpirl:

As a workaround, you can make nginx modify responses accordingly, for example:

proxy_redirect <http://example.com:8080/> <http://example.com/>;

See also nginx docs

I was going to suggest this should be added to lafs-rpg until this bug is fixed... but then I decided to do some digging and see if the bug would actually be difficult to fix in Tahoe.

What I found is that, at least with Twisted 13.0.0 and Nevow 0.11.1, the correct port number from the request's Host header is now included in the reponse's Location header! So, the workarounds shouldn't be necessary anymore and we can put TCP proxies in front of our web gateways and not end up with broken redirects.

Here is what I found while trying to determine how these redirects are made:

  • Various objects in Tahoe (in places like directory.py) subclass Nevow's rend.Page and use its addSlash feature.
  • Nevow's rend.Page then calls request.URLPath() which I believe is from twisted.web.server.Request.URLPath, which calls twisted.python.urlpath.URLPath.fromRequest which calls back to twisted.web.server.Request.prePathURL which calls _prePathURL which finally calls twisted.web.http.Request.getHost which returns a twisted.internet.tcp.Port from which it (Request.getHost) brazenly accesses the apparently-undocumented instance attribute port and stuffs it in a URL. (Or so it seems.)
  • I was going to say this seems problematic as it would prevent Twisted's webserver from being run on other transports, like a UNIX Socket. So I created an example of that with mkdir -p foo/bar; twistd -n web -p unix:*tmp/unixweb --path foo and a made a TCP-to-unix proxy with socat TCP4-LISTEN:8080,fork,reuseaddr unix:*tmp/unixweb and then sent a request with curl -v <http://127.0.0.1:8080/bar> ... but much to my surprise I got a response with Location: <http://127.0.0.1:8080/bar/>! So then I tested with Tahoe using my original instructions in this ticket description and found that the correct port number is now there as well. From reading the code I linked to above I'm not actually sure how this is happening, but it is.
  • I did find a case where a Twisted webserver listening on a UNIX socket produces bad redirects, though: When there are HTTP/1.0 requests (meaning, without a Host header), the Location header in the response begins with <http://None/>).
  • I wonder if Twisted actually needs to make absolute redirects for some reason, or if it would be OK for it to start making relative redirects all the time?
  • For more gory details of addSlash, check out warner's Nevow issue #52: deprecation warning in addSlash redirects on py2.6, regarding Tahoe issue #2312.

Anyway, although this problem is now not so bad anymore, I'm not going to close this ticket because I still think we should have relative redirects everywhere. Hopefully the links above will help me or someone else figure out how to make that happen in the future.

Replying to [lpirl](/tahoe-lafs/trac-2024-07-25/issues/1928#issuecomment-132623): > As a workaround, you can make nginx modify responses accordingly, for example: > > `proxy_redirect <http://example.com:8080/> <http://example.com/>;` > > See also [nginx docs](http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_redirect) I was going to suggest this should be added to [lafs-rpg](https://tahoe-lafs.org/pipermail/tahoe-dev/2012-January/007010.html) until this bug is fixed... but then I decided to do some digging and see if the bug would actually be difficult to fix in Tahoe. What I found is that, at least with Twisted 13.0.0 and Nevow 0.11.1, the correct port number from the request's `Host` header is now included in the reponse's `Location` header! **So, the workarounds shouldn't be necessary anymore and we can put TCP proxies in front of our web gateways and not end up with broken redirects.** Here is what I found while trying to determine how these redirects are made: * Various objects in Tahoe (in places like [directory.py](https://github.com/tahoe-lafs/tahoe-lafs/blob/master/src/allmydata/web/directory.py#L578)) subclass Nevow's `rend.Page` and use its `addSlash` feature. * Nevow's [rend.Page](https://github.com/twisted/nevow/blob/397f5f789753a919c6f21b58a487db1b9fe8567a/nevow/rend.py#L540) then calls `request.URLPath()` which I believe is from [twisted.web.server.Request.URLPath](https://twistedmatrix.com/trac/browser/tags/releases/twisted-15.5.0/twisted/web/server.py#L434), which calls [twisted.python.urlpath.URLPath.fromRequest](https://twistedmatrix.com/trac/browser/tags/releases/twisted-15.5.0/twisted/python/urlpath.py#L174) which calls back to [twisted.web.server.Request.prePathURL](https://twistedmatrix.com/trac/browser/tags/releases/twisted-15.5.0/twisted/web/server.py#L430) which calls `_prePathURL` which finally calls [twisted.web.http.Request.getHost](https://twistedmatrix.com/documents/current/api/twisted.web.http.Request.html#getHost) which returns a [twisted.internet.tcp.Port](https://twistedmatrix.com/documents/current/api/twisted.internet.tcp.Port.html) from which it (`Request.getHost`) brazenly accesses the apparently-undocumented instance attribute `port` and stuffs it in a URL. (Or so it seems.) * I was going to say this seems problematic as it would prevent Twisted's webserver from being run on other transports, like a UNIX Socket. So I created an example of that with `mkdir -p foo/bar; twistd -n web -p unix:*tmp/unixweb --path foo` and a made a TCP-to-unix proxy with `socat TCP4-LISTEN:8080,fork,reuseaddr unix:*tmp/unixweb` and then sent a request with `curl -v <http://127.0.0.1:8080/bar>` ... but much to my surprise I got a response with `Location: <http://127.0.0.1:8080/bar/>`! So then I tested with Tahoe using my original instructions in this ticket description and found that the correct port number is now there as well. From reading the code I linked to above I'm not actually sure how this is happening, but it is. * I did find a case where a Twisted webserver listening on a UNIX socket produces bad redirects, though: When there are `HTTP/1.0` requests (meaning, without a `Host` header), the `Location` header in the response begins with `<http://None/>`). * I wonder if Twisted actually needs to make absolute redirects for some reason, or if it would be OK for it to start making relative redirects all the time? * For more gory details of `addSlash`, check out warner's [Nevow issue #52: deprecation warning in addSlash redirects on py2.6](https://github.com/twisted/nevow/issues/52), regarding Tahoe issue #2312. Anyway, although this problem is now not so bad anymore, I'm not going to close this ticket because I still think we should have relative redirects everywhere. Hopefully the links above will help me or someone else figure out how to make that happen in the future.
Author
Owner

Actually, as long as we have any absolute redirects, a rewrite workaround will still be necessary in the case that someone wants to put SSL in front of their web gateway because Twisted has no way of knowing that its absolute redirects should be https://.

Actually, as long as we have any absolute redirects, a rewrite workaround will still be necessary in the case that someone wants to put SSL in front of their web gateway because Twisted has no way of knowing that its absolute redirects should be `https://`.
lpirl commented 2016-01-31 19:49:36 +00:00
Author
Owner

Replying to leif:

Actually, as long as we have any absolute redirects, a rewrite workaround will still be necessary in the case that someone wants to put SSL in front of their web gateway because Twisted has no way of knowing that its absolute redirects should be https://.

Exactly, and using SSL is essential when accessing the WUI over the Internet (regarding the confidentiality of the URLs).

Couldn't Twisted look for X-Forwarded-* headers? Esp. X-Forwarded-Proto (or the more recent Forwarded header)?

Replying to [leif](/tahoe-lafs/trac-2024-07-25/issues/1928#issuecomment-132625): > Actually, as long as we have any absolute redirects, a rewrite workaround will still be necessary in the case that someone wants to put SSL in front of their web gateway because Twisted has no way of knowing that its absolute redirects should be `https://`. Exactly, and using SSL is essential when accessing the WUI over the Internet (regarding the confidentiality of the URLs). Couldn't Twisted look for `X-Forwarded-*` headers? Esp. `X-Forwarded-Proto` (or the more recent `Forwarded` header)?
Author
Owner

You could also use i2p or tor hidden services or SSH tunnels or various other things instead of HTTPS :)

Yes, I suppose if your TLS frontend is aware of HTTP and can add a header, Twisted could be made to look at that (maybe it started to at some point? as I said above I don't actually understand how it is getting the port number correct now).

It would be simpler if it could just make relative redirects, though.

You could also use i2p or tor hidden services or SSH tunnels or various other things instead of HTTPS :) Yes, I suppose if your TLS frontend is aware of HTTP and can add a header, Twisted could be made to look at that (maybe it started to at some point? as I said above I don't actually understand how it is getting the port number correct now). It would be simpler if it could just make relative redirects, though.
lpirl commented 2016-01-31 23:54:57 +00:00
Author
Owner

True, but sometimes HTTP and ordinary Web browser access is desired. :)
And yes, this ticket is still valid since making the proxy fixing the redirects is really nothing more than a workaround.

True, but sometimes HTTP and ordinary Web browser access is desired. :) And yes, this ticket is still valid since making the proxy fixing the redirects is really nothing more than a workaround.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#1928
No description provided.