Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework host/hostname/authority implementation. #1561

Merged
merged 11 commits into from Feb 7, 2020
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -37,6 +37,7 @@ All notable changes to this project will be documented in this file. For info on
- `Rack::HeaderHash` is memoized by default. ([#1549](https://github.com/rack/rack/pull/1549), [@ioquatix](https://github.com/ioquatix))
- `Rack::Directory` allow directory traversal inside root directory. ([#1417](https://github.com/rack/rack/pull/1417), [@ThomasSevestre](https://github.com/ThomasSevestre))
- Sort encodings by server preference. ([#1184](https://github.com/rack/rack/pull/1184), [@ioquatix](https://github.com/ioquatix), [@wjordan](https://github.com/wjordan))
- Rework host/hostname/authority implementation in `Rack::Request`. ([#1561](https://github.com/rack/rack/pull/1561), [@ioquatix](https://github.com/ioquatix))
ioquatix marked this conversation as resolved.
Show resolved Hide resolved

### Removed

Expand Down
1 change: 1 addition & 0 deletions lib/rack.rb
Expand Up @@ -15,6 +15,7 @@

module Rack
HTTP_HOST = 'HTTP_HOST'
HTTP_PORT = 'HTTP_PORT'
HTTP_VERSION = 'HTTP_VERSION'
HTTPS = 'HTTPS'
PATH_INFO = 'PATH_INFO'
Expand Down
19 changes: 17 additions & 2 deletions lib/rack/lint.rb
Expand Up @@ -271,13 +271,28 @@ def check_env(env)
## accepted specifications and must not be used otherwise.
##

%w[REQUEST_METHOD SERVER_NAME SERVER_PORT
QUERY_STRING
%w[REQUEST_METHOD SERVER_NAME QUERY_STRING
rack.version rack.input rack.errors
rack.multithread rack.multiprocess rack.run_once].each { |header|
assert("env missing required key #{header}") { env.include? header }
}

## The <tt>SERVER_PORT</tt> must be an integer if set.
ioquatix marked this conversation as resolved.
Show resolved Hide resolved
assert("env[SERVER_PORT] is not an integer") do
server_port = env["SERVER_PORT"]
server_port.nil? || (Integer(server_port) rescue false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we state that SERVER_PORT must be an integer, we should do env["SERVER_PORT"].is_a?(Integer). I don't think we need to check that it is convertable to an integer, unless we want that to be the spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense, I'll update the SPEC.

end

## The <tt>SERVER_NAME</tt> must be a valid authority as defined by RFC7540.
assert("#{env[SERVER_NAME]} must be a valid authority") do
URI.parse("http://#{env[SERVER_NAME]}/") rescue false
end

## The <tt>HTTP_HOST</tt> must be a valid authority as defined by RFC7540.
ioquatix marked this conversation as resolved.
Show resolved Hide resolved
assert("#{env[HTTP_HOST]} must be a valid authority") do
URI.parse("http://#{env[HTTP_HOST]}/") rescue false
end

## The environment must not contain the keys
## <tt>HTTP_CONTENT_TYPE</tt> or <tt>HTTP_CONTENT_LENGTH</tt>
## (use the versions without <tt>HTTP_</tt>).
Expand Down
2 changes: 1 addition & 1 deletion lib/rack/recursive.rb
Expand Up @@ -19,7 +19,7 @@ def initialize(url, env = {})
@env[PATH_INFO] = @url.path
@env[QUERY_STRING] = @url.query if @url.query
@env[HTTP_HOST] = @url.host if @url.host
@env["HTTP_PORT"] = @url.port if @url.port
@env[HTTP_PORT] = @url.port if @url.port
@env[RACK_URL_SCHEME] = @url.scheme if @url.scheme

super "forwarding to #{url}"
Expand Down
212 changes: 138 additions & 74 deletions lib/rack/request.rb
Expand Up @@ -129,11 +129,23 @@ module Helpers
# to include the port in a generated URI.
DEFAULT_PORTS = { 'http' => 80, 'https' => 443, 'coffee' => 80 }

# The address of the client which connected to the proxy.
HTTP_X_FORWARDED_FOR = 'HTTP_X_FORWARDED_FOR'

# The contents of the host/:authority header sent to the proxy.
HTTP_X_FORWARDED_HOST = 'HTTP_X_FORWARDED_HOST'

# The value of the scheme sent to the proxy.
HTTP_X_FORWARDED_SCHEME = 'HTTP_X_FORWARDED_SCHEME'
HTTP_X_FORWARDED_PROTO = 'HTTP_X_FORWARDED_PROTO'
HTTP_X_FORWARDED_HOST = 'HTTP_X_FORWARDED_HOST'
HTTP_X_FORWARDED_PORT = 'HTTP_X_FORWARDED_PORT'
HTTP_X_FORWARDED_SSL = 'HTTP_X_FORWARDED_SSL'

# The protocol used to connect to the proxy.
HTTP_X_FORWARDED_PROTO = 'HTTP_X_FORWARDED_PROTO'

# The port used to connect to the proxy.
HTTP_X_FORWARDED_PORT = 'HTTP_X_FORWARDED_PORT'

# Another way for specifing https scheme was used.
HTTP_X_FORWARDED_SSL = 'HTTP_X_FORWARDED_SSL'

def body; get_header(RACK_INPUT) end
def script_name; get_header(SCRIPT_NAME).to_s end
Expand Down Expand Up @@ -207,8 +219,42 @@ def scheme
end
end

# The authority of the incoming reuqest as defined by RFC2976.
ioquatix marked this conversation as resolved.
Show resolved Hide resolved
# https://tools.ietf.org/html/rfc3986#section-3.2
#
# In HTTP/1, this is the `host` header.
# In HTTP/2, this is the `:authority` pseudo-header.
def authority
get_header(SERVER_NAME) + ':' + get_header(SERVER_PORT)
forwarded_authority || host_authority || server_authority
end

# The authority as defined by the `SERVER_NAME`/`SERVER_ADDR` and
# `SERVER_PORT` variables.
def server_authority
host = self.server_name
ioquatix marked this conversation as resolved.
Show resolved Hide resolved
port = self.server_port

if host
if port
return "#{host}:#{port}"
else
return host
end
end
end

def server_name
if name = get_header(SERVER_NAME)
return name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #1557 (comment), you stated I'm fine with implicit return when the cyclomatic complexity of a function is 1, or the branch depth is 1. To keep at least a reasonably consistent style, can we avoid explicit returns in cases where the branch depth is <= 1 (I assume the branch depth is 1 here, and outside of the if the branch depth is 0). Ditto for the rest of the file.

elsif address = get_header(SERVER_ADDR)
ioquatix marked this conversation as resolved.
Show resolved Hide resolved
return wrap_ipv6(address)
end
end

def server_port
if port = get_header(SERVER_PORT)
return Integer(port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As SERVER_PORT is required, we don't need the if. Also, if we change SPEC to require it be an integer, we don't need to convert it, so this method could just be get_header(SERVER_PORT)

end
end

def cookies
Expand All @@ -232,52 +278,87 @@ def xhr?
get_header("HTTP_X_REQUESTED_WITH") == "XMLHttpRequest"
end

def host_with_port
port = self.port
if port.nil? || port == DEFAULT_PORTS[scheme]
host
# The `HTTP_HOST` header.
def host_authority
get_header(HTTP_HOST)
end

def host_with_port(authority = self.authority)
host, address, port = split_authority(authority)

if port == DEFAULT_PORTS[self.scheme]
return host
else
host = self.host
# If host is IPv6
host = "[#{host}]" if host.include?(':')
"#{host}:#{port}"
return authority
end
end

# Returns a formatted host, suitable for being used in a URI.
def host
# Remove port number.
strip_port hostname.to_s
host, address, port = split_authority(self.authority)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused local variables, method body could probably just be split_authority(authority)[0]. Likewise hostname could be split_authority(authority)[1]. If you want really keep the existing style, use underscores for the unused local variables. Similar issue exists in the port method below.


return host
end

# Returns an address suitable for being used with `getaddrinfo`.
def hostname
host, address, port = split_authority(self.authority)

return address
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like inconsistency: the method hostname returns address… why? I think we should name it either hostname or address ecerywhere.

Copy link
Member Author

@ioquatix ioquatix Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It matches the implementation of URI. I'm not saying it's right. I just decided to follow the existing design/method names. But you are right. Maybe it should be address or something else...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, if you even just rename local variable to hostname — it'd be better. I don't see address as URI method or in its sources.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. It is actually confusing to me too in a way.

u = URI("http://[::1]/bar")
p u.hostname      #=> "::1"
p u.host          #=> "[::1]"

What does make sense to me is when you say "host_with_port". That means, the "escaped" host and :port appended. You can't have hostname_with_port. It's simply not possible because it becomes ambiguous to parse.

So, there's that logic... But yeah, I also see it's confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could rename it hostname everywhere. I see what you are saying, and I don't think it would be a problem. Except that technically, I don't even know if 192.168.1.1 is a hostname? Or 1:: is a hostname? To me, something like www.google.com is a hostname. Not sure what is the right word/definition here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's about local variable name but the entire thing is kind of confusing, so we should try to get it right for our users or we just perpetuate the chaos.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have several options:

1/ Rename local variable address -> hostname and return address to return hostname.
2/ Rename method def hostname to def address.
3/ Remove def hostname.
4/ Add def uri which returns an instance of URI which represents what the URI was used to make the request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 is like compliance with URI from stdlib.
2 is like try to improve naming, but, as I see, still with logic violations.
3 is radical, and, I guess, we still need for a helper method. If not — the best code is no code.
4 seems interesting, and "URI" sounds common enough for IPs and domains, especially with URI instance inside.

So, I can't say before trying, but I'd try to make in this order: 3, 4, 1, 2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait for feedback from @jeremyevans before I make any further changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record, there are several new methods, and that includes def hostname. It's not released in Rack 2.1.


def port
result =
if port = extract_port(hostname)
port
elsif port = get_header(HTTP_X_FORWARDED_PORT)
port
elsif has_header?(HTTP_X_FORWARDED_HOST)
DEFAULT_PORTS[scheme]
elsif has_header?(HTTP_X_FORWARDED_PROTO)
DEFAULT_PORTS[extract_proto_header(get_header(HTTP_X_FORWARDED_PROTO))]
else
get_header(SERVER_PORT)
if authority = self.authority
host, address, port = split_authority(self.authority)
if port
return port
end
end

if forwarded_port = self.forwarded_port
return forwarded_port.first
end

result.to_i unless result.to_s.empty?
if scheme = self.scheme
if port = DEFAULT_PORTS[self.scheme]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are going to set the scheme local variable, you should use it as the argument to DEFAULT_PORTS.[]. Otherwise don't define the local variable.

return port
end
end

return self.server_port
end

def forwarded_for
if value = get_header(HTTP_X_FORWARDED_FOR)
split_header(value).map do |authority|
split_authority(wrap_ipv6(authority))[1]
end
end
end

def forwarded_port
if value = get_header(HTTP_X_FORWARDED_PORT)
split_header(value).map(&:to_i)
end
end

def forwarded_authority
if value = get_header(HTTP_X_FORWARDED_HOST)
wrap_ipv6(split_header(value).first)
end
end

def ssl?
scheme == 'https' || scheme == 'wss'
end

def ip
remote_addrs = split_ip_addresses(get_header('REMOTE_ADDR'))
remote_addrs = split_header(get_header('REMOTE_ADDR'))
remote_addrs = reject_trusted_ip_addresses(remote_addrs)

return remote_addrs.first if remote_addrs.any?

forwarded_ips = split_ip_addresses(get_header('HTTP_X_FORWARDED_FOR'))
.map { |ip| strip_port(ip) }
forwarded_ips = self.forwarded_for

return reject_trusted_ip_addresses(forwarded_ips).last || forwarded_ips.first || get_header("REMOTE_ADDR")
end
Expand Down Expand Up @@ -466,6 +547,20 @@ def values_at(*keys)

def default_session; {}; end

# Assist with compatibility when processing `X-Forwarded-For`.
def wrap_ipv6(host)
# Even thought IPv6 addresses should be wrapped in square brackets,
# sometimes this is not done in various legacy/underspecified headers.
# So we try to fix this situation for compatibility reasons.

# Try to detect IPv6 addresses which aren't escaped yet:
if !host.start_with?('[') && host.count(':') > 1
"[#{host}]"
else
host
end
end

def parse_http_accept_header(header)
header.to_s.split(/\s*,\s*/).map do |part|
attribute, parameters = part.split(/\s*;\s*/, 2)
Expand All @@ -489,37 +584,24 @@ def parse_multipart
Rack::Multipart.extract_multipart(self, query_parser)
end

def split_ip_addresses(ip_addresses)
ip_addresses ? ip_addresses.strip.split(/[,\s]+/) : []
end

def hostname
if forwarded = get_header(HTTP_X_FORWARDED_HOST)
forwarded.split(/,\s?/).last
else
get_header(HTTP_HOST) ||
get_header(SERVER_NAME) ||
get_header(SERVER_ADDR)
end
def split_header(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A generic name like split_header implies it will split all headers, which I don't think is possible as headers have different formats. Maybe split_ip_addresses wasn't accurate, but I think split_header is more misleading as a name.

value ? value.strip.split(/[,\s]+/) : []
end

def strip_port(ip_address)
# IPv6 format with optional port: "[2001:db8:cafe::17]:47011"
# returns: "2001:db8:cafe::17"
sep_start = ip_address.index('[')
sep_end = ip_address.index(']')
if (sep_start && sep_end)
return ip_address[sep_start + 1, sep_end - 1]
end
AUTHORITY = /(?<host>(\[(?<ip6>.*)\])|(?<ip4>[\d\.]+)|(?<name>[a-zA-Z0-9\.\-]+))(:(?<port>\d+))?/
private_constant :AUTHORITY
ioquatix marked this conversation as resolved.
Show resolved Hide resolved

# IPv4 format with optional port: "192.0.2.43:47011"
# returns: "192.0.2.43"
sep = ip_address.index(':')
if (sep && ip_address.count(':') == 1)
return ip_address[0, sep]
def split_authority(authority)
if match = AUTHORITY.match(authority)
if address = match[:ip6]
return match[:host], address, match[:port]&.to_i
else
return match[:host], match[:host], match[:port]&.to_i
end
end

ip_address
# Give up!
return authority, authority, nil
end

def reject_trusted_ip_addresses(ip_addresses)
Expand All @@ -544,24 +626,6 @@ def extract_proto_header(header)
end
end
end

def extract_port(uri)
# IPv6 format with optional port: "[2001:db8:cafe::17]:47011"
# change `uri` to ":47011"
sep_start = uri.index('[')
sep_end = uri.index(']')
if (sep_start && sep_end)
uri = uri[sep_end + 1, uri.length]
end

# IPv4 format with optional port: "192.0.2.43:47011"
# or ":47011" from IPv6 above
# returns: "47011"
sep = uri.index(':')
if (sep && uri.count(':') == 1)
return uri[sep + 1, uri.length]
end
end
end

include Env
Expand Down