Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parsing of HTTP_HOST header #2605

Merged
merged 4 commits into from Apr 26, 2021
Merged

Improve parsing of HTTP_HOST header #2605

merged 4 commits into from Apr 26, 2021

Conversation

pascalbetz
Copy link
Contributor

@pascalbetz pascalbetz commented Apr 20, 2021

IPV6 Host was not properly parsed.

With the insanely detailed bug report his was fixed in no time: 👍

Closes #2584

Description

Please describe your pull request. Thank you for contributing! You're the best.

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

IPV6 Host was not properly parsed.

#2584
@@ -231,7 +231,7 @@ def fetch_status_code(status)
#
def normalize_env(env, client)
if host = env[HTTP_HOST]
if colon = host.index(":")
if colon = host.index(/:\d+$/)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to extract this to a constant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run the benchmark hello.sh with your changes, and it seems that extracting to a constant will make it faster. You can give it a try if that makes sense to you as well 😃

No constant

❯ benchmarks/wrk/hello.sh
Puma starting in single mode...
* Puma version: 5.2.2 (ruby 2.7.0-p0) ("Fettisdagsbulle")
*  Min threads: 4
*  Max threads: 4
*  Environment: development
*          PID: 23551
* Listening on http://0.0.0.0:9292
Use Ctrl-C to stop
Running 30s test @ http://localhost:9292
  2 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   699.53us  301.57us   4.14ms   81.63%
    Req/Sec     2.90k   622.92     3.97k    66.45%
  Latency Distribution
     50%  612.00us
     75%  813.00us
     90%    1.07ms
     99%    1.81ms
  173587 requests in 30.10s, 10.59MB read
Requests/sec:   5766.76
Transfer/sec:    360.42KB

Extract to constant

❯ benchmarks/wrk/hello.sh
Puma starting in single mode...
* Puma version: 5.2.2 (ruby 2.7.0-p0) ("Fettisdagsbulle")
*  Min threads: 4
*  Max threads: 4
*  Environment: development
*          PID: 23609
* Listening on http://0.0.0.0:9292
Use Ctrl-C to stop
Running 30s test @ http://localhost:9292
  2 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   623.64us  271.01us  15.81ms   87.05%
    Req/Sec     3.24k   405.07     4.22k    74.09%
  Latency Distribution
     50%  570.00us
     75%  688.00us
     90%    0.89ms
     99%    1.49ms
  194270 requests in 30.10s, 11.86MB read
Requests/sec:   6454.13
Transfer/sec:    403.38KB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kuei0221

Thanks for the benchmark results. I tried to reproduce the results and could not find such a difference between requests/s of inline regexp and constant regexp. There was a small difference between the old (index(":")) and new version).

I'll try with different ruby version later today.

Copy link
Contributor

@kuei0221 kuei0221 Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can ignore my benchmark result if that doesn't make sense to you ( it defer from machine to machine, and I believe my machine is not very good 😂 )
I think it is a good idea to make it a constant in const.rb, so we don't need to allocated this value everytime it is called.
The following comment in that file give us a good reason to do so:

puma/lib/puma/const.rb

Lines 90 to 97 in a717b27

# Frequently used constants when constructing requests or responses. Many times
# the constant just refers to a string with the same contents. Using these constants
# gave about a 3% to 10% performance improvement over using the strings directly.
#
# The constants are frozen because Hash#[]= when called with a String key dups
# the String UNLESS the String is frozen. This saves us therefore 2 object
# allocations when creating the env hash later.
#

@@ -53,6 +53,36 @@ def new_connection
TCPSocket.new(@host, @port).tap {|sock| @ios << sock}
end

def test_normalize_host_header_missing
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is probably a smarter way to check those headers?

@pascalbetz pascalbetz marked this pull request as ready for review April 20, 2021 11:41
@nateberkopec nateberkopec added bug waiting-for-review Waiting on review from anyone labels Apr 20, 2021
@MSP-Greg
Copy link
Member

A few things:

  1. The comment about '3%' was committed fourteen years ago.
  2. If this is changed to a constant, it could also be placed in the file that calls it.
  3. Using rindex instead of index would also solve the bug that this is fixing.
  4. Generally, how to handle constants that are only called by one file/class/module is a matter of style. Often lookup values are treated differently than objects used for code control (branching, conditionals, etc).

@pascalbetz
Copy link
Contributor Author

Thanks for the feedback, I'll look into it.
I just realized that my testdata is faulty (missing brackets for ipv6).

I don't think that rindex allone would work though. The port is optional in the host header, so an IPV6 without port would return a faulty index.

@MSP-Greg
Copy link
Member

MSP-Greg commented Apr 22, 2021

The port is optional in the host header

Sorry, didn't know/check that.

@MSP-Greg
Copy link
Member

@pascalbetz

Not that familiar with IPv6, couldn't an address without a port also satisfy /:\d+$/?

@pascalbetz
Copy link
Contributor Author

@MSP-Greg I think my test data for IPV6 is wrong and the Regexp is faulty.

I need to look up the specs to be sure but IMHO the Host header can be

<hostname | IPV4 | ("[" IPV6 "]")>

along with optional

<colon><port>

So IPV6 would be

[3ffe:2a00:100:7031::1]

and if there is a port

[3ffe:2a00:100:7031::1]:1234

If this is correct, then I can write a better RE or do something around rindex("]:")

@MSP-Greg
Copy link
Member

MSP-Greg commented Apr 22, 2021

I'd vote for rindex ']:'

EDIT: Working on other things, but I think we're back to /:\d+$/ or /:\d+\z/ is the best choice?

IPV6 are bracketed but contain colons so I needed to adapt the logic to separate host/port.
@pascalbetz
Copy link
Contributor Author

@MSP-Greg thanks for the feedback. You make it easy for me to contribute.

I added rindex("]:") to detect IPV6 with port and added a "not IPV6 check" to the existing logic.

@MSP-Greg MSP-Greg self-requested a review April 23, 2021 14:46
@nateberkopec nateberkopec added this to the 5.3.0 milestone Apr 24, 2021
@@ -231,7 +231,11 @@ def fetch_status_code(status)
#
def normalize_env(env, client)
if host = env[HTTP_HOST]
if colon = host.index(":")
# host can be a hostname, ipv4 or bracketed ipv6. Followed by an optional port.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comments 👍

@nateberkopec nateberkopec merged commit 51a82b5 into puma:master Apr 26, 2021
@pascalbetz
Copy link
Contributor Author

Hey Puma Team

To me it's easy to contribute to puma. Which is hard work on your end.

Thanks a lot 🤩

@nateberkopec
Copy link
Member

Thank you for your contribution Pascal!

@kaorihinata
Copy link

I apologize. I was too busy to create the pull request myself at the time. Had I seen this one I would have actually suggested:

if host.include?("]:")
  env[SERVER_NAME], _, env[SERVER_PORT] = host.rpartition(":")
elsif not host.include?("[") and host.include?(":")
  env[SERVER_NAME], env[SERVER_PORT] = host.split(":")
else
  env[SERVER_NAME], env[SERVER_PORT] = host, default_server_port(env)
end

No offset juggling makes it look less like an awk script, and it's almost identical in performance. On my system it edges out the new code, but only very slightly (hanging in the 458k to 464k range, instead of the 448k to 461k range), but at that point you're averaging wrk runs.

I was also going to note that the hello.sh script being hard coded to bind to http://0.0.0.0:9292 is interpreted on some platforms (for example, macOS) to mean "all IPv4 addresses", and localhost is always resolved to the IPv4 loopback address, so wrk never hits anything beyond the IPv4 address:port scenario.

@pascalbetz
Copy link
Contributor Author

oh, did not know rpartition

@kaorihinata
Copy link

oh, did not know rpartition

Ruby does have a truly staggering number of methods on some objects.

JuanitoFatas pushed a commit to JuanitoFatas/puma that referenced this pull request Sep 9, 2022
* Improve parsing of HTTP_HOST header

IPV6 Host was not properly parsed.

puma#2584

* Extracted Regex to constant

* Incorporate feedback

IPV6 are bracketed but contain colons so I needed to adapt the logic to separate host/port.

Co-authored-by: pascal betz <pascal.betz@swisscom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug waiting-for-review Waiting on review from anyone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simple method of parsing HTTP_HOST in normalize_env breaks on IPv6 addresses.
5 participants