New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve parsing of HTTP_HOST header #2605
Conversation
IPV6 Host was not properly parsed. #2584
lib/puma/request.rb
Outdated
@@ -231,7 +231,7 @@ def fetch_status_code(status) | |||
# | |||
def normalize_env(env, client) | |||
if host = env[HTTP_HOST] | |||
if colon = host.index(":") | |||
if colon = host.index(/:\d+$/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to extract this to a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have run the benchmark hello.sh
with your changes, and it seems that extracting to a constant will make it faster. You can give it a try if that makes sense to you as well 😃
No constant
❯ benchmarks/wrk/hello.sh
Puma starting in single mode...
* Puma version: 5.2.2 (ruby 2.7.0-p0) ("Fettisdagsbulle")
* Min threads: 4
* Max threads: 4
* Environment: development
* PID: 23551
* Listening on http://0.0.0.0:9292
Use Ctrl-C to stop
Running 30s test @ http://localhost:9292
2 threads and 4 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 699.53us 301.57us 4.14ms 81.63%
Req/Sec 2.90k 622.92 3.97k 66.45%
Latency Distribution
50% 612.00us
75% 813.00us
90% 1.07ms
99% 1.81ms
173587 requests in 30.10s, 10.59MB read
Requests/sec: 5766.76
Transfer/sec: 360.42KB
Extract to constant
❯ benchmarks/wrk/hello.sh
Puma starting in single mode...
* Puma version: 5.2.2 (ruby 2.7.0-p0) ("Fettisdagsbulle")
* Min threads: 4
* Max threads: 4
* Environment: development
* PID: 23609
* Listening on http://0.0.0.0:9292
Use Ctrl-C to stop
Running 30s test @ http://localhost:9292
2 threads and 4 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 623.64us 271.01us 15.81ms 87.05%
Req/Sec 3.24k 405.07 4.22k 74.09%
Latency Distribution
50% 570.00us
75% 688.00us
90% 0.89ms
99% 1.49ms
194270 requests in 30.10s, 11.86MB read
Requests/sec: 6454.13
Transfer/sec: 403.38KB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the benchmark results. I tried to reproduce the results and could not find such a difference between requests/s of inline regexp and constant regexp. There was a small difference between the old (index(":")
) and new version).
I'll try with different ruby version later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can ignore my benchmark result if that doesn't make sense to you ( it defer from machine to machine, and I believe my machine is not very good 😂 )
I think it is a good idea to make it a constant in const.rb
, so we don't need to allocated this value everytime it is called.
The following comment in that file give us a good reason to do so:
Lines 90 to 97 in a717b27
# Frequently used constants when constructing requests or responses. Many times | |
# the constant just refers to a string with the same contents. Using these constants | |
# gave about a 3% to 10% performance improvement over using the strings directly. | |
# | |
# The constants are frozen because Hash#[]= when called with a String key dups | |
# the String UNLESS the String is frozen. This saves us therefore 2 object | |
# allocations when creating the env hash later. | |
# |
@@ -53,6 +53,36 @@ def new_connection | |||
TCPSocket.new(@host, @port).tap {|sock| @ios << sock} | |||
end | |||
|
|||
def test_normalize_host_header_missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is probably a smarter way to check those headers?
A few things:
|
Thanks for the feedback, I'll look into it. I don't think that |
Sorry, didn't know/check that. |
Not that familiar with IPv6, couldn't an address without a port also satisfy |
@MSP-Greg I think my test data for IPV6 is wrong and the Regexp is faulty. I need to look up the specs to be sure but IMHO the Host header can be
along with optional
So IPV6 would be
and if there is a port
If this is correct, then I can write a better RE or do something around |
I'd vote for EDIT: Working on other things, but I think we're back to |
IPV6 are bracketed but contain colons so I needed to adapt the logic to separate host/port.
@MSP-Greg thanks for the feedback. You make it easy for me to contribute. I added |
@@ -231,7 +231,11 @@ def fetch_status_code(status) | |||
# | |||
def normalize_env(env, client) | |||
if host = env[HTTP_HOST] | |||
if colon = host.index(":") | |||
# host can be a hostname, ipv4 or bracketed ipv6. Followed by an optional port. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good comments 👍
Hey Puma Team To me it's easy to contribute to puma. Which is hard work on your end. Thanks a lot 🤩 |
Thank you for your contribution Pascal! |
I apologize. I was too busy to create the pull request myself at the time. Had I seen this one I would have actually suggested: if host.include?("]:")
env[SERVER_NAME], _, env[SERVER_PORT] = host.rpartition(":")
elsif not host.include?("[") and host.include?(":")
env[SERVER_NAME], env[SERVER_PORT] = host.split(":")
else
env[SERVER_NAME], env[SERVER_PORT] = host, default_server_port(env)
end No offset juggling makes it look less like an awk script, and it's almost identical in performance. On my system it edges out the new code, but only very slightly (hanging in the 458k to 464k range, instead of the 448k to 461k range), but at that point you're averaging I was also going to note that the |
oh, did not know |
Ruby does have a truly staggering number of methods on some objects. |
* Improve parsing of HTTP_HOST header IPV6 Host was not properly parsed. puma#2584 * Extracted Regex to constant * Incorporate feedback IPV6 are bracketed but contain colons so I needed to adapt the logic to separate host/port. Co-authored-by: pascal betz <pascal.betz@swisscom.com>
IPV6 Host was not properly parsed.
With the insanely detailed bug report his was fixed in no time: 👍
Closes #2584
Description
Please describe your pull request. Thank you for contributing! You're the best.
Your checklist for this pull request
[ci skip]
to the title of the PR.#issue
" to the PR description or my commit messages.