Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parsing of HTTP_HOST header #2605

Merged
merged 4 commits into from Apr 26, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion lib/puma/request.rb
Expand Up @@ -231,7 +231,7 @@ def fetch_status_code(status)
#
def normalize_env(env, client)
if host = env[HTTP_HOST]
if colon = host.index(":")
if colon = host.index(/:\d+$/)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to extract this to a constant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run the benchmark hello.sh with your changes, and it seems that extracting to a constant will make it faster. You can give it a try if that makes sense to you as well 😃

No constant

❯ benchmarks/wrk/hello.sh
Puma starting in single mode...
* Puma version: 5.2.2 (ruby 2.7.0-p0) ("Fettisdagsbulle")
*  Min threads: 4
*  Max threads: 4
*  Environment: development
*          PID: 23551
* Listening on http://0.0.0.0:9292
Use Ctrl-C to stop
Running 30s test @ http://localhost:9292
  2 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   699.53us  301.57us   4.14ms   81.63%
    Req/Sec     2.90k   622.92     3.97k    66.45%
  Latency Distribution
     50%  612.00us
     75%  813.00us
     90%    1.07ms
     99%    1.81ms
  173587 requests in 30.10s, 10.59MB read
Requests/sec:   5766.76
Transfer/sec:    360.42KB

Extract to constant

❯ benchmarks/wrk/hello.sh
Puma starting in single mode...
* Puma version: 5.2.2 (ruby 2.7.0-p0) ("Fettisdagsbulle")
*  Min threads: 4
*  Max threads: 4
*  Environment: development
*          PID: 23609
* Listening on http://0.0.0.0:9292
Use Ctrl-C to stop
Running 30s test @ http://localhost:9292
  2 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   623.64us  271.01us  15.81ms   87.05%
    Req/Sec     3.24k   405.07     4.22k    74.09%
  Latency Distribution
     50%  570.00us
     75%  688.00us
     90%    0.89ms
     99%    1.49ms
  194270 requests in 30.10s, 11.86MB read
Requests/sec:   6454.13
Transfer/sec:    403.38KB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kuei0221

Thanks for the benchmark results. I tried to reproduce the results and could not find such a difference between requests/s of inline regexp and constant regexp. There was a small difference between the old (index(":")) and new version).

I'll try with different ruby version later today.

Copy link
Contributor

@kuei0221 kuei0221 Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can ignore my benchmark result if that doesn't make sense to you ( it defer from machine to machine, and I believe my machine is not very good 😂 )
I think it is a good idea to make it a constant in const.rb, so we don't need to allocated this value everytime it is called.
The following comment in that file give us a good reason to do so:

puma/lib/puma/const.rb

Lines 90 to 97 in a717b27

# Frequently used constants when constructing requests or responses. Many times
# the constant just refers to a string with the same contents. Using these constants
# gave about a 3% to 10% performance improvement over using the strings directly.
#
# The constants are frozen because Hash#[]= when called with a String key dups
# the String UNLESS the String is frozen. This saves us therefore 2 object
# allocations when creating the env hash later.
#

env[SERVER_NAME] = host[0, colon]
env[SERVER_PORT] = host[colon+1, host.bytesize]
else
Expand Down
30 changes: 30 additions & 0 deletions test/test_puma_server.rb
Expand Up @@ -53,6 +53,36 @@ def new_connection
TCPSocket.new(@host, @port).tap {|sock| @ios << sock}
end

def test_normalize_host_header_missing
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is probably a smarter way to check those headers?

server_run app: ->(env) do
[200, {}, [env["SERVER_NAME"], "\n", env["SERVER_PORT"]]]
end

data = send_http_and_read "GET / HTTP/1.0\r\n\r\n"
assert_equal "localhost\n80", data.split("\r\n").last
end

def test_normalize_host_header_ipv4
server_run app: ->(env) do
[200, {}, [env["SERVER_NAME"], "\n", env["SERVER_PORT"]]]
end

data = send_http_and_read "GET / HTTP/1.0\r\nHost: 123.123.123.123:456\r\n\r\n"
assert_equal "123.123.123.123\n456", data.split("\r\n").last
end

def test_normalize_host_header_ipv6
server_run app: ->(env) do
[200, {}, [env["SERVER_NAME"], "\n", env["SERVER_PORT"]]]
end

data = send_http_and_read "GET / HTTP/1.0\r\nHost: ::ffff:127.0.0.1:9292\r\n\r\n"
assert_equal "::ffff:127.0.0.1\n9292", data.split("\r\n").last

data = send_http_and_read "GET / HTTP/1.0\r\nHost: ::1:9292\r\n\r\n"
assert_equal "::1\n9292", data.split("\r\n").last
end

def test_proper_stringio_body
data = nil

Expand Down