Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression since version 0.71 #836

Closed
misalcedo opened this issue Nov 20, 2023 · 7 comments · Fixed by #838
Closed

Performance regression since version 0.71 #836

misalcedo opened this issue Nov 20, 2023 · 7 comments · Fixed by #838

Comments

@misalcedo
Copy link
Contributor

I am running a benchmark of excon hitting an httpbin-go container repeatedly using benchmark-ips.

I found that version 0.71 was significantly faster than 0.104. Looking at a a ruby-prof graph output I see that for OpenSSL the underlying socket is already buffered and in older versions we made significantly more syscalls to the socket (903 for v0.71.0 versus 104 for v0.104.0).

The changes since version 0.71 in sockets are not many, but I see #796 as potentially being the culprit. Although we reduced the number of allocations and syscalls performance can actually suffer for OpenSSL.

Excon.zip

v0.104.0: 258.3 i/s
v0.71.0:  364.4 i/s

I plan to look into this a bit but the easiest change is changing the condition here:

until @backend_eof || @read_buffer.length >= max_length

Most sockets (including Ruby's) treat the max length as an upper bound and do not try to fill the whole buffer. Excon, however attempts to fill the entire buffer on each call to #read_nonblock. By just changing the condition to until @backend_eof || @read_buffer.length != 0 we can improve performance without seriously impacting allocations or syscalls. I am trying to perform some more testing to see if this is safe to do.

@misalcedo
Copy link
Contributor Author

Attached is a profile of the modified version. Same number of syscalls and pretty much the same number of allocations.

Excon Modified v1.html.zip

We can reduce allocations even further by re-using the same buffer on calls to read_nonblock:

Excon Modified v2.html.zip

I'll try to put up a PR with my changes soon.

@misalcedo
Copy link
Contributor Author

Draft PR here: #837

Currently, I am stuck on how to handle streaming responses with a defined content length.

There are 2 cases in tests:

  • Stream has a full chunk
  • Stream has a partial chunk

Somehow the response wants to handle these differently, but I don't see how we can tell. Still working through that.

@misalcedo
Copy link
Contributor Author

I figured out a path forward.

In order to handle the streaming with a response block cases, I had to add a block argument to Socket#read. This allowed me to denote whether I wanted to wait for bytes to be avialable or not. That together with a new Socket#read_chunk that tries to read a full chunk_size bytes helped me get tests passing.

@misalcedo
Copy link
Contributor Author

Running into an issue now where the socket tests hand on drain. Still not sure why. Will try looking into it once I have some time.

@misalcedo
Copy link
Contributor Author

New draft PR that stays closer to the original implementation while improving perf and passing all tests: #838

@misalcedo
Copy link
Contributor Author

misalcedo commented Nov 22, 2023

@geemus The PR is now ready to review. With these changes Excon is now the fastest HTTP client in my benchmark. On version 0.71.0 Excon matched the performance of Net::HTTP. On version 104, Excon was the slowest of the clients I tested (http.rb, httpx, net/http, excon, typhoeus).

Here is the result of one (of many) runs:

Simulate-Excon-modified:       369.9 i/s
Simulate-Net::HTTP-0.3.2:      353.8 i/s - same-ish: difference falls within error
Simulate-HTTPX-1.1.1:          331.4 i/s - 1.12x  slower
Simulate-Typhoeus-1.4.0:       322.0 i/s - 1.15x  slower
Simulate-HTTP-5.1.1:           308.2 i/s - 1.20x  slower

Comparison of a run with v0.71.0:

Simulate-Excon-0.71.0:         364.4 i/s
Simulate-Net::HTTP-0.3.2:      340.1 i/s - same-ish: difference falls within error
Simulate-Typhoeus-1.4.0:       312.4 i/s - 1.17x  slower
Simulate-HTTPX-1.1.1:          311.1 i/s - 1.17x  slower
Simulate-HTTP-5.1.1:           306.9 i/s - 1.19x  slower

Comparison of a run with v0.104.0:

Simulate-Net::HTTP-0.3.2:      333.3 i/s
Simulate-HTTPX-1.1.1:          321.3 i/s - same-ish: difference falls within error
Simulate-Typhoeus-1.4.0:       312.2 i/s - same-ish: difference falls within error
Simulate-HTTP-5.1.1:           303.9 i/s - same-ish: difference falls within error
Simulate-Excon-0.104.0:        258.3 i/s - 1.29x  slower

@misalcedo
Copy link
Contributor Author

The PR includes a zip of HTML files that show allocations for all 3 versions. The proposed version is no worse than v104 in terms of number of allocations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant