Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Connection#requests_in_batches method #647

Merged
merged 4 commits into from Oct 18, 2017
Merged

Add Connection#requests_in_batches method #647

merged 4 commits into from Oct 18, 2017

Conversation

djberg96
Copy link
Contributor

This is a simple wrapper for the Connection#requests method that allows users to chop up large request sets into smaller batches. What I noticed in practice is that somewhere around 500 simultaneous requests I would get a heavy increase in write time that I couldn't predict, resulting in a timeout error even after setting :write_timeout to 5 minutes.

Conversely, I was able to send multiple batches of 250-300 simultaneous requests with the default timeout and receive results in under 2 minutes. So, what I end up doing in practice is slicing up the requests into groups via each_slice and collating them into a single result set.

This PR is just a handy wrapper around Array#each_slice and Connection#requests, with a default slice set at 256.

Addresses #646

@djberg96
Copy link
Contributor Author

I'm not sure what happened with JRuby, but the CI error appears unrelated.

@geemus
Copy link
Contributor

geemus commented Sep 29, 2017

Sorry for the delay, was traveling last week and still trying to play catchup. I think this seems like it is on the right track, but do have some points I'd like to discuss:

  1. I wouldn't worry about jruby, it can be a bit flaky for some reason. I restarted it and we'll see if it goes through this time.
  2. Naming wise, I'm not sure about requests_in_batches, wonder if something like batch_requests might be nicer? Or perhaps something else altogether, what do you think?
  3. We should probably add at least one test around this. We can probably reuse something very similar to the existing https://github.com/excon/excon/blob/master/tests/pipeline_tests.rb, but maybe with batch size set to 1 (so that there should be 2 batches, and we can compare to the expected non-batched result). Does that make sense?
  4. I think it would be helpful to me to see an example of usage, in addition to the comment description. So I suspect that might be helpful to others as well. Maybe we should add a short section to the README describing the new feature with an example. What do you think?

@djberg96
Copy link
Contributor Author

djberg96 commented Oct 8, 2017

@geemus I looked at pipeline_tests.rb, but cannot tell what it's actually testing, so I'm loathe to just copy and paste.

I also made a couple changes. First, I changed the name to batch_requests since you prefer it. Second, I changed the effective default value to a call to Process.getrlimit.

@ghost
Copy link

ghost commented Oct 8, 2017

@djberg96 it tests persistent connections.

  5       returns(%w{ 1 2 3 4 }, 'connection is persistent') do
  6         connection = Excon.new('http://127.0.0.1:9292', :persistent => true)
  7
  8         ret = []
  9         ret << connection.requests([
 10           {:method => :get, :path => '/echo/request_count'},
 11           {:method => :get, :path => '/echo/request_count'}
 12         ]).map(&:body)
 13         ret << connection.requests([
 14           {:method => :get, :path => '/echo/request_count'},
 15           {:method => :get, :path => '/echo/request_count'}
 16         ]).map(&:body)
 17         ret.flatten
 18       end

First call results in a body of '1', second '2', third 3, fourth 4. If persistence is working then we expect back an array that is equal to [1,2,3,4]... The 'good' server counts the number of request for a given persistent connection. If the connection is not persistent (as in the test below this) we expect [1,2,1,2]. 1 and 2 for the first two requests, and 1 and 2 for the following. The count should never be greater than 2 since the server only increments its count per connection AND request and not just per request (even if it originates from the same source).

@geemus see my lastest note on finishing up new test suite with gherkin for these types of tests, for great readability in PR #575

@djberg96
Copy link
Contributor Author

@starbelly Thanks.

@geemus Ok, I've pushed up some tests. Please let me know if they're ok.

@geemus
Copy link
Contributor

geemus commented Oct 17, 2017

@djberg96 looks good. I think the tests might even be a bit of overkill now, but definitely covers the regression case I was more worried about (we could probably remove all but a couple and still be in good shape in that regard). I think the only other outstanding thing I'd like to see would be an addition to the readme with a description of this and a usage example of how to call and what you might get back. Would you be able to add that also? Thanks!

@djberg96
Copy link
Contributor Author

djberg96 commented Oct 18, 2017

@geemus Ok, I've updated the README. Please let me know how it looks.

If you're happy with it, I can squash these commits if you like.

@geemus
Copy link
Contributor

geemus commented Oct 18, 2017

Works for me, thanks!

@geemus geemus merged commit c92f240 into excon:master Oct 18, 2017
@ghost ghost mentioned this pull request Nov 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants