Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(request): optionally retry network failures #353

Merged
merged 1 commit into from
Nov 23, 2015

Conversation

nbrustein
Copy link
Contributor

Retries can be turned on by setting

Stripe.max_retries_on_network_failure = 2 # or any other positive integer

Once the max_retries_on_network_failure is set to a positive integer, and request that fails on a network error will be retried. There will be a sleep between each retry, the length of which is determined using an exponential backoff algorithm.

If a post request is made without an idempotency key and retries are on, then an idempotency key will be added automatically, since it is unsafe to retry without one.

max_retry_sleep_seconds is a float that can be set to configure that maximum time to sleep between retries (default is 2 seconds)

base_retry_sleep_seconds is a float can be set to configure the time to wait before the initial retry (default is 0.5 seconds)

@nbrustein
Copy link
Contributor Author

This is a re-submission of #280

@kyleconroy, I rebased, added the exponential backoff, and am now only adding an idempotency key to post requests. A couple questions:

  1. Which README should I update? The README in this project has very little currently, so I don't think that's the place for it.
  2. Are you sure we don't need to add an idempotency key to DELETE requests? Seems like if the first one had already hit the server successfully, then the second one will get an InvalidRequest error.

@@ -232,6 +268,15 @@ def self.request_headers(api_key)
end
end

# the build machines run ruby 1.8.7, and so do not have SecureRandom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer support Ruby 1.8.7, so we can assume that SecureRandom is defined.

@kyleconroy
Copy link
Contributor

  1. The main README is the best place we have right now for this documentation.
  2. Agreed, we should add the tokens for both POST and DELETE.

That leaves the interface. max_retries_on_network_failure is too verbose. Maybe network_retry_attempts, network_retry_max_sleep, network_retry_min_sleep instead?

cc @russelldavis @brandur for thoughts.

@russelldavis
Copy link
Contributor

How about just max_network_retries?

@kyleconroy
Copy link
Contributor

@russelldavis are you proposing that we don't expose the other values?

@russelldavis
Copy link
Contributor

Oh, sorry, I was just thinking of the first value. How about:

  • max_network_retries
  • initial_network_retry_delay
  • max_network_retry_delay

retry_count = retry_count + 1
sleep sleep_time(retry_count)
response = execute_request_with_rescues(request_opts, api_base_url, retry_count)
if self.on_successful_retry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for this callback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to know if we were ever actually hitting this thing, so I put some logging in that callback. That's how I know this code has actually caught errors for us. (A bit of an edge case, for sure. I wouldn't complain too much if you wanted to remove it)

@brandur
Copy link
Contributor

brandur commented Nov 17, 2015

Looks good to me generally!

Honestly, related to #313, this would be much more cleanly implemented as a Faraday or Excon middleware, but given that we're not there yet, it makes sense not to block on that.

One possible idea here is to just remove the options for max_retry_sleep_seconds and base_retry_sleep_seconds and just use reasonable defaults. Allow users to just set the maximum number of retries instead, and just add the additional configuration back if there's any demand for it (unlikely IMO). This would have the advantage of keeping the API a little smaller in case we want to re-implement it.

raise APIConnectionError.new(message + "\n\n(Network error: #{e.message})")
end

def self.should_retry?(e, retry_count)
return false unless self.max_retries_on_network_failure > retry_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this seems easier to follow if rewritten as:

return false if retry_count >= self.max_retries_on_network_failure

@booleanbetrayal
Copy link

looking forward to seeing this PR land!

@kyleconroy
Copy link
Contributor

Agreed, keeping the surface area of this change low (one settable attribute) seems like a smart choice. I'd also be a fan of getting rid of the retry callback.

@nbrustein
Copy link
Contributor Author

@kyleconroy, summarizing the discussion above, I'm going to do the following:

  1. Rename max_retries_on_network_failure to max_network_retries
  2. Rename max_retry_sleep_seconds to max_network_retry_delay
  3. Rename base_retry_sleep_seconds to initial_network_retry_delay
  4. Document retry stuff in README.rdoc (this still seems a little strange to me given what else is in that file currently, but I don't mind adding it)
  5. For max_network_retry_delay and initial_network_retry_delay I can hardcode them and make them not configurable, or I can just leave them configurable but not bother to document them. (Let me know what you prefer.)
  6. Remove on_successful_retry (Or I can just leave it, add a comment for why it's there, and not document it. Let me know what you prefer.)

Anything else I missed? Thanks for the comments.

@kyleconroy
Copy link
Contributor

  1. Please hardcode the values, making them not configurable
  2. Please remove on_successful_retry

Other than that, looks good!

@nbrustein nbrustein force-pushed the retry-network-failures branch 2 times, most recently from 15d818b to 66ced30 Compare November 18, 2015 20:26
@nbrustein
Copy link
Contributor Author

Should be good to go now.

@booleanbetrayal
Copy link

@kyleconroy - anything else blocking this feature?

@kyleconroy
Copy link
Contributor

@nbrustein max_retries_on_network_failure needs to be renamed to max_network_retries. generate_random_idempotency_key should only use SecureRandom. We no longer support 1.8.7, so we can assume it's available.

@nbrustein
Copy link
Contributor Author

@kyleconroy those changes are in. Let me know if there's anything else.

@kyleconroy
Copy link
Contributor

LGTM @brandur can you take one last look as well?

@brandur
Copy link
Contributor

brandur commented Nov 23, 2015

@kyleconroy +1 from me.

@nbrustein Thanks for the patch!

kyleconroy added a commit that referenced this pull request Nov 23, 2015
feat(request): optionally retry network failures
@kyleconroy kyleconroy merged commit 4e01b13 into stripe:master Nov 23, 2015
@kyleconroy
Copy link
Contributor

Thanks again @nbrustein. We'll cut a new release in around a week, as we'll want to wait for the holidays to be over.

@booleanbetrayal
Copy link

👍

@dziemian007
Copy link

Hi @kyleconroy, could you please update rubygems so we can see see this change without specifying github repository in Gemfile? Thanks!

@brandur
Copy link
Contributor

brandur commented Jan 5, 2016

@dziemian007 Released as 1.32.0. Sorry about the delay!

@booleanbetrayal
Copy link

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants