Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default retry behaviour includes HTTP 4xx client errors #833

Open
rahim opened this issue Sep 25, 2023 · 6 comments
Open

Default retry behaviour includes HTTP 4xx client errors #833

rahim opened this issue Sep 25, 2023 · 6 comments
Labels

Comments

@rahim
Copy link

rahim commented Sep 25, 2023

See fog/fog-aws#690 for background on how I stumbled on this. I opened a PR in fog-aws to configure round this, but it feels like something that might be better addressed here.

#690 added the ability to configure which errors were retried.

Excon sets the following for retry_errors

DEFAULT_RETRY_ERRORS = [
Excon::Error::Timeout,
Excon::Error::Socket,
Excon::Error::HTTPStatus
]

Client, Success, Redirection, Informational and Server are all descendants of Excon::Error::HTTPStatus.

excon/lib/excon/error.rb

Lines 68 to 125 in 85556ae

class HTTPStatus < Error
attr_reader :request, :response
def initialize(msg, request = nil, response = nil)
super(msg)
@request = request
@response = response
end
end
# HTTP Error classes
class Informational < HTTPStatus; end
class Success < HTTPStatus; end
class Redirection < HTTPStatus; end
class Client < HTTPStatus; end
class Server < HTTPStatus; end
class Continue < Informational; end # 100
class SwitchingProtocols < Informational; end # 101
class OK < Success; end # 200
class Created < Success; end # 201
class Accepted < Success; end # 202
class NonAuthoritativeInformation < Success; end # 203
class NoContent < Success; end # 204
class ResetContent < Success; end # 205
class PartialContent < Success; end # 206
class MultipleChoices < Redirection; end # 300
class MovedPermanently < Redirection; end # 301
class Found < Redirection; end # 302
class SeeOther < Redirection; end # 303
class NotModified < Redirection; end # 304
class UseProxy < Redirection; end # 305
class TemporaryRedirect < Redirection; end # 307
class BadRequest < Client; end # 400
class Unauthorized < Client; end # 401
class PaymentRequired < Client; end # 402
class Forbidden < Client; end # 403
class NotFound < Client; end # 404
class MethodNotAllowed < Client; end # 405
class NotAcceptable < Client; end # 406
class ProxyAuthenticationRequired < Client; end # 407
class RequestTimeout < Client; end # 408
class Conflict < Client; end # 409
class Gone < Client; end # 410
class LengthRequired < Client; end # 411
class PreconditionFailed < Client; end # 412
class RequestEntityTooLarge < Client; end # 413
class RequestURITooLong < Client; end # 414
class UnsupportedMediaType < Client; end # 415
class RequestedRangeNotSatisfiable < Client; end # 416
class ExpectationFailed < Client; end # 417
class UnprocessableEntity < Client; end # 422
class TooManyRequests < Client; end # 429
class InternalServerError < Server; end # 500
class NotImplemented < Server; end # 501
class BadGateway < Server; end # 502
class ServiceUnavailable < Server; end # 503
class GatewayTimeout < Server; end # 504

Incomplete tree for some additional context:

StandardError
└──Excon::Error
   ├──Warning
   ├──Socket
   │  └──Certificate
   ├──InvalidHeaderKey
   ├──InvalidHeaderValue
   ├──Timeout
   ├──ResponseParse
   ├──ProxyConnectionError
   ├──ProxyParse
   ├──TooManyRedirects
   └──HttpStatus
      ├──Informational
      ├──Success
      │  ├──OK
      │  ├──Created
      │  ├──Accepted
      │  └──[...]
      ├──Redirection
      ├──Client
      │  ├──BadRequest
      │  ├──Unauthorized
      │  ├──PaymentRequired
      │  ├──Forbidden
      │  ├──NotFound
      │  ├──MethodNotAllowed
      │  ├──NotAcceptable
      │  ├──ProxyAuthenticationRequired
      │  ├──RequestTimeout
      │  ├──Conflict
      │  ├──Gone
      │  └──[...]
      └──Server
         ├──InternalServerError
         ├──BadGateway
         ├──ServiceUnavailable
         └──GatewayTimeout

In terms of descendants of HttpStatus I struggle to think of situations where retrying anything other than Excon::Error::Server and perhaps Excon::RequestTimeout and Excon::TooManyRequests could be useful. Blanketing the failures is certainly the conservative choice here though, there's surely scenarios I've just not had the imagination to think up.

When retry_interval is 0 (as is the case in Excon's default configuration), these choices tend to have minimal impact because repeat queries are typically so fast, but when the interval is increased waiting on retries for scenarios we have no reason to believe would change becomes a cost.

My suggestion is that we set

  DEFAULT_RETRY_ERRORS = [
    Excon::Error::Timeout,
    Excon::Error::Socket,
    Excon::Error::Server
  ]

Possibly also including Excon::RequestTimeout and Excon::TooManyRequests.

I'm happy to open a PR if there's agreement this is a reasonable approach.

@geemus
Copy link
Contributor

geemus commented Sep 26, 2023

Hey, I mentioned this on one of the fog-aws issues, but I think the main potential use case that I can recall is to smooth eventual consistency. ie a NotFound is retried, with some delay, making it so the end user doesn't have to necessarily deal with that themselves. (or at least I believe that was my intention back then).

To your points though, there are a few problems with this.

  1. Although this behavior might be helpful, the implementation is much broader than this and likely retries many things that would always fail, wasting time and effort. So even if we decided to keep this behavior, we would might want to narrow it to only apply to the not found case.
  2. The not found/eventual consistency case is not actually that common in the wild I don't think. Some of how this is probably reflects that this was originally just a portion of fog, which initially only supported AWS (where eventual consistency was definitely something you have to regularly contend with). Most things are not aws though, and so it might warrant different defaults in the general case vs the aws case.
  3. The appropriate intervals for the eventual consistency case and other cases may be quite different. ie it may make sense for no delay in some cases, but for eventual consistency and TooManyRequests something like exponential backoff might be more appropriate. Maybe that means these should be broken out or treated differently.

It's possible that the right approach might be to separate out the eventual consistency stuff and handle that specifically in fog-aws (and not here where it's rarely applicable). That could also give us more flexibility/leverage to make it configurable.

What do you think?

@rahim
Copy link
Author

rahim commented Sep 28, 2023

It's possible that the right approach might be to separate out the eventual consistency stuff and handle that specifically in fog-aws (and not here where it's rarely applicable). That could also give us more flexibility/leverage to make it configurable.

What do you think?

That sounds a promising path - I'd been starting to think along similar lines – that trying to cover off the eventual consistency case as the default behaviour in a HTTP library feels like overstretching.

Default retry of anything is somewhat surprising when compared with other HTTP libraries, to my knowledge, all of net-http/httparty/faraday/curl/python-requests/js-fetch/axios require some form of explicit opt-in to get any retry.

At the fog service level, where waiting on eventual consistency is something that might be desirable, it feels like something that would benefit from specific affordances that call out exactly what's going on.

A good comparison is the AWS Ruby SDK's waiters pattern:

ec2 = Aws::EC2::Client.new
ec2.wait_until(:instance_running, instance_ids:['i-12345678'])

@geemus
Copy link
Contributor

geemus commented Sep 29, 2023

@rahim yeah, in fog there is the somewhat similar Fog.wait_for though it isn't always used. I also agree that removing the client errors from retries sounds like a good fix in general (and even more so given our discussion in fog-aws about the state of everything).

I don't think there should be any retries by default though. The middleware is loaded by default, but I believe the actual retry is based on the truthiness of :idempotent, which defaults to false (see: https://github.com/excon/excon/blob/master/lib/excon/constants.rb#L151). So in that regard, I guess it's inline with other libraries, though I'm not sure they have a similar notion of idempotence and retries at all. Does that make sense?

@rahim
Copy link
Author

rahim commented Sep 29, 2023

I don't think there should be any retries by default though. The middleware is loaded by default, but I believe the actual retry is based on the truthiness of :idempotent, which defaults to false

Ah, thank you - that was a detail I'd missed.

When I was reviewing the various fog retry behaviours it sounds like I ought to have been grepping for where that idempotent flag got set too.

@geemus
Copy link
Contributor

geemus commented Sep 29, 2023

No worries, it's pretty easy to overlook I think.

Copy link

This issue has been marked inactive and will be closed if no further activity occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants