Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate timeout errors by type #850

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

maxim
Copy link

@maxim maxim commented Apr 21, 2024

Excon used to raise Excon::Error::Timeout for any type of timeout. This PR splits this error into multiple specific ones, with the following inheritance hierarchy:

Timeout
|_ReadTimeout
|_WriteTimeout
|_ConnectTimeout
  |_ConnectReadTimeout
  |_ConnectWriteTimeout

Since all timeout errors inherit from Excon::Error::Timeout, we maintain backwards compatibility.

A bonus benefit is that even when we raise a generic "request timeout reached" we are still raising the correct type of exception, such that a user can tell the phase of the request in which timeout actually occurred. This makes the request timeout feature more powerful.

Breaking changes

  • The DEFAULT_RETRY_ERRORS no longer includes read and write timeouts, only connect. It seems to me the safer option, but I wonder how you feel about making this change here.
  • I did change exception "connect timeout reached" to say "connect [read/write] timeout reached" instead. We probably shouldn't consider it a breaking change, but I can see it both ways.

Questions

We could probably use more tests. If you think we should, where should I add them? (test, tests, or spec)?

Excon used to raise Excon::Error::Timeout for any type of timeout.
This commit splits this error into multiple specific ones, with the
following inheritance hierarchy:

Timeout
|_ReadTimeout
|_WriteTimeout
|_ConnectTimeout
  |_ConnectReadTimeout
  |_ConnectWriteTimeout
Comment on lines +67 to +69
def self.described_as(human_name)
new("#{human_name.to_s.tr('_', ' ')} timeout reached")
end
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this to be the most elegant way to give us a little flexibility while maintaining DRY-ness, and keeping exceptions completely vanilla otherwise. (Not overwriting their initialize, etc).

Also, I was tempted to put .to_s.tr(…) into the caller, but held off on it for now. Curious what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't totally sure about this part. Other than request timeouts, I think they all just use their type for the human name. And especially if we made request timeouts their own thing (as I suggest in another comment), even that distinction is no longer true. At that point, it seems like it might be simpler to drop the human_name and described_as parts and move it all up into the by_type method as you suggest. What do you think?

Comment on lines +54 to +65
def self.by_type(type, human_name = type)
case type
when :connect_read
Excon::Errors::ConnectReadTimeout.described_as(human_name)
when :connect_write
Excon::Errors::ConnectWriteTimeout.described_as(human_name)
when :read
Excon::Errors::ReadTimeout.described_as(human_name)
when :write
Excon::Errors::WriteTimeout.described_as(human_name)
end
end
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if by_type is clear enough. It seems to make sense in caller context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems pretty good. The only further thing that comes to mind would be to tweak it toward something like new_by_type since it is kind of a specialized constructor that might make it a little more obvious the intention. What do you think?

@@ -364,6 +364,9 @@ def select_with_timeout(socket, type)
if @data.include?(:deadline)
request_timeout = request_time_remaining

# If request timeout is already reached, there's no need to proceed.
raise(Excon::Errors::Timeout.by_type(type, 'request')) if request_timeout <= 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, while we are at it, that it might actually make sense to break this out into Excon::Errors::RequestTimeout (another child of Timeout I would think). That would be inline with the other changes I think, and again provide a bit more information to users (who can also just deal with the parent class as they used to if they prefer).

Copy link
Author

@maxim maxim Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only concern is that if we do that, we lose the benefit of being able to tell which specific timeout caused the overall request timeout. It would be powerful if I could set timeout on the entire request, but retry only on connect timeouts.

That's why I opted to raise specific exceptions, while describing them in the message as 'request timeout reached'.

Another alternative is to repeat the whole hierarchy for Request, and raise one of the request exceptions when it's supposed to be a request timeout:

Timeout
|_ReadTimeout
|_WriteTimeout
|_ConnectTimeout
  |_ConnectReadTimeout
  |_ConnectWriteTimeout
|_RequestTimeout
  |_RequestReadTimeout
  |_RequestWriteTimeout
  |_RequestConnectTimeout
    |_RequestConnectReadTimeout
    |_RequestConnectWriteTimeout

this seems a little excessive however.

Are you sure you'd want to hide the specific timeout error behind RequestTimeout?

Copy link
Author

@maxim maxim Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mention this detail: there is already an exception called RequestTimeout (code 408), which would be in conflict if we did this.

Edit: I'll wait for us to make this decision before we proceed, since this affects a few other things. (No rush or pressure, I'm already surprised at how responsive you are.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, right...

For some context, if I recall correctly the request timeout here is harder to pin down to a particular cause in the same way.

ie I expect read timeout is specifically that I started waiting at the beginning of a read and it took longer than I wanted, so I raised. Same thing for write or connect errors.

I believe the intention with the error we are talking about here is actually a timeout for the entirety of the request. So the timer starts as soon as the request process begins and it times out as soon as the limit is surpassed. That being said, it's possible for instance that the timeout actually would occur in between the other activities during excon's own processing. ie if connect took just slightly less than the request limit, the timeout would technically occur in between connect and write. In that case, it wouldn't really be correct to say that the timout was either connect or write (and hence the generic error we see here).

Does that make sense?

All that being said, I'm not sure that "request" is a very good descriptor of these expectations. Maybe it should be something like "OverallTimeout" or "TotalTimeout" or something (I'm definitely open to suggestions). It's also possible that this is more trouble than it's worth and something like what you have done is good enough. What do you think?

I'm glad to happily surprise you in terms of responsiveness, though paradoxically I was feeling bad for how long it took to get back to you...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible that this is more trouble than it's worth and something like what you have done is good enough.

Could be, but you're right that we have to think this through.

it's possible for instance that the timeout actually would occur in between the other activities

It's true, we only check monotonic clock once per operation, so we can't know when the deadline was actually crossed. Could've been during the beginning of the current phase, the end of the previous one, or in between.

But if you think about it, for "RequestTimeout" we don't actually care exactly when timeout occurred. We only care which phases of request completed successfully at the moment of raising this error. So maybe we indeed should go with something like TotalTimeout (better name pending), but add a piece of metadata on it, like:

rescue Excon::Errors::TotalTimeout => e
  puts e.completed_request_phases # => [:connect_write, :connect_read, :write]
end

If request timeout is not set by the user, then we fallback to specific timeout errors. I don't know the code deep enough to tell if this is easy to do, but at least it avoids the need for impossible timeout precision. How does this sound?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Naming wise maybe we could combine the old and new ideas, so something like OverallRequestTimeout or TotalRequestTimeout? I kind of like overall between those I think. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OverallRequestTimeout looks like a good option. We could also consider a different angle: RequestDurationTimeout, RequestSpanTimeout. What do you think about these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah. I really like RequestDurationTimeout. I think that is probably my new favorite, great idea!

@geemus
Copy link
Contributor

geemus commented Apr 23, 2024

I like it so far, offered a few suggestions as well. Thanks!

Breaking Changes

  • I think it should be relatively safe, though it may change behaviors for some users, those behaviors were probably somewhat dangerous to begin with.
  • I wouldn't really consider changing the text of the errors as breaking. I hope at least that those particulars aren't being relied upon, but it's not unlikely that someone will have done something there.

Tests

  • Probably just in tests would be good for further tests, as I think that is still where the brunt of the tests live. I think it would be good to at least try and trigger each of the error cases if we can. The test stuff is a bit of a mess and it would be great to consolidate/etc. Someone started converting things and then lost steam, and I haven't had bandwidth to go back to it (plus it's hard to prioritize when what we have works well enough, albeit in a bit of a messy setup).

@maxim
Copy link
Author

maxim commented May 10, 2024

@geemus Couple of updates.

  1. I decided to abandon the idea of showing which steps succeeded in a request timeout, because I'm not sure how to figure that out.
  2. I tried to catch a connect timeout by connecting to a reserved black hole IPV6 address 100::, but instead I'm getting: Excon::Error::Socket: Operation timed out - connect(2) for [100::]:9292 (Errno::ETIMEDOUT). It looks like excon doesn't raise its own Timeout error sometimes. Is that expected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants