Separate timeout errors by type #850

maxim · 2024-04-21T18:00:03Z

Excon used to raise Excon::Error::Timeout for any type of timeout. This PR splits this error into multiple specific ones, with the following inheritance hierarchy:

Timeout
|_ReadTimeout
|_WriteTimeout
|_ConnectTimeout
  |_ConnectReadTimeout
  |_ConnectWriteTimeout

Since all timeout errors inherit from Excon::Error::Timeout, we maintain backwards compatibility.

A bonus benefit is that even when we raise a generic "request timeout reached" we are still raising the correct type of exception, such that a user can tell the phase of the request in which timeout actually occurred. This makes the request timeout feature more powerful.

Breaking changes

The DEFAULT_RETRY_ERRORS no longer includes read and write timeouts, only connect. It seems to me the safer option, but I wonder how you feel about making this change here.
I did change exception "connect timeout reached" to say "connect [read/write] timeout reached" instead. We probably shouldn't consider it a breaking change, but I can see it both ways.

Questions

We could probably use more tests. If you think we should, where should I add them? (test, tests, or spec)?

Excon used to raise Excon::Error::Timeout for any type of timeout. This commit splits this error into multiple specific ones, with the following inheritance hierarchy: Timeout |_ReadTimeout |_WriteTimeout |_ConnectTimeout |_ConnectReadTimeout |_ConnectWriteTimeout

lib/excon/socket.rb

maxim · 2024-04-21T18:20:48Z

lib/excon/error.rb

+      def self.described_as(human_name)
+        new("#{human_name.to_s.tr('_', ' ')} timeout reached")
+      end


I found this to be the most elegant way to give us a little flexibility while maintaining DRY-ness, and keeping exceptions completely vanilla otherwise. (Not overwriting their initialize, etc).

Also, I was tempted to put .to_s.tr(…) into the caller, but held off on it for now. Curious what you think.

Yeah, I wasn't totally sure about this part. Other than request timeouts, I think they all just use their type for the human name. And especially if we made request timeouts their own thing (as I suggest in another comment), even that distinction is no longer true. At that point, it seems like it might be simpler to drop the human_name and described_as parts and move it all up into the by_type method as you suggest. What do you think?

maxim · 2024-04-21T19:30:51Z

lib/excon/error.rb

+      def self.by_type(type, human_name = type)
+        case type
+        when :connect_read
+          Excon::Errors::ConnectReadTimeout.described_as(human_name)
+        when :connect_write
+          Excon::Errors::ConnectWriteTimeout.described_as(human_name)
+        when :read
+          Excon::Errors::ReadTimeout.described_as(human_name)
+        when :write
+          Excon::Errors::WriteTimeout.described_as(human_name)
+        end
+      end


Not sure if by_type is clear enough. It seems to make sense in caller context.

It seems pretty good. The only further thing that comes to mind would be to tweak it toward something like new_by_type since it is kind of a specialized constructor that might make it a little more obvious the intention. What do you think?

geemus · 2024-04-23T13:14:17Z

lib/excon/socket.rb

@@ -364,6 +364,9 @@ def select_with_timeout(socket, type)
      if @data.include?(:deadline)
        request_timeout = request_time_remaining

+        # If request timeout is already reached, there's no need to proceed.
+        raise(Excon::Errors::Timeout.by_type(type, 'request')) if request_timeout <= 0


I think, while we are at it, that it might actually make sense to break this out into Excon::Errors::RequestTimeout (another child of Timeout I would think). That would be inline with the other changes I think, and again provide a bit more information to users (who can also just deal with the parent class as they used to if they prefer).

The only concern is that if we do that, we lose the benefit of being able to tell which specific timeout caused the overall request timeout. It would be powerful if I could set timeout on the entire request, but retry only on connect timeouts.

That's why I opted to raise specific exceptions, while describing them in the message as 'request timeout reached'.

Another alternative is to repeat the whole hierarchy for Request, and raise one of the request exceptions when it's supposed to be a request timeout:

Timeout |_ReadTimeout |_WriteTimeout |_ConnectTimeout |_ConnectReadTimeout |_ConnectWriteTimeout |_RequestTimeout |_RequestReadTimeout |_RequestWriteTimeout |_RequestConnectTimeout |_RequestConnectReadTimeout |_RequestConnectWriteTimeout

this seems a little excessive however.

Are you sure you'd want to hide the specific timeout error behind RequestTimeout?

Forgot to mention this detail: there is already an exception called RequestTimeout (code 408), which would be in conflict if we did this.

Edit: I'll wait for us to make this decision before we proceed, since this affects a few other things. (No rush or pressure, I'm already surprised at how responsive you are.)

Oops, right...

For some context, if I recall correctly the request timeout here is harder to pin down to a particular cause in the same way.

ie I expect read timeout is specifically that I started waiting at the beginning of a read and it took longer than I wanted, so I raised. Same thing for write or connect errors.

I believe the intention with the error we are talking about here is actually a timeout for the entirety of the request. So the timer starts as soon as the request process begins and it times out as soon as the limit is surpassed. That being said, it's possible for instance that the timeout actually would occur in between the other activities during excon's own processing. ie if connect took just slightly less than the request limit, the timeout would technically occur in between connect and write. In that case, it wouldn't really be correct to say that the timout was either connect or write (and hence the generic error we see here).

Does that make sense?

All that being said, I'm not sure that "request" is a very good descriptor of these expectations. Maybe it should be something like "OverallTimeout" or "TotalTimeout" or something (I'm definitely open to suggestions). It's also possible that this is more trouble than it's worth and something like what you have done is good enough. What do you think?

I'm glad to happily surprise you in terms of responsiveness, though paradoxically I was feeling bad for how long it took to get back to you...

It's also possible that this is more trouble than it's worth and something like what you have done is good enough.

Could be, but you're right that we have to think this through.

it's possible for instance that the timeout actually would occur in between the other activities

It's true, we only check monotonic clock once per operation, so we can't know when the deadline was actually crossed. Could've been during the beginning of the current phase, the end of the previous one, or in between.

But if you think about it, for "RequestTimeout" we don't actually care exactly when timeout occurred. We only care which phases of request completed successfully at the moment of raising this error. So maybe we indeed should go with something like TotalTimeout (better name pending), but add a piece of metadata on it, like:

rescue Excon::Errors::TotalTimeout => e puts e.completed_request_phases # => [:connect_write, :connect_read, :write] end

If request timeout is not set by the user, then we fallback to specific timeout errors. I don't know the code deep enough to tell if this is easy to do, but at least it avoids the need for impossible timeout precision. How does this sound?

Sounds good to me. Naming wise maybe we could combine the old and new ideas, so something like OverallRequestTimeout or TotalRequestTimeout? I kind of like overall between those I think. What do you think?

OverallRequestTimeout looks like a good option. We could also consider a different angle: RequestDurationTimeout, RequestSpanTimeout. What do you think about these?

Oh, yeah. I really like RequestDurationTimeout. I think that is probably my new favorite, great idea!

geemus · 2024-04-23T13:21:29Z

I like it so far, offered a few suggestions as well. Thanks!

Breaking Changes

I think it should be relatively safe, though it may change behaviors for some users, those behaviors were probably somewhat dangerous to begin with.
I wouldn't really consider changing the text of the errors as breaking. I hope at least that those particulars aren't being relied upon, but it's not unlikely that someone will have done something there.

Tests

Probably just in tests would be good for further tests, as I think that is still where the brunt of the tests live. I think it would be good to at least try and trigger each of the error cases if we can. The test stuff is a bit of a mess and it would be great to consolidate/etc. Someone started converting things and then lost steam, and I haven't had bandwidth to go back to it (plus it's hard to prioritize when what we have works well enough, albeit in a bit of a messy setup).

maxim · 2024-05-10T19:47:03Z

@geemus Couple of updates.

I decided to abandon the idea of showing which steps succeeded in a request timeout, because I'm not sure how to figure that out.
I tried to catch a connect timeout by connecting to a reserved black hole IPV6 address 100::, but instead I'm getting: Excon::Error::Socket: Operation timed out - connect(2) for [100::]:9292 (Errno::ETIMEDOUT). It looks like excon doesn't raise its own Timeout error sometimes. Is that expected?

maxim commented Apr 21, 2024

View reviewed changes

lib/excon/socket.rb Show resolved Hide resolved

maxim mentioned this pull request Apr 21, 2024

Split read/write/connect timeout error classes #849

Open

fixup! Separate timeout errors by type

87e91d9

maxim commented Apr 21, 2024

View reviewed changes

fixup! Separate timeout errors by type

9bfda52

maxim commented Apr 21, 2024

View reviewed changes

geemus reviewed Apr 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate timeout errors by type #850

Separate timeout errors by type #850

maxim commented Apr 21, 2024 •

edited

maxim Apr 21, 2024

geemus Apr 23, 2024

maxim Apr 21, 2024

geemus Apr 23, 2024

geemus Apr 23, 2024

maxim Apr 23, 2024 •

edited

maxim Apr 23, 2024 •

edited

geemus Apr 25, 2024

maxim Apr 26, 2024

geemus Apr 30, 2024

maxim May 2, 2024

geemus May 3, 2024

geemus commented Apr 23, 2024 •

edited

maxim commented May 10, 2024

Separate timeout errors by type #850

Are you sure you want to change the base?

Separate timeout errors by type #850

Conversation

maxim commented Apr 21, 2024 • edited

Breaking changes

Questions

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxim Apr 23, 2024 • edited

Choose a reason for hiding this comment

maxim Apr 23, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geemus commented Apr 23, 2024 • edited

Breaking Changes

Tests

maxim commented May 10, 2024

maxim commented Apr 21, 2024 •

edited

maxim Apr 23, 2024 •

edited

maxim Apr 23, 2024 •

edited

geemus commented Apr 23, 2024 •

edited