Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of idle timeout (aws nat gateway fixed idle timeout of 350 seconds) #3184

Open
ecerulm opened this issue Nov 8, 2023 · 3 comments · May be fixed by #3275
Open

Handling of idle timeout (aws nat gateway fixed idle timeout of 350 seconds) #3184

ecerulm opened this issue Nov 8, 2023 · 3 comments · May be fixed by #3275

Comments

@ecerulm
Copy link
Contributor

ecerulm commented Nov 8, 2023

Context

using a connection pool to a given host where the connection are going through an AWS NAT Gateway which has a fixed idle timeout of 350 seconds.

The issue is that when a new request is performed after 350 seconds, the connection will error with "Connection reset by peer" (nat gateway will send an explicit RST when trying to send to that connection after the idle timeout) , then it will retry but the next connection in the pool will also be "reset by peer" (it was idle too).

I would be nice to have idle_timeout=349 parameter for PoolManager/ConnectionPool, so that urllib3 will check if the connection has been idle for more than that and discard it before use it.

I think today (from comment

When a connection is pulled from the pool it's checked to see if the underlying socket is still open and if not the connection is reopened.

The problem with that is the the underlying socket is still open (you will only get the RST from aws nat gateway) after you try to use it.

So my proposal is is to in addition to check if the underlying socket is still open, to do also a check if the idle_timeout has passed, if it has pass reopen the connection.

Is it something you currently cannot do?
well, I guess , I can catch the exception (after all the retries) , and drop the whole poolmanager/connectionpool.

Is this related to an existing issue/problem?

Yes, connections via nat gateway have implicit idle timeout and connection are not explicitly reset by nat gateway, they are only resetted when you try to used them. That plays bad with current urllib behavior because it will retry but all connections for that host are "tainted", so it should not try to use them .

Alternatives

Can you achieve the same result doing it in an alternative way?
Is the alternative considerable?

I guess that will require some error handling and drop the poolmanager/connectionpool and create a new one when this is detected.

Duplicate

Has the feature been requested before?
If so, please provide a link to the issue.

#2498 was dismissed on the grounds that it will require a background thread, but I think there is no need for that. This could be done by checking if the idle_timeout has passed when the connection is pulled from the pool and reopening the connection if the timeout has passed. I guess #2498 was calling for a more explicitly closes the connection after the idle_timeout, but I'm asking more for a "consider idle connection dead after x seconds).

Contribution

Would you be willing to submit a PR?
(Help can be provided if you need assistance submitting a PR)

@sethmlarson
Copy link
Member

sethmlarson commented Jan 6, 2024

I could see a feature like this being useful, especially how you describe it with checking a "last known good" time against the idle_timeout on the connection pool. Would cut down on the number of unnecessary retries happening as well.

What do other maintainers think about this feature? cc @pquentin @illia-v If we think it's useful we can add a bounty to this one.

@illia-v
Copy link
Member

illia-v commented Jan 7, 2024

I agree that this can be useful

@ecerulm
Copy link
Contributor Author

ecerulm commented Jan 8, 2024

I think I can create a PR for this feature, that would be my first contribution to this project, so I'm not sure about the process, can you assign the issue to me?

@ecerulm ecerulm linked a pull request Jan 10, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants