New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hang inside call to PoolManager request #1817
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start, thanks! You've even included a test! 💯
I've left a few comments, we may need a few iterations before this can be merged, but that's perfectly normal and expected, please don't take any comments here personally. :)
src/urllib3/poolmanager.py
Outdated
except ( | ||
HTTPError, | ||
SocketError, | ||
BaseSSLError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain why we're catching less exceptions here compared to the existing drain_and_release_conn
?
urllib3/src/urllib3/connectionpool.py
Lines 763 to 776 in 8c7a43b
def drain_and_release_conn(response): | |
try: | |
# discard any remaining response body, the connection will be | |
# released back to the pool once the entire response is read | |
response.read() | |
except ( | |
TimeoutError, | |
HTTPException, | |
SocketError, | |
ProtocolError, | |
BaseSSLError, | |
SSLError, | |
): | |
pass |
(It's probably perfectly fine, I just haven't thought about that yet.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at exceptions.py, the exceptions TimeoutError, ProtocolError, and SSLError are subclassed from HTTPError and thus caught by HTTPError. Since no individual action is required, using HTTPError should suffice. Are there exeptions that should be passed on?
I could find no code for HTTPException. There are imports such as
from .packages.six.moves.http_client import HTTPException # noqa: F401
but I could not find the code. Does it exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My coding style is not to have silent failures as they can cause insidious bugs and undesired side-effects. Instead of "pass", would it be good practice to log a debug statement here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could find no code for HTTPException. There are imports such as
from .packages.six.moves.http_client import HTTPException # noqa: F401
but I could not find the code. Does it exist?
six is a package we use to help with Python 2/3 compatibility. six.moves
allows to use the same name for standard library modules that were renamed. See https://six.readthedocs.io/#module-six.moves for details. And indeed, httplib was renamed to http.client in Python 3. And this is where HTTPException
is defined: https://docs.python.org/3/library/http.client.html#http.client.HTTPException. (Unfortunately, urllib3 is deeply entangled with http.client/httplib, which is one reason why https://github.com/python-trio/hip exists.)
Why .packages
, you may ask? Because we vendor all urllib3 dependencies, six is currently included in the source: https://github.com/urllib3/urllib3/blob/master/src/urllib3/packages/six.py
Now, you're right that if we include HTTPException
, we don't need other classes, but that was already the case in the existing drain_and_release_conn
function. Unless we have another reason to have two different variations of that function, we should move it to a common place where it can be called. https://en.wikipedia.org/wiki/Don%27t_repeat_yourself
This is why I was suggesting moving that function to src/urllib3/response.py. Does that make sense?
My coding style is not to have silent failures as they can cause insidious bugs and undesired side-effects. Instead of "pass", would it be good practice to log a debug statement here?
This makes sense in general, but in that case it's not a failure, we expect to see those exceptions in the happy path. (Maybe we'll want to refine the list in the future because some exceptions should not be silently ignored, but that's an another issue.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pquentin
I understand HTTPException and will add it to the list. As for logging, I agree with your reasoning.
I was looking at the code base, I and I see in response.py
def release_conn(self):
if not self._pool or not self._connection:
return
self._pool._put_conn(self._connection)
self._connection = None
The function drain_and_release_conn() acts on the response object and not the connection. Thus putting the function in response.py is good (DRY), but I think the function name should change. Being in HTTPReponse, I would suggest a name drain() so the function would be:
def drain(self):
try:
# Discard any remaining response body. As a result, the connection
# will be released back to the pool now the entire response has been read
self.read()
except (
HTTPError,
SocketError,
BaseSSLError,
HTTPException
):
pass
Your thoughts?
@pquentin |
I'm not sure I understand the question, since the dummy server uses Tornado. We could try to modify the port and the host between redirects to in order to get multiple connection pools just like in your reproducing example in #1808, but I'm not sure we need this. I think we could just have one redirect, and then make sure we have 1/ only one connection pool 2/ no connection left open in that pool. I'll be able to experiment more on Monday, I'm unlikely to have enough time until then. |
The only CI failure is about lint. You need to install black and run it on |
Regarding Tornado server. PS. Setting up the bindings for two ports was quite easy. As you said, not needed. |
@pquentin |
Thanks! This is looking much better. I'm very happy with the progress here. I think we can improve the tests by 1/ simplifying the existing PoolManager one 2/ testing the I don't know what to think about the drain() name right now. I think I would have just left the function out of the Response class. What I know is that we should not have to comment "this releases the connection!" every time we call this function, it's better to encode this in the name somehow. |
Here is how I would rewrite test_redirect_preload: def test_redirect_without_preload_releases_connection(self):
with PoolManager(block=True, maxsize=2) as http:
r = http.request(
"GET", "%s/redirect" % self.base_url, preload_content=False
)
assert r._pool.num_requests == 2
assert r._pool.num_connections == 1
assert len(http.pools) == 1 It's carefully designed to be as concise as possible but still:
We're checking that with a redirect and (By the way, since it's not the simplest case, I would not place it first in the TestPoolManager class.) Does that make sense? (The next step is to figure out how to test the raise_on_redirect case.) |
src/urllib3/response.py
Outdated
@@ -29,6 +30,17 @@ | |||
log = logging.getLogger(__name__) | |||
|
|||
|
|||
def drain_conn(response): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For what it's worth, I now believe having drain_conn
inside the response class is perfectly fine, I just needed a better name, and we now have that. But keeping the function outside of the class is fine too.)
@pquentin
Looks to be inside the AWS Python Boto library. Would this be a test platform failure? Can the Travis CI build be re-triggered? |
@pquentin
Looking at this code, I am wondering if function release_conn should internally call drain_conn. Initially, when I looked at the code for release_conn, I saw it was catching socket errors, eg timeout, which implied to me either a full response was received or it was not. However, with the pattern to stream content, if I have it right, I am wondering if I did not get http status code 200, then is there a possibility of outstanding data on the socket that is not being read but released back to the pool. So, should release_conn internally call drain_conn? |
What do you think of the test mentioned above? Does it make sense? If yes, I suggest including it in your code.
I don't know why Travis ran this integration test. @sethmlarson I thought this would only apply to release commits?
Oh, that's a really good question. I just looked at this, and there's no possibility of outstanding data, because before reusing the connection we call But that means we do have two very similar ways to drain connections. One option is to replace the call to drain_conn() with a call to release_conn(). For now let's keep things like they are, and I'll try to think about it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pquentin, @sethmlarson I had a look at rfc7231 Redirect and most redirect status codes do not state a HTTP body to be prohibited or optional (except 307, 308). It seems silly and very unlikely to have a HTTP body response for a HTTP redirect, but it is allowed. But this is my first time delving into standards and there is much context and practice I do not have. I am still of the mind to call drain_conn from inside release_conn as a precaution. Your thoughts? Separately, I like the simplicity of your test case and I will replace it in the code. Good thinking. I will think about the other test case you talked of. |
Oh, you're right, we don't want to close connections when we can avoid it. Thanks, I had totally missed that. Your understanding of all this is impressing!
Can you please open an issue so that we can discuss about this separately? It would be a much broader change than the one we're discussing here, and I'd like to get this code merged. The change currently looks really good, we just need to find a way to test the |
Okay, here is the final bit: diff --git a/test/with_dummyserver/test_poolmanager.py b/test/with_dummyserver/test_poolmanager.py
index 3f130a94..1160a2b6 100644
--- a/test/with_dummyserver/test_poolmanager.py
+++ b/test/with_dummyserver/test_poolmanager.py
@@ -115,6 +115,7 @@ class TestPoolManager(HTTPDummyServerTestCase):
% (self.base_url, self.base_url)
},
retries=1,
+ preload_content=False,
)
with pytest.raises(MaxRetryError):
@@ -126,8 +127,15 @@ class TestPoolManager(HTTPDummyServerTestCase):
% (self.base_url, self.base_url)
},
retries=Retry(total=None, redirect=1),
+ preload_content=False,
)
+ # Even with preload_content=False and raise on redirects, we reused the same
+ # connection
+ assert len(http.pools) == 1
+ pool = http.connection_from_host(self.host, self.port)
+ assert pool.num_connections == 1
+
def test_redirect_cross_host_remove_headers(self):
with PoolManager() as http:
r = http.request(
This changes allows to test the |
@tycarac We'll also need a rebase from master to make the new coverage checks happy. I'm happy to do this myself in another pull request. I'll make sure to credit you for this change, of course. |
@pquentin |
To be clear, you don't need to make another pull request, you can change your existing branch, by merging master, on rebasing on top of master. If I was to do this myself, then another pull request would have been needed, as I can't change your branch. That being said, feel free to open another pull request if it's easier.
Oh, sorry to hear that. Don't forget that health comes first! (And that many other things come before open source work.)
Yeah, I think we can open an issue to figure out a) later. b) however is ready to be included, as mentioned in #1817 (comment).
No worries, your pull request is not blocking a release. Code waits for releases, but releases don't wait for code. :) |
@pquentin |
Codecov Report
@@ Coverage Diff @@
## master #1817 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 22 22
Lines 2007 2008 +1
=========================================
+ Hits 2007 2008 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fantastic work, this is ready for inclusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a great first contribution, thank you so much for this! 🙌 ❤️
We hope you stay a while and help with more issues if you have the time. :)
Also big thank you to @pquentin for all the reviewing and comments! 💯
Patch for urllib3 hanging inside call to PoolManager request
Closes #1808