Hang inside call to PoolManager request #1817

tycarac · 2020-03-18T10:57:10Z

Patch for urllib3 hanging inside call to PoolManager request

pquentin

This is a great start, thanks! You've even included a test! 💯

I've left a few comments, we may need a few iterations before this can be merged, but that's perfectly normal and expected, please don't take any comments here personally. :)

src/urllib3/poolmanager.py

test/with_dummyserver/test_poolmanager.py

pquentin · 2020-03-18T11:50:20Z

src/urllib3/poolmanager.py

+            except (
+                HTTPError,
+                SocketError,
+                BaseSSLError,


Can you please explain why we're catching less exceptions here compared to the existing drain_and_release_conn?

urllib3/src/urllib3/connectionpool.py

Lines 763 to 776 in 8c7a43b

def drain_and_release_conn(response):

try:

# discard any remaining response body, the connection will be

# released back to the pool once the entire response is read

response.read()

except (

TimeoutError,

HTTPException,

SocketError,

ProtocolError,

BaseSSLError,

SSLError,

):

pass

(It's probably perfectly fine, I just haven't thought about that yet.)

Looking at exceptions.py, the exceptions TimeoutError, ProtocolError, and SSLError are subclassed from HTTPError and thus caught by HTTPError. Since no individual action is required, using HTTPError should suffice. Are there exeptions that should be passed on?

I could find no code for HTTPException. There are imports such as
from .packages.six.moves.http_client import HTTPException # noqa: F401
but I could not find the code. Does it exist?

My coding style is not to have silent failures as they can cause insidious bugs and undesired side-effects. Instead of "pass", would it be good practice to log a debug statement here?

I could find no code for HTTPException. There are imports such as
from .packages.six.moves.http_client import HTTPException # noqa: F401
but I could not find the code. Does it exist?

six is a package we use to help with Python 2/3 compatibility. six.moves allows to use the same name for standard library modules that were renamed. See https://six.readthedocs.io/#module-six.moves for details. And indeed, httplib was renamed to http.client in Python 3. And this is where HTTPException is defined: https://docs.python.org/3/library/http.client.html#http.client.HTTPException. (Unfortunately, urllib3 is deeply entangled with http.client/httplib, which is one reason why https://github.com/python-trio/hip exists.)

Why .packages, you may ask? Because we vendor all urllib3 dependencies, six is currently included in the source: https://github.com/urllib3/urllib3/blob/master/src/urllib3/packages/six.py

Now, you're right that if we include HTTPException, we don't need other classes, but that was already the case in the existing drain_and_release_conn function. Unless we have another reason to have two different variations of that function, we should move it to a common place where it can be called. https://en.wikipedia.org/wiki/Don%27t_repeat_yourself

This is why I was suggesting moving that function to src/urllib3/response.py. Does that make sense?

My coding style is not to have silent failures as they can cause insidious bugs and undesired side-effects. Instead of "pass", would it be good practice to log a debug statement here?

This makes sense in general, but in that case it's not a failure, we expect to see those exceptions in the happy path. (Maybe we'll want to refine the list in the future because some exceptions should not be silently ignored, but that's an another issue.)

@pquentin
I understand HTTPException and will add it to the list. As for logging, I agree with your reasoning.

I was looking at the code base, I and I see in response.py

def release_conn(self): if not self._pool or not self._connection: return self._pool._put_conn(self._connection) self._connection = None

The function drain_and_release_conn() acts on the response object and not the connection. Thus putting the function in response.py is good (DRY), but I think the function name should change. Being in HTTPReponse, I would suggest a name drain() so the function would be:

def drain(self): try: # Discard any remaining response body. As a result, the connection # will be released back to the pool now the entire response has been read self.read() except ( HTTPError, SocketError, BaseSSLError, HTTPException ): pass

Your thoughts?

tycarac · 2020-03-19T01:26:20Z

@pquentin
Good feedback is hard to come by, so I appreciate it. Also, a code base has history, context, practices and styles of its own.
Are you OK with a unit test on Dummy Server? Initially I was experimenting with binding two listening ports with Tornado server, and then I found Dummy Server. The two ports were for the redirect handling but I did not get as far to create a response for the second (redirect) binding.

pquentin · 2020-03-19T07:36:52Z

Are you OK with a unit test on Dummy Server? Initially I was experimenting with binding two listening ports with Tornado server, and then I found Dummy Server. The two ports were for the redirect handling but I did not get as far to create a response for the second (redirect) binding.

I'm not sure I understand the question, since the dummy server uses Tornado. We could try to modify the port and the host between redirects to in order to get multiple connection pools just like in your reproducing example in #1808, but I'm not sure we need this. I think we could just have one redirect, and then make sure we have 1/ only one connection pool 2/ no connection left open in that pool. I'll be able to experiment more on Monday, I'm unlikely to have enough time until then.

pquentin · 2020-03-19T07:39:02Z

The only CI failure is about lint. You need to install black and run it on test/with_dummyserver/test_poolmanager.py. (I'd suggest configuring it in your editor too.) You can then run nox -s lint to see if the problem is fixed.

tycarac · 2020-03-19T08:08:33Z

Regarding Tornado server.
So blur of me. I remember what I did. I started with two ports as I thought I would need to endpoints to test redirection. Then I discovered I that hosts "localhost" and "127.0.0.1" map to two different pools and thus I did not need to use two ports. So please disregard.

PS. Setting up the bindings for two ports was quite easy. As you said, not needed.

tycarac · 2020-03-23T07:12:37Z

@pquentin
I figured out a bit more about git and that forked git repos do not update. Anyway, I made some changes and submitted a new pull request. I am curious to know what you think.

pquentin · 2020-03-24T10:20:08Z

Thanks! This is looking much better. I'm very happy with the progress here.

I think we can improve the tests by 1/ simplifying the existing PoolManager one 2/ testing the retries.raise_on_redirect case. Unfortunately I won't be able to figure out the details until tomorrow morning at best. Sorry about that.

I don't know what to think about the drain() name right now. I think I would have just left the function out of the Response class. What I know is that we should not have to comment "this releases the connection!" every time we call this function, it's better to encode this in the name somehow.

pquentin · 2020-03-26T06:38:37Z

Here is how I would rewrite test_redirect_preload:

def test_redirect_without_preload_releases_connection(self):
    with PoolManager(block=True, maxsize=2) as http:
        r = http.request(
            "GET", "%s/redirect" % self.base_url, preload_content=False
        )
        assert r._pool.num_requests == 2
        assert r._pool.num_connections == 1
        assert len(http.pools) == 1

It's carefully designed to be as concise as possible but still:

make sure we don't leak any connections
avoid hanging without the fix.

We're checking that with a redirect and preload_content=False, we did make two requests, but only used one connection because the first one was released (objective 1). We use maxsize=2 so that the test does not hang if we do use two connections (objective 2). And just to be sure, we check that we only used one connection pool, as we are at the PoolManager level (objective 1).

(By the way, since it's not the simplest case, I would not place it first in the TestPoolManager class.)

Does that make sense?

(The next step is to figure out how to test the raise_on_redirect case.)

pquentin · 2020-03-26T07:26:24Z

src/urllib3/response.py

@@ -29,6 +30,17 @@
 log = logging.getLogger(__name__)


+def drain_conn(response):


(For what it's worth, I now believe having drain_conn inside the response class is perfectly fine, I just needed a better name, and we now have that. But keeping the function outside of the class is fine too.)

tycarac · 2020-03-26T23:57:48Z

@pquentin
The Travis CI build failed with error in the integration testing:

FAIL: test_stress_test_token_bucket (tests.functional.retries.test_bucket.TestTokenBucketThreading)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/urllib3/urllib3/botocore/tests/functional/retries/test_bucket.py", line 109, in test_stress_test_token_bucket
    self.assertTrue(not any(x < (0.3 * mean) for x in distribution))
AssertionError: False is not true

Looks to be inside the AWS Python Boto library. Would this be a test platform failure? Can the Travis CI build be re-triggered?

tycarac · 2020-03-27T04:18:35Z

@pquentin
The code I have been using to fetch web content to local files is:

try:
    rsp = url_client.request('GET', url)
    _logger.debug(f'> {id:4d} resp code:  {rsp.status}')
    if rsp.status == 200:
        _logger.debug(f'> {id:4d} write:      "{filepath.name}"')
        with filepath.open('wb', buffering=_BUFFER_SIZE) as rfp:
            shutil.copyfileobj(rsp, rfp, length=_BUFFER_SIZE)
except urllib3.exceptions.HTTPError as ex:
    _logger.exception(f'> {id:4d} HTTP error')
except Exception as ex:
    _logger.exception('Unexpected')
    raise ex
finally:
    if rsp:
        rsp.release_conn()

Looking at this code, I am wondering if function release_conn should internally call drain_conn. Initially, when I looked at the code for release_conn, I saw it was catching socket errors, eg timeout, which implied to me either a full response was received or it was not. However, with the pattern to stream content, if I have it right, I am wondering if I did not get http status code 200, then is there a possibility of outstanding data on the socket that is not being read but released back to the pool.

So, should release_conn internally call drain_conn?

pquentin · 2020-03-27T07:01:51Z

What do you think of the test mentioned above? Does it make sense? If yes, I suggest including it in your code.

The Travis CI build failed with error in the integration testing:

I don't know why Travis ran this integration test. @sethmlarson I thought this would only apply to release commits?

So, should release_conn internally call drain_conn?

Oh, that's a really good question. I just looked at this, and there's no possibility of outstanding data, because before reusing the connection we call _get_conn, and it closes the socket if needed.

But that means we do have two very similar ways to drain connections. One option is to replace the call to drain_conn() with a call to release_conn(). For now let's keep things like they are, and I'll try to think about it.

sethmlarson

This change looks great, excellent work @tycarac and thank you for the great reviews @pquentin.

re integration tests, the name of the branch is release-1.25.9 so is being treated as a release candidate branch. That failure can be ignored. Are we good to merge? :)

tycarac · 2020-03-28T02:37:37Z

@pquentin, @sethmlarson
I had a look at _get_conn and other functions in connectionpool.py. I may be blur but I do not see the mechanism to clear any residual response data. If this mechanism closes the connection, then would we loose advantages of pooling with the overhead of opening connections? Perhaps redirects are infrequent enough that is not a real issue. I was thinking how I could test this.

I had a look at rfc7231 Redirect and most redirect status codes do not state a HTTP body to be prohibited or optional (except 307, 308). It seems silly and very unlikely to have a HTTP body response for a HTTP redirect, but it is allowed. But this is my first time delving into standards and there is much context and practice I do not have.

I am still of the mind to call drain_conn from inside release_conn as a precaution.

Your thoughts?

Separately, I like the simplicity of your test case and I will replace it in the code. Good thinking. I will think about the other test case you talked of.

pquentin · 2020-03-28T14:10:31Z

If this mechanism closes the connection, then would we loose advantages of pooling with the overhead of opening connections?

Oh, you're right, we don't want to close connections when we can avoid it. Thanks, I had totally missed that. Your understanding of all this is impressing!

I am still of the mind to call drain_conn from inside release_conn as a precaution.

Can you please open an issue so that we can discuss about this separately? It would be a much broader change than the one we're discussing here, and I'd like to get this code merged.

The change currently looks really good, we just need to find a way to test the if retries.raise_on_redirect: case. (Unfortunately our coverage measurement is broken right now so we need to be extra careful here.) I don't think we can test this without explicitly checking that drain_conn() was called since we raise an exception on the first redirect and I don't think we can figure out if the connection was released or not here.

pquentin · 2020-03-30T06:24:38Z

Okay, here is the final bit:

diff --git a/test/with_dummyserver/test_poolmanager.py b/test/with_dummyserver/test_poolmanager.py
index 3f130a94..1160a2b6 100644
--- a/test/with_dummyserver/test_poolmanager.py
+++ b/test/with_dummyserver/test_poolmanager.py
@@ -115,6 +115,7 @@ class TestPoolManager(HTTPDummyServerTestCase):
                         % (self.base_url, self.base_url)
                     },
                     retries=1,
+                    preload_content=False,
                 )
 
             with pytest.raises(MaxRetryError):
@@ -126,8 +127,15 @@ class TestPoolManager(HTTPDummyServerTestCase):
                         % (self.base_url, self.base_url)
                     },
                     retries=Retry(total=None, redirect=1),
+                    preload_content=False,
                 )
 
+            # Even with preload_content=False and raise on redirects, we reused the same
+            # connection
+            assert len(http.pools) == 1
+            pool = http.connection_from_host(self.host, self.port)
+            assert pool.num_connections == 1
+
     def test_redirect_cross_host_remove_headers(self):
         with PoolManager() as http:
             r = http.request(

This changes allows to test the retries_on_redirect case. With this included, I'll finally be able to merge this! I can't push to your branch, so you'll have to do it. Thanks

pquentin · 2020-04-01T05:56:53Z

@tycarac We'll also need a rebase from master to make the new coverage checks happy. I'm happy to do this myself in another pull request. I'll make sure to credit you for this change, of course.

tycarac · 2020-04-03T05:08:06Z

@pquentin
Sorry for the delay. I will make another pull request shortly. I have been unwell but and keen to continue. There are two issues that can be done later: a) possible issue with draining redirect responses and b) the test on redirect.
If the pull request needs tweaking, please do so to get the release out.

pquentin · 2020-04-03T05:15:45Z

Sorry for the delay. I will make another pull request shortly.

To be clear, you don't need to make another pull request, you can change your existing branch, by merging master, on rebasing on top of master. If I was to do this myself, then another pull request would have been needed, as I can't change your branch. That being said, feel free to open another pull request if it's easier.

I have been unwell but and keen to continue.

Oh, sorry to hear that. Don't forget that health comes first! (And that many other things come before open source work.)

There are two issues that can be done later: a) possible issue with draining redirect responses and b) the test on redirect.

Yeah, I think we can open an issue to figure out a) later. b) however is ready to be included, as mentioned in #1817 (comment).

If the pull request needs tweaking, please do so to get the release out.

No worries, your pull request is not blocking a release. Code waits for releases, but releases don't wait for code. :)

tycarac · 2020-04-03T07:41:20Z

@pquentin
Made the pull request correctly (I think). GitHub build checks now in progress. More firsts in learning to apply a git diff :-)

codecov · 2020-04-03T07:47:31Z

Codecov Report

Merging #1817 into master will not change coverage by %.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master     #1817   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           22        22           
  Lines         2007      2008    +1     
=========================================
+ Hits          2007      2008    +1

Flag	Coverage Δ
#unittests	`99.60% <100.00%> (+<0.01%)`	⬆️

Impacted Files	Coverage Δ
src/urllib3/connectionpool.py	`100.00% <100.00%> (ø)`
src/urllib3/poolmanager.py	`100.00% <100.00%> (ø)`
src/urllib3/response.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0286734...8b26a02. Read the comment docs.

pquentin

Thanks! Fantastic work, this is ready for inclusion.

sethmlarson

What a great first contribution, thank you so much for this! 🙌 ❤️
We hope you stay a while and help with more issues if you have the time. :)

Also big thank you to @pquentin for all the reviewing and comments! 💯

issue #1808

c4c7e39

Patch for urllib3 hanging inside call to PoolManager request

pquentin requested changes Mar 18, 2020

View reviewed changes

tycarac changed the title ~~issue #1808~~ urllib3 hanging inside call to PoolManager request Mar 18, 2020

tycarac changed the title ~~urllib3 hanging inside call to PoolManager request~~ Hang inside call to PoolManager request Mar 18, 2020

Fix for broken Python 2.7 build

94e0fc6

tycarac added 3 commits March 23, 2020 14:53

Merge remote-tracking branch 'origin-remote/master' into release-1.25.9

1766e96

Move drain function to HTTPResponse

5bc5e9c

Remove unused imports (exceptions)

ebc54ed

tycarac added 2 commits March 26, 2020 18:00

Move drain function out of HTTPResponse

685ca36

Merge remote-tracking branch 'origin/master' into release-1.25.9

cd3027d

pquentin reviewed Mar 26, 2020

View reviewed changes

Put drain_conn() back in class HTTPResponse

9e3751c

sethmlarson approved these changes Mar 27, 2020

View reviewed changes

Simplify test for redirect and no pre-load

72bc180

Merge remote-tracking branch 'origin/master' into 1817-redirect

65728ed

tycarac added 2 commits April 3, 2020 18:25

Improve test for raise_on_redirect

401d652

Merge remote-tracking branch 'origin/master' into 1817-redirect

8b26a02

pquentin approved these changes Apr 3, 2020

View reviewed changes

sethmlarson approved these changes Apr 3, 2020

View reviewed changes

sethmlarson merged commit 29b214a into urllib3:master Apr 3, 2020

sethmlarson pushed a commit to sethmlarson/urllib3 that referenced this pull request Apr 11, 2020

Drain connection after PoolManager redirect (urllib3#1817)

240aa20

Dobatymo pushed a commit to Dobatymo/urllib3 that referenced this pull request Mar 16, 2022

Drain connection after PoolManager redirect (urllib3#1817)

8b22090

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hang inside call to PoolManager request #1817

Hang inside call to PoolManager request #1817

tycarac commented Mar 18, 2020 •

edited by pquentin

pquentin left a comment

pquentin Mar 18, 2020 •

edited

tycarac Mar 18, 2020 •

edited

tycarac Mar 18, 2020

pquentin Mar 19, 2020

tycarac Mar 22, 2020 •

edited

tycarac commented Mar 19, 2020

pquentin commented Mar 19, 2020

pquentin commented Mar 19, 2020

tycarac commented Mar 19, 2020 •

edited

tycarac commented Mar 23, 2020

pquentin commented Mar 24, 2020

pquentin commented Mar 26, 2020

pquentin Mar 26, 2020

tycarac commented Mar 26, 2020

tycarac commented Mar 27, 2020

pquentin commented Mar 27, 2020

sethmlarson left a comment •

edited

tycarac commented Mar 28, 2020 •

edited

pquentin commented Mar 28, 2020

pquentin commented Mar 30, 2020

pquentin commented Apr 1, 2020

tycarac commented Apr 3, 2020

pquentin commented Apr 3, 2020

tycarac commented Apr 3, 2020

codecov bot commented Apr 3, 2020 •

edited

pquentin left a comment

sethmlarson left a comment •

edited

	def drain_and_release_conn(response):
	try:
	# discard any remaining response body, the connection will be
	# released back to the pool once the entire response is read
	response.read()
	except (
	TimeoutError,
	HTTPException,
	SocketError,
	ProtocolError,
	BaseSSLError,
	SSLError,
	):
	pass

		@@ -29,6 +30,17 @@
		log = logging.getLogger(__name__)


		def drain_conn(response):

Hang inside call to PoolManager request #1817

Hang inside call to PoolManager request #1817

Conversation

tycarac commented Mar 18, 2020 • edited by pquentin

pquentin left a comment

Choose a reason for hiding this comment

pquentin Mar 18, 2020 • edited

Choose a reason for hiding this comment

tycarac Mar 18, 2020 • edited

Choose a reason for hiding this comment

tycarac Mar 18, 2020

Choose a reason for hiding this comment

pquentin Mar 19, 2020

Choose a reason for hiding this comment

tycarac Mar 22, 2020 • edited

Choose a reason for hiding this comment

tycarac commented Mar 19, 2020

pquentin commented Mar 19, 2020

pquentin commented Mar 19, 2020

tycarac commented Mar 19, 2020 • edited

tycarac commented Mar 23, 2020

pquentin commented Mar 24, 2020

pquentin commented Mar 26, 2020

pquentin Mar 26, 2020

Choose a reason for hiding this comment

tycarac commented Mar 26, 2020

tycarac commented Mar 27, 2020

pquentin commented Mar 27, 2020

sethmlarson left a comment • edited

Choose a reason for hiding this comment

tycarac commented Mar 28, 2020 • edited

pquentin commented Mar 28, 2020

pquentin commented Mar 30, 2020

pquentin commented Apr 1, 2020

tycarac commented Apr 3, 2020

pquentin commented Apr 3, 2020

tycarac commented Apr 3, 2020

codecov bot commented Apr 3, 2020 • edited

Codecov Report

pquentin left a comment

Choose a reason for hiding this comment

sethmlarson left a comment • edited

Choose a reason for hiding this comment

tycarac commented Mar 18, 2020 •

edited by pquentin

pquentin Mar 18, 2020 •

edited

tycarac Mar 18, 2020 •

edited

tycarac Mar 22, 2020 •

edited

tycarac commented Mar 19, 2020 •

edited

sethmlarson left a comment •

edited

tycarac commented Mar 28, 2020 •

edited

codecov bot commented Apr 3, 2020 •

edited

sethmlarson left a comment •

edited