Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang inside call to PoolManager request #1817

Merged
merged 12 commits into from Apr 3, 2020
Merged

Hang inside call to PoolManager request #1817

merged 12 commits into from Apr 3, 2020

Conversation

tycarac
Copy link
Contributor

@tycarac tycarac commented Mar 18, 2020

Patch for urllib3 hanging inside call to PoolManager request

Closes #1808

Patch for urllib3 hanging inside call to PoolManager request
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start, thanks! You've even included a test! 💯

I've left a few comments, we may need a few iterations before this can be merged, but that's perfectly normal and expected, please don't take any comments here personally. :)

src/urllib3/poolmanager.py Outdated Show resolved Hide resolved
test/with_dummyserver/test_poolmanager.py Outdated Show resolved Hide resolved
except (
HTTPError,
SocketError,
BaseSSLError,
Copy link
Member

@pquentin pquentin Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why we're catching less exceptions here compared to the existing drain_and_release_conn?

def drain_and_release_conn(response):
try:
# discard any remaining response body, the connection will be
# released back to the pool once the entire response is read
response.read()
except (
TimeoutError,
HTTPException,
SocketError,
ProtocolError,
BaseSSLError,
SSLError,
):
pass

(It's probably perfectly fine, I just haven't thought about that yet.)

Copy link
Contributor Author

@tycarac tycarac Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at exceptions.py, the exceptions TimeoutError, ProtocolError, and SSLError are subclassed from HTTPError and thus caught by HTTPError. Since no individual action is required, using HTTPError should suffice. Are there exeptions that should be passed on?

I could find no code for HTTPException. There are imports such as
from .packages.six.moves.http_client import HTTPException # noqa: F401
but I could not find the code. Does it exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My coding style is not to have silent failures as they can cause insidious bugs and undesired side-effects. Instead of "pass", would it be good practice to log a debug statement here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could find no code for HTTPException. There are imports such as
from .packages.six.moves.http_client import HTTPException # noqa: F401
but I could not find the code. Does it exist?

six is a package we use to help with Python 2/3 compatibility. six.moves allows to use the same name for standard library modules that were renamed. See https://six.readthedocs.io/#module-six.moves for details. And indeed, httplib was renamed to http.client in Python 3. And this is where HTTPException is defined: https://docs.python.org/3/library/http.client.html#http.client.HTTPException. (Unfortunately, urllib3 is deeply entangled with http.client/httplib, which is one reason why https://github.com/python-trio/hip exists.)

Why .packages, you may ask? Because we vendor all urllib3 dependencies, six is currently included in the source: https://github.com/urllib3/urllib3/blob/master/src/urllib3/packages/six.py

Now, you're right that if we include HTTPException, we don't need other classes, but that was already the case in the existing drain_and_release_conn function. Unless we have another reason to have two different variations of that function, we should move it to a common place where it can be called. https://en.wikipedia.org/wiki/Don%27t_repeat_yourself

This is why I was suggesting moving that function to src/urllib3/response.py. Does that make sense?

My coding style is not to have silent failures as they can cause insidious bugs and undesired side-effects. Instead of "pass", would it be good practice to log a debug statement here?

This makes sense in general, but in that case it's not a failure, we expect to see those exceptions in the happy path. (Maybe we'll want to refine the list in the future because some exceptions should not be silently ignored, but that's an another issue.)

Copy link
Contributor Author

@tycarac tycarac Mar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pquentin
I understand HTTPException and will add it to the list. As for logging, I agree with your reasoning.

I was looking at the code base, I and I see in response.py

def release_conn(self):
    if not self._pool or not self._connection:
        return

    self._pool._put_conn(self._connection)
    self._connection = None

The function drain_and_release_conn() acts on the response object and not the connection. Thus putting the function in response.py is good (DRY), but I think the function name should change. Being in HTTPReponse, I would suggest a name drain() so the function would be:

def drain(self):
    try:
        # Discard any remaining response body. As a result, the connection 
        # will be released back to the pool now the entire response has been read
        self.read()
    except (
        HTTPError,
        SocketError,
        BaseSSLError,
        HTTPException
    ):
        pass

Your thoughts?

@tycarac tycarac changed the title issue #1808 urllib3 hanging inside call to PoolManager request Mar 18, 2020
@tycarac tycarac changed the title urllib3 hanging inside call to PoolManager request Hang inside call to PoolManager request Mar 18, 2020
@tycarac
Copy link
Contributor Author

tycarac commented Mar 19, 2020

@pquentin
Good feedback is hard to come by, so I appreciate it. Also, a code base has history, context, practices and styles of its own.
Are you OK with a unit test on Dummy Server? Initially I was experimenting with binding two listening ports with Tornado server, and then I found Dummy Server. The two ports were for the redirect handling but I did not get as far to create a response for the second (redirect) binding.

@pquentin
Copy link
Member

Are you OK with a unit test on Dummy Server? Initially I was experimenting with binding two listening ports with Tornado server, and then I found Dummy Server. The two ports were for the redirect handling but I did not get as far to create a response for the second (redirect) binding.

I'm not sure I understand the question, since the dummy server uses Tornado. We could try to modify the port and the host between redirects to in order to get multiple connection pools just like in your reproducing example in #1808, but I'm not sure we need this. I think we could just have one redirect, and then make sure we have 1/ only one connection pool 2/ no connection left open in that pool. I'll be able to experiment more on Monday, I'm unlikely to have enough time until then.

@pquentin
Copy link
Member

The only CI failure is about lint. You need to install black and run it on test/with_dummyserver/test_poolmanager.py. (I'd suggest configuring it in your editor too.) You can then run nox -s lint to see if the problem is fixed.

@tycarac
Copy link
Contributor Author

tycarac commented Mar 19, 2020

Regarding Tornado server.
So blur of me. I remember what I did. I started with two ports as I thought I would need to endpoints to test redirection. Then I discovered I that hosts "localhost" and "127.0.0.1" map to two different pools and thus I did not need to use two ports. So please disregard.

PS. Setting up the bindings for two ports was quite easy. As you said, not needed.

@tycarac
Copy link
Contributor Author

tycarac commented Mar 23, 2020

@pquentin
I figured out a bit more about git and that forked git repos do not update. Anyway, I made some changes and submitted a new pull request. I am curious to know what you think.

@pquentin
Copy link
Member

Thanks! This is looking much better. I'm very happy with the progress here.

I think we can improve the tests by 1/ simplifying the existing PoolManager one 2/ testing the retries.raise_on_redirect case. Unfortunately I won't be able to figure out the details until tomorrow morning at best. Sorry about that.

I don't know what to think about the drain() name right now. I think I would have just left the function out of the Response class. What I know is that we should not have to comment "this releases the connection!" every time we call this function, it's better to encode this in the name somehow.

@pquentin
Copy link
Member

Here is how I would rewrite test_redirect_preload:

def test_redirect_without_preload_releases_connection(self):
    with PoolManager(block=True, maxsize=2) as http:
        r = http.request(
            "GET", "%s/redirect" % self.base_url, preload_content=False
        )
        assert r._pool.num_requests == 2
        assert r._pool.num_connections == 1
        assert len(http.pools) == 1

It's carefully designed to be as concise as possible but still:

  1. make sure we don't leak any connections
  2. avoid hanging without the fix.

We're checking that with a redirect and preload_content=False, we did make two requests, but only used one connection because the first one was released (objective 1). We use maxsize=2 so that the test does not hang if we do use two connections (objective 2). And just to be sure, we check that we only used one connection pool, as we are at the PoolManager level (objective 1).

(By the way, since it's not the simplest case, I would not place it first in the TestPoolManager class.)

Does that make sense?

(The next step is to figure out how to test the raise_on_redirect case.)

@@ -29,6 +30,17 @@
log = logging.getLogger(__name__)


def drain_conn(response):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(For what it's worth, I now believe having drain_conn inside the response class is perfectly fine, I just needed a better name, and we now have that. But keeping the function outside of the class is fine too.)

@tycarac
Copy link
Contributor Author

tycarac commented Mar 26, 2020

@pquentin
The Travis CI build failed with error in the integration testing:

FAIL: test_stress_test_token_bucket (tests.functional.retries.test_bucket.TestTokenBucketThreading)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/urllib3/urllib3/botocore/tests/functional/retries/test_bucket.py", line 109, in test_stress_test_token_bucket
    self.assertTrue(not any(x < (0.3 * mean) for x in distribution))
AssertionError: False is not true

Looks to be inside the AWS Python Boto library. Would this be a test platform failure? Can the Travis CI build be re-triggered?

@tycarac
Copy link
Contributor Author

tycarac commented Mar 27, 2020

@pquentin
The code I have been using to fetch web content to local files is:

try:
    rsp = url_client.request('GET', url)
    _logger.debug(f'> {id:4d} resp code:  {rsp.status}')
    if rsp.status == 200:
        _logger.debug(f'> {id:4d} write:      "{filepath.name}"')
        with filepath.open('wb', buffering=_BUFFER_SIZE) as rfp:
            shutil.copyfileobj(rsp, rfp, length=_BUFFER_SIZE)
except urllib3.exceptions.HTTPError as ex:
    _logger.exception(f'> {id:4d} HTTP error')
except Exception as ex:
    _logger.exception('Unexpected')
    raise ex
finally:
    if rsp:
        rsp.release_conn()

Looking at this code, I am wondering if function release_conn should internally call drain_conn. Initially, when I looked at the code for release_conn, I saw it was catching socket errors, eg timeout, which implied to me either a full response was received or it was not. However, with the pattern to stream content, if I have it right, I am wondering if I did not get http status code 200, then is there a possibility of outstanding data on the socket that is not being read but released back to the pool.

So, should release_conn internally call drain_conn?

@pquentin
Copy link
Member

What do you think of the test mentioned above? Does it make sense? If yes, I suggest including it in your code.

The Travis CI build failed with error in the integration testing:

I don't know why Travis ran this integration test. @sethmlarson I thought this would only apply to release commits?

So, should release_conn internally call drain_conn?

Oh, that's a really good question. I just looked at this, and there's no possibility of outstanding data, because before reusing the connection we call _get_conn, and it closes the socket if needed.

But that means we do have two very similar ways to drain connections. One option is to replace the call to drain_conn() with a call to release_conn(). For now let's keep things like they are, and I'll try to think about it.

Copy link
Member

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks great, excellent work @tycarac and thank you for the great reviews @pquentin.

re integration tests, the name of the branch is release-1.25.9 so is being treated as a release candidate branch. That failure can be ignored. Are we good to merge? :)

@tycarac
Copy link
Contributor Author

tycarac commented Mar 28, 2020

@pquentin, @sethmlarson
I had a look at _get_conn and other functions in connectionpool.py. I may be blur but I do not see the mechanism to clear any residual response data. If this mechanism closes the connection, then would we loose advantages of pooling with the overhead of opening connections? Perhaps redirects are infrequent enough that is not a real issue. I was thinking how I could test this.

I had a look at rfc7231 Redirect and most redirect status codes do not state a HTTP body to be prohibited or optional (except 307, 308). It seems silly and very unlikely to have a HTTP body response for a HTTP redirect, but it is allowed. But this is my first time delving into standards and there is much context and practice I do not have.

I am still of the mind to call drain_conn from inside release_conn as a precaution.

Your thoughts?

Separately, I like the simplicity of your test case and I will replace it in the code. Good thinking. I will think about the other test case you talked of.

@pquentin
Copy link
Member

If this mechanism closes the connection, then would we loose advantages of pooling with the overhead of opening connections?

Oh, you're right, we don't want to close connections when we can avoid it. Thanks, I had totally missed that. Your understanding of all this is impressing!

I am still of the mind to call drain_conn from inside release_conn as a precaution.

Can you please open an issue so that we can discuss about this separately? It would be a much broader change than the one we're discussing here, and I'd like to get this code merged.

The change currently looks really good, we just need to find a way to test the if retries.raise_on_redirect: case. (Unfortunately our coverage measurement is broken right now so we need to be extra careful here.) I don't think we can test this without explicitly checking that drain_conn() was called since we raise an exception on the first redirect and I don't think we can figure out if the connection was released or not here.

@pquentin
Copy link
Member

Okay, here is the final bit:

diff --git a/test/with_dummyserver/test_poolmanager.py b/test/with_dummyserver/test_poolmanager.py
index 3f130a94..1160a2b6 100644
--- a/test/with_dummyserver/test_poolmanager.py
+++ b/test/with_dummyserver/test_poolmanager.py
@@ -115,6 +115,7 @@ class TestPoolManager(HTTPDummyServerTestCase):
                         % (self.base_url, self.base_url)
                     },
                     retries=1,
+                    preload_content=False,
                 )
 
             with pytest.raises(MaxRetryError):
@@ -126,8 +127,15 @@ class TestPoolManager(HTTPDummyServerTestCase):
                         % (self.base_url, self.base_url)
                     },
                     retries=Retry(total=None, redirect=1),
+                    preload_content=False,
                 )
 
+            # Even with preload_content=False and raise on redirects, we reused the same
+            # connection
+            assert len(http.pools) == 1
+            pool = http.connection_from_host(self.host, self.port)
+            assert pool.num_connections == 1
+
     def test_redirect_cross_host_remove_headers(self):
         with PoolManager() as http:
             r = http.request(

This changes allows to test the retries_on_redirect case. With this included, I'll finally be able to merge this! I can't push to your branch, so you'll have to do it. Thanks

@pquentin
Copy link
Member

pquentin commented Apr 1, 2020

@tycarac We'll also need a rebase from master to make the new coverage checks happy. I'm happy to do this myself in another pull request. I'll make sure to credit you for this change, of course.

@tycarac
Copy link
Contributor Author

tycarac commented Apr 3, 2020

@pquentin
Sorry for the delay. I will make another pull request shortly. I have been unwell but and keen to continue. There are two issues that can be done later: a) possible issue with draining redirect responses and b) the test on redirect.
If the pull request needs tweaking, please do so to get the release out.

@pquentin
Copy link
Member

pquentin commented Apr 3, 2020

Sorry for the delay. I will make another pull request shortly.

To be clear, you don't need to make another pull request, you can change your existing branch, by merging master, on rebasing on top of master. If I was to do this myself, then another pull request would have been needed, as I can't change your branch. That being said, feel free to open another pull request if it's easier.

I have been unwell but and keen to continue.

Oh, sorry to hear that. Don't forget that health comes first! (And that many other things come before open source work.)

There are two issues that can be done later: a) possible issue with draining redirect responses and b) the test on redirect.

Yeah, I think we can open an issue to figure out a) later. b) however is ready to be included, as mentioned in #1817 (comment).

If the pull request needs tweaking, please do so to get the release out.

No worries, your pull request is not blocking a release. Code waits for releases, but releases don't wait for code. :)

@tycarac
Copy link
Contributor Author

tycarac commented Apr 3, 2020

@pquentin
Made the pull request correctly (I think). GitHub build checks now in progress. More firsts in learning to apply a git diff :-)

@codecov
Copy link

codecov bot commented Apr 3, 2020

Codecov Report

Merging #1817 into master will not change coverage by %.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master     #1817   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           22        22           
  Lines         2007      2008    +1     
=========================================
+ Hits          2007      2008    +1     
Flag Coverage Δ
#unittests 99.60% <100.00%> (+<0.01%) ⬆️
Impacted Files Coverage Δ
src/urllib3/connectionpool.py 100.00% <100.00%> (ø)
src/urllib3/poolmanager.py 100.00% <100.00%> (ø)
src/urllib3/response.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0286734...8b26a02. Read the comment docs.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fantastic work, this is ready for inclusion.

Copy link
Member

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a great first contribution, thank you so much for this! 🙌 ❤️
We hope you stay a while and help with more issues if you have the time. :)

Also big thank you to @pquentin for all the reviewing and comments! 💯

@sethmlarson sethmlarson merged commit 29b214a into urllib3:master Apr 3, 2020
sethmlarson pushed a commit to sethmlarson/urllib3 that referenced this pull request Apr 11, 2020
Dobatymo pushed a commit to Dobatymo/urllib3 that referenced this pull request Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

urllib3 hanging inside call to PoolManager request
3 participants