Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass response_kw to HTTPConnectionPool through HTTPAdapter.send #6523

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

joren485
Copy link
Contributor

@joren485 joren485 commented Sep 9, 2023

Summary

This PR adds kwargs arguments to HTTPAdapter.send, which it passes to HTTPConnectionPool.urlopen in urllib3.

Description

As discussed in #4956, urrllib3 recently changed the default value of enforce_content_length from False to True. The new default seems like a sane choice, but in some use cases the Content-Length header should not be enforced. To change the default behavior, urllib3 allows application code to set the enforce_content_length argument. As far as I know, requests does not have a way to pass this argument to urllib3.

The HTTPConnectionPool.urlopen method in urllib3 has the **response_kw kwargs argument to pass extra arguments down to the response parser. This PR adds a similar to argument to HTTPAdapter.send.

With this PR, users can override the HTTPAdapter.send method to pass extra arguments to HTTPConnectionPool.urlopen. For example, this enables users to explicitly set the enforce_content_length.

Example

import requests

from requests.adapters import HTTPAdapter

class EnforceContentLengthAdapter(HTTPAdapter):
    def send(self, *args, **kwargs):
        kwargs["enforce_content_length"] = False
        return super().send(*args, **kwargs)

s = requests.Session()
s.mount("http://", EnforceContentLengthAdapter())
s.mount("https://", EnforceContentLengthAdapter())

r = s.get("http://localhost:8080/")
print(r.raw.enforce_content_length) # Returns False

@sigmavirus24
Copy link
Contributor

Hi @joren485 ,

Requests is under a feature freeze. Further, even though this may seem like a backwards compatible change, it in fact will likely cause issues for a good percentage of users if they are unware. Many people implement customer HTTPAdapters for various reasons and I'd estimate at least 50% copy the exact signature out of requests when they write the code and don't expect to need to update things after the fact. This effectively expands the signature in a way that isn't backwards compatible as now, someone may start using a library that has too permissive of a lower bound on requests with an adapter that requires this and it will cause confusing and unexpected errors in passing too many arguments. This will cost countless people an unnecessary amount of time trying to debug things.

With that in mind, I could see a potential benefit to adding a separate attribute on the HTTPAdapter that is not added to any function signature (i.e., not to __init__, send, etc.) and is pickled properly that is a dictionary of urllib3_response_options, e.g.,

self.urllib3_response_options = {}

As the default that could then be updated with self.urllib3_response_options.setdefault("enforce_content_length", True). This could then be used in

resp = conn.urlopen(
method=request.method,
url=url,
body=request.body,
headers=request.headers,
redirect=False,
assert_same_host=False,
preload_content=False,
decode_content=False,
retries=self.max_retries,
timeout=timeout,
chunked=chunked,
)
where I would create a copy of the dictionary and then override it with what we have in the kwargs to that call, and the explode (**) into the urlopen call.

Does that make sense to you?

@joren485
Copy link
Contributor Author

@sigmavirus24

That definitely makes sense to me! I pushed a first version of your idea. Is this what you had in mind?

It would work like this:

import requests

from requests.adapters import HTTPAdapter


class EnforceContentLengthAdapter(HTTPAdapter):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.urllib3_response_options.setdefault("enforce_content_length", False)


s = requests.Session()
s.mount("http://", EnforceContentLengthAdapter())
s.mount("https://", EnforceContentLengthAdapter())

r = s.get("http://localhost:8080/")
print(r.raw.enforce_content_length)

I am aware of the feature freeze, but I dot not consider this change a feature, but rather backwards compatibility as the default behavior of requests has changed. Is this PR allowed?

@joren485
Copy link
Contributor Author

joren485 commented Sep 12, 2023

@sigmavirus24 I implemented your suggestions.

I removed the need for self.urllib3_response_options.copy() by providing **self.urllib3_response_options directly to urlopen, because it minimizes the code changes and looks cleaner imo. This way, user code is still not able to override the given arguments (and only provide new arguments like enforce_content_length), as providing duplicate arguments throws a TypeError.

However, if you prefer the use of a separate dictionary with self.urllib3_response_options.copy(), I will happily change it back.

Copy link
Member

@nateprewitt nateprewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +0 on this. I understand the use case, although it's fairly straightforward to work around. If we're going to add an override hatch directly onto the adapter, it should just override anything in the urlopen kwargs at that point.

@@ -130,6 +130,7 @@ class HTTPAdapter(BaseAdapter):
"_pool_connections",
"_pool_maxsize",
"_pool_block",
"urllib3_response_options",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; the values in here could be anything for the urlopen/_make_request methods. Defining them as response_options seems too tightly scoped. I left another comment below, but perhaps urllib3_urlopen_kwargs or just urllib3_send_kwargs would be more correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to be very clear this week only affect one thing because naming things too generically makes others and they should work for other things. Based on your other comments, I'm thinking we should add the word extra to the band to be clear these are only extra args

Comment on lines +490 to +506
urlopen_kwargs = self.urllib3_response_options.copy()
urlopen_kwargs.update(
{
"method": request.method,
"url": url,
"body": request.body,
"headers": request.headers,
"redirect": False,
"assert_same_host": False,
"preload_content": False,
"decode_content": False,
"retries": self.max_retries,
"timeout": timeout,
"chunked": chunked,
}
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this should be done in the opposite order (defaults updated by self.urllib3_response_options). If we add this feature, we'll almost certainly get future issues with questions like "Why can't I change preload_content or decode_content with the <variable name>?".

If we're going to expose something like this to avoid custom send implementations, it should probably be the full set of arguments. We can still keep the same defaults, but whatever users want to pass us, we should just forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience if there's something like this designed to help people access something as an escape hatch it will be misunderstood and misused. Copying the dictionary and updating it will avoid footguns. No one will override the headers, method, data, or anything else. I'm not particularly strongly invested in this feature as a whole but the other way I think we can enable this (because overriding send is a nightmare) is breaking out the actual call to urlopen into a different method that can be overridden.

@joren485
Copy link
Contributor Author

@nateprewitt Thank you very much for your suggestions, I am happy to implement them. @sigmavirus24 do you agree with the suggested changes?

I understand the use case, although it's fairly straightforward to work around.

Could you expand on this? The only way to work around the new enforce_content_length behavior, that I know of, is to monkey patch urllib3. Do you have a better workaround?

@sigmavirus24
Copy link
Contributor

The only way to work around the new enforce_content_length behavior, that I know of, is to monkey patch urllib3

You can inherit from the adapter and override the send method that you are modifying here.

You could also create a pool manager that sets this parameter and set it on the adapter. Specifically, a Pool manager accepts ConnectionPool kwargs and the Connection Pool accepts response kwargs.

Honestly, it's probably best to update the adapter to pass this to how we create the pool manager now that we're talking about it. It will be far fewer copies and updates here and will look like how folks can already use this today

@sigmavirus24
Copy link
Contributor

Sorry for the twists here @joren485. I've been reviewing this entirely from my phone so much harder to review docs etc at the same time and make a better decision

@joren485
Copy link
Contributor Author

You can inherit from the adapter and override the send method that you are modifying here.

I wanted to prevent the need for such an override with this PR, as HTTPAdapter.send is quite a large method and overriding it to change a single call in the middle of the method would require manually copying any upstream changes.

You could also create a pool manager that sets this parameter and set it on the adapter. Specifically, a Pool manager accepts ConnectionPool kwargs and the Connection Pool accepts response kwargs.

I do not see how this would work, as the connection_pool_kw kwargs set by PoolManager._new_pool are not used in HTTPConnectionPool.urlopen (or HTTPConnectionPool._make_request). PoolManager does not seem to have control over the arguments passed to HTTPConnectionPool.urlopen. Please correct me if I am wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants