Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance when merging cookies #2875

Open
1 task done
gyula-lakatos opened this issue Oct 2, 2023 · 2 comments
Open
1 task done

Slow performance when merging cookies #2875

gyula-lakatos opened this issue Oct 2, 2023 · 2 comments
Labels
perf Issues relating to performance

Comments

@gyula-lakatos
Copy link

gyula-lakatos commented Oct 2, 2023

There is a quite significant performance degradation when cookies are being used with requests.

httpx._client.BaseClient._merge_cookies calls into the cookiejar here:

        if cookies or self.cookies:
            merged_cookies = Cookies(self.cookies)
            merged_cookies.update(cookies)
            return merged_cookies
        return cookies

When either cookies or self.cookies is not None, then httpx._models.Cookies.__bool__ will call into a method called deepvalues in http.cookiejar.CookieJar by indirectly calling http.cookiejar.CookieJar.__iter__.

httpx._models.Cookies.__bool__:

    def __bool__(self) -> bool:
        for _ in self.jar:
            return True
        return False

http.cookiejar.CookieJar.__iter__:

    def __iter__(self):
        return deepvalues(self._cookies)

Deepvalues is a significant performance hog because it does a lot of things recursively.

A suggested solution is to check cookies and self.cookies for None instead of using __bool__.

I dropped a performance snapshot to illustrate the problem:
image

Here deepvalues takes up almost 17% of the CPU time.

@karpetrosyan karpetrosyan added the perf Issues relating to performance label Oct 3, 2023
Tunglies added a commit to Tunglies/httpx that referenced this issue Oct 14, 2023
@tomchristie
Copy link
Member

Pulling in the stdlib http.cookiejar and urllib.request modules brings in quite a lot of submerged complexity... I'd wonder if we really need that for the httpx.Cookies implementation.

  • Could we instead be using a plain dictionary format under-the-hood for the cookie persistence?
  • Which cookie attributes does httpx honor? For example, does "expires" actually take affect?
  • Can we point to a really clear easy-to-read functional description of http cookies?

@vergeev
Copy link

vergeev commented Nov 5, 2023

Could we instead be using a plain dictionary format under-the-hood for the cookie persistence?

I think the biggest obstacle here is the fact that the jar attribute of Cookies class is a part of a public interface:

* `.jar` - **CookieJar**

Users may want to access it in order to customize the cookie policy (which is set during initialization and accessed when adding a header). So a compatibility layer may be warranted, and it can get complex.

Other than enforcing a cookie policy, the CookieJar appears to be mostly concerned about thread-safety, and it could be replicated in the Cookies class.

Which cookie attributes does httpx honor? For example, does "expires" actually take affect?

It's domain, path and expires. The domain and path are tested, and expires is enforced at the CookieJar level. Each time httpx.Cookies.set_cookie_header is called, it calls http.cookiejar.CookieJar. add_cookie_header, which clears the expired cookies: https://github.com/python/cpython/blob/24b5cbd3dce3fe37cdc787ccedd1e73a4f8cfc3c/Lib/http/cookiejar.py#L1387.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Issues relating to performance
Projects
None yet
Development

No branches or pull requests

4 participants