Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor cookie handling in CookiesMiddleware #5946

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

emarondan
Copy link
Contributor

@emarondan emarondan commented Jun 9, 2023

  • Refactoring cookie handling in CookiesMiddleware
  • Adding test case for off-domain jar storage

Can close #5841

@codecov
Copy link

codecov bot commented Jun 22, 2023

Codecov Report

Merging #5946 (c1a5e15) into master (4dacad0) will decrease coverage by 46.39%.
Report is 366 commits behind head on master.
The diff coverage is 65.84%.

❗ Current head c1a5e15 differs from pull request most recent head 4d58842. Consider uploading reports for the commit 4d58842 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #5946       +/-   ##
===========================================
- Coverage   88.85%   42.47%   -46.39%     
===========================================
  Files         162      163        +1     
  Lines       11057    11550      +493     
  Branches     1801     1880       +79     
===========================================
- Hits         9825     4906     -4919     
- Misses        954     6269     +5315     
- Partials      278      375       +97     
Files Changed Coverage Δ
scrapy/__init__.py 84.21% <0.00%> (ø)
scrapy/cmdline.py 0.00% <0.00%> (-67.97%) ⬇️
scrapy/commands/shell.py 0.00% <0.00%> (-92.86%) ⬇️
scrapy/downloadermiddlewares/cookies.py 37.50% <0.00%> (-60.40%) ⬇️
scrapy/downloadermiddlewares/decompression.py 32.75% <0.00%> (-67.25%) ⬇️
scrapy/downloadermiddlewares/httpcache.py 33.73% <ø> (-60.25%) ⬇️
scrapy/linkextractors/__init__.py 75.00% <ø> (-25.00%) ⬇️
scrapy/mail.py 38.88% <0.00%> (-39.77%) ⬇️
scrapy/squeues.py 54.32% <ø> (-45.68%) ⬇️
scrapy/contracts/__init__.py 18.51% <20.00%> (-65.08%) ⬇️
... and 74 more

... and 64 files with indirect coverage changes

📢 Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapy).

@emarondan emarondan changed the title [WIP] Refactor cookie handling in CookiesMiddleware Refactor cookie handling in CookiesMiddleware Jun 22, 2023
@wRAR wRAR requested a review from Gallaecio June 27, 2023 13:56
@emarondan emarondan force-pushed the cookie-setting-5841 branch 2 times, most recently from 7fa0e79 to 69bb49d Compare June 29, 2023 19:24
cookie.domain = request_domain

jar.set_cookie_if_ok(cookie, request)
cookie.domain = request_domain
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wRAR With this modification the cookie is always set on the jar. The cookie domain must be the same as the request one in order to be set by jar.set_cookie_if_ok()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand why the logic is inverted by this change.

jar.set_cookie_if_ok(cookie, request)
else:
print(f'seting cookie {cookie.__dict__}')
jar.set_cookie(cookie)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the domain is not public, the cookie should be set anyway.

@@ -646,7 +646,7 @@ def test_user_set_cookie_domain_suffix_public_period(self):
"https://foo.co.uk",
"https://bar.co.uk",
"co.uk",
cookies1=False,
cookies1=True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the modification made on the handler, I think this test should change since the cookie will be present on the first request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why should the cookies be present on the first request and absent on the second one, as both requests are similar? Does https://datatracker.ietf.org/doc/html/rfc6265#section-5.3 say the cookie should be modified to be for foo.co.uk? I'm not sure yet.

@@ -655,7 +655,7 @@ def test_user_set_cookie_domain_suffix_public_private(self):
"https://foo.blogspot.com",
"https://bar.blogspot.com",
"blogspot.com",
cookies1=False,
cookies1=True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above test.

@Gallaecio
Copy link
Member

I don’t think the changes to existing tests are correct. I think we need to change the implementation instead to make sure they continue to pass as they were.

Also, the cookie processing method affects both requests and responses, so it would be good to make sure we have a test to make sure that, if we get a Set-Cookie header from a request with a cookie set for an unrelated domain, that the cookie is not added to the cookiejar. We trust users, not servers.

- adding test to make sure we don't add a cookie with an unrelated domain from a response
Comment on lines 758 to 761
response.headers["Set-Cookie"] = "asd=fgh; domain=d.example"
self.mw.process_response(request, response, spider=None)
jar = self.mw.jars[None]
assert not jar._cookies.get("c.example", {}).get("/", {}).get("asd")
Copy link
Member

@Gallaecio Gallaecio Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the new approach is, I think, that it also applies to server-side cookies.

I think this test is not right, and fixing it as follows:

Suggested change
response.headers["Set-Cookie"] = "asd=fgh; domain=d.example"
self.mw.process_response(request, response, spider=None)
jar = self.mw.jars[None]
assert not jar._cookies.get("c.example", {}).get("/", {}).get("asd")
response.headers["Set-Cookie"] = "asd=fgh; domain=c.example"
self.mw.process_response(request, response, spider=None)
jar = self.mw.jars[None]
assert not jar._cookies.get("c.example", {}).get("/", {}).get("asd") # I wonder if we can just assert that c.example is not in the jar to simplify the test.

Will probably make it fail with the new approach.

It would not be very clean, but I wonder if it would be possible to change the policy right before setting user-defined cookies (and only user-defined cookies, not Set-Cookie headers from servers), and restore it right after.

If not that way, we need to think of something else.

Emmanuel Rondan and others added 2 commits September 7, 2023 16:27
…dd a cookie or not is made on process_response

- Update test case for CookiesMiddleware. Now it test if the domain is present on the cookie jar
@Gallaecio
Copy link
Member

@emarondan I have made a small change to test_process_response_unrelated_domain, to cover the issue I fear we might introduce here. When receiving a cookie in a response for a different domain, the cookie should be ignored altogether; we should not add it to the cookie jar, not for the actual response domain, not for the cookie domain.

Please, see if you can find a way, however unclean, to get test_off_domain_jar_storage to pass without breaking the new test_process_response_unrelated_domain in the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setting a cookie for a different domain does not work
3 participants