-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process cookies from header only once #4812
base: master
Are you sure you want to change the base?
Conversation
What if a retry (instead of a redirect) needs to happen, because the server sent e.g. a 503 response. Wouldn鈥檛 the retry miss some required cookies? |
Codecov Report
@@ Coverage Diff @@
## master #4812 +/- ##
==========================================
+ Coverage 87.52% 87.54% +0.02%
==========================================
Files 160 160
Lines 9761 9762 +1
Branches 1440 1440
==========================================
+ Hits 8543 8546 +3
+ Misses 955 954 -1
+ Partials 263 262 -1
|
I don't think it would, once the header cookies are processed they are stored in the corresponding cookiejar and added to the appropriate requests. It would be an interesting test case though. |
I think that this and other similar cases (like previously mentioned 302s) required to be covered by tests first (maybe implemented in separate pull request). It think it should be.. regular testcase with
|
I am not able to reproduce this issue on recent master script.pyfrom scrapy import FormRequest
import scrapy
from scrapy.crawler import CrawlerProcess
class LoginSpider(scrapy.Spider):
name = 'login'
# allowed_domains = ['quotes.toscrape.com']
start_urls = ['http://quotes.toscrape.com/login']
def parse(self, response):
csrf_token = response.xpath('//input[@name="csrf_token"]/@value').get()
return FormRequest.from_response(
response,
# formname='',
# formcss='',
formxpath='//form',
formdata={
'csrf_token': csrf_token,
'username': 'admin',
'password': 'admin'
},
callback=self.after_login
)
def after_login(self, response):
logout = response.xpath("//a[text()='Logout']/text()").get()
self.logger.info(logout)
if __name__ == "__main__":
process = CrawlerProcess();
process.crawl(LoginSpider);
process.start() log_output
From this test and on test proposed on [WIP] #6141 I don't see issue with 302's mentioned on #4717 (comment) . Proposed tests rely on possibility to look into CookieJars stored in CookiesMiddleware variable which is not trivial as we don't have implementation of #1878 |
@Gallaecio |
馃毀 Work in progress 馃毀
A first approach to solving #4717. The idea is to process cookies from the header only once, the actual method of solving it can vary (meta key, request attribute, etc). I know it's not much, but I'd like your opinions @Gallaecio @kmike @wRAR.
The following spider works with this patch, while it breaks with the current master (5a38639) - I'm working on some tests to reproduce this behaviour.
Fixes #1992