Importing reactor in spider's file can cause `ReactorAlreadyInstalledError` #5525

adg-mh · 2022-06-09T21:53:17Z

Description

When running a spider with scrapy crawl ... <spider>, due to a change in the way the reactor is installed (possibly introduced in 60c8838), importing reactor from twisted at the top of that spider's python file will now cause an error (in the case where TWISTED_REACTOR is not defined):

twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed

I was able to work around the issue by explicitly setting TWISTED_REACTOR which causes scrapy's install_reactor() helper to be used (which explicitly suppresses that error). This doesn't allow me to use twisted's default reactor per platform (without recreating the logic to do so myself).

Steps to Reproduce

from twisted.internet import reactor at the top of a spider's python file
(importing the reactor module installs the default reactor)

Versions

Scrapy       : 2.6.1
lxml         : 4.9.0.0
libxml2      : 2.9.12
cssselect    : 1.1.0
parsel       : 1.6.0
w3lib        : 1.22.0
Twisted      : 22.4.0
Python       : 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 19:29:22) [MSC v.1916 32 bit (Intel)]
pyOpenSSL    : 22.0.0 (OpenSSL 3.0.3 3 May 2022)
cryptography : 37.0.2
Platform     : Windows-10-10.0.17763-SP0

The text was updated successfully, but these errors were encountered:

Gallaecio · 2022-06-10T05:02:16Z

I think importing the reactor from any code that is executed before the Scrapy code can handle reactor initialization has become a bad practice. I believe the best way forward is to avoid that import until Scrapy has setup the right reactor.

Even if we made it so that your code works, it would still fail if in the future you decide to switch reactors, or Scrapy changes the default reactor.

Now, you have not shared your code, but I imagine you can rewrite it so that the reactor is imported within a function or method that uses it, so that importing the spider file or class does not cause this issue.

That said, @wRAR, could you have a look into this? I wonder if this could be breaking pre-2.6 code, and if we could make it so that things do not break where the installed reactor is the one Scrapy would install.

If we can fix it, maybe we could also log a deprecation warning (e.g. “Importing twisted.internet.reactor before Scrapy initializes it is deprecated. You should move such imports within scopes (e.g. functions, methods) that are only called once Scrapy has initialized the reactor.”), and we should discuss if this is worth adding to 2.6.2 or 2.6.3.

wRAR · 2022-06-10T09:10:11Z

Without thinking too much about this:

ideally no code should import twisted.internet.reactor at the module import time (so at the top level, directly or indirectly), but this is not always possible and also I don't think this is documented at our side
technically, there are several ways to deal with the case where Scrapy wants to install the rector but it's already installed (at least if it's the same type that it wanted); we shouldn't change them before understanding the implications and reasons of the current behavior
it's possible that this indeed breaks pre-2.6 code but I need to try some test cases to check that

adg-mh · 2022-06-10T13:43:37Z

I understand if this isn't going to be supported.

I did consider importing the reactor where it's used, which is a reactor.callLater call inside a spider_opened signal handler.
Generally I try to avoid placing imports inside any code but this is probably a case where I should let that go.

wRAR · 2022-06-10T14:54:26Z

Well, it's possible that it will be supported unless TWISTED_REACTOR is set to non-default, we just need to think about possible problems.

Gallaecio · 2022-06-14T08:45:10Z

I have verified that scrapy runspider works on 2.5.1 but not on 2.6.1 for:

from scrapy import Spider
from twisted.internet import reactor


class ToScrapeComSpider(Spider):
    name = 'toscrape_com'
    start_urls = ['https://toscrape.com']

    def parse(self, response):
        yield {'html': response.text}

Gallaecio · 2022-06-14T10:40:37Z

I don't think this is documented at our side

Turns out we do.

Gallaecio added this to the 2.6.2 milestone Jun 10, 2022

Gallaecio mentioned this issue Jun 13, 2022

2.6.0 breaks calling multiple Spider in CrawlerProcess() #5435

Closed

Gallaecio self-assigned this Jun 14, 2022

Gallaecio mentioned this issue Jun 14, 2022

If TWISTED_REACTOR is None, reuse any pre-installed reactor #5528

Merged

Gallaecio closed this as completed Jun 17, 2022

Gallaecio added a commit to Gallaecio/scrapy that referenced this issue Jun 17, 2022

Cover scrapy#5525 in the 2.6.2 release notes

1261647

Gallaecio mentioned this issue Jun 17, 2022

Update 2.6 release notes #5535

Merged

Gallaecio added a commit that referenced this issue Jun 17, 2022

Cover #5525 in the 2.6.2 release notes (#5535)

54bfb96

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Importing reactor in spider's file can cause `ReactorAlreadyInstalledError` #5525

Importing reactor in spider's file can cause `ReactorAlreadyInstalledError` #5525

adg-mh commented Jun 9, 2022

Gallaecio commented Jun 10, 2022

wRAR commented Jun 10, 2022

adg-mh commented Jun 10, 2022

wRAR commented Jun 10, 2022

Gallaecio commented Jun 14, 2022 •

edited

Gallaecio commented Jun 14, 2022

Importing reactor in spider's file can cause ReactorAlreadyInstalledError #5525

Importing reactor in spider's file can cause ReactorAlreadyInstalledError #5525

Comments

adg-mh commented Jun 9, 2022

Description

Steps to Reproduce

Versions

Gallaecio commented Jun 10, 2022

wRAR commented Jun 10, 2022

adg-mh commented Jun 10, 2022

wRAR commented Jun 10, 2022

Gallaecio commented Jun 14, 2022 • edited

Gallaecio commented Jun 14, 2022

Importing reactor in spider's file can cause `ReactorAlreadyInstalledError` #5525

Importing reactor in spider's file can cause `ReactorAlreadyInstalledError` #5525

Gallaecio commented Jun 14, 2022 •

edited