Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[asyncio] ERROR: Task was destroyed but it is pending! #233

Closed
gottogethelp opened this issue Sep 12, 2023 · 1 comment
Closed

[asyncio] ERROR: Task was destroyed but it is pending! #233

gottogethelp opened this issue Sep 12, 2023 · 1 comment
Labels
duplicate This issue or pull request already exists

Comments

@gottogethelp
Copy link

gottogethelp commented Sep 12, 2023

Running the following MRE scraper I intermittently get the error from the title part way through the URLs being processed:

import logging

from scrapy import Request, Spider
from scrapy.crawler import CrawlerProcess


class FlashscoreSpider(Spider):
    name = "flashscore"
    custom_settings = {
        "DOWNLOAD_HANDLERS": {
            "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
            "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
        },
        "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
        "REQUEST_FINGERPRINTER_IMPLEMENTATION": "2.7",
        "LOG_LEVEL": logging.ERROR,
    }

    start_urls = [
        "https://www.flashscore.com/match/WKM03Vff/#/match-summary/match-summary",
        "https://www.flashscore.com/match/6go7eBHA/#/match-summary/match-summary",
        "https://www.flashscore.com/match/W0rJh91T/#/match-summary/match-summary",
        "https://www.flashscore.com/match/4lDXBW1i/#/match-summary/match-summary",
        "https://www.flashscore.com/match/v75p9UoA/#/match-summary/match-summary",
        "https://www.flashscore.com/match/4EjNNOJF/#/match-summary/match-summary",
        "https://www.flashscore.com/match/rT3yPgsm/#/match-summary/match-summary",
        "https://www.flashscore.com/match/hbHHSeqM/#/match-summary/match-summary",
        "https://www.flashscore.com/match/pjiBe6zN/#/match-summary/match-summary",
        "https://www.flashscore.com/match/KOs4lXMu/#/match-summary/match-summary",
        "https://www.flashscore.com/match/UswTO2Nc/#/match-summary/match-summary",
        "https://www.flashscore.com/match/OtjFfQkT/#/match-summary/match-summary",
    ]

    def start_requests(self):
        for url in self.start_urls:
            yield Request(
                url=url,
                meta=dict(dont_redirect=True, playwright=True),
                callback=self.parse,
            )

    def parse(self, response):
        print(f"Parsing {response.url}")


if __name__ == "__main__":
    process = CrawlerProcess()
    process.crawl(FlashscoreSpider)
    process.start()

Doesn't happen every time but when it does it usually repeats. Here's the terminal output:

Parsing https://www.flashscore.com/match/4EjNNOJF/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/6go7eBHA/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/v75p9UoA/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/4lDXBW1i/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/W0rJh91T/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/hbHHSeqM/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/rT3yPgsm/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/WKM03Vff/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/KOs4lXMu/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/UswTO2Nc/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/OtjFfQkT/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/pjiBe6zN/#/match-summary/match-summary
2023-09-12 00:09:20 [asyncio] ERROR: Task was destroyed but it is pending!
task: <Task pending name='Task-4858' coro=<ScrapyPlaywrightDownloadHandler._make_request_handler.<locals>._request_handler() running at /Users/<me>/opt/miniconda3/envs/capra_production/lib/python3.10/site-packages/scrapy_playwright/handler.py:529> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[gather.<locals>._done_callback() at /Users/<me>/opt/miniconda3/envs/capra_production/lib/python3.10/asyncio/tasks.py:720]>
2023-09-12 00:09:20 [asyncio] ERROR: Task was destroyed but it is pending!
task: <Task pending name='Task-4857' coro=<Page._on_route() running at /Users/<me>/opt/miniconda3/envs/capra_production/lib/python3.10/site-packages/playwright/_impl/_page.py:249> wait_for=<_GatheringFuture pending cb=[Task.task_wakeup()]> cb=[AsyncIOEventEmitter._emit_run.<locals>.callback() at /Users/<me>/opt/miniconda3/envs/capra_production/lib/python3.10/site-packages/pyee/asyncio.py:65]>

In running a longer sequence of URLs the errors appear intermittently and usually in blocks of quite a few together.

It doesn't affect further processing though - when I create items they flow into pipelines and are successfully processed there despite there being errors all over the place.

Am I doing something wrong here?

python: 3.10.8
scrapy: 2.8.0
scrapy-playwright: 0.0.32
MacOS: 13.5.1

@elacuesta
Copy link
Member

Indeed, I can reproduce. This was already reported at #188 but it was lacking a reproducible example, let's continue the discussion at #188.

@elacuesta elacuesta closed this as not planned Won't fix, can't repro, duplicate, stale Sep 12, 2023
@elacuesta elacuesta added the duplicate This issue or pull request already exists label Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants