Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Task was destroyed but it is pending! in route.abort() #1402

Closed
cyberfuhrer opened this issue Jul 4, 2022 · 11 comments · Fixed by #1412 or #1430
Closed

[Question]: Task was destroyed but it is pending! in route.abort() #1402

cyberfuhrer opened this issue Jul 4, 2022 · 11 comments · Fixed by #1412 or #1430
Assignees

Comments

@cyberfuhrer
Copy link

Your question

Why am I getting an error in route.abort()?
Example:

from playwright.sync_api import sync_playwright

def handle_route(route):
    if route.request.resource_type == "image":
        response = page.request.fetch(route.request)
	if int(response.headers['content-length']) > 500:
	    route.abort()
	else:
	    route.continue_()
    else:
        route.continue_()

playwright = sync_playwright().start()
browser = playwright.chromium.launch()
context = browser.new_context()
context.route('**/*', handle_route)
page = context.new_page()
page.goto(url='https://www.pinterest.com/', timeout=10000)
page.wait_for_timeout(3000)
page.close()
context.close()
browser.close()
Task was destroyed but it is pending!
task: <Task pending coro=<BrowserContext._on_route() running at /usr/local/lib/python3.7/dist-packages/playwright/_impl/_browser_context.py:187> wait_for=<_GatheringFuture pending cb=[<TaskWakeupMethWrapper object at 0x7f2c168870d8>()]>>
...

Playwright version 1.23.0

@rwoll
Copy link
Member

rwoll commented Jul 5, 2022

Thanks for the report—your code looks good and this is likely a leak introduced by the new requests handling code. It will be addressed, but it should be safe to ignore the warning for now.

@x0day
Copy link
Contributor

x0day commented Jul 6, 2022

Thanks for the report—your code looks good and this is likely a leak introduced by the new requests handling code. It will be addressed, but it should be safe to ignore the warning for now.

@rwoll

after this commit 7b424eb, _on_route calling method has changed, where channel closed before _on_route called not taken into account, and this warning is appeared.

1.23.0

self._channel.on(
"route",
lambda params: asyncio.create_task(
self._on_route(
from_channel(params["route"]), from_channel(params["request"])
)
),
)

1.22.0

self._channel.on(
"route",
lambda params: self._on_route(
from_channel(params["route"]), from_channel(params["request"])
),
)

and with this issue and same reason, we also meet this error after upgrade to 1.23.0.

task: <Task pending name='Task-82983' coro=<Page._on_route() running at /opt/venv/lib/python3.8/site-packages/playwright/_impl/_page.py:246> wait_for=<_GatheringFuture pending cb=[<TaskWakeupMethWrapper object at 0x7f1c36070250>()]> cb=[AsyncIOEventEmitter._emit_run.<locals>._callback() at /opt/venv/lib/python3.8/site-packages/pyee/_asyncio.py:55]>
[E 220706 03:05:34 PID:10] [asyncio] Task exception was never retrieved
future: <Task finished name='Task-80531' coro=<Page._disable_interception() done, defined at /opt/venv/lib/python3.8/site-packages/playwright/_impl/_page.py:623> exception=Error('Playwright connection closed')>
Traceback (most recent call last):
  File "/opt/venv/lib/python3.8/site-packages/playwright/_impl/_page.py", line 624, in _disable_interception
    await self._channel.send("setNetworkInterceptionEnabled", dict(enabled=False))
  File "/opt/venv/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 43, in send
    return await self._connection.wrap_api_call(
  File "/opt/venv/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 370, in _
    return await result
  File "/opt/venv/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 64, in inner_send
    callback = self._connection._send_message_to_server(self._guid, method, params)
  File "/opt/venv/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 265, in _send_message_to_server
    self._transport.send(message)
  File "/opt/venv/lib/python3.8/site-packages/playwright/_impl/_transport.py", line 250, in send
    raise Error("Playwright connection closed")
playwright._impl._api_types.Error: Playwright connection closed

rwoll added a commit to rwoll/playwright-python that referenced this issue Jul 7, 2022
rwoll added a commit to rwoll/playwright-python that referenced this issue Jul 7, 2022
rwoll added a commit that referenced this issue Jul 7, 2022
rwoll added a commit to rwoll/playwright-python that referenced this issue Jul 7, 2022
@rwoll
Copy link
Member

rwoll commented Jul 8, 2022

Re-opening. The fix we landed might be non-exhaustive, and I'm not longer convinced it's the correct fix.

@rwoll rwoll reopened this Jul 8, 2022
@ja8zyjits
Copy link

ja8zyjits commented Jul 12, 2022

@rwoll shouldn't we add a self._background_task_tracker.close() in the page._on_close() method too?

@rwoll
Copy link
Member

rwoll commented Jul 12, 2022

@rwoll shouldn't we add a self._background_task_tracker.close() in the page._on_close() method too?

I don't think that's exhaustive. There's a bunch of places where we call create_task and don't await, so it looks like the proper fix is to loop over the pending tasks when we go to exit the Python PW context manager cancelling. I'm currently finishing off some other work, but I will be looping back to this shortly. (To my knowledge, the current release has a leak and annoying log line sometimes, but is not functionally broken/blocking folks. If this is incorrect, please let me know.)

Thanks for your patience!

@ja8zyjits
Copy link

ja8zyjits commented Jul 12, 2022

I don't think that's exhaustive.

At present the issue is mostly at the Page class only, so we need not worry about other areas. If we could clean up at the end of page then that must fix the issue, according to my understanding. Cleaning from the Python PW context manager will be good, but that shall be something that we can look into later.

I'm currently finishing off some other work,

Thats fine, we are thankful for you to maintain this.

the current release has a leak and annoying log line sometimes, but is not functionally broken/blocking folks. If this is incorrect, please let me know

It is a memory leak, so all issues, related to them does arrive once the code runs for a longer period of time. Say 1 hour or so, since we keep on building the pages and forget to clean its related components. So even though the page is deleted the async task objects survive till the code exits.

Run this piece of code:
memory_leak.py

import asyncio
from playwright.async_api import async_playwright


async def run(playwright):
    chromium = playwright.chromium # or "firefox" or "webkit".
    browser = await chromium.launch()
    all_tasks = set()
    for each in range(10):
        await create_navigate_and_destroy(browser)
        tasks = asyncio.all_tasks()
        for task in tasks:
            if "Page._on_route" in str(task.get_coro()):
                all_tasks.add(task.__hash__())
        print(f"unclosed task length is {len(all_tasks)}")
    # other actions...
    await browser.close()

async def create_navigate_and_destroy(browser):
    page = await browser.new_page()
    await page.route("*", intercept_request)
    await page.goto("http://httpbin.org")
    await page.close()

async def intercept_request(route, request):
    if request.resource_type in ["stylesheet", "font"]:
        await route.abort()
    else:
        await route.continue_()

async def main():
    async with async_playwright() as playwright:
        await run(playwright)

async def run_main():
    await main()
    print(f"{len(asyncio.all_tasks())} is the final list")
asyncio.run(run_main())

After running this

$ python memory_leak.py 
unclosed task length is 2
unclosed task length is 4
unclosed task length is 6
unclosed task length is 8
unclosed task length is 10
unclosed task length is 12
unclosed task length is 14
unclosed task length is 16
unclosed task length is 19
unclosed task length is 20
21 is the final list

@cyberfuhrer
Copy link
Author

On some sites, I began to receive such errors

from playwright.sync_api import sync_playwright

def handle_route(route):
	if route.request.resource_type == "image":
		response = page.request.fetch(route.request)
		if 'content-length' in response.headers and int(response.headers['content-length']) > 5000:
			route.abort()
		else:
			route.continue_()
	else:
		route.continue_()

playwright = sync_playwright().start()
browser = playwright.chromium.launch()
context = browser.new_context()
context.route('**/*', handle_route)
page = context.new_page()
page.goto(url='https://www.slideshare.net/', timeout=10000)
page.wait_for_timeout(7000)
page.screenshot(full_page=True, path="block.png")
page.close()
context.close()
browser.close()
Exception in callback SyncBase._sync.<locals>.<lambda>(<Task finishe...===========')>) at /usr/local/lib/python3.7/dist-packages/playwright/_impl/_sync_base.py:85
handle: <Handle SyncBase._sync.<locals>.<lambda>(<Task finishe...===========')>) at /usr/local/lib/python3.7/dist-packages/playwright/_impl/_sync_base.py:85>
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_sync_base.py", line 85, in <lambda>
    task.add_done_callback(lambda _: g_self.switch())
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_helper.py", line 282, in impl
    )(route, request)
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_impl_to_api_mapping.py", line 116, in wrapper_func
    *list(map(lambda a: self.from_maybe_impl(a), args))[:arg_count]
  File "plw_block2.py", line 46, in handle_route
    response = page.request.fetch(route.request)
  File "/usr/local/lib/python3.7/dist-packages/playwright/sync_api/_generated.py", line 14231, in fetch
    ignoreHTTPSErrors=ignore_https_errors,
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_sync_base.py", line 89, in _sync
    return task.result()
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_fetch.py", line 321, in fetch
    "ignoreHTTPSErrors": ignoreHTTPSErrors,
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_connection.py", line 44, in send
    lambda: self.inner_send(method, params, False)
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_connection.py", line 370, in _
    return await result
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_connection.py", line 78, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Request context disposed.
=========================== logs ===========================
→ GET https://cdn.slidesharecdn.com/ss_thumbnails/beagreatproductleader-amplifyoct2019v5-191007205738-thumbnail-3.jpg?cb=1580173593
  user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/104.0.5112.20 Safari/537.36
  accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8
  accept-encoding: gzip,deflate,br
  referer: https://www.slideshare.net/
  sec-ch-ua: 
  sec-ch-ua-mobile: ?0
  sec-ch-ua-platform: 
============================================================

@ja8zyjits
Copy link

I have hacked a piece of code, could be used till the official fix is released.

from playwright._impl._connection import from_channel
from playwright._impl._api_types import TimeoutError, Error
# ...........(check above piece of code)
async def create_navigate_and_destroy(browser):
    page = await browser.new_page()
    del page._impl_obj._channel._events["route"]
    page._background_pending_tasks = []
    page._impl_obj._channel.on(
        "route",
        lambda params: page._background_pending_tasks.append(
            asyncio.create_task(
                page._impl_obj._on_route(
                    from_channel(params["route"]), from_channel(params["request"])
                )
            )
        ),
    )
    page.on("close", cleanup_bg_tasks)
    await page.route("*", intercept_request)
    await page.goto("http://httpbin.org")
    await page.close()
# ...........(check above piece of code)

The output shows that there are no pending tasks

$ python memory_leak.py 
unclosed task length is 0
unclosed task length is 0
unclosed task length is 0
unclosed task length is 0
unclosed task length is 0
unclosed task length is 0
unclosed task length is 0
unclosed task length is 0
unclosed task length is 0
unclosed task length is 0
1 is the final list

@ja8zyjits
Copy link

ja8zyjits commented Jul 13, 2022

I don't think that's exhaustive

I believe we need to clean up the routes on every goto, go_forward, reload, go_back and also every time before the page loads a new content based on click or something else 🤔 like before page.on("load") or page.on("request") etc.
Else the pending tasks keeps on increasing.

@ja8zyjits
Copy link

@rwoll this wont resolve the leak issue for longer running tests. kindly investigate from that perspective.

@rwoll
Copy link
Member

rwoll commented Jul 14, 2022

Thanks @ja8zyjits. That's correct, there may be additional patches depending on what guarantees we want to make about task cleanup (and their timing).

1.23.1 patch (likely releasing tomorrow) at least cleans up with the ContextManager close. Playwright may additionally opt to track tasks and clean them up on page and/or context close. Sorry for not putting the update here. I've just created #1433 for folks to follow if they re: additional patches in this area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants