Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] cleanup tasks on page and context close #1433

Closed
rwoll opened this issue Jul 14, 2022 · 16 comments
Closed

[Feature] cleanup tasks on page and context close #1433

rwoll opened this issue Jul 14, 2022 · 16 comments

Comments

@rwoll
Copy link
Member

rwoll commented Jul 14, 2022

Follow up to #1402

@cyberfuhrer
Copy link

After update:

Task was destroyed but it is pending!
task: <Task pending coro=<BrowserContext._on_route() running at /usr/local/lib/python3.7/dist-packages/playwright/_impl/_browser_context.py:187> wait_for=<_GatheringFuture pending cb=[<TaskWakeupMethWrapper object at 0x7f109de7faf8>()]>>
Task was destroyed but it is pending!
task: <Task pending coro=<BrowserContext._on_route() running at /usr/local/lib/python3.7/dist-packages/playwright/_impl/_browser_context.py:187> wait_for=<_GatheringFuture pending cb=[<TaskWakeupMethWrapper object at 0x7f109dd42eb8>()]>>
Task was destroyed but it is pending!
...
...

many many warnings, and

Exception in callback SyncBase._sync.<locals>.<lambda>(<Task finishe...===========')>) at /usr/local/lib/python3.7/dist-packages/playwright/_impl/_sync_base.py:85
handle: <Handle SyncBase._sync.<locals>.<lambda>(<Task finishe...===========')>) at /usr/local/lib/python3.7/dist-packages/playwright/_impl/_sync_base.py:85>
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_sync_base.py", line 85, in <lambda>
    task.add_done_callback(lambda _: g_self.switch())
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_helper.py", line 282, in impl
    )(route, request)
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_impl_to_api_mapping.py", line 116, in wrapper_func
    *list(map(lambda a: self.from_maybe_impl(a), args))[:arg_count]
  File "plw.py", line 162, in handle_route
    response = page.request.fetch(route.request)
  File "/usr/local/lib/python3.7/dist-packages/playwright/sync_api/_generated.py", line 14231, in fetch
    ignoreHTTPSErrors=ignore_https_errors,
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_sync_base.py", line 89, in _sync
    return task.result()
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_fetch.py", line 321, in fetch
    "ignoreHTTPSErrors": ignoreHTTPSErrors,
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_connection.py", line 44, in send
    lambda: self.inner_send(method, params, False)
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_connection.py", line 370, in _
    return await result
  File "/usr/local/lib/python3.7/dist-packages/playwright/_impl/_connection.py", line 78, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Request context disposed.

@ja8zyjits
Copy link

Will closing all the bg tasks the best idea? #1430

@rwoll
Copy link
Member Author

rwoll commented Jul 18, 2022

I dont think closing all the background tasks would be the right thing to do. Since we might have background async tasks of the remaining tests running.
This will cancel everything and make the running program hang.

@ja8zyjits Can you share an example script/repo where you see it hanging and/or what kind of background async tasks you are creating? I want to make sure I understand the issue properly. Where/when are you entering/exiting the Context Manager?

To clarify: the patch you linked to cancels the tasks on Context Manager exit, and only if Playwright "owns" the event loop (i.e. only if it created the event loop). I think it's reasonable design to say: when you enter the Playwright Context and don't provide an event loop, Playwright will create the event loop, and cancel all tasks on Context Manager exit.

Thanks!

@ja8zyjits
Copy link

the patch you linked to cancels the tasks on Context Manager exit, and only if Playwright "owns" the event loop (i.e. only if it created the event loop)

I just read the full code, I missed the part where own loop condition was being checked. I think that wont kill the background tasks that was created by the already running loop.
My bad.

@ja8zyjits
Copy link

The tracemalloc gives us an insight into the memory allocation rising. Run this piece of code

# memory_leak.py
import asyncio
import tracemalloc
from playwright._impl._connection import from_channel
from playwright.async_api import async_playwright
from playwright._impl._api_types import Error


async def create_browser_and_page(playwright):
    chromium = playwright.chromium  # or "firefox" or "webkit".
    browser = await chromium.launch()
    page = await create_page(browser)
    return browser, page


async def create_page(browser):
    page = await browser.new_page()
    await page.route("*", intercept_request)
    return page


async def intercept_request(route, request):
    if request.resource_type in ["stylesheet", "font"]:
        await route.abort()
    else:
        await route.continue_()


async def snap_shot():
    tracemalloc.start()
    session = await async_playwright().start()
    browser, page = await create_browser_and_page(session)
    snapshot1 = tracemalloc.take_snapshot()
    for _ in range(10):
        await page.goto("http://httpbin.org")
        snapshot = tracemalloc.take_snapshot()
        diff_snaps = snapshot.compare_to(snapshot1, "lineno")
        print_snaps(diff_snaps)
    snapshot2 = tracemalloc.take_snapshot()
    top_stats = snapshot2.compare_to(snapshot1, 'lineno')
    print("############ final difference in snapshot is ###############")
    print_snaps(top_stats)
    await page.close()
    await browser.close()
    await session.stop()


def print_snaps(diff_snaps):
    print("\n\n****************************")
    print("[ Top 10 differences ]")
    for stat in diff_snaps[:10]:
        print(stat)
    print("****************************\n\n")


asyncio.run(snap_shot())

The result would be somewhat similar to this

$ python memory_leak.py
############ final difference in snapshot is ###############


****************************
[ Top 10 differences ]
/usr/lib/python3.8/json/decoder.py:353: size=400 KiB (+334 KiB), count=5206 (+4337), average=79 B
/usr/lib/python3.8/site-packages/playwright/_impl/_network.py:575: size=197 KiB (+197 KiB), count=1948 (+1948), average=103 B
/usr/lib/python3.8/site-packages/pyee/_base.py:41: size=112 KiB (+102 KiB), count=1315 (+1199), average=87 B
/usr/lib/python3.8/site-packages/playwright/_impl/_connection.py:354: size=166 KiB (+89.4 KiB), count=808 (+429), average=210 B
/usr/lib/python3.8/tracemalloc.py:185: size=69.2 KiB (+69.2 KiB), count=1477 (+1477), average=48 B
/usr/lib/python3.8/linecache.py:137: size=1873 KiB (+59.4 KiB), count=18033 (+647), average=106 B
/usr/lib/python3.8/tracemalloc.py:121: size=58.9 KiB (+58.9 KiB), count=837 (+837), average=72 B
/usr/lib/python3.8/site-packages/playwright/_impl/_connection.py:352: size=82.4 KiB (+57.4 KiB), count=1318 (+918), average=64 B
/usr/lib/python3.8/tracemalloc.py:472: size=46.8 KiB (+46.8 KiB), count=1186 (+1186), average=40 B
/usr/lib/python3.8/tracemalloc.py:111: size=46.2 KiB (+46.2 KiB), count=591 (+591), average=80 B
****************************

The network and connection module is adding to the memory. Shouldn't we have a stable memory allocation over a certain number of page requests?

@rwoll
Copy link
Member Author

rwoll commented Jul 18, 2022

Looks like the snippet is starting (session = await async_playwright().start()) and stopping (await session.stop()) the PW session manually. async_playwright().start() is mostly for REPL interaction.

Code should use context manager:

async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto("http://playwright.dev")
        print(await page.title())
        await browser.close()

which will ensure more proper cleanup the way it's currently implemented.

Also, when filing the perf and mem issues, can you clarify the scenario under which you are hitting them? In other words, are you hitting them organically (and then producing an inorganic repro) and is it preventing you from using Playwright, or are you finding them only when stress testing/hunting?

For prioritizing, it's helpful for us to understand the context in which you're seeing these issues. Thanks!

@ja8zyjits
Copy link

ja8zyjits commented Jul 19, 2022

or are you finding them only when stress testing/hunting

yes, even if you are having a context manager, after the browser and page object is created, if we keep hitting the website continuously, we could see the memory profile rising.

We detected it when our docker containers running tests failed after oom. Long tests, fails. But if we do the unittests based on pytest plugins we didn't detect any anomaly yet.

We had similar tests in pyppeteer, and no such memory issue was found.

I wrote the above script as a simpler reproducing step and it could properly point out at the issue I believe. Shouldn't the network tasks destroy unwanted objects before the next request?

@rwoll
Copy link
Member Author

rwoll commented Jul 19, 2022

Shouldn't the network tasks destroy unwanted objects before the next request?

Oh, I misread your snippet. No,

    for _ in range(100000):
        await page.goto("http://httpbin.org")

is known to retain per-request metadata.

Each test should close the BrowserContext (if it's not auto-closed by your test runner).

But if we do the unittests based on pytest plugins we didn't detect any anomaly yet.

The official pytest plugin uses a BrowserContext per test and closes it, so that's likely why you aren't seeing the issue.

/cc @mxschmitt

@ja8zyjits
Copy link

Each test should close the BrowserContext (if it's not auto-closed by your test runner).

Oh, It would have been great if we could have a method to clear up the request meta-data.

We have tests that use the cookies that we set at the first request, being used at the 100+ tests later. This makes it sure that some pages cannot be visited without visiting a number of other pages.

If we clear the context then that will make it difficult for us to test such sequential actions.

@rwoll
Copy link
Member Author

rwoll commented Jul 19, 2022

We have tests that use the cookies that we set at the first request, being used at the 100+ tests later. This makes it sure that some pages cannot be visited without visiting a number of other pages.

Tests shouldn't depend on other tests, otherwise it makes them impossible to retry without trying all the tests before them. This can also prevent parallelizing your tests, as well as introduce flakes.

We recommend using a setup fixture/function that exports storage state (https://playwright.dev/python/docs/api/class-browsercontext#browser-context-storage-state) that can then be loaded into each isolated test via https://playwright.dev/python/docs/api/class-browser#browser-new-context-option-storage-state. Having a new context per test is the assumed model for tests.

@ja8zyjits
Copy link

We recommend using a setup fixture/function that exports storage state (https://playwright.dev/python/docs/api/class-browsercontext#browser-context-storage-state) that can then be loaded into each isolated test via https://playwright.dev/python/docs/api/class-browser#browser-new-context-option-storage-state. Having a new context per test is the assumed model for tests.

This sounds good and very useful too. I will surely try this.

Also, can you point out the exact line of code where this retention of per-request metadata is happening? I would like to hack and play around it in the repl env, would also help me understand the code base a bit better.

@rwoll
Copy link
Member Author

rwoll commented Jul 19, 2022

Also, can you point out the exact line of code where this retention of per-request metadata is happening? I would like to hack and play around it in the repl env, would also help me understand the code base a bit better.

This isn't a specific line, but hopefully this helps (along with some example code and print statements):

Here's some pointers:

  • The Request objects extend ChannelOwner (see definition in _network.py of Request)
  • ChannelOwner has a concept of parent and retains objects (see _connection.py)
  • So as long as we have a reference to a ChannelOwner we keep references to the object(s) in its tree around
  • ChannelOwner has a _dispose method that cleans/removes itself and its tree

Given the following print statements added:

diff --git a/playwright/_impl/_connection.py b/playwright/_impl/_connection.py
index f9790ff..967cb43 100644
--- a/playwright/_impl/_connection.py
+++ b/playwright/_impl/_connection.py
@@ -117,6 +117,8 @@ class ChannelOwner(AsyncIOEventEmitter):
         self._connection._objects[guid] = self
         if self._parent:
             self._parent._objects[guid] = self
+        if (self._type in ["BrowserContext", "Request"]):
+            print(f"CREATE of {self._guid} with PARENT {self._parent._guid}")
 
     def _dispose(self) -> None:
         # Clean up from parent and connection.
@@ -129,6 +131,9 @@ class ChannelOwner(AsyncIOEventEmitter):
             object._dispose()
         self._objects.clear()
 
+        if (self._type in ["BrowserContext", "Request"]):
+            print(f"DISPOSE of {self._guid}")
+
 
 class ProtocolCallback:
     def __init__(self, loop: asyncio.AbstractEventLoop) -> None:

If we run:

import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        context = await browser.new_context()
        page = await context.new_page()

        for i in range(0, 10):
          print(f"GOTO…https://httpbin.org/status/200")
          await page.goto("https://httpbin.org/status/200")
          print("END GOTO")
        
        print("CLOSING CONTEXT")
        await context.close()
        print("CONTEXT CLOSED")
        await browser.close()

asyncio.run(main())

We get the following output:

CREATE of browser-context@ae7c236c2be1bfca6392e89211f2fcd8 with PARENT browser@5a977e1c91d7c3b3f5d2064f120499b2
GOTO…https://httpbin.org/status/200
CREATE of request@a3e3f745f734e0244cc5e1bb6b62771b with PARENT browser-context@ae7c236c2be1bfca6392e89211f2fcd8
END GOTO
GOTO…https://httpbin.org/status/200
CREATE of request@e7c0c2d94214aed4e055657a55ef0409 with PARENT browser-context@ae7c236c2be1bfca6392e89211f2fcd8
END GOTO
GOTO…https://httpbin.org/status/200
CREATE of request@5507188480fe2fa06342f327c82fc5ea with PARENT browser-context@ae7c236c2be1bfca6392e89211f2fcd8
END GOTO
GOTO…https://httpbin.org/status/200
CREATE of request@ca15845df5a8d2daefe6a34f28b71a92 with PARENT browser-context@ae7c236c2be1bfca6392e89211f2fcd8
END GOTO
GOTO…https://httpbin.org/status/200
CREATE of request@9440dae25eb460396d716896e59d9967 with PARENT browser-context@ae7c236c2be1bfca6392e89211f2fcd8
END GOTO
GOTO…https://httpbin.org/status/200
CREATE of request@2e0128e3a72f8fcfa82a81edd3cf2b96 with PARENT browser-context@ae7c236c2be1bfca6392e89211f2fcd8
END GOTO
GOTO…https://httpbin.org/status/200
CREATE of request@da364febf26d8dde6f71759a157a598a with PARENT browser-context@ae7c236c2be1bfca6392e89211f2fcd8
END GOTO
GOTO…https://httpbin.org/status/200
CREATE of request@7bcab194c95a8fb4a400452d840b4359 with PARENT browser-context@ae7c236c2be1bfca6392e89211f2fcd8
END GOTO
GOTO…https://httpbin.org/status/200
CREATE of request@b83aa8ee827287422e15613f792cc1cb with PARENT browser-context@ae7c236c2be1bfca6392e89211f2fcd8
END GOTO
GOTO…https://httpbin.org/status/200
CREATE of request@3c99b702976d1900e558b9a85a3d32ec with PARENT browser-context@ae7c236c2be1bfca6392e89211f2fcd8
END GOTO
CLOSING CONTEXT
DISPOSE of request@a3e3f745f734e0244cc5e1bb6b62771b
DISPOSE of request@e7c0c2d94214aed4e055657a55ef0409
DISPOSE of request@5507188480fe2fa06342f327c82fc5ea
DISPOSE of request@ca15845df5a8d2daefe6a34f28b71a92
DISPOSE of request@9440dae25eb460396d716896e59d9967
DISPOSE of request@2e0128e3a72f8fcfa82a81edd3cf2b96
DISPOSE of request@da364febf26d8dde6f71759a157a598a
DISPOSE of request@7bcab194c95a8fb4a400452d840b4359
DISPOSE of request@b83aa8ee827287422e15613f792cc1cb
DISPOSE of request@3c99b702976d1900e558b9a85a3d32ec
DISPOSE of browser-context@ae7c236c2be1bfca6392e89211f2fcd8
CONTEXT CLOSED

Note that we don't dispose of the request objects until their parent (BrowserContext) is disposed.

Does that help?

@ja8zyjits
Copy link

Does that help?

Yes it does, thank you for taking time to explain this. This does help me dig into what is happening underneath. Playing with that piece of code help me understand what all details are being retained.

Also If I may ask, shouldn't we be able to clear up all the request-meta and frame-meta data once the page is cleared? Are those data required later?

@rwoll
Copy link
Member Author

rwoll commented Jul 20, 2022

Also If I may ask, shouldn't we be able to clear up all the request-meta and frame-meta data once the page is cleared? Are those data required later?

Before I answer, I think it's important to point out the recommendation is to use a BrowserContext per-test, so on one-hand it doesn't matter that it's retained. Playwright's model doesn't expect you to call page.goto(…) 1000 times in a row.

To answer: yes, if a goto response (or other request) is stored (as in you store it in a variable) and then interact with it after other navigations, it should still have data. When you call context.close(), that's considered the end of the lifetime of the objects, so it's ok for it to be disposed then.

@ja8zyjits
Copy link

Before I answer, I think it's important to point out the recommendation is to use a BrowserContext per-test,

Yes I understand the utility of context here.

with it after other navigations, it should still have data

Shouldn't this end with the page being disposed rather than the context? Since a request-response is linked to the parent page rather than the entire context?

@rwoll rwoll removed the v1.24 label Jul 20, 2022
@rwoll
Copy link
Member Author

rwoll commented Jul 20, 2022

Shouldn't this end with the page being disposed rather than the context? Since a request-response is linked to the parent page rather than the entire context?

On one hand yes, but we consider it context-scoped.


Closing this as part of triage in anticipation of the 1.24 cut. Once that's done, if there are related items, please file specific tickets that include repros and the usecases from which they were derived.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants