New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] cleanup tasks on page and context close #1433
Comments
After update:
many many warnings, and
|
Will closing all the bg tasks the best idea? #1430 |
@ja8zyjits Can you share an example script/repo where you see it hanging and/or what kind of background async tasks you are creating? I want to make sure I understand the issue properly. Where/when are you entering/exiting the Context Manager? To clarify: the patch you linked to cancels the tasks on Context Manager exit, and only if Playwright "owns" the event loop (i.e. only if it created the event loop). I think it's reasonable design to say: when you enter the Playwright Context and don't provide an event loop, Playwright will create the event loop, and cancel all tasks on Context Manager exit. Thanks! |
I just read the full code, I missed the part where |
The # memory_leak.py
import asyncio
import tracemalloc
from playwright._impl._connection import from_channel
from playwright.async_api import async_playwright
from playwright._impl._api_types import Error
async def create_browser_and_page(playwright):
chromium = playwright.chromium # or "firefox" or "webkit".
browser = await chromium.launch()
page = await create_page(browser)
return browser, page
async def create_page(browser):
page = await browser.new_page()
await page.route("*", intercept_request)
return page
async def intercept_request(route, request):
if request.resource_type in ["stylesheet", "font"]:
await route.abort()
else:
await route.continue_()
async def snap_shot():
tracemalloc.start()
session = await async_playwright().start()
browser, page = await create_browser_and_page(session)
snapshot1 = tracemalloc.take_snapshot()
for _ in range(10):
await page.goto("http://httpbin.org")
snapshot = tracemalloc.take_snapshot()
diff_snaps = snapshot.compare_to(snapshot1, "lineno")
print_snaps(diff_snaps)
snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("############ final difference in snapshot is ###############")
print_snaps(top_stats)
await page.close()
await browser.close()
await session.stop()
def print_snaps(diff_snaps):
print("\n\n****************************")
print("[ Top 10 differences ]")
for stat in diff_snaps[:10]:
print(stat)
print("****************************\n\n")
asyncio.run(snap_shot()) The result would be somewhat similar to this $ python memory_leak.py
############ final difference in snapshot is ###############
****************************
[ Top 10 differences ]
/usr/lib/python3.8/json/decoder.py:353: size=400 KiB (+334 KiB), count=5206 (+4337), average=79 B
/usr/lib/python3.8/site-packages/playwright/_impl/_network.py:575: size=197 KiB (+197 KiB), count=1948 (+1948), average=103 B
/usr/lib/python3.8/site-packages/pyee/_base.py:41: size=112 KiB (+102 KiB), count=1315 (+1199), average=87 B
/usr/lib/python3.8/site-packages/playwright/_impl/_connection.py:354: size=166 KiB (+89.4 KiB), count=808 (+429), average=210 B
/usr/lib/python3.8/tracemalloc.py:185: size=69.2 KiB (+69.2 KiB), count=1477 (+1477), average=48 B
/usr/lib/python3.8/linecache.py:137: size=1873 KiB (+59.4 KiB), count=18033 (+647), average=106 B
/usr/lib/python3.8/tracemalloc.py:121: size=58.9 KiB (+58.9 KiB), count=837 (+837), average=72 B
/usr/lib/python3.8/site-packages/playwright/_impl/_connection.py:352: size=82.4 KiB (+57.4 KiB), count=1318 (+918), average=64 B
/usr/lib/python3.8/tracemalloc.py:472: size=46.8 KiB (+46.8 KiB), count=1186 (+1186), average=40 B
/usr/lib/python3.8/tracemalloc.py:111: size=46.2 KiB (+46.2 KiB), count=591 (+591), average=80 B
**************************** The |
Looks like the snippet is starting ( Code should use context manager: async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
await page.goto("http://playwright.dev")
print(await page.title())
await browser.close() which will ensure more proper cleanup the way it's currently implemented. Also, when filing the perf and mem issues, can you clarify the scenario under which you are hitting them? In other words, are you hitting them organically (and then producing an inorganic repro) and is it preventing you from using Playwright, or are you finding them only when stress testing/hunting? For prioritizing, it's helpful for us to understand the context in which you're seeing these issues. Thanks! |
yes, even if you are having a context manager, after the browser and page object is created, if we keep hitting the website continuously, we could see the memory profile rising. We detected it when our docker containers running tests failed after We had similar tests in pyppeteer, and no such memory issue was found. I wrote the above script as a simpler reproducing step and it could properly point out at the issue I believe. Shouldn't the network tasks destroy unwanted objects before the next request? |
Oh, I misread your snippet. No,
is known to retain per-request metadata. Each test should close the BrowserContext (if it's not auto-closed by your test runner).
The official pytest plugin uses a BrowserContext per test and closes it, so that's likely why you aren't seeing the issue. /cc @mxschmitt |
Oh, It would have been great if we could have a method to clear up the request meta-data. We have tests that use the cookies that we set at the first request, being used at the 100+ tests later. This makes it sure that some pages cannot be visited without visiting a number of other pages. If we clear the context then that will make it difficult for us to test such sequential actions. |
Tests shouldn't depend on other tests, otherwise it makes them impossible to retry without trying all the tests before them. This can also prevent parallelizing your tests, as well as introduce flakes. We recommend using a setup fixture/function that exports storage state (https://playwright.dev/python/docs/api/class-browsercontext#browser-context-storage-state) that can then be loaded into each isolated test via https://playwright.dev/python/docs/api/class-browser#browser-new-context-option-storage-state. Having a new |
This sounds good and very useful too. I will surely try this. Also, can you point out the exact line of code where this retention of per-request metadata is happening? I would like to hack and play around it in the repl env, would also help me understand the code base a bit better. |
This isn't a specific line, but hopefully this helps (along with some example code and print statements): Here's some pointers:
Given the following print statements added: diff --git a/playwright/_impl/_connection.py b/playwright/_impl/_connection.py
index f9790ff..967cb43 100644
--- a/playwright/_impl/_connection.py
+++ b/playwright/_impl/_connection.py
@@ -117,6 +117,8 @@ class ChannelOwner(AsyncIOEventEmitter):
self._connection._objects[guid] = self
if self._parent:
self._parent._objects[guid] = self
+ if (self._type in ["BrowserContext", "Request"]):
+ print(f"CREATE of {self._guid} with PARENT {self._parent._guid}")
def _dispose(self) -> None:
# Clean up from parent and connection.
@@ -129,6 +131,9 @@ class ChannelOwner(AsyncIOEventEmitter):
object._dispose()
self._objects.clear()
+ if (self._type in ["BrowserContext", "Request"]):
+ print(f"DISPOSE of {self._guid}")
+
class ProtocolCallback:
def __init__(self, loop: asyncio.AbstractEventLoop) -> None: If we run: import asyncio
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch()
context = await browser.new_context()
page = await context.new_page()
for i in range(0, 10):
print(f"GOTO…https://httpbin.org/status/200")
await page.goto("https://httpbin.org/status/200")
print("END GOTO")
print("CLOSING CONTEXT")
await context.close()
print("CONTEXT CLOSED")
await browser.close()
asyncio.run(main()) We get the following output:
Note that we don't dispose of the request objects until their Does that help? |
Yes it does, thank you for taking time to explain this. This does help me dig into what is happening underneath. Playing with that piece of code help me understand what all details are being retained. Also If I may ask, shouldn't we be able to clear up all the request-meta and frame-meta data once the page is cleared? Are those data required later? |
Before I answer, I think it's important to point out the recommendation is to use a BrowserContext per-test, so on one-hand it doesn't matter that it's retained. Playwright's model doesn't expect you to call To answer: yes, if a goto response (or other request) is stored (as in you store it in a variable) and then interact with it after other navigations, it should still have data. When you call |
Yes I understand the utility of context here.
Shouldn't this end with the page being disposed rather than the context? Since a request-response is linked to the parent page rather than the entire context? |
On one hand yes, but we consider it context-scoped. Closing this as part of triage in anticipation of the 1.24 cut. Once that's done, if there are related items, please file specific tickets that include repros and the usecases from which they were derived. |
Follow up to #1402
The text was updated successfully, but these errors were encountered: