Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Browser.close does not shutdown the process immediately #12186

Open
1 of 2 tasks
heaven opened this issue Mar 31, 2024 · 16 comments
Open
1 of 2 tasks

[Bug]: Browser.close does not shutdown the process immediately #12186

heaven opened this issue Mar 31, 2024 · 16 comments

Comments

@heaven
Copy link

heaven commented Mar 31, 2024

Error string

MaxListenersExceededWarning: Possible EventEmitter memory leak detected.

Bug behavior

  • Flaky
  • PDF

Background

The code example above is a simplified version of our Lambda function. Basically. we don't do anything tricky, only load the page, giving it reasonable time to load while monitoring the ready state of the page and its source code, and get the data from it.

The crawler works great except sometimes the browser freezes. In most cases, we catch this when closing the browser. We scrape hundreds of thousands of pages and weren't able to make it work reliably reusing the same browser instance, even with periodical restarts. Therefore, we close and open the browser on each crawl.

At some point, it freezes on the browser.close() so we had to implement a timeout function and try to kill the process, which doesn't work well. Even after sending the SIGKILL signal, the process stays. Probably in a zombie state or just stucks somewhere, I don't know and we have no instruments to investigate. Must be an upstream bug in the chrome itself.

Expectation

There should be a way to get rid of a faulty process. Disregarding the nature of the failure, we should be able to kill it and start a new one.

Reality

The fact we're sending SIGKILL and it brings us nowhere tells me the child process is spawned incorrectly and wasn't properly detached from its parent. When something goes wrong, such processes stay as long as their parent, which can take forever as we have no control over Lambda's lifecycle (almost). The only solution I can think of is to call exit in the main function, which will forcefully restart the Lambda. Then we handle a faulty response and restart the job.

We started seeing this problem after upgrading to 112+. We are on 122 now but the picture is the same.

Screenshot 2024-03-31 at 17 22 51

Puppeteer configuration file (if used)

No response

Puppeteer version

22.4.1

Node version

20.11.1

Package manager

npm

Package manager version

10.2.4

Operating system

Linux

@heaven heaven added the bug label Mar 31, 2024
Copy link

github-actions bot commented Mar 31, 2024

This issue has an outdated Puppeteer version: 22.4.1. Please verify your issue on the latest 22.6.2 version. Then update the form accordingly.


Analyzer run

@github-actions github-actions bot added invalid and removed invalid labels Mar 31, 2024
@OrKoN
Copy link
Collaborator

OrKoN commented Mar 31, 2024

Thanks for reporting the issue but @sparticuz/chromium is not the browser version we officially support. Also, we cannot help with debugging Lambda environments. I’d suggest raising an issue with @sparticuz/chromium first, checking that the lambda function has sufficient memory and cpu for chromium as well as no cpu throttling is going on (some serverless environment throttle the cpu once the response was provided).

@OrKoN
Copy link
Collaborator

OrKoN commented Mar 31, 2024

MaxListenersExceededWarning: Possible EventEmitter memory leak detected.

note that this is not an error but a warning, and it’s very likely not accurate if you expect to launch multiple browser instances from one node process. You set max processes to one but it appears that actually one node process launches multiple browsers. It does not look like the code checks if a browser instance has already been launched and overrides the module scoped variable. What happens I suspect is that multiple events arrive and multiple browsers are launched, but only the last launched one is closed.

@OrKoN OrKoN closed this as not planned Won't fix, can't repro, duplicate, stale Mar 31, 2024
@heaven
Copy link
Author

heaven commented Mar 31, 2024

@OrKoN I've installed the versions following this guide https://pptr.dev/chromium-support.

The memory was the first thing I suspect but in this case, there's enough free memory and I had this issue even when increasing Lambda's memory a few times. No CPU throttling was ever reported as well.

I think this is a similar issue #9195

We launch only one instance, the problem is not in the warning itself but in that, the child process is not killed properly. Another problem is I have no idea in which state it stays. If it is a zombie that's probably not a big deal as it freed up most of the resources and just hangs around. But the problem remains.

The code I provided is a simplified version, we launch just one instance and stop it once finished. The process.setMaxListeners(1); was set to identify process leakage, which it did.

When we launch the browser we also subscribe to the disconnect event, so if it disconnects unexpectedly we try to kill the process, which also leaves the process.

@OrKoN
Copy link
Collaborator

OrKoN commented Mar 31, 2024

@heaven could you please provide a reproducible example outside of Lambda using Chrome for Testing binaries mentioned on the version support page? Perhaps you could include the events triggering the lambda function as well? (I assume they are concurrent). The current example cannot be executed. Note that sparticuz/chromium builds even with matching versions can still have differences since they have been customized to run in AWS Lambda environment. In the sample code you provided the browserProcess variable is overwritten on concurrent events and the kill command might not reach the right process.

@OrKoN OrKoN reopened this Mar 31, 2024
@github-actions github-actions bot added invalid and removed invalid labels Mar 31, 2024
@heaven
Copy link
Author

heaven commented Apr 1, 2024

I wish I could but I'm struggling to reproduce this. I've been debugging this all day, though found one website that causes the bug in 4 of 5 crawls – https://canpromos.ca/size/3xl. My observation is it freezes on pages that take longer time to load.

Lambdas are concurrent only in terms of multiple instances being started but a single instance is always dedicated to one request at a time. So when we initiate 500 calls to a lambda function concurrently, 500 lambda functions are started. That being said, the browserProcess variable is "thread safe", it will survive between the calls but cannot be accessed concurrently by separate invocations. When a consecutive call is received, an already running and idling Lambda will be reused.

I've resolved my issue for now by restarting the entire function once there are 5 dead processes hanging around. So I stopped killing the browser process manually when it refuses to quit and only do that when the browser disconnects. Sometimes it takes a few more seconds, other times a few minutes. I think the browser freezes on something, I have no idea where exactly, might be some args 🤷‍♂️. Every time I try to recrawl the page that blocked the browser it succeeds.

The event we send to it could be as simple as:

{
  "url": "https://canpromos.ca/size/3xl"
}

The layer:

{
  "dependencies": {
    "puppeteer-core": "^22.4.1"
  },
  "devDependencies": {
    "@sparticuz/chromium": "^122.0.0"
  }
}

So basically just create a folder nodejs, create a package.json there, and run npm install. Then make a zip archive with this nodejs folder and create a layer with it (you'd need to upload the archive to S3 first because of the size limit).

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 1, 2024

Thanks for the details, but I am not able to test using AWS Lambda. Perhaps open an issue with @sparticuz/chromium still? The project is focused on serverless environment so perhaps someone knows what might be causing this.

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 1, 2024

@heaven have you tried to pass dumpio: true to the launch function to see what logs the browser process produces?

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 1, 2024

Also, the env var DEBUG=puppeteer:browsers:launcher to log messages for the launch/kill functions.

@heaven
Copy link
Author

heaven commented Apr 1, 2024

Yeah, I definitely will submit to @sparticuz/chromium. My initial thought was that Puppeter should be able to get rid of a dead browser whatever the underlying issue is but now I'm not that certain, it feels like the browser goes into a faulty state and none of Puppeteer's commands ever reach it. I'll try with dumpio, thank you.

@heaven
Copy link
Author

heaven commented Apr 2, 2024

So I tried dumpio: true but did not find anything suspicious there. Some GPU-related complaints when the graphicsMode is set to false. When set to true, those complaints are gone but that doesn't change the main picture.

A typical log would look like this:

2024-04-02T11:41:29.560Z	2f9f5ec4-58fd-4dce-a2ec-7de890e1981a	INFO	Starting: https://canpromos.ca/size/3xl
2024-04-02T11:41:29.560Z	2f9f5ec4-58fd-4dce-a2ec-7de890e1981a	INFO	Launching browser
START RequestId: 2f9f5ec4-58fd-4dce-a2ec-7de890e1981a Version: $LATEST
2024-04-02T11:41:29.561Z puppeteer:browsers:launcher Launching /tmp/chromium --allow-pre-commit-input --disable-background-networking --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-component-update --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-field-trial-config --disable-hang-monitor --disable-infobars --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --disable-search-engine-choice-screen --disable-sync --enable-automation --export-tagged-pdf --generate-pdf-document-outline --force-color-profile=srgb --metrics-recording-only --no-first-run --password-store=basic --use-mock-keychain --disable-features=Translate,AcceptCHFrame,MediaRouter,OptimizationHints,ProcessPerSiteUpToMainFrameThreshold,Translate,BackForwardCache,AcceptCHFrame,MediaRouter,OptimizationHints,AudioServiceOutOfProcess,IsolateOrigins,site-per-process --enable-features=NetworkServiceInProcess2,NetworkServiceInProcess2,SharedArrayBuffer --headless=new --hide-scrollbars --mute-audio about:blank --allow-pre-commit-input --disable-background-networking --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-component-update --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --disable-sync --enable-automation --enable-blink-features=IdleDetection --export-tagged-pdf --force-color-profile=srgb --metrics-recording-only --no-first-run --password-store=basic --use-mock-keychain --disable-domain-reliability --disable-print-preview --disable-speech-api --disk-cache-size=33554432 --mute-audio --no-default-browser-check --no-pings --single-process --font-render-hinting=none --hide-scrollbars --ignore-gpu-blocklist --in-process-gpu --window-size=1920,1080 --disable-webgl --allow-running-insecure-content --disable-setuid-sandbox --disable-site-isolation-trials --disable-web-security --no-sandbox --no-zygote --headless='chrome-headless-shell' --disk-cache-size=268435456 --window-size=1366,4608 --ignore-certificate-errors --disable-remote-fonts --user-data-dir=/tmp/chrome-user-data --remote-debugging-pipe {
detached: true,
env: 
{}
,
stdio: [ 'ignore', 'pipe', 'pipe', 'pipe', 'pipe' ]
}
2024-04-02T11:41:29.563Z puppeteer:browsers:launcher Launched 176
[0402/114129.574263:ERROR:egl_util.cc(44)] Failed to load GLES library: /tmp/libGLESv2.so: /tmp/libGLESv2.so: cannot open shared object file: No such file or directory
[0402/114129.600608:ERROR:check.cc(319)] Check failed: false. NOTREACHED log messages are omitted in official builds. Sorry!
[0402/114129.600646:ERROR:gpu_channel_manager.cc(955)] Failed to create GLES3 context, fallback to GLES2.
[0402/114129.600652:ERROR:check.cc(319)] Check failed: false. NOTREACHED log messages are omitted in official builds. Sorry!
[0402/114129.600657:ERROR:gpu_channel_manager.cc(966)] ContextResult::kFatalFailure: Failed to create shared context for virtualization.
[0402/114129.600661:ERROR:shared_image_stub.cc(728)] SharedImageStub: unable to create context
[0402/114129.600668:ERROR:gpu_channel.cc(767)] GpuChannel: Failed to create SharedImageStub
[0402/114129.600686:ERROR:check.cc(319)] Check failed: false. NOTREACHED log messages are omitted in official builds. Sorry!
[0402/114129.600692:ERROR:gpu_channel_manager.cc(955)] Failed to create GLES3 context, fallback to GLES2.
[0402/114129.600695:ERROR:check.cc(319)] Check failed: false. NOTREACHED log messages are omitted in official builds. Sorry!
[0402/114129.600700:ERROR:gpu_channel_manager.cc(966)] ContextResult::kFatalFailure: Failed to create shared context for virtualization.
[0402/114129.600704:ERROR:shared_image_stub.cc(728)] SharedImageStub: unable to create context
[0402/114129.600708:ERROR:gpu_channel.cc(767)] GpuChannel: Failed to create SharedImageStub
2024-04-02T11:41:29.621Z	2f9f5ec4-58fd-4dce-a2ec-7de890e1981a	INFO	Opening page
2024-04-02T11:41:29.646Z	2f9f5ec4-58fd-4dce-a2ec-7de890e1981a	INFO	Loading: https://canpromos.ca/size/3xl
[0402/114138.505549:INFO:CONSOLE(403)] "Turbo mode file downloaded!", source: https://canpromos.ca/wp-content/plugins/woocommerce-products-filter/ext/turbo_mode/js/turbo_mode.js?ver=1.3.4.5 (403)
[0402/114138.560148:INFO:CONSOLE(0)] "Third-party cookie will be blocked. Learn more in the Issues tab.", source: https://canpromos.ca/size/3xl/ (0)
[0402/114139.783212:INFO:CONSOLE(0)] "Third-party cookie will be blocked. Learn more in the Issues tab.", source: https://canpromos.ca/size/3xl/ (0)
[0402/114139.803421:INFO:CONSOLE(0)] "Third-party cookie will be blocked. Learn more in the Issues tab.", source: https://canpromos.ca/size/3xl/ (0)
[0402/114139.919333:INFO:CONSOLE(1)] "[OptinMonster] The origin header was not set in the request.", source: https://a.omappapi.com/app/js/api.min.js (1)
[0402/114141.388245:INFO:CONSOLE(0)] "Third-party cookie will be blocked. Learn more in the Issues tab.", source: https://canpromos.ca/size/3xl/ (0)
[0402/114141.724074:INFO:CONSOLE(0)] "Third-party cookie will be blocked. Learn more in the Issues tab.", source: https://canpromos.ca/size/3xl/ (0)
[0402/114142.900913:INFO:CONSOLE(24)] "[Meta Pixel] - An invalid email address was specified for 'em'. This data will not be sent with any events for this Pixel.", source: https://connect.facebook.net/en_US/fbevents.js (24)

When calling browser.close() this line puppeteer:browsers:launcher Browser process 176 onExit never appears.

In my opinion, the problem is that the page keeps loading. So I set the timeout of 25 seconds which aborts the promise but the page keeps loading, then I close it and something doesn't go according to the plan.

So instead of closing the page, I did this:

// Close the pages, unbind browser events, and close it.
async function shutdown(browser) {
  console.log("Shutting down");

  try {
    const pages = await browser.pages();
    let page;

    console.log("Pages open:", pages.length);

    for (let i = 0; i < pages.length; i++) {
      page = pages[i];

      if (page.crawler_cdp_client)
        page.crawler_cdp_client.off('Network.requestIntercepted');

      page.off("request");
      page.off("response");
      page.off("framenavigated");

      await page.goto("about:blank");
    }
  } catch(e) {
    console.error("Unexpected error when closing page:", e);
  }

  if (browser) {
    try {
      browser.off('disconnected');

      // Give the browser 5 seconds to shut down
      await timeout(
        browser.close().catch((e) => { console.error("Error closing browser:", e) }), 5000
      ).catch(async () => monitorBrowserStop(browser));
    } catch(e) {
      console.error("Unexpected error when closing browser:", e);
    }
  }
}

And that worked!

A reproducible script would probably be to load a heavy page that takes too long, preferably in a limited environment with a slow vCPU, set a navigation timeout, and try closing the page and the browser.

@github-actions github-actions bot added invalid and removed invalid labels Apr 2, 2024
@heaven
Copy link
Author

heaven commented Apr 2, 2024

After this change, the log output looks like this (I've removed dumpio as there is nothing useful in there)

2024-04-02T12:03:51.576Z	ab07639c-8d0b-4d0b-bfda-9567bad35d19	INFO	Starting: https://canpromos.ca/size/3xl
2024-04-02T12:03:51.576Z	ab07639c-8d0b-4d0b-bfda-9567bad35d19	INFO	Launching browser
2024-04-02T12:03:51.577Z puppeteer:browsers:launcher Launching /tmp/chromium --allow-pre-commit-input --disable-background-networking --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-component-update --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-field-trial-config --disable-hang-monitor --disable-infobars --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --disable-search-engine-choice-screen --disable-sync --enable-automation --export-tagged-pdf --generate-pdf-document-outline --force-color-profile=srgb --metrics-recording-only --no-first-run --password-store=basic --use-mock-keychain --disable-features=Translate,AcceptCHFrame,MediaRouter,OptimizationHints,ProcessPerSiteUpToMainFrameThreshold,Translate,BackForwardCache,AcceptCHFrame,MediaRouter,OptimizationHints,AudioServiceOutOfProcess,IsolateOrigins,site-per-process --enable-features=NetworkServiceInProcess2,NetworkServiceInProcess2,SharedArrayBuffer --headless=new --hide-scrollbars --mute-audio about:blank --allow-pre-commit-input --disable-background-networking --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-component-update --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --disable-sync --enable-automation --enable-blink-features=IdleDetection --export-tagged-pdf --force-color-profile=srgb --metrics-recording-only --no-first-run --password-store=basic --use-mock-keychain --disable-domain-reliability --disable-print-preview --disable-speech-api --disk-cache-size=33554432 --mute-audio --no-default-browser-check --no-pings --single-process --font-render-hinting=none --hide-scrollbars --ignore-gpu-blocklist --in-process-gpu --window-size=1920,1080 --disable-webgl --allow-running-insecure-content --disable-setuid-sandbox --disable-site-isolation-trials --disable-web-security --no-sandbox --no-zygote --headless='chrome-headless-shell' --disk-cache-size=268435456 --window-size=1366,4608 --ignore-certificate-errors --disable-remote-fonts --user-data-dir=/tmp/chrome-user-data --remote-debugging-pipe {
detached: true,
env: 
{}
,
stdio: [ 'ignore', 'ignore', 'ignore', 'pipe', 'pipe' ]
}
2024-04-02T12:03:51.579Z puppeteer:browsers:launcher Launched 60
2024-04-02T12:03:51.626Z	ab07639c-8d0b-4d0b-bfda-9567bad35d19	INFO	Opening page
2024-04-02T12:03:51.661Z	ab07639c-8d0b-4d0b-bfda-9567bad35d19	INFO	Loading: https://canpromos.ca/size/3xl
2024-04-02T12:04:13.760Z	ab07639c-8d0b-4d0b-bfda-9567bad35d19	INFO	Pages open: 1
2024-04-02T12:04:14.006Z puppeteer:browsers:launcher Browser process 60 onExit
END RequestId: ab07639c-8d0b-4d0b-bfda-9567bad35d19
REPORT RequestId: ab07639c-8d0b-4d0b-bfda-9567bad35d19	Duration: 22490.72 ms	Billed Duration: 22491 ms	Memory Size: 1536 MB	Max Memory Used: 767 MB	

So I think there might be something wrong with the Puppeteer still.

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 2, 2024

Interesting, but not what I expected. browser.close() calls https://chromedevtools.github.io/devtools-protocol/tot/Browser/#method-close which should just close the browser. I wonder if this is the browser issue that it waits for page to load before closing? Is it possible that the page runs some infinite JS loop which could cause calls to be blocked? Also, perhaps in your scenario you do not care about graceful close? in that case, calling browser.process().kill() should also work? Why did kill not work before?

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 2, 2024

@heaven could you include the CDP log (DEBUG=puppeteer:*) for the hanging run?

@heaven
Copy link
Author

heaven commented Apr 2, 2024

Interesting, but not what I expected. browser.close() calls https://chromedevtools.github.io/devtools-protocol/tot/Browser/#method-close which should just close the browser. I wonder if this is the browser issue that it waits for page to load before closing? Is it possible that the page runs some infinite JS loop which could cause calls to be blocked? Also, perhaps in your scenario you do not care about graceful close? in that case, calling browser.process().kill() should also work? Why did kill not work before?

Previously we just did browser.close(), after another upgrade this started hanging occasionally, we found out that closing pages before closing the browser helps. In the Lambda environment, a single function may run for an infinite amount of time so having a graceful shutdown is preferable. Otherwise, it is easy to start hitting some limits, from filesystem overflow to memory and even file descriptors. We do currently try to kill the browser when it suddenly disconnects. But that's rather more like a counter-measure than a solution and in most cases it disconnects because it has already crashed ;)

@heaven could you include the CDP log (DEBUG=puppeteer:*) for the hanging run?

The log is heavy, attaching as a file:

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 2, 2024

Thanks, I see that Browser.close sends a success response but actually does not close the browser.

@OrKoN OrKoN closed this as completed Apr 2, 2024
@OrKoN OrKoN reopened this Apr 2, 2024
@OrKoN OrKoN added confirmed and removed invalid labels Apr 2, 2024
@github-actions github-actions bot removed the invalid label Apr 2, 2024
@OrKoN OrKoN added the P2 label Apr 2, 2024
@OrKoN OrKoN changed the title [Bug]: Crashed browser processes stuck forever [Bug]: Browser.close does not shutdown the process immediately Apr 2, 2024
@OrKoN OrKoN added upstream and removed invalid labels Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants