Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check Firefox compatibility for Windows #7668

Closed
wants to merge 4 commits into from

Conversation

whimboo
Copy link
Collaborator

@whimboo whimboo commented Oct 11, 2021

As it looks like there might be some incompatibilities with our CDP implementation when Fission is enabled on Windows. This was mentioned in #7659.

I'm going to force disable Fission for now. Maybe this helps to make the failure go away.

@google-cla google-cla bot added the cla: yes label Oct 11, 2021
@whimboo
Copy link
Collaborator Author

whimboo commented Oct 11, 2021

Moving to draft for now because I would like to do some experiments, which might trigger to a backout of e7f8262 (PR #7610).

@whimboo whimboo marked this pull request as draft October 11, 2021 07:56
@whimboo
Copy link
Collaborator Author

whimboo commented Oct 11, 2021

Forcing Fission disabled doesn't actually fix the test. So some other change that I might have done recently in Firefox might have caused it.

I'll check with custom Firefox builds where it might have been introduced.

@whimboo whimboo force-pushed the windows_fission branch 4 times, most recently from 127a948 to 8588ede Compare October 12, 2021 07:41
@whimboo
Copy link
Collaborator Author

whimboo commented Nov 10, 2021

The problem that we currently have with Windows is that Firefox uses a launcher process that initially gets started, and then itself launches the real Firefox process. As such Puppeteer only knows about the launcher process, and that is the process that gets killed by a couple of tests. By doing that the real Firefox process remains running and is blocking the temporary profile from being removed. And these are the failures that happen most right now.

As such I would suggest to find a solution for that problem first, which means that we would need to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1656962 on our side. Requiring client to find the real process I don't see as a possible solution.

CC @OrKoN.

@OrKoN
Copy link
Collaborator

OrKoN commented Nov 10, 2021

@whimboo thanks for the info, can we disable that test for FireFox on Windows for now?

@whimboo
Copy link
Collaborator Author

whimboo commented Nov 10, 2021

@whimboo thanks for the info, can we disable that test for FireFox on Windows for now?

It might not be only a single test. Basically each test that causes BrowserRunner.kill() to be executed.

But hey, I just saw some code which looks kinda weird! Why are we trying to remove the temporary user data directory BEFORE we kill the browser process?

Is that somehow on purpose? From our point of view this will clearly cause issues when the process still has hold on some of the files of the profile. Can we flip that around? I think that this might already help in some cases.

@whimboo
Copy link
Collaborator Author

whimboo commented Nov 10, 2021

I created #7761 to check that.

@whimboo
Copy link
Collaborator Author

whimboo commented Nov 10, 2021

@OrKoN we also kill the browser here:

try {
const connection = await runner.setupConnection({
usePipe: pipe,
timeout,
slowMo,
preferredRevision: this._preferredRevision,
});
const browser = await Browser.create(
connection,
[],
ignoreHTTPSErrors,
defaultViewport,
runner.proc,
runner.close.bind(runner)
);
if (waitForInitialPage)
await browser.waitForTarget((t) => t.type() === 'page', { timeout });
return browser;
} catch (error) {
runner.kill();
throw error;
}
}
.

So what I see is that when waitForTarget() fails for the initial page, we also kill the browser while at this time we could also call browser.close() for a soft-close. Any reason why this isn't done?

OrKoN pushed a commit that referenced this pull request Nov 11, 2021
#7762)

When the browser has been started and we have a valid reference lets make use of it instead of force-killing the process. A force kill should probably be the last resort in cleaning up the process.

This will help with Firefox as described on #7668 (comment).
@whimboo whimboo changed the title Check Fission compatibility on Windows Check Firefox compatibility for Windows Nov 18, 2021
@whimboo
Copy link
Collaborator Author

whimboo commented Nov 18, 2021

We landed a fix in Firefox which connects the launcher process as used by default on Windows with all the other Firefox processes within the same job. At the same time we also set --remote-debugging-port to automatically use the --wait-for-browser command line argument on Windows, which is required by automation clients to wait for the real Firefox process to be launched. With these two changes the results on Windows are looking way better now!

Nevertheless there are remaining issues, and maybe I can continue next week.

@whimboo
Copy link
Collaborator Author

whimboo commented Nov 18, 2021

Actually the test Launcher.spec should filter out ignored default arguments causes troubles. In case of Firefox it removes the --no-remote and --headless command line arguments on Windows. This means the following:

  1. Without --no-remote specified an existing Firefox process will be re-used, and a new Firefox with the Puppeteer profile will not be spawned.
  2. Removing --headless will launch the browser in non-headless mode and ignoring the HEADLESS setting.

While this might work for Chrome it's not appropriate for Firefox. Also depending on the platform and some other options different default arguments are getting set for Firefox, so the above problem will not only happen on Windows.

@OrKoN any suggestions? IMHO we could update the test so that it only removes the very last argument, which seems to always be the about:blank one?

@OrKoN
Copy link
Collaborator

OrKoN commented Nov 19, 2021

@whimboo sorry, I am not really familiar with this part. It looks like ignoreDefaultArgs is an optional parameter (https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#puppeteerlaunchoptions). Do you mean that the test is failing on FF? I didn't get the suggestion about updating the test to only remove the last argument.

@whimboo
Copy link
Collaborator Author

whimboo commented Nov 22, 2021

@whimboo sorry, I am not really familiar with this part. It looks like ignoreDefaultArgs is an optional parameter (https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#puppeteerlaunchoptions). Do you mean that the test is failing on FF? I didn't get the suggestion about updating the test to only remove the last argument.

Ok, so IMHO there should be a distinction between default arguments and mandatory arguments. As such it might be that some of the currently used default arguments for Firefox should actually be mandatory:

firefoxArguments.push('--wait-for-browser');

First the --no-remote argument. It's used across platforms to always create a new instance of Firefox. If that argument is removed (as what the test does) a new window for an already existing Firefox process will be opened. As such this causes side-effects for any following test.

Beside that the test also removes the 3rd default argument which in case of Windows and a headless CI job (which is done by default) will be --headless.

As such I would propose the following:

  1. We move the --no-remote argument to launch(), which would be [similar to --remote-debugging-port] argument, and does NOT allow to ignore it.

  2. We update the test so that it removes one or two other default arguments that are really optional. Not sure if we could even push some other non existent arguments as defaults first, to get these removed before starting the browser.

@OrKoN
Copy link
Collaborator

OrKoN commented Nov 22, 2021

@whimboo is there a case when a user might want to remove --no-remote so that a new window for an already existing Firefox process will be opened? In general, I think it makes sense to move the flags that are mandatory out of the default arguments. Thanks for the explanation! Also, does it make sense to have separate tests for Firefox and Chromium to make sure that the right arguments are removed/not removed?

@whimboo
Copy link
Collaborator Author

whimboo commented Apr 21, 2022

@whimboo is there a case when a user might want to remove --no-remote so that a new window for an already existing Firefox process will be opened? In general, I think it makes sense to move the flags that are mandatory out of the default arguments. Thanks for the explanation! Also, does it make sense to have separate tests for Firefox and Chromium to make sure that the right arguments are removed/not removed?

FYI a patch for that actually landed in #8233 and #8250. Now there are two distinct tests for both Chrome and Firefox. I'll rebase and trigger new checks in a moment.

@whimboo
Copy link
Collaborator Author

whimboo commented Apr 21, 2022

Note that right now there seems to be only a single remaining failure on Windows which would prevent us from re-enabling results in CI. The failing test is:

 Page.evaluate
   should transfer 100Mb of data from page to node.js:

@whimboo
Copy link
Collaborator Author

whimboo commented Apr 25, 2022

 Page.evaluate
   should transfer 100Mb of data from page to node.js:

@OrKoN the error trace as visible in the CI logs is far from being helpful. Do you see what could be wrong here? I'm not able to see what is actually failing here, and I cannot reproduce it locally on my local Windows machine.

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 26, 2022

@whimboo I am not sure what exactly fails but if I read the code right: a CDPSession receives a message that has an ID (thus, a callback is expected as the message is a response to a message from Puppeteer); yet the callback is missing so it looks like an unsolicited response from the browser. I suggest adding DEBUG=puppeteer.* env var to the workflow and inspect the CDP traffic.

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 26, 2022

Ah sorry, you should also probably include a .only (possible with a eslint comment disabling rules we might have). Otherwise, it seems that logging would cause jobs to time out.

@whimboo
Copy link
Collaborator Author

whimboo commented Apr 26, 2022

As it looks like the job on Windows doesn't finish at all. It's running 5:30h now and I think that I should cancel. Not sure how to continue investigation given that I cannot reproduce locally.

https://github.com/puppeteer/puppeteer/runs/6171558887?check_suite_focus=true

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 26, 2022

I think it actually might be the problem with GitHub actions. Let's try to restart it and see if it times out again. I can also try on my Windows locally.

@whimboo
Copy link
Collaborator Author

whimboo commented Apr 26, 2022

@OrKoN it doesn't look like that this has made any difference. As such if you could have a quick look on your local machine I would appreciate! Thanks.

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 27, 2022

Unfortunately, I am unable to repro on my Win too :(

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 27, 2022

Let me run the full test suite: perhaps there is some state leaking from previous tests.

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 27, 2022

Screenshot 2022-04-27 at 10 26 36

I have the following failures locally with this PR (not sure if I have any failures on main).

@whimboo
Copy link
Collaborator Author

whimboo commented Apr 27, 2022

Hm, this is sad to hear that you cannot reproduce the failure locally and that the CI checks do not have the failures that you are seeing locally. :(

As such I think that due to time constraints I'll have to post-pone any work on this PR for some time.

@OrKoN
Copy link
Collaborator

OrKoN commented Dec 12, 2022

I believe we can close this one.

@OrKoN OrKoN closed this Dec 12, 2022
@whimboo
Copy link
Collaborator Author

whimboo commented Dec 12, 2022

@OrKoN, are tests on Windows run again for Firefox? I cannot remember if / that we enabled those.

@OrKoN
Copy link
Collaborator

OrKoN commented Dec 12, 2022

@whimboo no, we have not enabled them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants