Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with page.screenshot #5300

Closed
timparker183 opened this issue Jan 10, 2020 · 3 comments
Closed

Problems with page.screenshot #5300

timparker183 opened this issue Jan 10, 2020 · 3 comments

Comments

@timparker183
Copy link

We're having problems getting screenshots to work with the latest Puppeteer. Using page.screenshot(fullPage=true) worked with much older versions (Puppeteer 1.1/Chromium 69/Node 8) - at least for reasonably-sized pages - but this crashes with the latest (Puppeteer 2.0/Chromium 79/Node 12), and the older work-arounds we've found are also failing.

Experimental evidence seems to show that page.screenshot crashes (Puppeteer 2.0.0/Chromium 79) if the captured image is larger than about 11 million pixels - either with 'fullPage=true' or with a specified clipping region.

An example page for this problem is:
https://www.dpreview.com/archive/2019/12

If we set the viewport to 1920w x 1080h, go to the page, and let things settle, document.body.clientHeight for this page is 28671 - much larger than the apparent 11 megapixel limit

With the older Puppeteer release, we could either capture the contents by making several calls to page.screenshot with a smaller clipping region specified (and splicing the resulting tiles together into a single output image) or we could be content with capturing just the top half - but the latest Puppeteer clips the contents to the viewport, so only the initial 1080 lines have useful data.

Someone seems to think that this clipping behavior is desirable - in this issue: #2079
... as indicated by this item in the initial post
make page.screenshot clip elements to the viewport per upstream Chromium changes. This matches the clipping behavior of elements in inner scrollers, i.e. document and overflow scroll clipping now work the same.

A partial work-around is to reset the viewport to be the size of the rendered page (1920x28671 in this case) - which allows the slice-and-splice strategy to work - but this results in a resize event being fired. For this example page, the resize event is not a problem - but for any pages which do have resize event handlers, we're hosed.

So... this boils down to a cry for help, a problem to solve, and several enhancement requests:

  1. (Problem to solve) - page.screenshot(fullPage=true) should not crash if the page in question is too large. If the dimensions of 'full page' are too large, it should return what it can without crashing.

  2. (Enhancement) - provide a startup option to revert the clipping behavior of page.screenshot to NOT clip to the current viewport so a slice-and-splice implementation can be used without altering the viewport

  3. (Enhancement - alternate to item 2) - add a new boolean argument 'clipToViewport' to page.screenshot to allow the caller to specify whether to clip to the current viewport or not. I'd suggest that the default for this should be 'false' to maximize compatibility with ealier puppeteer releases.

  4. (Enhancement) - provide an option to page.setViewport to suppress any resulting onResize event

  5. (Enhancement) - allow much larger page.screenshot output (there will always be a 'too large', but this should be much larger than the current maximum of approx. 11,000,000 pixels). This would eliminate the need for users to implement their own slice-and-splice to capture large pages

  6. (Enhancement) define 'too large' - Add new startup options and/or a new 'page.setMaxScreenshotSize'' method to specify the maximum dimensions for an image produced by page.screenshot. I'll suggest the default should be something like 2000w by 50000h. If the actual content is larger than that, clip to the maximum and return.

Steps to reproduce

Tell us about your environment:

  • Puppeteer version: 2.0.0, HeadlessChrome/79.0.3945.0
  • Platform / OS version: AWS Lambda (using latest from alixaxel's chrome-aws-lambda layer (version 2.0.2))
  • URLs (if applicable):
  • Node.js version: 12 (AWS Lambda)

======= code fragments =======
====== Initial page load - problems below assume this setup in place ======

let page = await browser.newPage();
let viewportSize = {width: 1920, height: 1080};

await page.setViewport(viewportSize);

let result= await page.goto('https://www.dpreview.com/archive/2019/12',{timeout:30000,waitUntil:'networkidle0'});

let actualPageHeight = await page.evaluate(() => document.body.clientHeight);
let actualPageWidth = await page.evaluate(() => document.body.clientWidth);

======= first problem ====

await page.screenshot({path:"/tmp/screenshot.png",fullPage:true});
// crash - actualPageHeight * actualPageWidth is greater than 11 million

======= second problem =======

await page.screenshot({path:"/tmp/screenshot.png",clip:{x:0,y:0,width:actualPageWidth,height:actualPageHeight}});
// crash - page.screenshot chokes on oversized clip region

====== third problem =========

await page.screenshot({path:"/tmp/screenshot.png",clip:{x:0,y:0,width:1920,height:5000}});
// no crash, but resulting image is blank below y=1080
await page.screenshot({path:"/tmp/screenshot2.png",clip:{x:0,y:5000,width:1920,height:5000}});
// no crash, but resulting image is completely blank

====== fourth problem =========
await page.setViewport({width:actualPageWidth, height:actualPageHeight});
await page.screenshot({path:"/tmp/screenshot.png",clip:{x:0,y:0,width:1920,height:5000}});
// no crash, image contains all expected pixels, but page sees a resize event
await page.screenshot({path:"/tmp/screenshot2.png",clip:{x:0,y:5000,width:1920,height:5000}});
// no crash, resulting image has expected content, but only if page does not react to resize event

@timparker183
Copy link
Author

update - the worst of these problems go away when we revert to 1.20 - this is fine until we're forced to migrate from Node 10 to Node 12 (we're seeing a missing library problem there), but we have a little more than a year before we lose Node 10 support.

That said... with 1.20 we don't have to mess with the viewport size, but we still have to make multiple calls to page.screenshot, and it appears that we have to limit the clip region to no more than (height=4000) to ensure that nothing is missing.

so... to reiterate... what we really need here is for page.screenshot (with fullPage=true) to return an image bounded by the actual page dimensions, up to reasonable size limits (configurable by startup arguments and/or a new method like 'page.setMaxImageDimensions'

@stale
Copy link

stale bot commented Jun 26, 2022

We're marking this issue as unconfirmed because it has not had recent activity and we weren't able to confirm it yet. It will be closed if no further activity occurs within the next 30 days.

@stale stale bot added the unconfirmed label Jun 26, 2022
@stale
Copy link

stale bot commented Jul 26, 2022

We are closing this issue. If the issue still persists in the latest version of Puppeteer, please reopen the issue and update the description. We will try our best to accomodate it!

@stale stale bot closed this as completed Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant