Skip to content
This repository has been archived by the owner on May 8, 2020. It is now read-only.

A suspended issue when i click to more next page on headless mode #201

Open
nickliqian opened this issue Feb 15, 2019 · 7 comments
Open

Comments

@nickliqian
Copy link

nickliqian commented Feb 15, 2019

1. Env version

OS: Ubuntu 16.04
Python version: 3.6.2
pyppeteer version: 0.0.25
Chrome version: 575458(default) (I also try other version like 579032 609904)

2. What happened?

I want to visit this page csdn: https://blog.csdn.net/xlgen157387 by pypeeteer on headless mode.
Then click next page util last page number.
We can see this list page has total 28 pages when we open this url.
I run my code for this purpose, but it stuck on page 24.
Actually I catch Navigation Timeout Exceeded Error on page 24, then retry to click next page button.
It suspended ont this code await page.screenshot({"path": "img/exp{}.png".format(i), "fullPage": True}).
Important: When headless=False, no this issue!

3. Origin code

You can recover this error by this code.
I suspect the connect lost. If you can click to last page not other issue, can you share your env version for me?Thks!!!
simplify version

import asyncio
import pyppeteer


async def main():
    browser = await pyppeteer.launch(
        headless=True
    )

    page = await browser.newPage()
    await page.setViewport({"width": 1900, "height": 1100})
    await page.setUserAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36")
    await page.goto('https://blog.csdn.net/xlgen157387')

    for i in range(1, 200):
        print("page {} =====".format(i))
        ua_ele = await page.xpath("//li[@class='js-page-next js-page-action ui-pager']")
        print("before click count {}".format(len(ua_ele)))
        ele = ua_ele[0]
        await ele.click()
        await page.waitForNavigation()  # stuck (suspended) here when click to <page i=25>

        ua_ele = await page.xpath("//li[@class='js-page-next js-page-action ui-pager']")
        print("after click count {}".format(len(ua_ele)))
        if len(ua_ele) == 0:
            print("last page!")
            break
        await page.waitForXPath("//li[@class='js-page-next js-page-action ui-pager']")
        await page.waitFor(800)
        await page.content()

    await browser.close()
    return "a"


loop = asyncio.get_event_loop()
task = asyncio.ensure_future(main())
loop.run_until_complete(task)

detail version

import asyncio
import pyppeteer
import time
from pyppeteer.errors import NetworkError, TimeoutError
import os


async def main():
    browser = await pyppeteer.launch(
        headless=True
    )

    page = await browser.newPage()
    await page.setViewport({"width": 1900, "height": 1100})
    await page.setUserAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36")
    await page.goto('https://blog.csdn.net/xlgen157387')

    for i in range(1, 200):
        print("page {} =====".format(i))
        ua_ele = await page.xpath("//li[@class='js-page-next js-page-action ui-pager']")
        print("before click count {}: {}".format(len(ua_ele), ua_ele))
        ele = ua_ele[0]
        print("-a")

        click_count = 0
        while click_count < 3:
            try:
                print("hu")
                await page.screenshot({"path": "img/exp{}.png".format(i), "fullPage": True})
                # if can not click, waitForNavigation function will fail
                print("-b b")
                ec = await ele.click()
                print(ec)
                print("-b")
                await page.waitForNavigation()  # stuck (suspended) here when click to <page i=25>
                print("-b b b")
                break
            except TimeoutError as e:
                print("some error after click: {}".format(e))
                click_count += 1
                print("usi")

        ua_ele = await page.xpath("//li[@class='js-page-next js-page-action ui-pager']")
        print("after click count {}: {}".format(len(ua_ele), ua_ele))
        if len(ua_ele) == 0:
            print("last page!")
            break
        await page.waitForXPath("//li[@class='js-page-next js-page-action ui-pager']")
        print("-c")
        await page.waitFor(1200)
        print("-d")
        content = await page.content()
        print("-e")
        # await page.deleteCookie()

    await browser.close()
    return "a"


s = time.time()
loop = asyncio.get_event_loop()
task = asyncio.ensure_future(main())
loop.run_until_complete(task)

print(task.result())
e = time.time()
print("{}s".format(e-s))
@nokados
Copy link

nokados commented Feb 17, 2019

I can reproduce this error, but only I have this script freezes on page 24.
UPD: Indeed, the last log message with page number is "page 24 ====", but the URL is https://blog.csdn.net/xlgen157387/article/list/25?

@nokados
Copy link

nokados commented Feb 17, 2019

I logged events in frame_manager.py like mishaberezi in a similar issue in pUppeteer.
The freezing arises after creating an execution context. The message "execution context created" is logged at the last line of the function _onExecutionContextCreated, so I assume that context was created succesfully. However, the next expecting messages like lifecycle events are not received.

log
page 23 =====
before click count 1
lifecycleevevent {'frameId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': '4249AAE0943906B495E9D1DDB1AE1D52', 'name': 'init', 'timestamp': 40460.853683}
lifecycleevevent {'frameId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': '3E997CE5D24DF006A1FF1FDC7394F390', 'name': 'init', 'timestamp': 40461.103231}
execution context destrouyed 70
frame detached 
execution context destrouyed 69
frame detached 
execution context destrouyed 68
execution context cleared
frame navigated {'id': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': '3E997CE5D24DF006A1FF1FDC7394F390', 'url': 'https://blog.csdn.net/xlgen157387/article/list/24?', 'securityOrigin': 'https://blog.csdn.net', 'mimeType': 'text/html'}
execution context created {'id': 71, 'origin': 'https://blog.csdn.net', 'name': '', 'auxData': {'isDefault': True, 'frameId': '47F975CF196F8FBBB9423C5D462B7D79'}}
lifecycleevevent {'frameId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': '3E997CE5D24DF006A1FF1FDC7394F390', 'name': 'firstPaint', 'timestamp': 40461.271383}
lifecycleevevent {'frameId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': '3E997CE5D24DF006A1FF1FDC7394F390', 'name': 'firstContentfulPaint', 'timestamp': 40461.271383}
lifecycleevevent {'frameId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': '3E997CE5D24DF006A1FF1FDC7394F390', 'name': 'firstTextPaint', 'timestamp': 40461.271383}
lifecycleevevent {'frameId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': '3E997CE5D24DF006A1FF1FDC7394F390', 'name': 'firstImagePaint', 'timestamp': 40461.271383}
lifecycleevevent {'frameId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': '3E997CE5D24DF006A1FF1FDC7394F390', 'name': 'firstMeaningfulPaintCandidate', 'timestamp': 40461.271383}
lifecycleevevent {'frameId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': '3E997CE5D24DF006A1FF1FDC7394F390', 'name': 'DOMContentLoaded', 'timestamp': 40461.448674}
lifecycleevevent {'frameId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': '3E997CE5D24DF006A1FF1FDC7394F390', 'name': 'firstMeaningfulPaintCandidate', 'timestamp': 40461.37856}
frame attached 9A96ED59951C6DC44C5CB6E69B220AFD 47F975CF196F8FBBB9423C5D462B7D79 False
lifecycleevevent {'frameId': '9A96ED59951C6DC44C5CB6E69B220AFD', 'loaderId': '93FA7757E124633729295113644641C6', 'name': 'DOMContentLoaded', 'timestamp': 40461.501684}
lifecycleevevent {'frameId': '9A96ED59951C6DC44C5CB6E69B220AFD', 'loaderId': 'B0D77E7EDDD9833CF17E70790D6EFE39', 'name': 'init', 'timestamp': 40461.502118}
frame navigated {'id': '9A96ED59951C6DC44C5CB6E69B220AFD', 'parentId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': 'B0D77E7EDDD9833CF17E70790D6EFE39', 'name': '', 'url': 'about:blank', 'securityOrigin': '://', 'mimeType': 'text/html'}
execution context created {'id': 72, 'origin': 'https://blog.csdn.net', 'name': '', 'auxData': {'isDefault': True, 'frameId': '9A96ED59951C6DC44C5CB6E69B220AFD'}}
lifecycleevevent {'frameId': '9A96ED59951C6DC44C5CB6E69B220AFD', 'loaderId': 'B0D77E7EDDD9833CF17E70790D6EFE39', 'name': 'load', 'timestamp': 40461.50941}
frame stopped loading 
lifecycleevevent {'frameId': '9A96ED59951C6DC44C5CB6E69B220AFD', 'loaderId': 'B0D77E7EDDD9833CF17E70790D6EFE39', 'name': 'DOMContentLoaded', 'timestamp': 40461.512303}
frame attached 52E46163F744BC16C379A8F3B0E9C76E 47F975CF196F8FBBB9423C5D462B7D79 False
lifecycleevevent {'frameId': '52E46163F744BC16C379A8F3B0E9C76E', 'loaderId': 'D177D80179268253AF8685D377AC5757', 'name': 'DOMContentLoaded', 'timestamp': 40461.88449}
lifecycleevevent {'frameId': '52E46163F744BC16C379A8F3B0E9C76E', 'loaderId': '355A57893FF868ADA3D83CD2225D62E7', 'name': 'init', 'timestamp': 40461.884764}
lifecycleevevent {'frameId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': '3E997CE5D24DF006A1FF1FDC7394F390', 'name': 'load', 'timestamp': 40461.888697}
lifecycleevevent {'frameId': '52E46163F744BC16C379A8F3B0E9C76E', 'loaderId': '47E7F26FD1BD1D3C12F60AAC3988DB54', 'name': 'init', 'timestamp': 40461.894251}
frame navigated {'id': '52E46163F744BC16C379A8F3B0E9C76E', 'parentId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': '47E7F26FD1BD1D3C12F60AAC3988DB54', 'name': 'BAIDU_DUP_fp_iframe', 'url': 'https://pos.baidu.com/wh/o.htm?ltr=', 'securityOrigin': 'https://pos.baidu.com', 'mimeType': 'text/html'}
execution context created {'id': 73, 'origin': 'https://pos.baidu.com', 'name': '', 'auxData': {'isDefault': True, 'frameId': '52E46163F744BC16C379A8F3B0E9C76E'}}
lifecycleevevent {'frameId': '52E46163F744BC16C379A8F3B0E9C76E', 'loaderId': '47E7F26FD1BD1D3C12F60AAC3988DB54', 'name': 'load', 'timestamp': 40461.915709}
frame stopped loading 
frame stopped loading 
lifecycleevevent {'frameId': '52E46163F744BC16C379A8F3B0E9C76E', 'loaderId': '47E7F26FD1BD1D3C12F60AAC3988DB54', 'name': 'DOMContentLoaded', 'timestamp': 40461.916774}
after click count 1
lifecycleevevent {'frameId': '9A96ED59951C6DC44C5CB6E69B220AFD', 'loaderId': 'B0D77E7EDDD9833CF17E70790D6EFE39', 'name': 'networkAlmostIdle', 'timestamp': 40461.512309}
lifecycleevevent {'frameId': '9A96ED59951C6DC44C5CB6E69B220AFD', 'loaderId': 'B0D77E7EDDD9833CF17E70790D6EFE39', 'name': 'networkIdle', 'timestamp': 40461.512309}
lifecycleevevent {'frameId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': '3E997CE5D24DF006A1FF1FDC7394F390', 'name': 'firstMeaningfulPaint', 'timestamp': 40461.37856}
lifecycleevevent {'frameId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': '3E997CE5D24DF006A1FF1FDC7394F390', 'name': 'networkAlmostIdle', 'timestamp': 40461.818695}
lifecycleevevent {'frameId': '52E46163F744BC16C379A8F3B0E9C76E', 'loaderId': '47E7F26FD1BD1D3C12F60AAC3988DB54', 'name': 'networkAlmostIdle', 'timestamp': 40461.91678}
lifecycleevevent {'frameId': '52E46163F744BC16C379A8F3B0E9C76E', 'loaderId': '47E7F26FD1BD1D3C12F60AAC3988DB54', 'name': 'networkIdle', 'timestamp': 40461.916793}
page 24 =====
before click count 1
lifecycleevevent {'frameId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': 'B513C68D3230A62C47CC3ECF2C3DD0D1', 'name': 'init', 'timestamp': 40462.838782}
lifecycleevevent {'frameId': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': 'FF9136930CB61A71978C5AD85CBE9F00', 'name': 'init', 'timestamp': 40463.086377}
execution context destrouyed 73
frame detached 
execution context destrouyed 72
frame detached 
execution context destrouyed 71
execution context cleared
frame navigated {'id': '47F975CF196F8FBBB9423C5D462B7D79', 'loaderId': 'FF9136930CB61A71978C5AD85CBE9F00', 'url': 'https://blog.csdn.net/xlgen157387/article/list/25?', 'securityOrigin': 'https://blog.csdn.net', 'mimeType': 'text/html'}
execution context created {'id': 74, 'origin': 'https://blog.csdn.net', 'name': '', 'auxData': {'isDefault': True, 'frameId': '47F975CF196F8FBBB9423C5D462B7D79'}}
Traceback (most recent call last):
  File "freeze.py", line 38, in 
    loop.run_until_complete(task)
  File "/home/nokados/anaconda3/lib/python3.7/asyncio/base_events.py", line 573, in run_until_complete
    return future.result()
  File "freeze.py", line 21, in main
    await page.waitForNavigation()  # stuck (suspended) here when click to 
  File "/home/nokados/.local/lib/python3.7/site-packages/pyppeteer/page.py", line 938, in waitForNavigation
    raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms 

In addition, if I start with any other page, the freezing will rise again after 24 iterations, after creating the context with id=74. For example, if I start with page=3, then I will be stuck during loading page 27.
However, if I will load only page 3 many times, I will get the exception at 25 iterations, i.e. at creating the context with id=77.

@nickliqian
Copy link
Author

I can reproduce this error, but only I have this script freezes on page 24.
UPD: Indeed, the last log message with page number is "page 24 ====", but the URL is https://blog.csdn.net/xlgen157387/article/list/25?

Thank you very much for your feedback!
It‘s a error of my issue description.Freeze page number is page 24.

@nickliqian
Copy link
Author

I logged events in frame_manager.py like mishaberezi in a similar issue in pUppeteer.
The freezing arises after creating an execution context. The message "execution context created" is logged at the last line of the function _onExecutionContextCreated, so I assume that context was created succesfully. However, the next expecting messages like lifecycle events are not received.

log
In addition, if I start with any other page, the freezing will rise again after 24 iterations, after creating the context with id=74. For example, if I start with page=3, then I will be stuck during loading page 27.
However, if I will load only page 3 many times, I will get the exception at 25 iterations, i.e. at creating the context with id=77.

Hi @nokados .
Very intresting found for <get the exception at 25 iterations>.
I read this issue Page.Content freezes without error after running 101 times #4011.
I have also encountered such a problem with node.js and puppeteer@1.12.1, when I changed to puppeteer@1.7.1 and chromium@579032, the example code can run normally. According to this answer, puppeteer@1.11.1 has no freeze issue.

@nokados
Copy link

nokados commented Feb 24, 2019

I explored this issue deeper and I have 2 investigations.

1. The similar freeze may be caused by an exception raised in the _recv_loop coroutine in connection.py.

This example will cause the same symptoms:

async def _recv_loop(self) -> None:
        async with self._ws as connection:
            self._connected = True
            self.connection = connection
            while self._connected:
                try:
                    resp = await self.connection.recv()
                    raise Exception('dummy exception') # <--- this is what we added
                    if resp:
                        await self._on_message(resp)
                except (websockets.ConnectionClosed, ConnectionResetError):
                    logger.info('connection closed')
                    break
                await asyncio.sleep(0)
        if self._connected:
            self._loop.create_task(self.dispose())

Anyway, it is not your case, because no exceptions are thrown here. However, maybe there is another coroutine that has the same bug...

2. The last executed line of code I could track is not in pyppeteer. It is in websockets library in protocol.py.

There is the next code:

    async def recv(self) -> Data:
            # skip some code ...
            pop_message_waiter: asyncio.Future[None] = self.loop.create_future()
            self._pop_message_waiter = pop_message_waiter
            try:
                await asyncio.wait(
                    [pop_message_waiter, self.transfer_data_task],
                    loop=self.loop,
                    return_when=asyncio.FIRST_COMPLETED,
                )
            finally:
                self._pop_message_waiter = None

pop_message_waiter is a future that is created here and do nothing.
transfer_data_task should read incoming messages and put them in a queue. In fact, its code is not executed (in case of freeze).
I suppose, control passes to some other coroutine, which goes wrong.

@nokados
Copy link

nokados commented Feb 24, 2019

Besides, I get this error message to chrome console on every page:

A parser-blocking, cross site (i.e. different eTLD+1) script, https://csdnimg.cn/cdn/content-toolbar/iconfont.js?_=4567%22, is invoked via document.write. The network request for this script MAY be blocked by the browser in this or a future page load due to poor network connectivity. If blocked in this page load, it will be confirmed in a subsequent console message. See https://www.chromestatus.com/feature/5718547946799104 for more details.

It may be the reason for some error, but it does not justify the hang of the entire script. We should still receive an exception and be able to keep progress, but there is no such possibility.

@aidataguy
Copy link

Hi, Just adding my two cents...

you could try adding element.appendChild() or similar to tackle this... refer to this answer on google devs site.

https://developers.google.com/web/updates/2016/08/removing-document-write

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants