Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

page.wait_for_navigation().await?; seems to return before all the pages assets (images,js,css) are fully loaded #184

Open
mxaddict opened this issue Oct 18, 2023 · 4 comments

Comments

@mxaddict
Copy link

page.wait_for_navigation().await?; seems to return before all the pages assets (images,js,css) are fully loaded

I'm trying to load a page that has some images that are added onto the page via js logic after an API request.

I was under the impression that the page.wait_for_navigation().await?; call would wait for these to load, but it seems it does not.

Is there a way to get this to behave the way I expected it to?

@mxaddict
Copy link
Author

I have a work around that I've implemented on my end:

Which is to have a event listener logging the timestamp of the last request.

Then in a new thread, I have a loop that checks if the last request is older than timeout.

    let page = Arc::new(browser.new_page("about:blank").await?);
    let last_request = Arc::new(Mutex::new(Instant::now()));
    let xlast_request = last_request.clone();

    let mut request_paused = page.event_listener::<EventRequestPaused>().await.unwrap();
    let xpage = page.clone();
    let interceptor_handle = tokio::spawn(async move {
        while let Some(event) = request_paused.next().await {
            *xlast_request.lock().unwrap() = Instant::now();
            info!(event.request.url);
            if let Err(e) = xpage.execute(ContinueRequestParams::new(event.request_id.clone())).await {
                error!("Failed to continue request: {e}");
            }
        }
    });
pub async fn wait_for_page(last: Arc<Mutex<Instant>>, timeout: Duration) {
    loop {
        tokio::time::sleep(timeout).await;
        if (last.lock().unwrap()).elapsed() > timeout {
            return;
        }
    }
}

@shulcsm
Copy link
Contributor

shulcsm commented Oct 19, 2023

I guess duplicate of #36

@mxaddict
Copy link
Author

I believe so, I did not see that issue before posting 😄

@beckend
Copy link

beckend commented Nov 5, 2023

Here is my approach:

    page
      .evaluate(
        r#"() =>
            new Promise((resolve) => {
              if (document.readyState === 'complete') {
                resolve('completed-no-event')
              } else {
                addEventListener('load', () => {
                  resolve('complete-event')
                })
              }
            })
        "#,
      )
      .await?;

This will even enable single page applications to be scraped, so no web pages needs to be server side rendered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants