Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request interception with caching (rev 8695759) fails when loading stylesheet-initiated fonts (?) #7038

Closed
starrify opened this issue Mar 30, 2021 · 8 comments · Fixed by #7060

Comments

@starrify
Copy link
Contributor

starrify commented Mar 30, 2021

0. Overview

TL;DR:

  • The recent fix of request interception and caching in revision 8695759 (Request interception and caching compatibility #6996) does not work properly in some conditions.
  • The error may be stably reproduced.
  • While I'm not 100% sure before understanding the relevant Chromium implement details, this seems to be caused by CSS-initiated font resources

1. Test Configuration

My tests are conducted with:

  • GNU/Linux Debian 10 (buster)
  • Node.js 14.16.0 and npm 7.7.6
  • Puppeteer 8695759 (also its parent bf60a30 for comparison)
  • Chromium r856583 (the bundled version from Puppeteer)

Here is a Dockerfile that may duplicate the exact test environment:

FROM node:14.16.0-buster
# Just for the dependencies.
RUN apt-get update && apt-get install -y chromium
# Puppeteer installation from git requires npm>=7
# See https://github.com/puppeteer/puppeteer/issues/7034
RUN npm install -g npm@7.7.6
WORKDIR /app
ARG PPTR_RVSN=8695759
RUN npm install git+ssh://git@github.com/puppeteer/puppeteer.git#${PPTR_RVSN}
ADD test.js .

Here is the sample JavaScript code I used to trigger the issue:

(async () => {
  const puppeteer = require('puppeteer');
  const browser = await puppeteer.launch({
    // Needed only when inside a Docker container
    args: ['--no-sandbox'],
  });
  const page = await browser.newPage();
  const enableCaching = process.env.TEST_ENABLE_CACHING == 'true';
  await page.setRequestInterception(true, enableCaching);
  page.on('request', (request) => {
    request.continue();
  });
  const url = process.env.TEST_URL || 'https://github.com/';
  try {
    await page.goto(url);
  } catch (err) {
    console.error(err);
  }
  await browser.close();
})();

This script accesses https://github.com/ by default, while in practice I observed the same issue from several other frequently visited websites.

2. Tests and Results

From the above configurations one may set up the test environment and launch the quick test.
Here is an example of triggering the issue:

$ docker build -t tmptest-pptr -f Dockerfile --build-arg PPTR_RVSN=8695759 . &> /dev/null
$ docker run --entrypoint=/bin/bash --env TEST_ENABLE_CACHING=true tmptest-pptr -c "node test.js"
TimeoutError: Navigation timeout of 30000 ms exceeded
    at /app/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:106:111

Here are the respective results per test:

Puppeteer ↓ \ caching → enabled disabled
8695759 error no error
bf60a30 (previous one) N/A no error

Therefore the implement of 8695759 seems not fully functional, at least not for rendering https://github.com/ as tested.

3. Further Analysis

Thanks to Puppeteer's debugging ability, I tried to dump the CDP communication of the failed test via the below command:

$ docker run --entrypoint=/bin/bash --env TEST_ENABLE_CACHING=true --env DEBUG="puppeteer:*" tmptest-pptr -c "node test.js" &> 8695759_caching_on.log

In case that may save you some time, I've uploaded the recorded 8695759_caching_on.log file here: https://gist.github.com/starrify/e1f5ad68c358b932890933be3fb2b736

By comparing the URLs in the log file, it is observed that there are four .woff resources for fonts, all initiated by the stylesheets, that had the Network.requestWillBeSent event but not a Network.responseReceived:

$ tmp_filter() { grep -Po "(?<=protocol:RECV ◀ ).*" | jq 'select(.method=="'$1'")|.params|(.response // .request)|.url' | sort; }
$ diff <(cat 8695759_caching_on.log | tmp_filter Network.requestWillBeSent) <(cat 8695759_caching_on.log | tmp_filter Network.responseReceived)
37,40d36
< "https://github.githubassets.com/static/fonts/alliance/Alliance-No-1-Bold.woff"
< "https://github.githubassets.com/static/fonts/alliance/Alliance-No-1-ExtraBold.woff"
< "https://github.githubassets.com/static/fonts/alliance/Alliance-No-1-Regular.woff"
< "https://github.githubassets.com/static/fonts/alliance/Alliance-No-1-SemiBold.woff"

These four font resources are apparently not cached since the Docker container has a cold start. Also that's confirmed from the extracted CDP log file (e.g. no Network.requestServedFromCache event observed).

Taking one resource Alliance-No-1-SemiBold.woff as an example for further inspection, I've got these from the log file:

$ cat 8695759_caching_on.log | grep "Alliance-No-1-SemiBold.woff" | grep -Po "(?<=protocol:RECV ◀ ).*" | jq '.params|(.requestId//.networkId)' -r
interception-job-28.0
51.164
interception-job-38.0

$ cat 8695759_caching_on.log | grep -P '\b(51\.164|interception-job-28\.0|interception-job-38\.0)\b'
2021-03-30T22:25:48.770Z puppeteer:protocol:RECV ◀ {"method":"Fetch.requestPaused","params":{"requestId":"interception-job-28.0","request":{"url":"https://github.githubassets.com/static/fonts/alliance/Alliance-No-1-SemiBold.woff","method":"GET","headers":{"sec-ch-ua":"","Origin":"https://github.com","sec-ch-ua-mobile":"?0","User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/90.0.4427.0 Safari/537.36","Accept":"*/*","Referer":"https://github.githubassets.com/assets/site-7c8868efa7b2b508d6f89c0e94f94864.css"},"initialPriority":"VeryHigh","referrerPolicy":"strict-origin-when-cross-origin"},"frameId":"0979EF16C3B9EFE96B9BD45E76E24204","resourceType":"Font","networkId":"51.164"},"sessionId":"E5290B3CCB2B7FB6607E6186D7644353"}
2021-03-30T22:25:48.935Z puppeteer:protocol:RECV ◀ {"method":"Network.requestWillBeSent","params":{"requestId":"51.164","loaderId":"5423DC2D96468E21561099D7F1FA7D90","documentURL":"https://github.com/","request":{"url":"https://github.githubassets.com/static/fonts/alliance/Alliance-No-1-SemiBold.woff","method":"GET","headers":{"sec-ch-ua":"","Referer":"https://github.githubassets.com/assets/site-7c8868efa7b2b508d6f89c0e94f94864.css","Origin":"https://github.com","sec-ch-ua-mobile":"?0","User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/90.0.4427.0 Safari/537.36"},"mixedContentType":"none","initialPriority":"VeryHigh","referrerPolicy":"strict-origin-when-cross-origin"},"timestamp":781832.362451,"wallTime":1617143148.769689,"initiator":{"type":"parser","url":"https://github.githubassets.com/assets/site-7c8868efa7b2b508d6f89c0e94f94864.css"},"type":"Font","frameId":"0979EF16C3B9EFE96B9BD45E76E24204","hasUserGesture":false},"sessionId":"E5290B3CCB2B7FB6607E6186D7644353"}
2021-03-30T22:25:48.935Z puppeteer:protocol:SEND ► {"sessionId":"E5290B3CCB2B7FB6607E6186D7644353","method":"Fetch.continueRequest","params":{"requestId":"interception-job-28.0"},"id":46}
2021-03-30T22:25:48.966Z puppeteer:protocol:RECV ◀ {"method":"Fetch.requestPaused","params":{"requestId":"interception-job-38.0","request":{"url":"https://github.githubassets.com/static/fonts/alliance/Alliance-No-1-SemiBold.woff","method":"GET","headers":{"sec-ch-ua":"","Origin":"https://github.com","sec-ch-ua-mobile":"?0","User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/90.0.4427.0 Safari/537.36","Accept":"*/*","Referer":"https://github.githubassets.com/assets/site-7c8868efa7b2b508d6f89c0e94f94864.css"},"initialPriority":"VeryHigh","referrerPolicy":"strict-origin-when-cross-origin"},"frameId":"0979EF16C3B9EFE96B9BD45E76E24204","resourceType":"Font","networkId":"51.164"},"sessionId":"E5290B3CCB2B7FB6607E6186D7644353"}

This suggests something interesting: for the same request (as identified by the networkId field), Chrome actually issued twice the Fetch.requestPaused event for some reason I do not yet know.

While Puppeteer only manages to resume the first pause, such requests are therefore blocked forever.

In addition, since such resources are initiated in the stylesheets, having the requests pending indefinitely would block the browser from issuing the Page.loadEventFired event, therefore Puppeteer would also keep blocking on the page.goto line.

@starrify
Copy link
Contributor Author

In addition, I have added a few more lines to the above testing JavaScript code to add a new mode which configures request interception via CDP directly than via Puppeteer.

The updated JS code and the recorded CDP debugging details have been uploaded here: https://gist.github.com/starrify/b1d0d632006aa3ed4288950e1d13fe21

With the new direct-CDP mode, the request interception indeed works properly with caching enabled upon rendering the same web page: still Chrome emits twice the Fetch.requestPaused event for the same request, but both are properly resumed.

Some details:

$ docker build -t tmptest-pptr -f Dockerfile --build-arg PPTR_RVSN=8695759 . &> /dev/null
$ docker run --entrypoint=/bin/bash --env TEST_ENABLE_CACHING=true --env TEST_INTERCEPTION_TYPE=cdp --env DEBUG="puppeteer:*" tmptest-pptr -c "node test.js" &> 8695759_cdp_interception.log

$ cat 8695759_cdp_interception.log | grep "Alliance-No-1-SemiBold.woff" | grep -Po "(?<=protocol:RECV ◀ ).*" | jq '.params|(.requestId//.networkId)' -r
interception-job-28.0
53.164
interception-job-38.0
53.164
53.164
$ cat 8695759_cdp_interception.log | grep -P '\b(53\.164|interception-job-28\.0|interception-job-38\.0)\b'
2021-03-31T00:47:15.045Z puppeteer:protocol:RECV ◀ {"method":"Fetch.requestPaused","params":{"requestId":"interception-job-28.0","request":{"url":"https://github.githubassets.com/static/fonts/alliance/Alliance-No-1-SemiBold.woff","method":"GET","headers":{"sec-ch-ua":"","Origin":"https://github.com","sec-ch-ua-mobile":"?0","User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/90.0.4427.0 Safari/537.36","Accept":"*/*","Referer":"https://github.githubassets.com/assets/site-7c8868efa7b2b508d6f89c0e94f94864.css"},"initialPriority":"VeryHigh","referrerPolicy":"strict-origin-when-cross-origin"},"frameId":"B8BD7028568C5135DC4142ECDB7D1510","resourceType":"Font","networkId":"53.164"},"sessionId":"F59A5983FAE804121A3D13A663DBE353"}
2021-03-31T00:47:15.045Z puppeteer:protocol:SEND ► {"sessionId":"F59A5983FAE804121A3D13A663DBE353","method":"Fetch.continueRequest","params":{"requestId":"interception-job-28.0"},"id":45}
2021-03-31T00:47:15.210Z puppeteer:protocol:RECV ◀ {"method":"Network.requestWillBeSent","params":{"requestId":"53.164","loaderId":"650DB92CD495C2741E363A74558A791F","documentURL":"https://github.com/","request":{"url":"https://github.githubassets.com/static/fonts/alliance/Alliance-No-1-SemiBold.woff","method":"GET","headers":{"sec-ch-ua":"","Referer":"https://github.githubassets.com/assets/site-7c8868efa7b2b508d6f89c0e94f94864.css","Origin":"https://github.com","sec-ch-ua-mobile":"?0","User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/90.0.4427.0 Safari/537.36"},"mixedContentType":"none","initialPriority":"VeryHigh","referrerPolicy":"strict-origin-when-cross-origin"},"timestamp":790318.637285,"wallTime":1617151635.044523,"initiator":{"type":"parser","url":"https://github.githubassets.com/assets/site-7c8868efa7b2b508d6f89c0e94f94864.css"},"type":"Font","frameId":"B8BD7028568C5135DC4142ECDB7D1510","hasUserGesture":false},"sessionId":"F59A5983FAE804121A3D13A663DBE353"}
2021-03-31T00:47:15.228Z puppeteer:protocol:RECV ◀ {"method":"Fetch.requestPaused","params":{"requestId":"interception-job-38.0","request":{"url":"https://github.githubassets.com/static/fonts/alliance/Alliance-No-1-SemiBold.woff","method":"GET","headers":{"sec-ch-ua":"","Origin":"https://github.com","sec-ch-ua-mobile":"?0","User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/90.0.4427.0 Safari/537.36","Accept":"*/*","Referer":"https://github.githubassets.com/assets/site-7c8868efa7b2b508d6f89c0e94f94864.css"},"initialPriority":"VeryHigh","referrerPolicy":"strict-origin-when-cross-origin"},"frameId":"B8BD7028568C5135DC4142ECDB7D1510","resourceType":"Font","networkId":"53.164"},"sessionId":"F59A5983FAE804121A3D13A663DBE353"}
2021-03-31T00:47:15.228Z puppeteer:protocol:SEND ► {"sessionId":"F59A5983FAE804121A3D13A663DBE353","method":"Fetch.continueRequest","params":{"requestId":"interception-job-38.0"},"id":55}
2021-03-31T00:47:15.231Z puppeteer:protocol:RECV ◀ {"method":"Network.requestWillBeSentExtraInfo","params":{"requestId":"53.164","associatedCookies":[],"headers":{":method":"GET",":authority":"github.githubassets.com",":scheme":"https",":path":"/static/fonts/alliance/Alliance-No-1-SemiBold.woff","sec-ch-ua":"","origin":"https://github.com","sec-ch-ua-mobile":"?0","user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/90.0.4427.0 Safari/537.36","accept":"*/*","sec-fetch-site":"cross-site","sec-fetch-mode":"cors","sec-fetch-dest":"font","referer":"https://github.githubassets.com/assets/site-7c8868efa7b2b508d6f89c0e94f94864.css","accept-encoding":"gzip, deflate, br","accept-language":"en-US"},"clientSecurityState":{"initiatorIsSecureContext":true,"initiatorIPAddressSpace":"Public","privateNetworkRequestPolicy":"WarnFromInsecureToMorePrivate"}},"sessionId":"F59A5983FAE804121A3D13A663DBE353"}
2021-03-31T00:47:15.289Z puppeteer:protocol:RECV ◀ {"method":"Network.responseReceivedExtraInfo","params":{"requestId":"53.164","blockedCookies":[],"headers":{"last-modified":"Mon, 21 Sep 2020 17:43:29 GMT","etag":"\"b623e9ecb8d774231962775086fc197b\"","content-type":"application/font-woff","server":"AmazonS3","via":"1.1 varnish, 1.1 varnish","accept-ranges":"bytes","date":"Wed, 31 Mar 2021 00:47:15 GMT","age":"3450","x-served-by":"cache-dca17733-DCA, cache-ams21077-AMS","x-cache":"HIT, HIT","x-cache-hits":"9, 2","access-control-allow-origin":"*","strict-transport-security":"max-age=31536000","x-fastly-request-id":"7d7399f29a57fb72b2813f21fb391f957e1a6318","content-length":"44400"},"resourceIPAddressSpace":"Public"},"sessionId":"F59A5983FAE804121A3D13A663DBE353"}
2021-03-31T00:47:15.480Z puppeteer:protocol:RECV ◀ {"method":"Network.responseReceived","params":{"requestId":"53.164","loaderId":"650DB92CD495C2741E363A74558A791F","timestamp":790319.070877,"type":"Font","response":{"url":"https://github.githubassets.com/static/fonts/alliance/Alliance-No-1-SemiBold.woff","status":200,"statusText":"","headers":{"x-fastly-request-id":"7d7399f29a57fb72b2813f21fb391f957e1a6318","date":"Wed, 31 Mar 2021 00:47:15 GMT","via":"1.1 varnish, 1.1 varnish","last-modified":"Mon, 21 Sep 2020 17:43:29 GMT","server":"AmazonS3","age":"3450","etag":"\"b623e9ecb8d774231962775086fc197b\"","x-served-by":"cache-dca17733-DCA, cache-ams21077-AMS","strict-transport-security":"max-age=31536000","x-cache":"HIT, HIT","content-type":"application/font-woff","access-control-allow-origin":"*","accept-ranges":"bytes","content-length":"44400","x-cache-hits":"9, 2"},"mimeType":"application/font-woff","connectionReused":true,"connectionId":71,"remoteIPAddress":"185.199.111.154","remotePort":443,"fromDiskCache":false,"fromServiceWorker":false,"fromPrefetchCache":false,"encodedDataLength":153,"timing":{"requestTime":790318.822413,"proxyStart":-1,"proxyEnd":-1,"dnsStart":-1,"dnsEnd":-1,"connectStart":-1,"connectEnd":-1,"sslStart":-1,"sslEnd":-1,"workerStart":-1,"workerReady":-1,"workerFetchStart":-1,"workerRespondWithSettled":-1,"sendStart":0.468,"sendEnd":0.575,"pushStart":0,"pushEnd":0,"receiveHeadersEnd":59.157},"responseTime":1.617151635288706e+12,"protocol":"h2","securityState":"secure","securityDetails":{"protocol":"TLS 1.3","keyExchange":"","keyExchangeGroup":"X25519","cipher":"AES_128_GCM","certificateId":0,"subjectName":"*.githubassets.com","sanList":["*.githubassets.com","githubassets.com"],"issuer":"DigiCert SHA2 High Assurance Server CA","validFrom":1604275200,"validTo":1636502399,"signedCertificateTimestampList":[],"certificateTransparencyCompliance":"unknown"}},"frameId":"B8BD7028568C5135DC4142ECDB7D1510"},"sessionId":"F59A5983FAE804121A3D13A663DBE353"}
2021-03-31T00:47:15.480Z puppeteer:protocol:RECV ◀ {"method":"Network.dataReceived","params":{"requestId":"53.164","timestamp":790319.070969,"dataLength":44400,"encodedDataLength":0},"sessionId":"F59A5983FAE804121A3D13A663DBE353"}
2021-03-31T00:47:15.481Z puppeteer:protocol:RECV ◀ {"method":"Network.dataReceived","params":{"requestId":"53.164","timestamp":790319.073044,"dataLength":0,"encodedDataLength":44427},"sessionId":"F59A5983FAE804121A3D13A663DBE353"}
2021-03-31T00:47:15.481Z puppeteer:protocol:RECV ◀ {"method":"Network.loadingFinished","params":{"requestId":"53.164","timestamp":790318.883865,"encodedDataLength":44580,"shouldReportCorbBlocking":false},"sessionId":"F59A5983FAE804121A3D13A663DBE353"}

@starrify
Copy link
Contributor Author

As illustrated above in this issue report, the newly observed Chrome behavior of possibly multiple request pauses (the Fetch.requestPaused event) of one request would cause troubles to Puppeteer.

The problem, as I assume, comes from Puppeteer's design / implement of its Page.request event:

Due to the complexity of the design, the network manager code then assumes a strict one-to-one relation of the two upstream events (Network.requestWillBeSent and Fetch.requestPaused). For example, a _requestIdToInterceptionId object is used to record the mapping from networkId to requestId of the Fetch.requestPaused event:

this._requestIdToInterceptionId.set(requestId, interceptionId);

Therefore such code may be easily broken once multiple Fetch.requestPaused events are issued for one request. In the above example, multiple requestId values would arrive for one networkId.

I would propose a backward-incompatible change to the API design as a solution, which is to separate Puppeteer's Page.request event into what may better reflect the upstream CDP:

  • One for forwarding Network.requestWillBeSent: users may have read access on requests to get information.
  • The other for forwarding Fetch.requestPaused: users may use this for request interception purposes, and possibly modify the request.

@Androbin
Copy link
Contributor

Androbin commented Apr 6, 2021

Very nice analysis

@Androbin
Copy link
Contributor

Androbin commented Apr 6, 2021

@starrify
Copy link
Contributor Author

starrify commented Apr 6, 2021

I have got some further studies on this very issue and here's a ticket created in the Chromium issue tracker: https://bugs.chromium.org/p/chromium/issues/detail?id=1196004 , in which I further discussed the specific conditions that may trigger this double pausing issue, and attached a few minimal test cases for triggering that.

@Androbin I appreciate your efforts and have responded in your pull request page some of my thoughts.

@Androbin
Copy link
Contributor

Androbin commented Apr 6, 2021

It turns out that 2.4 is not strictly necessary in order to reproduce the issue. While it's true that this will help make the font delay the load event, one can simply wait directly for the font to load, like I did in the unit test in the PR I submitted.

@Androbin
Copy link
Contributor

During testing I observed the following:
Stylesheet, cache disabled: Network.requestWillBeSent, Fetch.requestPaused
Stylesheet, cache enabled: Fetch.requestPaused, Network.requestWillBeSent
Font, cache disabled: Network.requestWillBeSent, Fetch.requestPaused
Font, cache enabled: Fetch.requestPaused, Network.requestWillBeSent, Fetch.requestPaused
Seems like the caching does play an essential role.

@Androbin
Copy link
Contributor

Someone offered a workaround upstream: https://bugs.chromium.org/p/chromium/issues/detail?id=1196004#c10

... for the time being, please try --disable-feature=WebFontsCacheAwareTimeoutAdaption as a workaround.

This seems to confirm that this issue is specific to fonts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants