Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bypass headless chrome detection #396

Closed
user-ge3e567tlw opened this issue Jun 18, 2019 · 24 comments
Closed

bypass headless chrome detection #396

user-ge3e567tlw opened this issue Jun 18, 2019 · 24 comments

Comments

@user-ge3e567tlw
Copy link

What versions are you running?

$ go list -m github.com/chromedp/chromedp
go list -m: not using modules

$ google-chrome --version
/Applications/Chromium.app/Contents/MacOS/Chromium --version
Chromium 77.0.3830.0

$ go version
go version go1.12.6 darwin/amd64

What did you do? Include clear steps.

package main

import (
	"context"
	"fmt"
	"time"

	"github.com/pkg/errors"

	"github.com/chromedp/cdproto/runtime"
	"github.com/chromedp/chromedp"
)

func main() {
	opts := []chromedp.ExecAllocatorOption{
		chromedp.ExecPath(`/Applications/Chromium.app/Contents/MacOS/Chromium`),
		chromedp.UserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3830.0 Safari/537.36"),
		chromedp.WindowSize(1920, 1080),
		chromedp.NoFirstRun,
		chromedp.NoDefaultBrowserCheck,
		chromedp.Headless,
		chromedp.DisableGPU,
	}

	ctx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
	defer cancel()

	ctx, cancel = chromedp.NewContext(ctx)
	defer cancel()

	err := chromedp.Run(ctx,
		evalJS(`Object.defineProperty(navigator, 'plugins', {get: () => [1, 2, 3, 4, 5, 7, 8, 9]});`),
		evalJS(`navigator.plugins.length`),
		chromedp.Navigate("https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html"),
		chromedp.ActionFunc(func(ctx context.Context) error {
			time.Sleep(3 * time.Second)
			return nil
		}),
		evalJS(`navigator.plugins.length`),
	)
	if err != nil {
		panic(err)
	}
}

func evalJS(js string) chromedp.Tasks {
	var res *runtime.RemoteObject
	return chromedp.Tasks{
		chromedp.EvaluateAsDevTools(js, &res),
		chromedp.ActionFunc(func(ctx context.Context) error {
			b, err := res.MarshalJSON()
			if err != nil {
				return errors.Wrap(err, "marshal")
			}
			fmt.Println("result: ", string(b))
			return nil
		}),
	}
}

What did you expect to see?

result:  {"type":"object","className":"Navigator","description":"Navigator","objectId":"{\"injectedScriptId\":1,\"id\":1}"}
result:  {"type":"number","value":8,"description":"8"}
result:  {"type":"number","value":8,"description":"8"}

I am trying to bypass headless browser detection as written in this article https://intoli.com/blog/not-possible-to-block-chrome-headless/. This works fine for puppeteer, but doesn't work for chromedp. I need to understand how to execute js code in chromedp, as in puppeteer. Because every time after run chromedp.Navigate() values are reset to default.

It is possible to do with chromedp?

What did you see instead?

result:  {"type":"object","className":"Navigator","description":"Navigator","objectId":"{\"injectedScriptId\":1,\"id\":1}"}
result:  {"type":"number","value":8,"description":"8"}
result:  {"type":"number","value":0,"description":"0"} // after chromedp.Navigate() navigator.plugins.length has default value equal 0

Works fine with puppeteer.

const puppeteer = require('puppeteer');

const userAgent = 'Mozilla/5.0 (X11; Linux x86_64)' +
  'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.39 Safari/537.36';

(async () => {
  const browser = await puppeteer.launch({headless: true});
  const page = await browser.newPage();
  
  await page.setUserAgent(userAgent);

  await page.evaluateOnNewDocument(() => {
    // Overwrite the `plugins` property to use a custom getter.
    Object.defineProperty(navigator, 'plugins', {
      // This just needs to have `length > 0` for the current test,
      // but we could mock the plugins too if necessary.
      get: () => [1, 2, 3, 4, 5, 6, 7, 8],
    });
  });

  await page.goto('https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html');
  await page.screenshot({path: 'result.png'});

  await browser.close();
})();
@mvdan
Copy link
Contributor

mvdan commented Jun 18, 2019

It seems to me like this is more about how you can get the equivalent of evaluateOnNewDocument with chromedp. I don't think we support that right now, and it sounds useful; I wonder how it actually works under the hood on Puppeteer.

If you have any insight into that, or know how this could be done with the devtools protocol, any help is appreciated.

@user-ge3e567tlw
Copy link
Author

user-ge3e567tlw commented Jun 18, 2019

Yes, you are absolutely right, evaluateOnNewDocument - this is what I need in chromedp.
I looked at the source code puppeteer, under the hood called this method in the devtools protocol page.addScriptToEvaluateOnNewDocument (https://chromedevtools.github.io/devtools-protocol/tot/Page#method-addScriptToEvaluateOnNewDocument).

puppeteer/lib/Page.js (https://github.com/GoogleChrome/puppeteer/blob/master/lib/Page.js#L789)

/**
   * @param {Function|string} pageFunction
   * @param {!Array<*>} args
   */
  async evaluateOnNewDocument(pageFunction, ...args) {
    const source = helper.evaluationString(pageFunction, ...args);
    await this._client.send('Page.addScriptToEvaluateOnNewDocument', { source });
  }

@user-ge3e567tlw
Copy link
Author

cdproto already has support page.AddScriptToEvaluateOnNewDocument (https://github.com/chromedp/cdproto/blob/master/page/page.go#L37)

chromedp.ActionFunc(func(ctx context.Context) error {
			identifier, err := page.AddScriptToEvaluateOnNewDocument(`Object.defineProperty(navigator, 'plugins', {get: () => [1, 2, 3, 4, 5, 7, 8, 9]});`).Do(ctx)
			if err != nil {
				return err
			}
			fmt.Println("identifier: ", identifier.String())
			return nil
		}),

page.AddScriptToEvaluateOnNewDocument works fine!

@kenshaw
Copy link
Member

kenshaw commented Jun 18, 2019

I was just about to point out the AddScript... in the page package/domain. Blocking proper detection of headless is doable. You just need to do the same stuff Puppeteer suggests doing.

@kenshaw
Copy link
Member

kenshaw commented Jun 18, 2019

BTW -- I will post code later to pass this test. It's just a matter of setting up the JavaScript. Please note, that the chromedp project and I personally have no interest in helping people "evade" bot detection. However, I will always demonstrate how things are done.

@kenshaw
Copy link
Member

kenshaw commented Jun 18, 2019

package main

import (
	"context"
	"io/ioutil"
	"log"

	"github.com/chromedp/cdproto/page"
	"github.com/chromedp/chromedp"
)

func main() {
	ctx, cancel := chromedp.NewContext(context.Background(), chromedp.WithDebugf(log.Printf))
	defer cancel()

	// see: https://intoli.com/blog/not-possible-to-block-chrome-headless/
	const script = `(function(w, n, wn) {
  // Pass the Webdriver Test.
  Object.defineProperty(n, 'webdriver', {
    get: () => false,
  });

  // Pass the Plugins Length Test.
  // Overwrite the plugins property to use a custom getter.
  Object.defineProperty(n, 'plugins', {
    // This just needs to have length > 0 for the current test,
    // but we could mock the plugins too if necessary.
    get: () => [1, 2, 3, 4, 5],
  });

  // Pass the Languages Test.
  // Overwrite the plugins property to use a custom getter.
  Object.defineProperty(n, 'languages', {
    get: () => ['en-US', 'en'],
  });

  // Pass the Chrome Test.
  // We can mock this in as much depth as we need for the test.
  w.chrome = {
    runtime: {},
  };

  // Pass the Permissions Test.
  const originalQuery = wn.permissions.query;
  return wn.permissions.query = (parameters) => (
    parameters.name === 'notifications' ?
      Promise.resolve({ state: Notification.permission }) :
      originalQuery(parameters)
  );

})(window, navigator, window.navigator);`

	var buf []byte
	var scriptID page.ScriptIdentifier
	if err := chromedp.Run(
		ctx,
		//chromedp.Emulate(device.IPhone7),
		chromedp.ActionFunc(func(ctx context.Context) error {
			var err error
			scriptID, err = page.AddScriptToEvaluateOnNewDocument(script).Do(ctx)
			if err != nil {
				return err
			}
			return nil
		}),
		chromedp.Navigate("https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html"),
		chromedp.CaptureScreenshot(&buf),
	); err != nil {
		log.Fatal(err)
	}

	if err := ioutil.WriteFile("screenshot.png", buf, 0644); err != nil {
		log.Fatal(err)
	}
}

It also works with emulating other devices, such as an IPhone7.

See the attached screenshot:

screenshot

Note: I'm using the headless-shell that's built without the HeadlessShell user-agent string.

@kenshaw kenshaw closed this as completed Jun 18, 2019
@user-ge3e567tlw
Copy link
Author

user-ge3e567tlw commented Jul 1, 2020 via email

@Leezj9671
Copy link

Leezj9671 commented Jul 1, 2020 via email

@dwisiswant0
Copy link

Is that possible to send chromedp.Evaluate result from page.AddScriptToEvaluateOnNewDocument: the workaround is same - to bypass headless, @kenshaw?

@rusq
Copy link

rusq commented May 2, 2021

Trying to get through the Cloudflare for downloading this file:
https://www.rbnz.govt.nz/-/media/ReserveBank/Files/Statistics/tables/b1/hb1-daily.xlsx?revision=5fa61401-a877-4607-b7ae-2e060c09935d

My test results are:

  1. Headless Chrome, No script: FAIL
  2. Headless Chrome, with script: FAIL
  3. Chrome, No script: FAIL
  4. Chrome, with script: Download successful

Tried with stealth.JS mentioned in #669, but doesn't work on Headless and panics when Chrome is visible.

Has anyone been able to work around the Cloudflare protection?

P.S. I'm using slightly modified example:
main.txt

@ZekeLu
Copy link
Member

ZekeLu commented May 3, 2021

@rusq Your issue is not the same as the one discussed here. In your case, there is not js code running in the browser to detect the headless mode. The detection happens on the server side. What you should do is to provide a customized user-agent string (if you run Chrome in headless mode, the user-agent string contains something like HeadlessChrome). And then configure the download behavior. Check #807 for how to configure the download behavior as of now.

update: Sorry, I just realized #807 is created by you.

@rusq
Copy link

rusq commented May 3, 2021

Hey @ZekeLu your help is invaluable, thank you again!

2021/05/03 22:09:50 Download Complete: /var/folders/pw/0rmmgypj1sjfhtyhhjs20s6c0000gn/T//fb79ab3d-66ea-431c-952a-4ddfc3c28882

I modified the options to include the user agent and it worked like a charm!

	opts := [...]func(*chromedp.ExecAllocator){
		chromedp.NoFirstRun,
		chromedp.NoDefaultBrowserCheck,
		chromedp.Headless,
		chromedp.UserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"),
	}

@rusq
Copy link

rusq commented May 3, 2021

Hey @ZekeLu I have combined the solutions above that got the download going into a small library draft: https://github.com/rusq/chromedl

Going to test it on production soon (best way to test things, current http.Get() doesn't work anyway)

@ZekeLu
Copy link
Member

ZekeLu commented May 3, 2021

Better to create allocator options like this:

opts := append(chromedp.DefaultExecAllocatorOptions[:],
	chromedp.UserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"),
)

@larytet
Copy link

larytet commented May 22, 2021

I can not get content for this (warning!) malicious link https://kentabucket56.s3.eu-de.cloud-object-storage.appdomain.cloud/underjudging/index.html
The page is not loaded correctly

cbimage

I tried to modify the user-agent

Content:Could not find node with given id (-32000).

I have the same problem with https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.js
The headless Chrome tests are all green.

My code

ActionFunc(func(ctx context.Context) error {
	node, err := dom.GetDocument().Do(ctx)
	if err != nil {
		return err
	}
	*content, err = dom.GetOuterHTML().WithNodeID(node.NodeID).Do(ctx)
	return err
}),

@ZekeLu
Copy link
Member

ZekeLu commented May 22, 2021

@larytet Do not call dom.GetDocument in your code. See #762.

@larytet
Copy link

larytet commented May 24, 2021

Here is another malicious URL
http://www.5091711893.s-fx.com/redirect/YWNjb3VudHNyZWNlaXZhYmxlQGF1LnNpa2EuY29t
protected by Cloudflare "checking your browser before accessing preview-domain.com". Is it a matter of when the Navigate() completes?
preview-domain.com is one of servers serving JS.

P.S. The page returns 404 now

@psidex
Copy link

psidex commented Jun 20, 2021

Sorry to revive this closed issue, but I'm trying to screenshot this website:

https://www.royalnavy.mod.uk/qhm/portsmouth/shipping-movements

and even following the techniques discussed in this thread it just comes out as a blank white screenshot - other cloudflare protected sites work, just not this one for some reason.

I am changing the useragent to one I know works (tested with puppeteer) and running the script that passes the intoli.com test.

Any ideas anyone?

@ZekeLu
Copy link
Member

ZekeLu commented Jun 21, 2021

@psidex See #396 (comment)

@psidex
Copy link

psidex commented Jun 21, 2021

@psidex See #396 (comment)

Yep as I said I tried the fixes suggested such as changing the useragent and inserting the script, I still just get a blank white screen

@ZekeLu
Copy link
Member

ZekeLu commented Jun 21, 2021

It works on my computer. And maybe it's a screenshot issue, see #863.

If it still does not work on your computer, see the comment in the source code for how to debug the issue.

package main

import (
	"context"
	"io/ioutil"
	"log"

	"github.com/chromedp/chromedp"
)

func main() {
	opts := append(chromedp.DefaultExecAllocatorOptions[:],
		// set up a proxy (such as Fiddler) and uncomment the next two lines to see the network requests if it still does not work.
		//chromedp.ProxyServer("localhost:8866"),
		//chromedp.Flag("ignore-certificate-errors", true),
		chromedp.UserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"),
	)
	ctx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
	defer cancel()
	ctx, cancel = chromedp.NewContext(
		ctx,
		// uncomment the next line to see the CDP messages
		//chromedp.WithDebugf(log.Printf),
	)
	defer cancel()

	var buf []byte
	err := chromedp.Run(ctx,
		chromedp.Navigate(`https://www.royalnavy.mod.uk/qhm/portsmouth/shipping-movements`),
		chromedp.CaptureScreenshot(&buf),
	)
	if err != nil {
		log.Fatal(err)
	}

	if err := ioutil.WriteFile("screenshot.png", buf, 0o644); err != nil {
		log.Fatal(err)
	}
}

@psidex
Copy link

psidex commented Jun 21, 2021

OK interesting, thanks

@pricemok
Copy link

package main

import (
	"context"
	"io/ioutil"
	"log"

	"github.com/chromedp/cdproto/page"
	"github.com/chromedp/chromedp"
)

func main() {
	ctx, cancel := chromedp.NewContext(context.Background(), chromedp.WithDebugf(log.Printf))
	defer cancel()

	// see: https://intoli.com/blog/not-possible-to-block-chrome-headless/
	const script = `(function(w, n, wn) {
  // Pass the Webdriver Test.
  Object.defineProperty(n, 'webdriver', {
    get: () => false,
  });

  // Pass the Plugins Length Test.
  // Overwrite the plugins property to use a custom getter.
  Object.defineProperty(n, 'plugins', {
    // This just needs to have length > 0 for the current test,
    // but we could mock the plugins too if necessary.
    get: () => [1, 2, 3, 4, 5],
  });

  // Pass the Languages Test.
  // Overwrite the plugins property to use a custom getter.
  Object.defineProperty(n, 'languages', {
    get: () => ['en-US', 'en'],
  });

  // Pass the Chrome Test.
  // We can mock this in as much depth as we need for the test.
  w.chrome = {
    runtime: {},
  };

  // Pass the Permissions Test.
  const originalQuery = wn.permissions.query;
  return wn.permissions.query = (parameters) => (
    parameters.name === 'notifications' ?
      Promise.resolve({ state: Notification.permission }) :
      originalQuery(parameters)
  );

})(window, navigator, window.navigator);`

	var buf []byte
	var scriptID page.ScriptIdentifier
	if err := chromedp.Run(
		ctx,
		//chromedp.Emulate(device.IPhone7),
		chromedp.ActionFunc(func(ctx context.Context) error {
			var err error
			scriptID, err = page.AddScriptToEvaluateOnNewDocument(script).Do(ctx)
			if err != nil {
				return err
			}
			return nil
		}),
		chromedp.Navigate("https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html"),
		chromedp.CaptureScreenshot(&buf),
	); err != nil {
		log.Fatal(err)
	}

	if err := ioutil.WriteFile("screenshot.png", buf, 0644); err != nil {
		log.Fatal(err)
	}
}

It also works with emulating other devices, such as an IPhone7.

See the attached screenshot:

screenshot

Note: I'm using the headless-shell that's built without the HeadlessShell user-agent string.

Thank you! It works for me.

@Davincible
Copy link

Regarding this issue; I was also running into some issues and created this lib to fix it; undetected-chromedp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests