Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hide that webdriver is being used #557

Closed
xthz opened this issue Jan 16, 2020 · 17 comments
Closed

hide that webdriver is being used #557

xthz opened this issue Jan 16, 2020 · 17 comments

Comments

@xthz
Copy link

xthz commented Jan 16, 2020

What versions are you running?

Lastest

chrome 79 and chromium 80/81

using enable-automation invalid,
Webdriver detected using javascript

window.navigator.webdriver
@mvdan
Copy link
Contributor

mvdan commented Jan 16, 2020

Sorry, I don't understand. Fill the template properly and try to explain with an example.

@xthz
Copy link
Author

xthz commented Jan 16, 2020

When I run chromedp, using js can still detect that webdriver is true.
image

But normal browsers are undefined.
image

version:
chrome 69
chromium: 80...
chromium: 81.0.4030.0

@mvdan
Copy link
Contributor

mvdan commented Jan 16, 2020

chromedp does not try to hide the fact that you are a bot, or that the Chrome browser is being automated. Doing that properly is quite a bit of work. If someone wants to send patches for actions that one could run to try to hide those details a bit better, I'd be happy to review.

@xthz
Copy link
Author

xthz commented Jan 16, 2020

chromedp does not try to hide the fact that you are a bot, or that the Chrome browser is being automated. Doing that properly is quite a bit of work. If someone wants to send patches for actions that one could run to try to hide those details a bit better, I'd be happy to review.

If chromedp cannot pretend to be a human rather than a robot, it is no different from selenium

@mvdan
Copy link
Contributor

mvdan commented Jan 16, 2020

Selenium is very different technology. Like I said, trying to hide the fact that you're a bot is possible, but it's not a planned feature right now. If you want to send patches, I'll be happy to review.

@xthz
Copy link
Author

xthz commented Jan 16, 2020

Selenium is very different technology. Like I said, trying to hide the fact that you're a bot is possible, but it's not a planned feature right now. If you want to send patches, I'll be happy to review.

Ok, thank you for your reply, 新年快乐

@mvdan mvdan changed the title about chrome79 and chromium 80/81 webdriver hide that webdriver is being used Jan 16, 2020
@pmurley
Copy link
Contributor

pmurley commented Jan 17, 2020

If all you're looking for is to hide that specific JS flag, you can just inject JavaScript to do that before you navigate to the page:

err = chromedp.Run(cxt, chromedp.ActionFunc(func(cxt context.Context) error {
   _, err := page.AddScriptToEvaluateOnNewDocument("Object.defineProperty(navigator, 'webdriver', { get: () => false, });").Do(cxt)
   if err != nil {
      return err
   }
   return nil
}))
if err != nil {
   fmt.Println(err)
}

There are lots of other ways to detect automation, though.

@xthz
Copy link
Author

xthz commented Jan 19, 2020

If all you're looking for is to hide that specific JS flag, you can just inject JavaScript to do that before you navigate to the page:

err = chromedp.Run(cxt, chromedp.ActionFunc(func(cxt context.Context) error {
   _, err := page.AddScriptToEvaluateOnNewDocument("Object.defineProperty(navigator, 'webdriver', { get: () => false, });").Do(cxt)
   if err != nil {
      return err
   }
   return nil
}))
if err != nil {
   fmt.Println(err)
}

There are lots of other ways to detect automation, though.

This method cannot maintain this parameter when the new tab page is load

@pmurley
Copy link
Contributor

pmurley commented Jan 19, 2020

Can you help me understand your use case a little better? Why do you care if it doesn't work on the new tab page?

@xthz
Copy link
Author

xthz commented Jan 19, 2020

Can you help me understand your use case a little better? Why do you care if it doesn't work on the new tab page?

navigator.webdriver parameters are reset when chromedp open a new tab. Some websites need to jump to a new tab

@pmurley
Copy link
Contributor

pmurley commented Jan 19, 2020

Can you give a real world example of this? What website?

@lxwang42
Copy link

Can you help me understand your use case a little better? Why do you care if it doesn't work on the new tab page?

navigator.webdriver parameters are reset when chromedp open a new tab. Some websites need to jump to a new tab

Probably try CDP Target domain?
https://chromedevtools.github.io/devtools-protocol/tot/Target/#event-targetCreated

@0xhiroki
Copy link

I'm trying to use chromedp for twitter.com to scrape some data after logging in. I can log in using chromedp with no problems, but once I do it, the html rendered on the home page (twitter.com) says:

We’ve detected that JavaScript is disabled in this browser. Please enable JavaScript or switch to a supported browser to continue using twitter.com. You can see a list of supported browsers in our Help Center.

I tried injecting the script mentioned above, but no luck.

If anyone has been able to solve this, please let me know. Thanks in advance!

@ZekeLu
Copy link
Member

ZekeLu commented Apr 14, 2021

@0xhiroki First of all, maybe it's better to use Twitter API to access Twitter resource.

Anyway, here is the answer to your question. According to my test, it seems that Twitter just checks the user-agent header on the server. When it sees Headless in that header, it will response with status code 400 (bad request). So the key is to set a custom user-agent header. You can diagnose further issues with the following source code yourself:

package main

import (
	"context"
	"io/ioutil"
	"log"
	"math"
	"time"

	"github.com/chromedp/cdproto/cdp"
	"github.com/chromedp/cdproto/emulation"
	"github.com/chromedp/cdproto/input"
	"github.com/chromedp/cdproto/page"
	"github.com/chromedp/chromedp"
)

func main() {
	opts := append(chromedp.DefaultExecAllocatorOptions[:],
		// set headless to false, and it works without any special configuration.
		//chromedp.Flag("headless", false),
		// use something like Fiddler (https://www.telerik.com/fiddler/fiddler-everywhere) to capture network traffics.
		//chromedp.ProxyServer("http://localhost:8866"),
		// when a proxy is used to capture HTTPS traffic, maybe it's required to ignore certificate error.
		//chromedp.Flag("ignore-certificate-errors", true),
		// the default user-agent header is something like: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/89.0.4389.114 Safari/537.36
		chromedp.UserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36"),
		// to make sure the login form is visible.
		chromedp.WindowSize(1200, 800),
	)
	ctx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
	defer cancel()
	ctx, cancel = chromedp.NewContext(ctx,
		// enable debug log to see the CDP traffics.
		//chromedp.WithDebugf(log.Printf),
	)
	defer cancel()

	var nodes []*cdp.Node

	err := chromedp.Run(ctx,
		chromedp.Navigate("https://twitter.com/"),
		// capture the login page in headless mode
		chromedp.ActionFunc(func(ctx context.Context) error {
			return screenshot(ctx, "login-page.png")
		}),
		// get username, password and login button nodes on the page.
		// it could become invalid in the future.
		chromedp.Nodes(`input[name*="session"],div[data-testid="LoginForm_Login_Button"]`, &nodes, chromedp.ByQueryAll, chromedp.AtLeast(3)),
	)
	if err != nil {
		log.Fatal(err)
	}
	err = chromedp.Run(ctx,
		chromedp.MouseClickNode(nodes[0]),
		input.InsertText("your username/email"),
		chromedp.MouseClickNode(nodes[1]),
		input.InsertText("your password"),
		chromedp.MouseClickNode(nodes[2]),
		// I'm lazy and just wait. You had better check something else to make sure the page is loaded.
		chromedp.Sleep(3*time.Second),
		// capture the page after login in headless mode
		chromedp.ActionFunc(func(ctx context.Context) error {
			return screenshot(ctx, "after-login.png")
		}),
	)
	if err != nil {
		log.Fatal(err)
	}
}

func screenshot(ctx context.Context, output string) error {
	// get layout metrics
	_, _, cssContentSize, err := page.GetLayoutMetrics().Do(ctx)
	if err != nil {
		return err
	}

	width, height := int64(math.Ceil(cssContentSize.Width)), int64(math.Ceil(cssContentSize.Height))

	// force viewport emulation
	err = emulation.SetDeviceMetricsOverride(width, height, 1, false).
		WithScreenOrientation(&emulation.ScreenOrientation{
			Type:  emulation.OrientationTypePortraitPrimary,
			Angle: 0,
		}).
		Do(ctx)
	if err != nil {
		return err
	}

	// capture screenshot
	buf, err := page.CaptureScreenshot().
		WithClip(&page.Viewport{
			X:      cssContentSize.X,
			Y:      cssContentSize.Y,
			Width:  cssContentSize.Width,
			Height: cssContentSize.Height,
			Scale:  1,
		}).Do(ctx)
	if err != nil {
		return err
	}

	err = ioutil.WriteFile(output, buf, 0o644)

	return err
}

@ZekeLu
Copy link
Member

ZekeLu commented May 19, 2021

chromedp.DefaultExecAllocatorOptions contains the command-line option enable-automation, according to https://peter.sh/experiments/chromium-command-line-switches/

--enable-automation: Enable indication that browser is controlled by automation.

Just remove this option and window.navigator.webdriver will be false. All the chromedp tests pass without this option.

	// ...
	opts := append(chromedp.DefaultExecAllocatorOptions[:],
		chromedp.Flag("enable-automation", false),
	)
	ctx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
	// ...

But as mvdan pointed out, trying to hide the fact that you're a bot is not a planned feature right now. Closing.

Update: chromedp.Flag("enable-automation", false) alone does no work since Chromium 79.0.3922.0. #557 (comment) is better. See #881 (comment) for more information.

@LovelyColor
Copy link

If all you're looking for is to hide that specific JS flag, you can just inject JavaScript to do that before you navigate to the page:

err = chromedp.Run(cxt, chromedp.ActionFunc(func(cxt context.Context) error {
   _, err := page.AddScriptToEvaluateOnNewDocument("Object.defineProperty(navigator, 'webdriver', { get: () => false, });").Do(cxt)
   if err != nil {
      return err
   }
   return nil
}))
if err != nil {
   fmt.Println(err)
}

There are lots of other ways to detect automation, though.

This method cannot maintain this parameter when the new tab page is load
You can try, it will help to hide the webdriver

chromedp.Flag("disable-blink-features","AutomationControlled"),

@ciaoSora
Copy link

Thank you LovelyColor and ZekeLu, I am able to bypass the detection after adding both of the following flags.

chromedp.Flag("enable-automation", false),
chromedp.Flag("disable-blink-features","AutomationControlled"),

If I omit one of them, the detection cannot be bypassed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants