New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hide that webdriver is being used #557
Comments
Sorry, I don't understand. Fill the template properly and try to explain with an example. |
chromedp does not try to hide the fact that you are a bot, or that the Chrome browser is being automated. Doing that properly is quite a bit of work. If someone wants to send patches for actions that one could run to try to hide those details a bit better, I'd be happy to review. |
If chromedp cannot pretend to be a human rather than a robot, it is no different from selenium |
Selenium is very different technology. Like I said, trying to hide the fact that you're a bot is possible, but it's not a planned feature right now. If you want to send patches, I'll be happy to review. |
Ok, thank you for your reply, 新年快乐 |
If all you're looking for is to hide that specific JS flag, you can just inject JavaScript to do that before you navigate to the page:
|
This method cannot maintain this parameter when the new tab page is load |
Can you help me understand your use case a little better? Why do you care if it doesn't work on the new tab page? |
|
Can you give a real world example of this? What website? |
Probably try CDP Target domain? |
I'm trying to use chromedp for twitter.com to scrape some data after logging in. I can log in using chromedp with no problems, but once I do it, the html rendered on the home page (twitter.com) says:
I tried injecting the script mentioned above, but no luck. If anyone has been able to solve this, please let me know. Thanks in advance! |
@0xhiroki First of all, maybe it's better to use Twitter API to access Twitter resource. Anyway, here is the answer to your question. According to my test, it seems that Twitter just checks the package main
import (
"context"
"io/ioutil"
"log"
"math"
"time"
"github.com/chromedp/cdproto/cdp"
"github.com/chromedp/cdproto/emulation"
"github.com/chromedp/cdproto/input"
"github.com/chromedp/cdproto/page"
"github.com/chromedp/chromedp"
)
func main() {
opts := append(chromedp.DefaultExecAllocatorOptions[:],
// set headless to false, and it works without any special configuration.
//chromedp.Flag("headless", false),
// use something like Fiddler (https://www.telerik.com/fiddler/fiddler-everywhere) to capture network traffics.
//chromedp.ProxyServer("http://localhost:8866"),
// when a proxy is used to capture HTTPS traffic, maybe it's required to ignore certificate error.
//chromedp.Flag("ignore-certificate-errors", true),
// the default user-agent header is something like: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/89.0.4389.114 Safari/537.36
chromedp.UserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36"),
// to make sure the login form is visible.
chromedp.WindowSize(1200, 800),
)
ctx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
defer cancel()
ctx, cancel = chromedp.NewContext(ctx,
// enable debug log to see the CDP traffics.
//chromedp.WithDebugf(log.Printf),
)
defer cancel()
var nodes []*cdp.Node
err := chromedp.Run(ctx,
chromedp.Navigate("https://twitter.com/"),
// capture the login page in headless mode
chromedp.ActionFunc(func(ctx context.Context) error {
return screenshot(ctx, "login-page.png")
}),
// get username, password and login button nodes on the page.
// it could become invalid in the future.
chromedp.Nodes(`input[name*="session"],div[data-testid="LoginForm_Login_Button"]`, &nodes, chromedp.ByQueryAll, chromedp.AtLeast(3)),
)
if err != nil {
log.Fatal(err)
}
err = chromedp.Run(ctx,
chromedp.MouseClickNode(nodes[0]),
input.InsertText("your username/email"),
chromedp.MouseClickNode(nodes[1]),
input.InsertText("your password"),
chromedp.MouseClickNode(nodes[2]),
// I'm lazy and just wait. You had better check something else to make sure the page is loaded.
chromedp.Sleep(3*time.Second),
// capture the page after login in headless mode
chromedp.ActionFunc(func(ctx context.Context) error {
return screenshot(ctx, "after-login.png")
}),
)
if err != nil {
log.Fatal(err)
}
}
func screenshot(ctx context.Context, output string) error {
// get layout metrics
_, _, cssContentSize, err := page.GetLayoutMetrics().Do(ctx)
if err != nil {
return err
}
width, height := int64(math.Ceil(cssContentSize.Width)), int64(math.Ceil(cssContentSize.Height))
// force viewport emulation
err = emulation.SetDeviceMetricsOverride(width, height, 1, false).
WithScreenOrientation(&emulation.ScreenOrientation{
Type: emulation.OrientationTypePortraitPrimary,
Angle: 0,
}).
Do(ctx)
if err != nil {
return err
}
// capture screenshot
buf, err := page.CaptureScreenshot().
WithClip(&page.Viewport{
X: cssContentSize.X,
Y: cssContentSize.Y,
Width: cssContentSize.Width,
Height: cssContentSize.Height,
Scale: 1,
}).Do(ctx)
if err != nil {
return err
}
err = ioutil.WriteFile(output, buf, 0o644)
return err
} |
Just remove this option and // ...
opts := append(chromedp.DefaultExecAllocatorOptions[:],
chromedp.Flag("enable-automation", false),
)
ctx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
// ... But as mvdan pointed out, trying to hide the fact that you're a bot is not a planned feature right now. Closing. Update: |
|
Thank you LovelyColor and ZekeLu, I am able to bypass the detection after adding both of the following flags.
If I omit one of them, the detection cannot be bypassed. |
What versions are you running?
Lastest
chrome 79 and chromium 80/81
using
enable-automation
invalid,Webdriver detected using javascript
The text was updated successfully, but these errors were encountered: