Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid modifying WSS URLs since parameters are dropped through the detectURL method #972

Closed
AlexLoyola opened this issue Dec 6, 2021 · 5 comments · Fixed by #1184
Closed

Comments

@AlexLoyola
Copy link

Bug description

Ever since the chromedp v0.7.3, the detectURL method was implemented in util.go to make it easier to send the base URL of a WS to NewRemoteAllocator. The rest of the URL is built by fetching the webSocketDebuggerUrl from json/version and constructing the URL again; however if the websocket is passed on with parameters, such as the case of an authentication token to a WSS for a hosted browser service, then it is impacting connection, since the parameter is dropped and the connection isn't successful.

Steps to reproduce the problem:

  1. Use chromedp v0.7.3 or above
  2. Pass a WSS to the NewRemoteAllocator

This throws the following error
2021/12/03 14:42:06 Failed getting title of example.com: could not dial "ws://chrome.browserless.io": unexpected HTTP response status: 301

$ go list -m github.com/chromedp/chromedp
github.com/chromedp/chromedp v0.7.3
$ google-chrome --version
Google Chrome 96.0.4664.55
$ go version
go version go1.17.3 darwin/amd64

What was run to get this error message?

Run the Go script below

package main

import (
  "context"
  "flag"
  "log"

  "github.com/chromedp/chromedp"
)

func main() {
  var devToolWsUrl string
  var title string

  flag.StringVar(&devToolWsUrl, "devtools-ws-url", "wss://chrome.browserless.io?token=my_token_here", "DevTools Websocket URL")
  flag.Parse()

  actxt, cancelActxt := chromedp.NewRemoteAllocator(context.Background(), devToolWsUrl)
  defer cancelActxt()

  ctxt, cancelCtxt := chromedp.NewContext(actxt) // create new tab
  defer cancelCtxt()                             // close tab afterwards

  if err := chromedp.Run(ctxt,
    chromedp.Navigate("https://example.com"),
    chromedp.Title(&title),
  ); err != nil {
    log.Fatalf("Failed getting title of example.com: %v", err)
  }

  log.Println("Got title of:", title)
}

What did you expect to see?

I was expecting the WebSocket to connect properly and scrape the Title, which occurs when using v0.7.2.

$ go get -u github.com/chromedp/chromedp@v0.7.2
$ go run chrome.go                             
2021/12/03 14:42:35 Got title of: Example Domain

What did you see instead?

I saw the following error message, with an unexpected HTTP response due to the fact that the token was not provided when the URL was reformatted through the detectURL method in util.go

$ go get -u github.com/chromedp/chromedp@v0.7.3
$ go run chrome.go
2021/12/03 14:42:06 Failed getting title of example.com: could not dial "ws://chrome.browserless.io": unexpected HTTP response status: 301
exit status 1

Possible issue

This most likely has to do with PR #817
While trying to simplify the way the URL is passed on to NewRemoteAllocator, it is unfortunately removing the arguments passed after the hostname at the moment it reconstructs the URL with the /json/version payload. Important arguments are lost in the process such as the token that is used to authenticate to many hosted browser services.

Possible fixes and justification

If the detectURL method returns Secure WebSockets as is, then they won't be modified by this function.
Why would this be important for WSS? Because most hosted browser services are running on WSS and are mainly the ones that require to authenticate through a token, which sometimes are passed through an argument such as this case and would make it important for it not be dropped from the URL.

Possible fixes

  1. Ignore urls containing "/devtools/browser/" and also the ones containing "wss" since these endpoints could use params as authentication methods to hosted services. Trusting that WSS will most likely be hosted services and are probably going to point exactly to the correct path anyways, and would not require the assistance that the PR make NewRemoteAllocator accept url without devtools/browser/... #817 provides.
    Something like this could work for the util.go:
func detectURL(urlstr string) string {
	if strings.Contains(urlstr, "/devtools/browser/") || strings.HasPrefix(urlstr, "wss") {
		return urlstr
	}
	// replace the scheme and path to construct the URL like:
	// http://127.0.0.1:9222/json/version
	u, err := url.Parse(urlstr) 
  1. Modify the detectURL method to extract the arguments provided after the hostname, so that they can be appended to the webSocketDebuggerUrl that is retrieved from json/version.
@joelgriffith
Copy link

Hey folks -- wanted to bump this up as we're more than happy to implement the work, but just wanted to get quick consensus that the approach is right. Please do let us know your thoughts and PR should follow up!

@ZekeLu
Copy link
Member

ZekeLu commented Dec 16, 2021

Hi @joelgriffith, thank you first!
I think it's better to make it explicit that we want to use the orignal URL. So my suggestion is to make NewRemoteAllocator accept options (just like NewExecAllocator), and add a pre-defined option NoDetectURL (maybe there is a better name). chromedp should not try to modify the URL when this option is present (including not calling forceIP which could modify the URL too).
@joelgriffith and @AlexLoyola what do you think?

@AlexLoyola
Copy link
Author

Hi @ZekeLu, thank you for your feedback!
@joelgriffith and I think it's a great approach - we'll start working on a PR based on that.
We might take a few weeks though due to the holiday season, so expect some delay on this task.

Thanks again for weighing in!

@chuprik
Copy link

chuprik commented Apr 8, 2022

any solution here? can't use browserless with token :)

@AlexLoyola
Copy link
Author

any solution here? can't use browserless with token :)

Hey @kotchuprik , until we finish up this change, the workaround for now is to "pin" chromedp to an older version... Anything prior to "v0.7.3" should work. Please let me know if this workaround works for you in the meantime.

We've been working on a new product with browserless so we haven't got to working on this PR lately. We'll try to put in the time soon. Thanks for understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants