Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling absoluteTo on URIs that are already absolute causes error #355

Closed
mikayla-maki opened this issue Aug 17, 2017 · 4 comments
Closed

Comments

@mikayla-maki
Copy link

mikayla-maki commented Aug 17, 2017

To reproduce, install the library with npm and then create a file with the following lines:

var URI = require("urijs");
var google = new URI('http://www.google.com');
google.absoluteTo("http://");

Running node test.js gives me this error: TypeError: Hostname cannot be empty, if protocol is http

@rodneyrehm
Copy link
Member

hey there!

unfortunately the example makes little sense - what are you trying to achieve?

@mikayla-maki
Copy link
Author

mikayla-maki commented Aug 17, 2017

Since posting this I've actually discovered that this example from the docs:

var u = new URI('//example.com/path');
u.absoluteTo('https://');

No longer works and that it's due to this shortcut in the body of the absoluteTo() method:

    if (!(base instanceof URI)) {
      base = new URI(base);
    }

Breaking when there's nothing but a protocol.

Anyway, I discovered this while trying to use this library called CrawlKit on a project, here's the relevant function:

    const addUrl = (u) => {
      let url = urijs(u);
      url = url.absoluteTo(defaultAbsoluteTo); //This variable's value is 'http://'
      url.normalize();
      url = url.toString();

      if (!seen.has(url)) {
        logger.info(`Adding ${url}`);
        seen.add(url);
        q.push(new Scope(url));
        logger.info(`${q.length()} task(s) in the queue.`);
      } else {
        logger.debug(`Skipping ${url} - already seen.`);
      }
    };

I believe they're trying to clean up the random URLs or domain names or what not that's passed to their library, so just returning the same URL would be good for my use case.

@rodneyrehm
Copy link
Member

As the hostname validation caused some kerfuffle I've decided to make this feature opt-in, thereby returning to the previous default behavior of v1.18.10. As of v1.19.0 you can activate hostname validation globally by setting URI.preventInvalidHostname = true, or on an instance by calling .preventInvalidHostname(true).

I'm sorry for the confusion, inconvenience caused by the original change and the delayed rectification.

@mikayla-maki
Copy link
Author

Thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants