Skip to content
This repository has been archived by the owner on Mar 7, 2021. It is now read-only.

Skip faulty URLs such as those missing a hostname or having an invali… #387

Closed
wants to merge 1 commit into from
Closed

Conversation

konstantinblaesi
Copy link
Contributor

See #385

This needs fixing either in URI.js or in simplecrawler. I agree that it makes more sense to do it in URI.js, but I think it will take a lot more effort to refactor their validation code.

This prevents the crawler crashing on pages containing links such as

<a href="http://">http://</a><br>
<a href="http://hostname:port/foo/bar">http://hostname:port/foo/bar</a>

@konstantinblaesi
Copy link
Contributor Author

konstantinblaesi commented Jul 28, 2017

Looks like 30e38b9 somehow broke the integration test for node 8.x ?

…d port number.

 Changes to be committed:
	modified:   lib/crawler.js
@konstantinblaesi
Copy link
Contributor Author

Closing as this is currently discussed and pursued in medialize/URI.js#344 and medialize/URI.js#345.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant