Skip to content
This repository has been archived by the owner on Mar 7, 2021. It is now read-only.

Commit

Permalink
Added try/catch block around URL parsing in Crawler#getRobotsTxt redi…
Browse files Browse the repository at this point in the history
…rection logic

More in #363
  • Loading branch information
fredrikekelund committed Jul 25, 2017
1 parent d7af4f1 commit 9f4e4ec
Showing 1 changed file with 12 additions and 3 deletions.
15 changes: 12 additions & 3 deletions lib/crawler.js
Expand Up @@ -650,9 +650,18 @@ Crawler.prototype.getRobotsTxt = function(url, callback) {

response.destroy();

var redirectTarget = uri(response.headers.location)
.absoluteTo(robotsTxtUrl)
.normalize();
var redirectTarget;

try {
redirectTarget = uri(response.headers.location)
.absoluteTo(robotsTxtUrl)
.normalize();
} catch (error) {
var robotsTxtHost = uri(robotsTxtUrl).pathname("").href();
errorMsg = util.format("Faulty redirect URL when fetching robots.txt for %s", robotsTxtHost);

return callback(new Error(errorMsg));
}

if (crawler.domainValid(redirectTarget.hostname())) {
crawler.getRobotsTxt(redirectTarget.href(), callback);
Expand Down

0 comments on commit 9f4e4ec

Please sign in to comment.