Skip to content
This repository has been archived by the owner on Mar 7, 2021. It is now read-only.

Releases: simplecrawler/simplecrawler

simplecrawler 1.1.9

12 Apr 07:59
Compare
Choose a tag to compare

Important notice

Drop Node.js 6 support.

Bug fixes

  • #485 - Fix format in FS cache backend error constructor

simplecrawler 1.1.8

13 Jun 13:50
Compare
Choose a tag to compare

Bug fixes

  • #375 - Cache doesn't work as expected

simplecrawler 1.1.7

09 Apr 13:11
Compare
Choose a tag to compare

Important notice

Dropped support for Node 4. The lowest supported version is now 6.13.0. This is due to requirements of some dependencies.

New features

  • The README is now generated using jsdoc-to-markdown.

Bug fixes

  • #419 - srcset source termination
  • #432 - Add a replacement in cleanURL
  • #439 - Links getting skipped due to escape sequence in href
  • #441 - Pattern for CSS url() syntax matches wrongly also some JS url() function calls
  • #443 - Oldest unfetched item duplicates
  • #447 - Fix TypeError ERR_INVALID_CALLBACK on fs.writeFile for node.js v10

simplecrawler 1.1.6

06 Oct 12:14
Compare
Choose a tag to compare

New features

  • Sitemap directives in /robots.txt are now added to the queue if Crawler#respectRobotsTxt is truthy.

Bug fixes

  • #398 - fix issue where multiple cookies weren't properly serialized for outbound requests
  • #400 - fix issue where <meta name="robots"> tags weren't properly parsed

simplecrawler 1.1.5

15 Aug 20:21
Compare
Choose a tag to compare

Administrative

  • Welcome @konstantinblaesi! We invited a few people who have made recent and significant contributions to come on board as simplecrawler collaborators. @konstantinblaesi heeded our call and has already submitted several PR's, including an ambitious upstream patch!
  • simplecrawler now has its own GitHub organisation and lives at simplecrawler/simplecrawler. This enables more fine tuned access controls for current and future collaborators. If you are interested in joining us, see #388.

Important notice

  • We have dropped support for node 0.12. The lowest supported version is now 4.x. This enables us to use more modern language features and will hopefully enable more patches soon. See #382 for more details.

Bug fixes

  • #364 - improved performance of Crawler#cleanExpandResources. See #382 for relevant patch
  • #385 and #363 - @konstantinblaesi submitted an upstream patch to URI.js that employs the same validation logic when calling the URI constructor as when calling the hostname and port methods. See #393 and medialize/URI.js#345 for relevant patches

simplecrawler 1.1.4

16 Jul 14:10
Compare
Choose a tag to compare

Bug fixes

  • #377 - fixed multiple issues with Crawler#removeFetchCondition and Crawler#removeDownloadCondition. Previously, those methods promised to throw an error if they couldn't find the fetch/download condition that was targeted, but they did not. The previous system for condition ID's was not stable either, since they targeted indexes in an array would change length. Crawler#removeFetchCondition also had a bug where it looked for fetch condition references in the Crawler#_downloadConditions array rather than Crawler#_fetchConditions. All of these issues have been fixed now. Thanks to @venning for a great bug report and PR!

simplecrawler 1.1.3

21 Jun 11:47
Compare
Choose a tag to compare

New features

  • #376 - added Crawler#sortQueryParameters option. This option will sort the query parameters in a URL, making it simple to avoid fetching duplicate pages on sites that use a lot of query parameters. Thanks @HaroldPutman!

simplecrawler 1.1.2

21 Jun 11:44
Compare
Choose a tag to compare

Bug fixes

  • #371 - fixed an issue where custom cache back-ends would not be properly type checked

simplecrawler 1.1.1

19 Mar 20:22
Compare
Choose a tag to compare

Bug fixes

  • #360 - updated README to correctly reflect new async fetch/download conditions API
  • #353 - updated default srcset discovery function to be more permissive
  • #357 - ensure that the port parameter is properly removed from the Crawler#getRequestOptions return object when using a custom HTTP agent

simplecrawler 1.1.0

10 Mar 13:12
Compare
Choose a tag to compare

New features

  • Added the ability to make both fetch conditions and download conditions async. This change also deprecates the previous synchronous behavior (we will be removing it in the next major release). This suggestion was originally brought up by @maxcorbeau in #345