Skip to content

Releases: adbar/courlan

courlan-1.1.0

30 Apr 11:20
2b11567
Compare
Choose a tag to compare
  • replace langcodes by babel and use its information on locales (#89, #92)
  • simplified and faster code: domain extraction, cleaning, filters and UrlStore (#90, #93, #94, #95)
  • UrlStore: better url batches, replace timelimit parameter by time_limit (#91)
  • maintenance: update readme and convert it to markdown (#97)

courlan-1.0.0

01 Feb 14:56
1cfb7db
Compare
Choose a tag to compare
  • license change from GPLv3+ to Apache 2.0 (#81)
  • UrlStore: write() method and load_store() function added (#83)
  • add parameter trailing_slash to keep of discard slashes at the end of URLs (#52)
  • maintenance: fix whitespace in clean_url() (#77), simplify code (#79)

courlan-0.9.5

28 Nov 11:34
b61b1b3
Compare
Choose a tag to compare
  • IRI to URI normalization: encode path, query and fragments (#58, #60)
  • normalization: strip common trackers (#65)
  • new function is_valid_url() (#63)
  • hardening of domain filter (#64)

Full Changelog: v0.9.4...v0.9.5

courlan-0.9.4

06 Sep 15:17
869912c
Compare
Choose a tag to compare
  • new UrlStore functions: add_from_html() (#42), discard() (#44), get_unvisited_domains
  • CLI: removed --samplesize, use --sample with an integer instead (#54)
  • added plausibility filter for domains/hosts (#48)
  • speedups and more efficient processing (#47, #49, #50)
  • fixed handling of relative URLs with @feltcat in #46
  • fixed bugs and ensured compatibility (#41, #43, #51, #56)
  • official support for Python 3.12

Full Changelog: v0.9.3...v0.9.4

courlan-0.9.3

31 May 14:41
05c6e20
Compare
Choose a tag to compare
  • more efficient URL parsing (#33)
  • refined link extraction and link filters (#30, #36)
  • more efficient normalization (#32)
  • more efficient sampling strategy (#31, #35)
  • added meta function to clear LRU caches (#34)
  • added parallel option in command-line interface (#37, #39)
  • added get_unvisited_domains() method to UrlStore (#40)

Full Changelog: v0.9.2...v0.9.3

courlan-0.9.2

02 May 17:02
eb23b9b
Compare
Choose a tag to compare
  • add blogspot archives to type filter
  • maintenance: upgrade urllib3 and review code

courlan-0.9.1

24 Apr 16:13
a144749
Compare
Choose a tag to compare
  • network tests: larger throughput
  • UrlStore: optional compression of rules (#21), added reset() (#22) and get_all_counts() methods
  • UrlStore fixes: signal in #18, total_url_number
  • updated Readme

Full Changelog: v0.9.0...v0.9.1

courlan-0.9.0

07 Mar 12:28
Compare
Choose a tag to compare
  • hardening of filters and URL parses (#14)
  • normalize punicode to unicode
  • methods added to UrlStore: get_crawl_delay(), print_unvisited_urls()
  • UrlStore now triggers exit code 1 when interrupted
  • argument added to extract_links(): no_filter
  • code refactoring: simplifications

Full Changelog: v0.8.3...v0.9.0

courlan-0.8.3

28 Jul 16:53
Compare
Choose a tag to compare
  • fixed bug in domain name extraction
  • uniform logging parameters

Full Changelog: v0.8.2...v0.8.3

courlan-0.8.2

26 Jul 16:22
Compare
Choose a tag to compare
  • full type hinting
  • maintenance: code linted

Full Changelog: v0.8.1...v0.8.2