Skip to content

Releases: archivesunleashed/aut

aut-1.2.0

17 Nov 02:08
569b3fe
Compare
Choose a tag to compare

Documentation

Release Notes

Full Changelog

Closed issues:

  • Include last modified date for a resource #546

Merged pull requests:

aut-1.1.1

31 Oct 19:14
5468f21
Compare
Choose a tag to compare

Documentation

Release Notes

Full Changelog

Fixed bugs:

  • DomainGraph should use YYYYMMDD not YYYYMMDDHHMMSS #544

Merged pull requests:

aut-1.1.0

17 Jun 15:52
1a221bb
Compare
Choose a tag to compare

Documentation

Release Notes

Full Changelog

Fixed bugs:

  • org.apache.tika.mime.MimeTypeException: Invalid media type name: application/rss+xml lang=utf-8 #542

Closed issues:

  • Add ARCH text files derivatives #540

Merged pull requests:

aut-1.0.0

11 Jun 17:11
4655448
Compare
Choose a tag to compare

Documentation

Release Notes

Full Changelog

Implemented enhancements:

  • Remove http headers, and html on webpages() #538
  • Add domain column to webpages() #534
  • Replace Java ARC/WARC record processing library #494
  • Method to perform finer-grained selection of ARCs and WARCs #247
  • Unnecessary buffer copying #18

Fixed bugs:

  • Discard date RDD filter only takes a single string, not a list of strings. #532
  • Extract gzip data from transfer-encoded WARC #493
  • ARC reader string vs int error on record length #492

Closed issues:

  • java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Set$Set1 Set(liberal.ca) #529
  • Improve CommandLineApp.scala test coverage #262
  • Improve ExtractBoilerpipeText.scala test coverage #261
  • Improve ArchiveRecord.scala test coverage #260
  • Unit testing for RecordLoader #182
  • Improve ArchiveRecordWritable.java test coverage #76
  • Improve WarcRecordUtils.java test coverage #74
  • Improve ArcRecordUtils.java test coverage #73
  • Improve ExtractDate.scala test coverage #64
  • Remove org.apache.commons.httpclient #23

Merged pull requests:

aut-0.91.0

21 Jan 15:03
2d03904
Compare
Choose a tag to compare

Documentation

Release Notes

Full Changelog

Implemented enhancements:

  • Include timestamp in crawl date #525

Merged pull requests:

  • Change crawl_date format to YYYYMMDDHHMMSS, update hasDate filter. #526 (ruebot)

aut-0.90.4

01 Nov 16:36
145354c
Compare
Choose a tag to compare

Documentation

Release Notes

Full Changelog

Implemented enhancements:

  • Replace scala-uri library from ExtractDomain and just parse public_suffix_list.dat #521

Fixed bugs:

  • Scaladocs haven't been created since 0.90.0 release #522

Merged pull requests:

aut-0.90.3

22 Oct 14:09
2df52a5
Compare
Choose a tag to compare

Documentation

Release Notes

Full Changelog

Fixed bugs:

  • ExtractDomains returns non-Apex Domains #519

Merged pull requests:

aut-0.90.2

12 May 16:09
2af038d
Compare
Choose a tag to compare

Documentation

Release Notes

Full Changelog

Fixed bugs:

  • ARC file name appearing in url list #516
  • WARC-Target-URI in Wget warc files is not parsed properly #514

Merged pull requests:

  • Filter or filedesc and dns records from arcs. #517 (ruebot)
  • Handle wget WARC-Target-URI formatting. #515 (ruebot)

aut-0.90.1

29 Apr 18:14
f185d91
Compare
Choose a tag to compare

Documentation

Release Notes

Full Changelog

Fixed bugs:

  • crawl_date is not included on binary information jobs when documentation says it is #512

Merged pull requests:

  • Add missing crawl_date column to binary information jobs. #513 (ruebot)
  • Update jsoup to 1.13.1 #511 (ruebot)

aut-0.90.0

27 Jan 15:21
c0872c7
Compare
Choose a tag to compare

Documentation

Release Notes

Full Changelog

Fixed bugs:

  • Python implementation of .all() has .keepValidPages() incorrectly applied to it #502
  • Extract hyperlinks from wayback machine #501
  • Release 0.80.0 JAR produces error; built 0.80.1 fatjar built on repo works #495

Closed issues:

  • Migrate CI infrastructure from TravisCI to GitHub Action #506
  • Split tf into it's own repo #498
  • Change master branch to main branch #490
  • GitHub action - Run isort and black on Python code #488
  • Add scalafmt GitHub action #486
  • Add Google Java Formatter as a GitHub action #484
  • Packages build is often broken - should we support it? #483
  • Implement SaveToDisk in Python #478
  • Java 11 support #356

Merged pull requests: