Skip to content

Releases: G-Research/spark-extension

[2.12.0] - 2024-04-26

29 Apr 09:35
Compare
Choose a tag to compare

Fixes

  • Diff change column should respect comparators (#238)

Changed

[2.11.0] - 2024-01-04

04 Jan 14:29
Compare
Choose a tag to compare

Added

  • Add count_null aggregate function (#206)
  • Support reading parquet schema (#208)
  • Add more columns to reading parquet metadata (#209, #211)
  • Provide groupByKey shortcuts for groupBy.as (#213)
  • Allow to install PIP packages into PySpark job (#215)
  • Allow to install Poetry projects into PySpark job (#216)

[2.10.0] - 2023-09-27

04 Oct 18:19
Compare
Choose a tag to compare

Fixed

  • Update setup.py to include parquet methods in python package (#191)

Added

  • Add --statistics option to diff app (#189)
  • Add --filter option to diff app (#190)

[2.9.0] - 2023-08-23

23 Aug 15:42
Compare
Choose a tag to compare

Added

  • Add key order sensitive map comparator (#187)

Changed

  • Use dataset encoder rather than implicit value encoder for implicit dataset extension class (#183)

Fixed

  • Fix key-sensitivity in map comparator (#186)

[2.8.0] - 2023-05-24

25 May 05:58
Compare
Choose a tag to compare

Added

  • Add method to set and automatically unset Spark job description. (#172)
  • Add column function that converts between .Net (C#, F#, Visual Basic) DateTime.Ticks and Spark timestamp / Unix epoch timestamps. (#153)

[2.7.0] - 2023-05-05

05 May 07:19
Compare
Choose a tag to compare

Added

  • Spark app to diff files or tables and write result back to file or table. (#160)
  • Add null value count to parquetBlockColumns and parquet_block_columns. (#162)
  • Add parallelism argument to Parquet metadata methods. (#164)

Changed

  • Change data type of column name in parquetBlockColumns and parquet_block_columns to array of strings.
    Cast to string to get earlier behaviour (string column name). (#162)

[2.6.0] - 2023-04-11

19 Apr 10:03
Compare
Choose a tag to compare

Added

  • Add reader for parquet metadata. (#154)

[2.5.0] - 2023-03-23

24 Mar 08:01
Compare
Choose a tag to compare

Added

  • Add whitespace agnostic diff comparator. (#137)
  • Add Python whl package build. (#151)

This is the first version that releases Python packages to PyPi: https://pypi.org/project/pyspark-extension/

[2.4.0] - 2022-12-08

08 Dec 19:01
Compare
Choose a tag to compare

Added

  • Allow for custom diff equality. (#127)

Fixed

  • Fix Python API calling into Scala code. (#132)

[2.3.0] - 2022-10-26

26 Oct 16:09
Compare
Choose a tag to compare

Added

  • Add diffWith to Scala, Java and Python Diff API. (#109)

Changed

  • Diff similar Datasets with ignoreColumns. Before, only similar DataFrame could be diffed with ignoreColumns. (#111)

Fixed

  • Cache before writing via partitionedBy to work around SPARK-40588. Unpersist via UnpersistHandle. (#124)