Skip to content

Releases: housepower/spark-clickhouse-connector

v0.7.3

06 Feb 09:47
07ff823
Compare
Choose a tag to compare

This release built against clickhouse-java 0.4.6, should also work with clickhouse-java 0.5.0 and 0.6.0

Change Logs

  • Core: Remove usage of ClickHouse client deprecated API (#291)
  • Spark: Add int long BigInteger cases for Decimal (#266)

v0.7.2

14 Jul 13:51
bf02ab9
Compare
Choose a tag to compare

Change Logs

  • Build: Upgrade clickhouse-java 0.4.6 (#235)
  • Test: Upload ClickHouse server logs when CI failed (#249)
  • Spark: ClickHouse FixedString map to Spark BinaryType (#251)
  • Spark: Fix ArrayIndexOutOfBoundsException when all columns are pruned and agg pushdown does not take effects (#256)
  • Core: Support parse Date64 which contains nano seconds (#258)

v0.7.1

22 May 11:09
496e546
Compare
Choose a tag to compare

Change Logs

  • Spark: Fix Decimal precision in JSON mode on reading

v0.7.0

27 Apr 07:55
cfc5d74
Compare
Choose a tag to compare

Notable Changes

This release supports Spark 3.3 and 3.4, and is compatible w/ clickhouse-jdbc:0.4.5.

Since this version, gRPC is deprecated, and may be removed in the future.

Change Logs

  • Core: Bump clickhouse-java 0.4.5 (#211)
  • Core: Deprecate gRPC protocol (#233)
  • Spark: Initial support Spark 3.4 (#228)
  • Spark: Polish configuration's doc
  • Spark: Fix custom options (#231)
  • Spark 3.3: Remove ConfigurationSuite
  • Docs: Mention 0.6.0 important changes
  • Docs: Add Compatible Matrix
  • Docs: Fix link for configuration page
  • Docs: Correct the Spark version for integration tests
  • Test: Test against ClickHouse 23.3 (#232)
  • Playground: Update Kyuubi 1.7.0

v0.6.1

27 Apr 05:31
dec2f42
Compare
Choose a tag to compare

Change Logs

  • Spark 3.3: Fix custom options (#234)
  • Build: Enable CI on release branch

v0.6.0

13 Mar 15:59
52175c6
Compare
Choose a tag to compare

Notable Changes

This release only supports Spark 3.3, and is compatible w/ clickhouse-jdbc:0.3.2-patch11.

The default protocol is changed to HTTP, as suggested by ClickHouse/clickhouse-java#1252 (comment)

gRPC is experimental and problematic, I should probably drop it someday to avoid confusion.

Change Logs

  • Core: Respect ClickHouse ORDER BY Clause default behavior
  • Spark: Change default protocol to HTTP (#190)
  • Spark: Fix Decimal reading in JSON format (#220)
  • Spark: Support Date type as partition column in dropPartition (#218)
  • Spark: Support tcp_port in catalog option (#223)
  • Spark: Fix timestamp value transformation (#216)
  • Spark: Use clickhouse java client to parse schema (#215)
  • Spark: Allow setting arbitrary options for ClickHouseClient (#203)
  • Spark: Support reading Bool type (#207)
  • Spark: Rename and reorganize functions (#198)
  • Spark: Simplify spark.clickhouse.write.format values
  • Spark: Support RowBinary format in reading (#195)
  • Spark: Support read metrics (#191)
  • Spark: Test parse LowCardinality column definition (#217)
  • Playground: Switch minio image back to bitnami/minio
  • Playground: Restructure directories and upgrade components (#212)
  • Playground: Remove python
  • Playground: Fix S3 magic committer confs
  • Playground: Use eclipse-temurin:8-focal as base image (#188)
  • Docs: Syntax improvements
  • Docs: Remove incubating from Kyuubi reference (#209)
  • Docs: Bump mkdocs-material 9.0.9
  • Docs: Remove unused var spark_version
  • Docs: Auto generate configuration docs
  • Docs: Fix documentation --jars/--packages usage (#186)
  • Docs: polish sentence
  • Docs: Supply demo for native SQL execution
  • Docs: Use docker compose V2 command
  • Docs: Update Rebalance image
  • Docs: Improve sentence
  • Docs: Enrich Catalog Management
  • Infra: Enable spotless (#208)
  • Infra: Upgrade CI runner image and actions (#214)
  • Build: Polish gradle scripts
  • Build: Bump Spark 3.3.2 (#219)
  • Build: Bump Gradle 7.6 (#213)
  • Build: Bump testcontainers-scala 0.40.12
  • Build: Bump gradle rat plugin 0.8.0
  • Build: Bump gradle scoverage plugin 7.0.1 (#193)
  • Build: Remove unused snapshot repo
  • Build: Remove Spark 3.2 support (#189)
  • Build: Bump Jackson 2.13.4 (#192)
  • Build: Rename SonarQube workflow
  • Build: Testing w/ multi clickhouse versions (#183)
  • Test: Allow testing w/ non-grpc versions (#182)
  • Test: Correct configuring log4j2

v0.5.0

09 Aug 09:34
8e1a114
Compare
Choose a tag to compare

Notable Changes

As of 0.5.0, this connector switches from ClickHouse raw gRPC Client to ClickHouse Official Java Client, which brings HTTP protocol support, extending the range of supported versions of ClickHouse Server. In the meanwhile, the gzip, zstd write compression support has been removed, and currently supported codecs are none, lz4(default).

If you upgrade from the previous versions, ONE of the following jars should be used w/ spark-clickhouse-connector-3.3_2.12-0.5.0.jar instead of a single clickhouse-spark-runtime-3.3_2.12-0.4.0.jar.

If you want to connect ClickHouse through gRPC, using

$SPARK_HOME/bin/spark-shell \
  --conf spark.sql.catalog.clickhouse=xenon.clickhouse.ClickHouseCatalog \
  --conf spark.sql.catalog.clickhouse.host=<clickhouse-host> \
  --conf spark.sql.catalog.clickhouse.protocol=grpc \
  --conf spark.sql.catalog.clickhouse.grpc_port=<clickhouse-grpc-port> \
  --conf spark.sql.catalog.clickhouse.user=<username> \
  --conf spark.sql.catalog.clickhouse.password=<password> \
  --conf spark.sql.catalog.clickhouse.database=<default-database> \
  --jars /path/clickhouse-spark-runtime-3.3_2.12:0.5.0.jar,/path/clickhouse-jdbc-0.3.2-patch11-all.jar

and if you prefer to use http, just change

  --conf spark.sql.catalog.clickhouse.protocol=grpc \
  --conf spark.sql.catalog.clickhouse.grpc_port=<clickhouse-grpc-port> \

to

  --conf spark.sql.catalog.clickhouse.protocol=http \
  --conf spark.sql.catalog.clickhouse.http_port=<clickhouse-http-port> \

Change Logs

  • Core: Deserializer consumes InputStream instead of ByteString (#162)
  • Core: Throw ClickHouseException instead of gRPC Exception (#163)
  • Core: Rename CHException (#164)
  • Core: Use ClickHouse Java client
  • Core: Remove gRPC
  • Core: Support compression on reading
  • Core: Simplify deserializeStream (#173)
  • Core: CHException should propagate root cause (#181)
  • Spark: Use ClickHouse Java client
  • Spark: Support compression on reading
  • Spark: Add column comment when create clickhouse table (#176)
  • Spark: Fix reading decimal values (#180)
  • Spark 3.3: Remove zstd support in writing (#166)
  • Spark 3.3: Support write metrics (#169)
  • Docs: Remove gzip compression
  • Docs: Mention HTTP support
  • Docs: Upgrade mkdocs-material
  • Docs: Replace versions w/ variables
  • Docs: Add spark.clickhouse.read.compression.codec
  • Docs: Enrich internal docs
  • Build: Bump clickhouse-jdbc 0.3.2-patch11
  • Build: Bump gradle rat plugin 0.7.1
  • Build: Remove scala-xml version restriction (#175)
  • Build: Algin Jackson version w/ Spark (#177)
  • Build: Bump Gradle 7.5.1
  • Playground: Switch to ClickHouse Java client
  • Playground: Fix dev setup
  • Test: Remove unused SparkClickHouseSingleTestHelper
  • Test: Bump testcontainers-scala 0.40.10 (#168)

v0.4.0

21 Jul 14:03
be97ae4
Compare
Choose a tag to compare

Notable Changes

  • Core: Fix DistributedEngineSpec#is_distributed
  • Core: Support parse ColumnExprPrecedence
  • Core: Replace Using by tryWithResource
  • Spark: Support ignore unsupported transform
  • Spark: Support constructing InputPartition by virtual col _partition_id
  • Spark: Improve writer's memory usage efficiency
  • Spark: Improve writer's log format
  • Spark: Reorganize test suites
  • Spark 3.2: Bump Spark 3.2.2 (#158)
  • Spark 3.2: Support GZIP, LZ4 in write
  • Spark 3.3: Support GZIP, LZ4, ZSTD in write
  • Spark 3.3: Support writing format ArrowStream
  • Spark 3.3: Cast non-nullable if the table column is not null
  • Spark 3.3: ArrowStream should close out in each batch
  • Spark 3.3: Fix ArrowStream writer summary
  • Spark 3.3: Fix ArrowStream writer memory leak and add metrics
  • Spark 3.3: Count serialize time of writeRow
  • Spark 3.3: Remove spark.clickhouse.write.batchSize upper bound limitation
  • Build: Bump gRPC 1.47.0 (#150)
  • Build: Switch default Maven Central mirror to Apache
  • Build: Daily SonarQube report
  • Build: Shade Jackson to avoid class conflict (#153)
  • Build: Bump Gradle 7.5
  • Test: Aglin isTesting w/ Spark
  • Test: Bump clickhouse-jdbc 0.3.2-patch10 (#151)
  • Test: Remove obsolete settings (#156)
  • Test: Bump testcontainers-scala 0.40.8
  • Docs: Document Spark versions support policy
  • Docs: Add overview image
  • Docs: Update developers docs
  • Docs: Basic internal docs
  • Playground: Expose ports of clickhouse-s1r1
  • Playground: Add back iceberg
  • Playground: Bump Iceberg 0.14.0
  • Playground: Upgrade Kyuubi TPC-DS/TPC-H connector version

v0.3.0

18 Jun 16:37
0b14ab8
Compare
Choose a tag to compare

Notable Changes

  • Support Spark 3.3 (#105)
  • Add conf spark.clickhouse.write.repartitionByPartition
  • Add conf spark.clickhouse.write.localSortByPartition
  • Add conf spark.clickhouse.write.localSortByKey
  • Support use gzip on writing
  • Always retry to write same instance
  • Remove SupportsTruncate
  • Simplify configuration
  • Initial support LowCardinality
  • Bump Jackson 2.13.3
  • Core: Fix retry method
  • Core: gRPC client exposes inputCompressionType option
  • Spark 3.3: Implement SupportsPushDownLimit (#112)
  • Spark 3.3: Enable tests which were failed because of SPARK-39313
  • Spark 3.3: Hack version for partition toYYYYMMDD(toDate(col))
  • Spark 3.3: Add conf spark.clickhouse.write.repartitionStrictly
  • Performance: Avoid creating JacksonGenerator when writing each row
  • Playground: Support for using buildx to build cross-platform images (#128)
  • Playground: Add missing container_name for Zookeeper
  • Playground: Change ClickHouse catalog name
  • Playground: Upgrade Spark 3.3.0 (#134)
  • Playground: Remove iceberg
  • Playground: Add CloudBeaver (#115)
  • Playground: Fix ARG PROJECT_VERSION
  • Playground: Fix KYUUBI_CONF_DIR
  • Playground: Add TPCHCatalog
  • Logs: Refine node information
  • Logs: Add clickhouse node info when error ocurrs (#129)
  • Logs: Print compression cost time
  • Docs: Update README for ARM platform users
  • Docs: Update README for testing w/ different Spark Scala versions
  • Test: Support overwrite ClickHouse image by CLICKHOUSE_IMAGE (#132)
  • Test: Testing TPC-DS tiny scale (#121)
  • Test: Move test to package org.apache.spark.sql.clickhouse
  • Docs: Update document adding Developers sections
  • Docs: Hide navigation in home page
  • Git: Ignore patch files
  • Minor: Remove unused try

v0.2.1

20 May 04:49
b924fe0
Compare
Choose a tag to compare
  • Playground: Upgrade kyuubi-spark-connector-tpcds
  • Playground: Add missing container_name for Zookeeper
  • Clean up code for Spark 3.2