Use predicates to restrict scan in merge operation #2411

Tommel71 · 2024-04-12T10:15:52Z

Description

Use predicates in the merge operation to read only parts of the affected partition into memory.

Use Case

I only ever write to one partition at a time.
I have a large table and I want to merge 2000 rows (or only one row - the issue remains) using when_not_matched_insert_all into it. The merge operation respects partitions, so merging into a new partition is fast. However, it looks like the merge operation currently reads the entire partition into memory, although in my case, the predicate and metadata could be used to restrict the search to only one file in the partition or even just one or very few row groups in parquet when my data is z-ordered. This would greatly speed up the merge operation in my case.

Currently, my query to merge 2000 rows into a partition of 2GB uncompressed parquet files takes 30 seconds, which forces me to internally keep track of whether or not the data has been written which in turn exposes me to data inconsistencies.

Related Issue(s)

The text was updated successfully, but these errors were encountered:

Tommel71 added the enhancement New feature or request label Apr 12, 2024

rtyler added the binding/rust Issues for the Rust crate label May 7, 2024

JonasDev1 mentioned this issue May 14, 2024

feat: improve merge performance by using predicate non-partition columns min/max for prefiltering #2513

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use predicates to restrict scan in merge operation #2411

Use predicates to restrict scan in merge operation #2411

Tommel71 commented Apr 12, 2024

Use predicates to restrict scan in merge operation #2411

Use predicates to restrict scan in merge operation #2411

Comments

Tommel71 commented Apr 12, 2024

Description