Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use predicates to restrict scan in merge operation #2411

Open
Tommel71 opened this issue Apr 12, 2024 · 0 comments
Open

Use predicates to restrict scan in merge operation #2411

Tommel71 opened this issue Apr 12, 2024 · 0 comments
Labels
binding/rust Issues for the Rust crate enhancement New feature or request

Comments

@Tommel71
Copy link

Description

Use predicates in the merge operation to read only parts of the affected partition into memory.

Use Case

I only ever write to one partition at a time.
I have a large table and I want to merge 2000 rows (or only one row - the issue remains) using when_not_matched_insert_all into it. The merge operation respects partitions, so merging into a new partition is fast. However, it looks like the merge operation currently reads the entire partition into memory, although in my case, the predicate and metadata could be used to restrict the search to only one file in the partition or even just one or very few row groups in parquet when my data is z-ordered. This would greatly speed up the merge operation in my case.

Currently, my query to merge 2000 rows into a partition of 2GB uncompressed parquet files takes 30 seconds, which forces me to internally keep track of whether or not the data has been written which in turn exposes me to data inconsistencies.

Related Issue(s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants