You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use predicates in the merge operation to read only parts of the affected partition into memory.
Use Case
I only ever write to one partition at a time.
I have a large table and I want to merge 2000 rows (or only one row - the issue remains) using when_not_matched_insert_all into it. The merge operation respects partitions, so merging into a new partition is fast. However, it looks like the merge operation currently reads the entire partition into memory, although in my case, the predicate and metadata could be used to restrict the search to only one file in the partition or even just one or very few row groups in parquet when my data is z-ordered. This would greatly speed up the merge operation in my case.
Currently, my query to merge 2000 rows into a partition of 2GB uncompressed parquet files takes 30 seconds, which forces me to internally keep track of whether or not the data has been written which in turn exposes me to data inconsistencies.
Related Issue(s)
The text was updated successfully, but these errors were encountered:
Description
Use predicates in the merge operation to read only parts of the affected partition into memory.
Use Case
I only ever write to one partition at a time.
I have a large table and I want to merge 2000 rows (or only one row - the issue remains) using
when_not_matched_insert_all
into it. The merge operation respects partitions, so merging into a new partition is fast. However, it looks like the merge operation currently reads the entire partition into memory, although in my case, the predicate and metadata could be used to restrict the search to only one file in the partition or even just one or very few row groups in parquet when my data is z-ordered. This would greatly speed up the merge operation in my case.Currently, my query to merge 2000 rows into a partition of 2GB uncompressed parquet files takes 30 seconds, which forces me to internally keep track of whether or not the data has been written which in turn exposes me to data inconsistencies.
Related Issue(s)
The text was updated successfully, but these errors were encountered: