Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function for row alignment with page mask #1790

Closed
Tracked by #1749
Ted-Jiang opened this issue Jun 5, 2022 · 0 comments · Fixed by #1791
Closed
Tracked by #1749

Add function for row alignment with page mask #1790

Ted-Jiang opened this issue Jun 5, 2022 · 0 comments · Fixed by #1791
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate

Comments

@Ted-Jiang
Copy link
Member

Ted-Jiang commented Jun 5, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
For now row group filter in datafusion pass a closure to arrow-rs

fn build_row_group_predicate(
    pruning_predicate: &PruningPredicate,
    metrics: ParquetFileMetrics,
) -> Box<dyn FnMut(&RowGroupMetaData, usize) -> bool> {

https://github.com/apache/arrow-datafusion/blob/585bc3a629b92ea7a86ebfe8bf762dbef4155710/datafusion/core/src/physical_plan/file_format/parquet.rs#L559-L562

So for page filter in datafusion, define filter_predicate

 Box<dyn FnMut(&[pageIndex], &[pageLocation], usize) -> &[bool]>

datafusion will send a mask(&[bool]) to arrow-rs,
then use mask call compute_row_ranges to construct RowRanges : row ranges in a row-group (one col) if col is sorted vec size will be 1.
For multi filter combine:
if there are two filters use and connect,use RowRanges::intersection to get the final rowRange; two filters use or connect,use RowRanges::union to get the final rowRange.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@Ted-Jiang Ted-Jiang added the enhancement Any new improvement worthy of a entry in the changelog label Jun 5, 2022
@Ted-Jiang Ted-Jiang changed the title Implement page filtering with Row Alignment Add function for row alignment with page mask Jun 6, 2022
@alamb alamb added the parquet Changes to the parquet crate label Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants