Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Parquet filter pushdown into scan #3462

Open
26 of 27 tasks
alamb opened this issue Sep 13, 2022 · 1 comment
Open
26 of 27 tasks

[EPIC] Parquet filter pushdown into scan #3462

alamb opened this issue Sep 13, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Sep 13, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
DataFusion offers sophisticated "filter pushdown" optimizations into LogicalPlan::TableScan by passing predicates into TableProvider::scan.

This ticket tracks the work to make use of these predicates in the table provider for parquet files, ParquetFileReader. Much of this work has been completed by the writing of this ticket, but I wanted to try and capture it here to both show how far DataFusion has come as well as how close we are to done

There are three types of predicate pushdown:

  • Prune row groups based on statistics (do not fetch or decode any pages)
  • Prune column pages based on page level statistics, skip decode of corresponding positions in other columns:
  • Prune row indexes based on Expr predicates, and skip decode of corresponding positions in other columns

Work Items

Related arrow-rs items:

@alamb alamb added the enhancement New feature or request label Sep 13, 2022
@alamb alamb changed the title [EPIC] Parquet filter pushdown [EPIC] Parquet filter pushdown into scan Oct 11, 2022
@alamb
Copy link
Contributor Author

alamb commented Jan 8, 2024

We are so close to making this happen -- the last thing is to be able to turn the predicate pushdown on by default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant