Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Row filtering #2271

Closed
wants to merge 6 commits into from
Closed

Row filtering #2271

wants to merge 6 commits into from

Conversation

thinkharderdev
Copy link
Contributor

Which issue does this PR close?

Closes #2270

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the parquet Changes to the parquet crate label Aug 1, 2022
Comment on lines +303 to +311
// TODO Why does this cause problems?
// if skipped != front.row_count {
// return Some(Err(general_err!(
// "failed to skip rows, expected {}, got {}",
// front.row_count,
// skipped
// )
// .into()));
// }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what's up here. This is the correct assertion to make here but I think there is a bug somewhere in the skip logic that skips the right values but gets the accounting wrong. Leaving this check in causes issues and leaving it out seems to produce the correct results (if the selections are created correctly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tustvold can you help out here? I haven't been following along with this PR / work closely enough to provide useful guidance here?

@thinkharderdev
Copy link
Contributor Author

@tustvold @alamb @Ted-Jiang

@alamb alamb requested a review from tustvold August 2, 2022 19:32
@tustvold
Copy link
Contributor

tustvold commented Aug 3, 2022

@thinkharderdev how would you like to proceed with this, in particular with regards to the suggestion on the ticket of pushing predicates into the ParquetRecordBatchReader? I would be happy to stub out an API for this if that would be helpful?

@thinkharderdev
Copy link
Contributor Author

@thinkharderdev how would you like to proceed with this, in particular with regards to the suggestion on the ticket of pushing predicates into the ParquetRecordBatchReader? I would be happy to stub out an API for this if that would be helpful?

I can take a crack at it and post a draft PR when the API is nailed down

@thinkharderdev
Copy link
Contributor Author

Closing in favor of #2310

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Changes to ParquetRecordBatchStream to support row filtering in DataFusion
3 participants