New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support RowFilter within ParquetRecordBatchReader (#2431) #2452
Conversation
@@ -1121,6 +1174,19 @@ mod tests { | |||
} | |||
} | |||
|
|||
fn with_row_filter(self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty chuffed with this fuzz test harness, in particular how easy it is to add new test cases. It has definitely been worth the investment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL "chuffed": https://www.merriam-webster.com/dictionary/chuffed (not common in American usage 😆 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- cc @thinkharderdev and @Ted-Jiang
let mut filter = self.filter; | ||
let mut selection = self.selection; | ||
|
||
if let Some(filter) = filter.as_mut() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the mut
here -- it seems off to me because I wouldn't expect that the filter / predicate contains state that could be modified. Maybe I misunderstand something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I purposefully opted to make ArrowPredicate take &mut self
, this can be exploited to allow for tracking of the current position - something this PR in fact makes use of.
@@ -1121,6 +1174,19 @@ mod tests { | |||
} | |||
} | |||
|
|||
fn with_row_filter(self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL "chuffed": https://www.merriam-webster.com/dictionary/chuffed (not common in American usage 😆 )
@@ -283,9 +283,6 @@ where | |||
batch_size: usize, | |||
) -> ReadResult<T> { | |||
// TODO: calling build_array multiple times is wasteful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this TODO still valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I intend to fix in a follow up
Benchmark runs are scheduled for baseline = 0a1f71c and contender = 569c781. 569c781 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2431
Rationale for this change
Supports RowFilter within the sync interface and uses this to hook into the fuzz tests for better coverage
What changes are included in this PR?
Are there any user-facing changes?