Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter for run end array #5573

Merged
merged 5 commits into from Apr 4, 2024
Merged

Conversation

fabianmurariu
Copy link
Contributor

Which issue does this PR close?

Related to #3520

Rationale for this change

Attempt at adding filter support for RunArray for i64 run_ends

What changes are included in this PR?

Support for filtering RunArray

Are there any user-facing changes?

the filter function now works for RunArray

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 30, 2024
arrow-select/src/filter.rs Outdated Show resolved Hide resolved
arrow-select/src/filter.rs Outdated Show resolved Hide resolved
arrow-select/src/filter.rs Outdated Show resolved Hide resolved
Comment on lines +403 to +407
new_run_ends.truncate(i);

if values_filter.is_empty() {
new_run_ends.clear();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you allocate the new_run_ends as just an empty Vec but with run_ends.len() capacity and push to it, you probably won't need this part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using this trick

        new_run_ends[i] = count;
        i += keep as usize;

to make it branchless, I can't do the same thing with push

@fabianmurariu fabianmurariu changed the title initial version of filter for run end array with i64 run_ends filter for run end array Apr 1, 2024
arrow-select/src/filter.rs Outdated Show resolved Hide resolved
#[allow(clippy::type_complexity)]
fn filter_run_end_array_generic<R: RunEndIndexType>(
re_arr: &RunArray<R>,
pred: &FilterPredicate,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can leave a TODO item to utilize the IterationStrategy within FilterPredicate for potential performance benefit to keep this PR as more an initial version of filter for run end arrays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could handle the None case in this PR, the index based ones are a poor fit for REE I suspect unless the selectivity of the filter is high. I'd need a benchmark but in short I would prefer to leave it as TODO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think None case is already handled by the parent

IterationStrategy::None => Ok(new_empty_array(values.data_type())),

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Jefffrey Jefffrey merged commit bc2a73d into apache:master Apr 4, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants