Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup take_boolean / take_bits for non-null indices (~4 - 5x speedup) #2307

Merged
merged 1 commit into from Aug 4, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
40 changes: 29 additions & 11 deletions arrow/src/compute/kernels/take.rs
Expand Up @@ -614,23 +614,41 @@ where
let mut output_buffer = MutableBuffer::new_null(len);
let output_slice = output_buffer.as_slice_mut();

indices
.iter()
.enumerate()
.try_for_each::<_, Result<()>>(|(i, index)| {
if let Some(index) = index {
let index = ToPrimitive::to_usize(&index).ok_or_else(|| {
let indices_has_nulls = indices.null_count() > 0;

if indices_has_nulls {
indices
.iter()
.enumerate()
.try_for_each::<_, Result<()>>(|(i, index)| {
if let Some(index) = index {
let index = ToPrimitive::to_usize(&index).ok_or_else(|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to figure out how to use https://docs.rs/arrow/19.0.0/arrow/datatypes/trait.ArrowNativeType.html#method.to_usize here to make the code slightly more pretty, but it wasn't obvious to me 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the pattern occurs quite often. Maybe we can at least reduce the amount of ok_or_else(|| {... by having this function available (so it becomes something like let index = ToPrimitive::to_usize_or_error(&index)?; everywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps more rust like:

ToPrimitive::try_to_usize(&index)?

ArrowError::ComputeError("Cast to usize failed".to_string())
})?;

if bit_util::get_bit(values_slice, values_offset + index) {
bit_util::set_bit(output_slice, i);
}
}

Ok(())
})?;
} else {
indices
.values()
.iter()
.enumerate()
.try_for_each::<_, Result<()>>(|(i, index)| {
let index = ToPrimitive::to_usize(index).ok_or_else(|| {
ArrowError::ComputeError("Cast to usize failed".to_string())
})?;

if bit_util::get_bit(values_slice, values_offset + index) {
bit_util::set_bit(output_slice, i);
}
}

Ok(())
})?;

Ok(())
})?;
}
Ok(output_buffer.into())
}

Expand Down