Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix take string on sliced indices #2960

Merged
merged 1 commit into from Oct 28, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
28 changes: 19 additions & 9 deletions arrow-select/src/take.rs
Expand Up @@ -21,9 +21,7 @@ use std::{ops::AddAssign, sync::Arc};

use arrow_array::types::*;
use arrow_array::*;
use arrow_buffer::{
bit_util, buffer::buffer_bin_and, ArrowNativeType, Buffer, MutableBuffer,
};
use arrow_buffer::{bit_util, ArrowNativeType, Buffer, MutableBuffer};
use arrow_data::{ArrayData, ArrayDataBuilder};
use arrow_schema::{ArrowError, DataType, Field};

Expand Down Expand Up @@ -675,12 +673,7 @@ where
*offset = length_so_far;
}

nulls = match indices.data_ref().null_buffer() {
Some(buffer) => {
Some(buffer_bin_and(buffer, 0, &null_buf.into(), 0, data_len))
Copy link
Contributor Author

@tustvold tustvold Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug is that this doesn't take into account any offset that indices may have, it should be

buffer_bin_and(buffer, indices.offset(), &null_buf.into(), 0, data_len)

However, this code is completely redundant as we already check the validity of indices in the loop above when constructing this mask, and so we can just remove this

}
None => Some(null_buf.into()),
};
nulls = Some(null_buf.into())
}

let array_data = ArrayData::builder(GenericStringArray::<OffsetSize>::DATA_TYPE)
Expand Down Expand Up @@ -1547,6 +1540,23 @@ mod tests {
_test_take_string::<LargeStringArray>()
}

#[test]
fn test_take_slice_string() {
let strings =
StringArray::from(vec![Some("hello"), None, Some("world"), None, Some("hi")]);
let indices = Int32Array::from(vec![Some(0), Some(1), None, Some(0), Some(2)]);
let indices_slice = indices.slice(1, 4);
let indices_slice = indices_slice
.as_ref()
.as_any()
.downcast_ref::<Int32Array>()
.unwrap();

let expected = StringArray::from(vec![None, None, Some("hello"), Some("world")]);
let result = take(&strings, indices_slice, None).unwrap();
assert_eq!(result.as_ref(), &expected);
}

macro_rules! test_take_list {
($offset_type:ty, $list_data_type:ident, $list_array_type:ident) => {{
// Construct a value array, [[0,0,0], [-1,-2,-1], [2,3]]
Expand Down