New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FixedSizeBinaryArray::try_from_sparse_iter_with_size #3054
Changes from 6 commits
8fac8fc
edc745e
f5db19a
c7ea585
702fe25
8f787a7
761fa15
85faf04
7162712
763b9da
6210d69
4d7498c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -129,17 +129,29 @@ impl FixedSizeBinaryArray { | |||||||||||||||
/// # Errors | ||||||||||||||||
/// | ||||||||||||||||
/// Returns error if argument has length zero, or sizes of nested slices don't match. | ||||||||||||||||
pub fn try_from_sparse_iter<T, U>(mut iter: T) -> Result<Self, ArrowError> | ||||||||||||||||
pub fn try_from_sparse_iter<T, U>(iter: T) -> Result<Self, ArrowError> | ||||||||||||||||
where | ||||||||||||||||
T: Iterator<Item = Option<U>>, | ||||||||||||||||
U: AsRef<[u8]>, | ||||||||||||||||
{ | ||||||||||||||||
Self::try_from_sparse_iter_with_size(iter, None) | ||||||||||||||||
} | ||||||||||||||||
|
||||||||||||||||
pub fn try_from_sparse_iter_with_size<T, U>( | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I think this makes for better ergonomics There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made the parameter Optional instead |
||||||||||||||||
mut iter: T, | ||||||||||||||||
asserted_size: Option<i32>, | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't follow the need for a separate asserted_size and detected_size. Looking at the logic I think we could just have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think if we were to just have a single size option we would need to have separate implementations of (IMO, I think the former is problematic and should be deprecated / removed) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see why you need to distinguish between the two different sizes in the implementation, it doesn't change if the size is determined by being passed in or from the first non-null element. They must all be equal regardless? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I separated the instances of |
||||||||||||||||
) -> Result<Self, ArrowError> | ||||||||||||||||
where | ||||||||||||||||
T: Iterator<Item = Option<U>>, | ||||||||||||||||
U: AsRef<[u8]>, | ||||||||||||||||
{ | ||||||||||||||||
let mut len = 0; | ||||||||||||||||
let mut size = None; | ||||||||||||||||
let mut detected_size = None; | ||||||||||||||||
let mut byte = 0; | ||||||||||||||||
let mut null_buf = MutableBuffer::from_len_zeroed(0); | ||||||||||||||||
let mut buffer = MutableBuffer::from_len_zeroed(0); | ||||||||||||||||
let mut prepend = 0; | ||||||||||||||||
|
||||||||||||||||
iter.try_for_each(|item| -> Result<(), ArrowError> { | ||||||||||||||||
// extend null bitmask by one byte per each 8 items | ||||||||||||||||
if byte == 0 { | ||||||||||||||||
|
@@ -150,21 +162,21 @@ impl FixedSizeBinaryArray { | |||||||||||||||
|
||||||||||||||||
if let Some(slice) = item { | ||||||||||||||||
let slice = slice.as_ref(); | ||||||||||||||||
if let Some(size) = size { | ||||||||||||||||
if size != slice.len() { | ||||||||||||||||
if let Some(detected_size) = detected_size { | ||||||||||||||||
if detected_size != slice.len() { | ||||||||||||||||
return Err(ArrowError::InvalidArgumentError(format!( | ||||||||||||||||
"Nested array size mismatch: one is {}, and the other is {}", | ||||||||||||||||
size, | ||||||||||||||||
detected_size, | ||||||||||||||||
slice.len() | ||||||||||||||||
))); | ||||||||||||||||
} | ||||||||||||||||
} else { | ||||||||||||||||
size = Some(slice.len()); | ||||||||||||||||
detected_size = Some(slice.len()); | ||||||||||||||||
buffer.extend_zeros(slice.len() * prepend); | ||||||||||||||||
} | ||||||||||||||||
bit_util::set_bit(null_buf.as_slice_mut(), len); | ||||||||||||||||
buffer.extend_from_slice(slice); | ||||||||||||||||
} else if let Some(size) = size { | ||||||||||||||||
} else if let Some(size) = detected_size { | ||||||||||||||||
buffer.extend_zeros(size); | ||||||||||||||||
} else { | ||||||||||||||||
prepend += 1; | ||||||||||||||||
|
@@ -175,16 +187,43 @@ impl FixedSizeBinaryArray { | |||||||||||||||
Ok(()) | ||||||||||||||||
})?; | ||||||||||||||||
|
||||||||||||||||
if len == 0 { | ||||||||||||||||
if len == 0 && asserted_size.is_none() { | ||||||||||||||||
return Err(ArrowError::InvalidArgumentError( | ||||||||||||||||
"Input iterable argument has no data".to_owned(), | ||||||||||||||||
)); | ||||||||||||||||
} | ||||||||||||||||
|
||||||||||||||||
let size = size.unwrap_or(0); | ||||||||||||||||
let size = if let Some(detected_size) = detected_size { | ||||||||||||||||
// Either asserted size is zero (because we are entering this | ||||||||||||||||
// function by way of FixedSizeBinaryArray::try_from_sparse_iter), or | ||||||||||||||||
// the detected size has to be equal to the asserted size. If neither | ||||||||||||||||
// of these conditions hold then we need to report an error. | ||||||||||||||||
if let Some(asserted_size) = asserted_size { | ||||||||||||||||
if detected_size as i32 != asserted_size { | ||||||||||||||||
return Err(ArrowError::InvalidArgumentError(format!( | ||||||||||||||||
"Nested array size mismatch: detected size is {}, and the asserted size is {}", | ||||||||||||||||
detected_size, | ||||||||||||||||
asserted_size | ||||||||||||||||
))); | ||||||||||||||||
} | ||||||||||||||||
asserted_size | ||||||||||||||||
} else { | ||||||||||||||||
detected_size as i32 | ||||||||||||||||
} | ||||||||||||||||
} else { | ||||||||||||||||
if let Some(asserted_size) = asserted_size { | ||||||||||||||||
// If we _haven't_ detected a size, then our provided iterator has | ||||||||||||||||
// only produced `None` values. We need to extend the buffer to the | ||||||||||||||||
// expected size else we'll segfault if we attempt to use this array. | ||||||||||||||||
buffer.extend_zeros(len * (asserted_size as usize)); | ||||||||||||||||
} | ||||||||||||||||
|
||||||||||||||||
asserted_size.unwrap_or(0) | ||||||||||||||||
}; | ||||||||||||||||
|
||||||||||||||||
let array_data = unsafe { | ||||||||||||||||
ArrayData::new_unchecked( | ||||||||||||||||
DataType::FixedSizeBinary(size as i32), | ||||||||||||||||
DataType::FixedSizeBinary(size), | ||||||||||||||||
len, | ||||||||||||||||
None, | ||||||||||||||||
Some(null_buf.into()), | ||||||||||||||||
|
@@ -582,6 +621,19 @@ mod tests { | |||||||||||||||
assert_eq!(5, arr.len()) | ||||||||||||||||
} | ||||||||||||||||
|
||||||||||||||||
#[test] | ||||||||||||||||
fn test_fixed_size_binary_array_from_sparse_iter_with_size_all_none() { | ||||||||||||||||
let input_arg = vec![None, None, None, None, None] as Vec<Option<Vec<u8>>>; | ||||||||||||||||
|
||||||||||||||||
let arr = FixedSizeBinaryArray::try_from_sparse_iter_with_size( | ||||||||||||||||
input_arg.into_iter(), | ||||||||||||||||
Some(16), | ||||||||||||||||
) | ||||||||||||||||
.unwrap(); | ||||||||||||||||
assert_eq!(16, arr.value_length()); | ||||||||||||||||
assert_eq!(5, arr.len()) | ||||||||||||||||
} | ||||||||||||||||
|
||||||||||||||||
#[test] | ||||||||||||||||
fn test_fixed_size_binary_array_from_vec() { | ||||||||||||||||
let values = vec!["one".as_bytes(), b"two", b"six", b"ten"]; | ||||||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2270,6 +2270,18 @@ macro_rules! typed_compares { | |
as_largestring_array($RIGHT), | ||
$OP, | ||
), | ||
(DataType::FixedSizeBinary(_), DataType::FixedSizeBinary(_)) => { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change does not appear to be related? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it's not, but it was something we were hoping to upstream and I think it accidentally fell into the branch. Would you like me to update this PR to mention this change? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think either we need tests of it, or we need to roll it back There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tests added. |
||
let lhs = $LEFT | ||
.as_any() | ||
.downcast_ref::<FixedSizeBinaryArray>() | ||
.unwrap(); | ||
let rhs = $RIGHT | ||
.as_any() | ||
.downcast_ref::<FixedSizeBinaryArray>() | ||
.unwrap(); | ||
|
||
compare_op(lhs, rhs, $OP) | ||
} | ||
(DataType::Binary, DataType::Binary) => compare_op( | ||
as_generic_binary_array::<i32>($LEFT), | ||
as_generic_binary_array::<i32>($RIGHT), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a doc comment similar to the one above would be good, especially to explain what the size argument is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.