Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate dictionary key in TypedDictionaryArray (#2578) #2589

Merged
merged 1 commit into from
Aug 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 8 additions & 0 deletions arrow/src/array/array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,14 @@ impl<'a, T: Array> Array for &'a T {
}

/// A generic trait for accessing the values of an [`Array`]
///
/// # Validity
///
/// An [`ArrayAccessor`] must always return a well-defined value for an index that is
/// within the bounds `0..Array::len`, including for null indexes where [`Array::is_null`] is true.
///
/// The value at null indexes is unspecified, and implementations must not rely on a specific
/// value such as [`Default::default`] being returned, however, it must not be undefined
pub trait ArrayAccessor: Array {
type Item: Send + Sync;

Expand Down
24 changes: 16 additions & 8 deletions arrow/src/array/array_dictionary.rs
Original file line number Diff line number Diff line change
Expand Up @@ -475,8 +475,7 @@ impl<'a, K: ArrowPrimitiveType, V: Sync> Array for TypedDictionaryArray<'a, K, V
impl<'a, K, V> IntoIterator for TypedDictionaryArray<'a, K, V>
where
K: ArrowPrimitiveType,
V: Sync + Send,
&'a V: ArrayAccessor,
Self: ArrayAccessor,
{
type Item = Option<<Self as ArrayAccessor>::Item>;
type IntoIter = ArrayIter<Self>;
Expand All @@ -491,21 +490,30 @@ where
K: ArrowPrimitiveType,
V: Sync + Send,
&'a V: ArrayAccessor,
<&'a V as ArrayAccessor>::Item: Default,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we don't currently support dictionaries containing complex types such as ListArray, etc... (neither does C++) nor even implement ArrayAccessor for those types, this restriction is unlikely to matter in practice

{
type Item = <&'a V as ArrayAccessor>::Item;

fn value(&self, index: usize) -> Self::Item {
assert!(self.dictionary.is_valid(index), "{}", index);
let value_idx = self.dictionary.keys.value(index).to_usize().unwrap();
// Dictionary indexes should be valid
unsafe { self.values.value_unchecked(value_idx) }
assert!(
index < self.len(),
"Trying to access an element at index {} from a TypedDictionaryArray of length {}",
index,
self.len()
);
unsafe { self.value_unchecked(index) }
}

unsafe fn value_unchecked(&self, index: usize) -> Self::Item {
let val = self.dictionary.keys.value_unchecked(index);
let value_idx = val.to_usize().unwrap();
// Dictionary indexes should be valid
self.values.value_unchecked(value_idx)

// As dictionary keys are only verified for non-null indexes
// we must check the value is within bounds
match value_idx < self.values.len() {
true => self.values.value_unchecked(value_idx),
false => Default::default(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. This is less or zero performance impact.

}
}
}

Expand Down