Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark typed buffer APIs safe (#996) (#1027) #1866

Merged
merged 5 commits into from Jun 15, 2022

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #996
Closes #1027

Rationale for this change

These APIs were originally marked unsafe because ArrowNativeType could be implemented outside the crate. Since #1819 this is no longer the case, as a result these methods are no longer unsafe as they can't lead to UB.

What changes are included in this PR?

Removes unsafe from MutableBuffer::typed_data_mut and Buffer::typed_data.

Are there any user-facing changes?

No

@@ -299,7 +296,7 @@ impl MutableBuffer {
/// assert_eq!(buffer.len(), 8) // u32 has 4 bytes
/// ```
#[inline]
pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T]) {
pub fn extend_from_slice<T: ArrowNativeType>(&mut self, items: &[T]) {
Copy link
Contributor Author

@tustvold tustvold Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was potentially unsound, as ToByteSlice is not sealed and so could theoretically be implemented for a type that is not trivially transmutable (which the implementation of this method implicitly assumes).

Edit: this is an API change

@github-actions github-actions bot added arrow Changes to the arrow crate parquet Changes to the parquet crate labels Jun 13, 2022
@tustvold tustvold added the api-change Changes to the arrow API label Jun 13, 2022
@@ -576,7 +576,7 @@ macro_rules! def_get_binary_array_fn {
fn $name(array: &$ty) -> Vec<ByteArray> {
let mut byte_array = ByteArray::new();
let ptr = crate::util::memory::ByteBufferPtr::new(
unsafe { array.value_data().typed_data::<u8>() }.to_vec(),
array.value_data().as_slice().to_vec(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this was ever using typed_data...

@tustvold
Copy link
Contributor Author

Clippy fix in #1868

Copy link
Contributor

@jhorstmann jhorstmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Agree that this should be safe.

/// correctly for type `T`.
pub fn typed_data<T: ArrowNativeType>(&self) -> &[T] {
// SAFETY
// ArrowNativeType are trivially transmutable, and this method checks alignment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ArrowNativeType are trivially transmutable, and this method checks alignment
// ArrowNativeType is sealed (can't be implemented outside the arrow crate,
// trivially transmutable, and this method checks alignment

let (prefix, offsets, suffix) = self.as_slice().align_to::<T>();
/// This function panics if the underlying buffer is not aligned
/// correctly for type `T`.
pub fn typed_data<T: ArrowNativeType>(&self) -> &[T] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is truly "safe" -- is it really true that any bit pattern is a valid ArrowNativeType? I am thinking about floating point representations in particular -- I wonder if this API could potentially create invalid f32 / f64 which seems like it would thus still be unsafe 🤔

Copy link
Contributor Author

@tustvold tustvold Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think https://doc.rust-lang.org/std/primitive.f32.html#method.from_bits is relevant here, the short answer is it is perfectly safe to transmute arbitrary bytes to floats, it may not be wise, but it is not UB.

In particular the standard library provides safe functions that transmute u32 -> f32, u64 -> f64, and so I think it is fair to say all bit sequences are valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that it is safe because there are no actual undefined bit patterns in any of the native types (as opposed to bool or Option<...> for example). Certain bit patterns might get canonicalized when interpreted as floating point values, but I don't think that would be considered undefined behavior. There are more details about specific behavior in the docs for f64::from_bits (which is considered safe).

@codecov-commenter
Copy link

Codecov Report

Merging #1866 (a33e53d) into master (3073a26) will decrease coverage by 0.00%.
The diff coverage is 87.09%.

@@            Coverage Diff             @@
##           master    #1866      +/-   ##
==========================================
- Coverage   83.48%   83.47%   -0.01%     
==========================================
  Files         201      201              
  Lines       57000    57001       +1     
==========================================
- Hits        47584    47583       -1     
- Misses       9416     9418       +2     
Impacted Files Coverage Δ
arrow/src/array/builder.rs 86.88% <ø> (ø)
arrow/src/datatypes/native.rs 66.66% <ø> (ø)
parquet/src/arrow/record_reader/buffer.rs 92.42% <ø> (ø)
...et/src/arrow/array_reader/byte_array_dictionary.rs 83.91% <33.33%> (ø)
arrow/src/array/data.rs 84.16% <50.00%> (ø)
arrow/src/buffer/mutable.rs 90.25% <75.00%> (-0.29%) ⬇️
arrow/src/array/array_union.rs 90.92% <100.00%> (ø)
arrow/src/buffer/immutable.rs 98.48% <100.00%> (ø)
arrow/src/buffer/ops.rs 96.77% <100.00%> (ø)
arrow/src/compute/kernels/cast.rs 95.77% <100.00%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3073a26...a33e53d. Read the comment docs.

@alamb alamb merged commit b10d58b into apache:master Jun 15, 2022
@alamb alamb changed the title Mark typed buffer APIs safe (#996) (#1027) Mark typed buffer APIs safe (#996) (#1027) Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MutableBuffer::typed_data_mut is not safe Remove unsafe from Buffer::typed_data
4 participants