Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unsafe from Buffer::typed_data #996

Closed
alamb opened this issue Dec 3, 2021 · 5 comments · Fixed by #1866
Closed

Remove unsafe from Buffer::typed_data #996

alamb opened this issue Dec 3, 2021 · 5 comments · Fixed by #1866
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Dec 3, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
As pointed out by @jhorstmann on #921 at #921 (comment),

Buffer::typed_data does not actually need to be marked unsafe since it checks the alignment requirements. The very similarly implemented MutableBuffer::typed_data_mut is not marked as unsafe. The safety notes mention bool as a special case, but that is no longer an ArrowNativeType since a while.

https://github.com/apache/arrow-rs/blob/6a6e7f7/arrow/src/buffer/immutable.rs#L160-L181

Describe the solution you'd like

  1. Remove unsafe from Buffer::typed_data
  2. Remove note about bool in docstrings
@alamb alamb added enhancement Any new improvement worthy of a entry in the changelog arrow Changes to the arrow crate labels Dec 3, 2021
@alamb
Copy link
Contributor Author

alamb commented Dec 4, 2021

I reviewed the code a bit more, and the comments make a great point:

    /// `ArrowNativeType` is public so that it can be used as a trait bound for other public
    /// components, such as the `ToByteSlice` trait.  However, this means that it can be
    /// implemented by user defined types, which it is not intended for.

Meaning that if a user implements ArrowNativeType for their types, this will result in undefined behavior.

I also looked more carefully at Buffer and it effectively allows reinterpreting arbitrary bytes as different types, so I am not sure that behavior is safe...

What could
be done is make MutableBuffer::typed_data unsafe instead as the docs there say very specifically

    /// # Safety
    /// This function must only be used when this buffer was extended with items of type `T`.
    /// Failure to do so results in undefined behavior.

@alamb
Copy link
Contributor Author

alamb commented Dec 4, 2021

Closing this one as I don't think it is actionable upon further review. FYI @jhorstmann

@jhorstmann
Copy link
Contributor

Ok, I wasn't aware that implementing ArrowNativeType for arbitrary types by a user is a supported usecase. Maybe making that into a sealed trait would also be an option. Anyway, my comment was more about the inconsistency between MutableBuffer::typed_data and Buffer::typed_data, so marking both unsafe is also fine.

@alamb
Copy link
Contributor Author

alamb commented Dec 5, 2021 via email

@alamb
Copy link
Contributor Author

alamb commented Dec 11, 2021

tustvold added a commit to tustvold/arrow-rs that referenced this issue Jun 13, 2022
alamb pushed a commit that referenced this issue Jun 15, 2022
* Mark typed buffer APIs safe (#996) (#1027)

* Fix parquet

* Format

* Review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants