Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace RawPtrBox with Safe Abstraction #1811

Closed
tustvold opened this issue Jun 7, 2022 · 9 comments · Fixed by #3743
Closed

Replace RawPtrBox with Safe Abstraction #1811

tustvold opened this issue Jun 7, 2022 · 9 comments · Fixed by #3743
Assignees
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@tustvold
Copy link
Contributor

tustvold commented Jun 7, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently the Array implementations make use of RawPtrBox to reference data on their attached ArrayData. Not only is this construction deeply unsafe, but requires further unsafe pointer manipulation to use.

Describe the solution you'd like

I would like for arrays to store strongly typed slices, instead of strongly typed pointers. This necessarily implies a self-referential struct, fortunately ouroboros provides a safe interface that handles this complexity for you. The hardest part is actually spelling the crates' name...

#[self_referencing]
pub struct PrimitiveArray<T: ArrowPrimitiveType> {
    data: ArrayData,
    #[borrows(data)]
    values: &'this [T::Native]
}

This would eliminate a lot of unsafe, without requiring any breaking changes.

Describe alternatives you've considered

We could not do this, this might just be a way to reduce the amount of unsafe in arrow-rs without requiring any breaking changes.

We could also abstract the self-referential shenanigans in a typed buffer abstraction, something like

#[self_referencing]
struct TypedBuffer<T: ArrowNativeType> {
  inner: Buffer,
  #[borrows(inner)]
  values: &'this [T]
}

Possibly even dropping ouroborous in favor of a little bit of custom unsafe.

Related Context

Somewhat related to #1799 which introduces stronger typing within ArrayData itself.

We used ouroboros for a while in IOx to handle flatbuffers, and it did the job. We have since removed flatbuffers in favor of a columnar protobuf representation, but that was more a reflection on flatbuffers than ouroboros.

Thoughts @alamb @jhorstmann @jorgecarleitao @HaoYang670 ?

@tustvold tustvold added question Further information is requested enhancement Any new improvement worthy of a entry in the changelog labels Jun 7, 2022
@HaoYang670
Copy link
Contributor

The ouroboros is a personal crate, I am not sure whether it is safe to depend on it.

@tustvold
Copy link
Contributor Author

tustvold commented Jun 7, 2022

Safe in what sense of the word? 😅

@HaoYang670
Copy link
Contributor

Safe in what sense of the word? 😅

If there is a bug in ouroboros, then how to fix it?
What if the crate is out of maintenance?

@tustvold
Copy link
Contributor Author

tustvold commented Jun 7, 2022

Definitely valid concerns, I think they need to be balanced against the likelihood of a bug in the current custom self-referential struct logic, vs an upstream that's had more eyes on it.

TBC I'm not totally sold on this proposal, I happen to think the current logic is probably fine, but opinions on the dangers of unsafe vary. Let's see what other people think 😅

@HaoYang670
Copy link
Contributor

Sorry, I think I lack of the knowledge about why we have to use raw pointers here?

@tustvold
Copy link
Contributor Author

tustvold commented Jun 7, 2022

The TLDR is

  • ArrayData stores untyped ref-counted buffers of bytes inline with the arrow specification
  • We want to provide a safe, strongly typed array interface to users (Array)
  • We want to perform alignment, etc... checks once on creation
  • We therefore construct typed RawPtrBox that reference the bytes owned by the Buffers stored in ArrayData on construction of an Array

Whilst I write this I realise we could encapsulate the self-referential shenanigans in a typed buffer abstraction. I'll update the issue 👍

@jhorstmann
Copy link
Contributor

Interesting, haven't seen or tried that crate before, but seems to solve the exact problem that our Array types have where we have data owned by ArrayData but to access it without indirections using a pointer or slice. The crate looks healthy, several contributors and releases. The macro might slow down compilation a bit, but I think that's a worthwhile tradeoff.

RawPtrBox also implements Send + Sync, I think that should also work automatically for slices.

@alamb
Copy link
Contributor

alamb commented Jun 7, 2022

I like the idea of a typed buffer 👍 that lives in arrow and doesn't rely on some other crate

I agree with @HaoYang670 that hiding the unsafely behind someone else's library may just be fooling ourselves into thinking arrow is safer than it is.

@tustvold tustvold changed the title Replace RawPtrBox with Ouroboros Replace RawPtrBox with Safe Abstraction Jun 8, 2022
tustvold added a commit to tustvold/arrow-rs that referenced this issue Jun 8, 2022
@tustvold
Copy link
Contributor Author

tustvold commented Jun 8, 2022

I bashed out what this might look like in #1820 PTAL

tustvold added a commit that referenced this issue Jun 9, 2022
* Add ScalarBuffer abstraction (#1811)

* Lint fixes
tustvold added a commit to tustvold/arrow-rs that referenced this issue Jun 9, 2022
Fix value_offsets for empty variable length arrays (apache#1824)
@alamb alamb removed the question Further information is requested label Jun 10, 2022
tustvold added a commit to tustvold/arrow-rs that referenced this issue Feb 20, 2023
tustvold added a commit to tustvold/arrow-rs that referenced this issue Feb 21, 2023
tustvold added a commit to tustvold/arrow-rs that referenced this issue Feb 21, 2023
tustvold added a commit to tustvold/arrow-rs that referenced this issue Feb 21, 2023
tustvold added a commit that referenced this issue Feb 23, 2023
* Remove RawPtrBox (#1811) (#1176)

* Clippy

* Extract get_offsets function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
4 participants