Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make FFI support optional, change APIs to be safe (#2302) #2303

Merged
merged 2 commits into from Aug 4, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Aug 3, 2022

Which issue does this PR close?

Closes #2301
Closes #2302

Rationale for this change

Most applications don't need FFI support, and putting this behind a feature flag allows them to reduce their dependency footprint

What changes are included in this PR?

Are there any user-facing changes?

Yes, ffi is no longer enabled by default, and Array::to_raw is removed

@tustvold tustvold added the api-change Changes to the arrow API label Aug 3, 2022
Comment on lines -736 to -771
/// Creates a new array from two FFI pointers. Used to import arrays from the C Data Interface
/// # Safety
/// Assumes that these pointers represent valid C Data Interfaces, both in memory
/// representation and lifetime via the `release` mechanism.
pub unsafe fn make_array_from_raw(
array: *const ffi::FFI_ArrowArray,
schema: *const ffi::FFI_ArrowSchema,
) -> Result<ArrayRef> {
let array = ffi::ArrowArray::try_from_raw(array, schema)?;
let data = ArrayData::try_from(array)?;
Ok(make_array(data))
}

/// Exports an array to raw pointers of the C Data Interface provided by the consumer.
/// # Safety
/// Assumes that these pointers represent valid C Data Interfaces, both in memory
/// representation and lifetime via the `release` mechanism.
///
/// This function copies the content of two FFI structs [ffi::FFI_ArrowArray] and
/// [ffi::FFI_ArrowSchema] in the array to the location pointed by the raw pointers.
/// Usually the raw pointers are provided by the array data consumer.
pub unsafe fn export_array_into_raw(
src: ArrayRef,
out_array: *mut ffi::FFI_ArrowArray,
out_schema: *mut ffi::FFI_ArrowSchema,
) -> Result<()> {
let data = src.data();
let array = ffi::FFI_ArrowArray::new(data);
let schema = ffi::FFI_ArrowSchema::try_from(data.data_type())?;

std::ptr::write_unaligned(out_array, array);
std::ptr::write_unaligned(out_schema, schema);

Ok(())
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are moved to the ffi module to make the feature flags easier

#[deprecated(
note = "use from_custom_allocation instead which makes it clearer that the allocation is in fact owned"
)]
pub unsafe fn from_unowned(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was already deprecated, so I opted to just remove it

/// # Memory Leaks
/// This method releases `buffers`. Consumers of this struct *must* call `release` before
/// releasing this struct, or contents in `buffers` leak.
pub fn try_new(data: ArrayData) -> Result<Self> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method wasn't actually unsafe, and was being called by safe methods previously without any additional checks #2301

@@ -216,15 +214,6 @@ pub trait Array: fmt::Debug + Send + Sync + JsonEqual {
self.data_ref().get_array_memory_size() + std::mem::size_of_val(self)
- std::mem::size_of::<ArrayData>()
}

/// returns two pointers that represent this array in the C Data Interface (FFI)
fn to_raw(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see a compelling reason for this API to exist, ultimately the use of std::any::Any means users can't implement Array for custom types, and I can't see why arrow-rs would ever make use of this extension point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not feel comfortable reviewing this change -- I think @viirya / @kszucs may know more about its usecase

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this to_raw usage, ArrowArray provides into_raw which should be enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only use ArrowArray.into_raw actually.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 3, 2022
@alamb alamb changed the title Make FFI support optional (#2302) Make FFI support optional, change APIs to be safe (#2302) Aug 3, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of an optional flag looks good to me. I don't feel comfortable reviewing the FFI changes as I am not familiar with that code

# force_validate runs full data validation for all arrays that are created
# this is not enabled by default as it is too computationally expensive
# but is run as part of our CI checks
force_validate = []
# Enable ffi support
ffi = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I was going to recommend adding this to the "list of features of this crate" but then it turns out we don't seem to have any docs for them 🤦

https://docs.rs/arrow/19.0.0/arrow/

@@ -216,15 +214,6 @@ pub trait Array: fmt::Debug + Send + Sync + JsonEqual {
self.data_ref().get_array_memory_size() + std::mem::size_of_val(self)
- std::mem::size_of::<ArrayData>()
}

/// returns two pointers that represent this array in the C Data Interface (FFI)
fn to_raw(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not feel comfortable reviewing this change -- I think @viirya / @kszucs may know more about its usecase

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good direction to me. I will take a close look later.

@tustvold tustvold merged commit d87f6a4 into apache:master Aug 4, 2022
@ursabot
Copy link

ursabot commented Aug 4, 2022

Benchmark runs are scheduled for baseline = 4b15b7e and contender = d87f6a4. d87f6a4 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make FFI support optional via Feature Flag ffi Mark ffi::ArrowArray::try_new is safe
4 participants