Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out arrow-data into a separate crate #2746

Merged
merged 3 commits into from Sep 22, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Sep 16, 2022

Which issue does this PR close?

Part of #2594

Rationale for this change

ArrayData represents the canonical representation of an arrow array, and is the language of the C Data interface. It therefore makes sense to split into its own crate. In theory this could form the basis for interoperability between different Rust array abstractions.

What changes are included in this PR?

Splits out ArrayData and related structures into a separate crate.

Are there any user-facing changes?

We can no longer implement From<ArrayRef> for ArrayData. In practice this only appears to be being used by pyarrow code, and can be replaced with calls to make_array or Array::data() as appropriate.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 16, 2022
@@ -284,7 +284,11 @@ impl<T: DecimalType> DecimalArray<T> {

// safety: self.data is valid DataType::Decimal as checked above
let new_data_type = Self::TYPE_CONSTRUCTOR(precision, scale);
Ok(self.data().clone().with_data_type(new_data_type).into())
let data = self.data().clone().into_builder().data_type(new_data_type);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was previously pub(crate), I opted to just remove it in favour of the into_builder plumbing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @liukun4515 -- I think it is fine

@@ -56,6 +56,10 @@ impl Bitmap {
unsafe { bit_util::get_bit_raw(self.bits.as_ptr(), i) }
}

pub fn buffer(&self) -> &Buffer {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend to deprecate buffer_ref in a follow up PR

@@ -0,0 +1,672 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved from array/transform/mod.rs but with the tests split out into arrow/tests/array_transform.rs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think moving out the tests is a great idea -- I wonder if we need to do anything more to make them discoverable 🤔

@@ -358,6 +358,90 @@ pub trait ArrayAccessor: Array {
unsafe fn value_unchecked(&self, index: usize) -> Self::Item;
}

impl PartialEq for dyn Array {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are moved from arrow/equal/mod.rs

@@ -1,1464 +0,0 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved to arrow-data/equal/mod.rs with the array equality moved to arrow/array/array.rs and the tests moved to arrow/tests/array_equal.rs

@@ -1771,1124 +1746,30 @@ mod tests {
}

#[test]
#[should_panic(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are moved to arrow/tests/array_validation.rs

@@ -88,30 +88,3 @@ pub(super) fn boolean_equal(
})
}
}

#[cfg(test)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are moved to arrow/tests/array_equal.rs

@@ -149,52 +148,3 @@ pub(super) fn list_equal<T: OffsetSizeTrait>(
})
}
}

#[cfg(test)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are moved to arrow/tests/array_equal.rs

@@ -0,0 +1,171 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved from arrow/array/equal/mod.rs

@tustvold tustvold marked this pull request as ready for review September 21, 2022 13:15
@alamb alamb changed the title Split out arrow data Split out arrow-data into a new crate Sep 21, 2022
@alamb alamb added the api-change Changes to the arrow API label Sep 21, 2022
@alamb alamb changed the title Split out arrow-data into a new crate Split out arrow-data into a separate crate Sep 21, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say I reviewed every single line in this PR super carefully, but I did read them all and looked at the changes, and verified the tests are still present.

I also like the way that several of the larger tests are moved into arrow/tests/... integration style tests. This is similar to what arrow2 does 👍 Nice

Do we think it is appropriate to run any benchmarks?

LGTM I say we ship it

@@ -0,0 +1,672 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think moving out the tests is a great idea -- I wonder if we need to do anything more to make them discoverable 🤔

buffer: &mut MutableBuffer,
offsets: &[T],
values: &[u8],
start: usize,
len: usize,
) {
let start_values = offsets[start].to_usize().unwrap();
let end_values = offsets[start + len].to_usize().unwrap();
let start_values = offsets[start].as_();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL: https://docs.rs/num/latest/num/cast/trait.AsPrimitive.html#tymethod.as_

as_() is something I have not seen before. Fascinating.

@@ -284,7 +284,11 @@ impl<T: DecimalType> DecimalArray<T> {

// safety: self.data is valid DataType::Decimal as checked above
let new_data_type = Self::TYPE_CONSTRUCTOR(precision, scale);
Ok(self.data().clone().with_data_type(new_data_type).into())
let data = self.data().clone().into_builder().data_type(new_data_type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @liukun4515 -- I think it is fine

@tustvold
Copy link
Contributor Author

This does not appear to have a discernible impact on the benchmarks - some fluctuate as is to be expected when you're talking nanoseconds, but no major change in either direction

@tustvold tustvold merged commit a927d7a into apache:master Sep 22, 2022
@ursabot
Copy link

ursabot commented Sep 22, 2022

Benchmark runs are scheduled for baseline = 80c0f1a and contender = a927d7a. a927d7a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants