Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FixedSizeBinaryArray::try_from_sparse_iter_with_size #3054

Merged
merged 12 commits into from Nov 13, 2022

Conversation

maxburke
Copy link
Contributor

@maxburke maxburke commented Nov 9, 2022

Closes #1390

Rationale for this change

FixedSizeBinaryArray::try_from_sparse_iter generates incorrect results when the provided iterator only produces None values.

What changes are included in this PR?

This change also implements the comparison kernel for FixedSizeBinary types.

Are there any user-facing changes?

This change adds a new function, FixedSizeBinaryArray::try_from_iter_with_size, which takes a default size parameter, and markes the existing try_from_sparse_iter as deprecated.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 9, 2022
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments


pub fn try_from_sparse_iter_with_size<T, U>(
mut iter: T,
asserted_size: Option<i32>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the need for a separate asserted_size and detected_size. Looking at the logic I think we could just have mut size: Option<i32> and everything would work correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we were to just have a single size option we would need to have separate implementations of try_from_sparse_iter and try_from_sparse_iter_with_size.

(IMO, I think the former is problematic and should be deprecated / removed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why you need to distinguish between the two different sizes in the implementation, it doesn't change if the size is determined by being passed in or from the first non-null element. They must all be equal regardless?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separated the instances of try_from_sparse_iter and try_from_sparse_iter_with_size because I think the corner cases of making the two implementations match up wasn't worth the complexity.

@@ -2270,6 +2270,18 @@ macro_rules! typed_compares {
as_largestring_array($RIGHT),
$OP,
),
(DataType::FixedSizeBinary(_), DataType::FixedSizeBinary(_)) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change does not appear to be related?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not, but it was something we were hoping to upstream and I think it accidentally fell into the branch. Would you like me to update this PR to mention this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either we need tests of it, or we need to roll it back

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests added.

Self::try_from_sparse_iter_with_size(iter, None)
}

pub fn try_from_sparse_iter_with_size<T, U>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a doc comment similar to the one above would be good, especially to explain what the size argument is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 139 to 140

pub fn try_from_sparse_iter_with_size<T, U>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn try_from_sparse_iter_with_size<T, U>(
pub fn try_from_sparse_iter_with_size<T, U>(iter: T, size: i32) -> Result<Self, ArrowError> {
try_from_sparse_iter_with_size_opt::<T, U>(iter, Some(size))
}
fn try_from_sparse_iter_with_size_opt<T, U>(

I think this makes for better ergonomics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the parameter Optional instead

@alamb
Copy link
Contributor

alamb commented Nov 11, 2022

If we can get this PR merged in the next 4-5 hours it will be included in arrow 27.0.0.

@maxburke
Copy link
Contributor Author

If we can get this PR merged in the next 4-5 hours it will be included in arrow 27.0.0.

rats

@alamb
Copy link
Contributor

alamb commented Nov 12, 2022

If we can get this PR merged in the next 4-5 hours it will be included in arrow 27.0.0.

😢

@alamb alamb requested a review from tustvold November 12, 2022 13:33
@viirya
Copy link
Member

viirya commented Nov 12, 2022

But there are a few places using the deprecated function, see CI.

error: use of deprecated associated function `arrow_array::FixedSizeBinaryArray::try_from_sparse_iter`: This function will fail if the iterator produces only None values; prefer `try_from_sparse_iter_with_size`
   --> arrow/src/array/ffi.rs:215:43
    |
215 |         let array = FixedSizeBinaryArray::try_from_sparse_iter(values.into_iter())?;
    |                                           ^^^^^^^^^^^^^^^^^^^^
    |
    = note: `-D deprecated` implied by `-D warnings`

@maxburke
Copy link
Contributor Author

But there are a few places using the deprecated function, see CI.

error: use of deprecated associated function `arrow_array::FixedSizeBinaryArray::try_from_sparse_iter`: This function will fail if the iterator produces only None values; prefer `try_from_sparse_iter_with_size`
   --> arrow/src/array/ffi.rs:215:43
    |
215 |         let array = FixedSizeBinaryArray::try_from_sparse_iter(values.into_iter())?;
    |                                           ^^^^^^^^^^^^^^^^^^^^
    |
    = note: `-D deprecated` implied by `-D warnings`

Yup, just saw the build failure and submitted a fix. I wasn't testing locally with the ffi feature enabled.

@viirya
Copy link
Member

viirya commented Nov 12, 2022

There are some clippy errors that are not related to this change. Submitted #3096 to fix that.

@tustvold
Copy link
Contributor

tustvold commented Nov 13, 2022

@maxburke could you possibly merge master into this branch, so the CI goes green. I don't seem to have permissions to do this, there is a checkbox to allow maintainers the ability to edit PRed branches

@maxburke
Copy link
Contributor Author

@maxburke could you possibly merge master into this branch, so the CI goes green. I don't seem to have permissions to do this, there is a checkbox to allow maintainers the ability to edit PRed branches

done!

@viirya viirya merged commit 20d81f5 into apache:master Nov 13, 2022
@viirya
Copy link
Member

viirya commented Nov 13, 2022

Thanks @maxburke

@ursabot
Copy link

ursabot commented Nov 13, 2022

Benchmark runs are scheduled for baseline = b7af85c and contender = 20d81f5. 20d81f5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb
Copy link
Contributor

alamb commented Nov 14, 2022

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FixedSizeBinaryArray::try_from_sparse_iter failed when given all Nones
5 participants