Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always validate the array data (except the Decimal) when creating array in IPC reader #2547

Merged
merged 2 commits into from Aug 23, 2022

Conversation

HaoYang670
Copy link
Contributor

Signed-off-by: remzi 13716567376yh@gmail.com

Which issue does this PR close?

Closes #2541.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670 HaoYang670 marked this pull request as draft August 22, 2022 02:32
@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 22, 2022
@HaoYang670
Copy link
Contributor Author

5 tests failures so far

failures:

---- ipc::reader::tests::read_generated_streams_014 stdout ----
thread 'ipc::reader::tests::read_generated_streams_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("-11697 is too small to store in a Decimal128 of precision 3. Min is -999")', arrow/src/ipc/reader.rs:497:18

---- ipc::reader::tests::read_generated_files_014 stdout ----
thread 'ipc::reader::tests::read_generated_files_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("-11697 is too small to store in a Decimal128 of precision 3. Min is -999")', arrow/src/ipc/reader.rs:497:18

---- json::reader::tests::test_row_type_validation stdout ----
thread 'json::reader::tests::test_row_type_validation' panicked at 'assertion failed: `(left == right)`
  left: `"Json error: Expected JSON record to be an object, found Array([Number(1), String(\"hello\")])"`,
 right: `"Json error: Expected JSON record to be an object, found Array [Number(1), String(\"hello\")]"`', arrow/src/json/reader.rs:2624:9

---- ipc::writer::tests::read_and_rewrite_generated_streams_014 stdout ----
thread 'ipc::writer::tests::read_and_rewrite_generated_streams_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("-11697 is too small to store in a Decimal128 of precision 3. Min is -999")', arrow/src/ipc/reader.rs:497:18

---- ipc::writer::tests::read_and_rewrite_generated_files_014 stdout ----
thread 'ipc::writer::tests::read_and_rewrite_generated_files_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("-11697 is too small to store in a Decimal128 of precision 3. Min is -999")', arrow/src/ipc/reader.rs:497:18

@HaoYang670 HaoYang670 marked this pull request as ready for review August 22, 2022 02:33
@viirya
Copy link
Member

viirya commented Aug 22, 2022

test_row_type_validation passes in current master. You can retry after cargo clean.

@viirya
Copy link
Member

viirya commented Aug 22, 2022

For the following tests, there are some out of range decimal values for the given precision in older version of generated files. A known issue.

---- ipc::reader::tests::read_generated_streams_014 stdout ----
thread 'ipc::reader::tests::read_generated_streams_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("-11697 is too small to store in a Decimal128 of precision 3. Min is -999")', arrow/src/ipc/reader.rs:497:18

---- ipc::reader::tests::read_generated_files_014 stdout ----
thread 'ipc::reader::tests::read_generated_files_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("-11697 is too small to store in a Decimal128 of precision 3. Min is -999")', arrow/src/ipc/reader.rs:497:18

---- ipc::writer::tests::read_and_rewrite_generated_streams_014 stdout ----
thread 'ipc::writer::tests::read_and_rewrite_generated_streams_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("-11697 is too small to store in a Decimal128 of precision 3. Min is -999")', arrow/src/ipc/reader.rs:497:18

---- ipc::writer::tests::read_and_rewrite_generated_files_014 stdout ----
thread 'ipc::writer::tests::read_and_rewrite_generated_files_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("-11697 is too small to store in a Decimal128 of precision 3. Min is -999")', arrow/src/ipc/reader.rs:497:18

@HaoYang670
Copy link
Contributor Author

A known issue.

Is there any ticket to track this?

@viirya
Copy link
Member

viirya commented Aug 22, 2022

A known issue.

Is there any ticket to track this?

I'd opened a ticket before to know why there are these values: https://issues.apache.org/jira/browse/ARROW-16696

BTW, I said "a known issue", not meaning it is some bug needed to fix. Just a known topic/problem if you try to validate these decimal values from the generated files.

@liukun4515
Copy link
Contributor

why the data in the ipc can't be trusted from this issue #2541 ? @HaoYang670 @tustvold

@tustvold
Copy link
Contributor

The IPC APIs are safe, it therefore cannot be possible to create UB using them. Currently reading a file with invalid list offsets, for example, will result in UB

@HaoYang670
Copy link
Contributor Author

At least, we should make the implementation consistent.
So far, Utf8 / Binary arrays use checked build function, however other arrays use build_unchecked.

It is weird for me that in create_primitative_array some arrays are checked, but others are not. If there are some reasons, we should doc them.

@tustvold
Copy link
Contributor

Regarding the Decimal validation there is an open ticket regarding how we handle this #2387

Perhaps we could back out the change to validating DecimalArray with a TODO tied to that ticket, and get the rest of this PR in?

@HaoYang670
Copy link
Contributor Author

Regarding the Decimal validation there is an open ticket regarding how we handle this #2387

Perhaps we could back out the change to validating DecimalArray with a TODO tied to that ticket, and get the rest of this PR in?

Sounds great. I will update the code and add more docs.

Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670 HaoYang670 changed the title Validate array data when creating array in ipc reader Always validate the array data (except the Decimal) when creating array in IPC reader Aug 23, 2022
@tustvold tustvold merged commit 3430537 into apache:master Aug 23, 2022
@ursabot
Copy link

ursabot commented Aug 23, 2022

Benchmark runs are scheduled for baseline = 7670c5f and contender = 3430537. 3430537 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@HaoYang670 HaoYang670 deleted the validate_array_data_in_ipc_reader branch August 23, 2022 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Always validate the array data when creating array in IPC reader
5 participants