Move JSON Test Format To integration-testing #2724

tustvold · 2022-09-13T17:55:46Z

Which issue does this PR close?

Part of #2594
Part of #2300
Related to #2723

Rationale for this change

Collocates the code related to the integration tests, reducing coupling to serde_json, potentially improves compilation times.

What changes are included in this PR?

Follow on to #2598, this moves the JSON schema parsing logic to integration-testing. I had incorrectly assumed this was a canonical representation, however, it is a custom format specific to these tests - https://github.com/apache/arrow/blob/master/docs/source/format/Integration.rst#json-test-data-format. I therefore think it can be safely moved to live alongside the other logic.

Are there any user-facing changes?

Yes, these APIs were public, but I'm fairly confident they weren't in use, especially as the logic to actually parse the files themselves along with the array data, is not public.

alamb

The idea makes sense to me -- I only worry that someone perhaps was using the JSON serialization code for some other purpose.

If this turns out to be the case, we can always extract the schema parsing code into its own crate as well.

Thanks @tustvold

cc @nevi-me and @viirya

ursabot · 2022-09-14T13:52:31Z

Benchmark runs are scheduled for baseline = 4f52a25 and contender = 5146663. 5146663 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

viirya

Looks good. The json format should be for integration test purpose. Although I think it may be possible like @alamb said that it is used for other purpose as it is public now, it is out of the design of the format and we can do extraction later if it is really needed.

domoritz · 2022-09-20T14:04:03Z

I was using the JSON format for schemas in https://github.com/domoritz/csv2parquet. Could you provide a crate for reading/writing schemas from JSON? I don't need the same format.

tustvold · 2022-09-20T14:05:24Z

@domoritz The schema types implement serde::Serialize and Deserialize, and can therefore be used directly with serde_json. Would that be workable?

domoritz · 2022-09-20T14:07:04Z

That should probably work. Do you have a code snippet I could look at?

tustvold · 2022-09-20T14:07:56Z

https://github.com/apache/arrow-rs/blob/master/arrow/src/datatypes/schema.rs#L292

domoritz · 2022-09-20T14:17:01Z

Perfect. That works!

maxburke · 2022-10-07T22:07:06Z

@alamb was right, someone (my company) was using this code, and now we're broken when trying to move beyond arrow 22; specifically we are dependent on `Schema::from::<serde_json::Value>

alamb · 2022-10-11T11:07:48Z

Based on some discussions, I think the plan is that @tustvold was going to extract this code into its own crate for reuse.

I also believe @tustvold is out for a few days so I would expect a delayed response. Please let me know if it would be helpful to file a ticket for this issue

The actual 13.0.0 DF release uses Arrow 24.0.0, but we need to pick up 25.0.0, since it brings back the Arrow Schema/Field-to-JSON serialization code (albeit in a different crate for integration tests). apache/arrow-rs#2868 apache/arrow-rs#2724

Move JSON Test Format To integration-testing

4588632

tustvold added the api-change label Sep 13, 2022

tustvold requested a review from liukun4515 September 13, 2022 17:55

github-actions bot added the arrow label Sep 13, 2022

This was referenced Sep 13, 2022

Don't Derive Serialize/Deserialize Serde Implementations for Schema Types #2723

Closed

Split out arrow-schema (#2594) #2711

Merged

Fix RAT

ddb91c5

alamb approved these changes Sep 14, 2022

View reviewed changes

tustvold merged commit 5146663 into apache:master Sep 14, 2022

viirya reviewed Sep 14, 2022

View reviewed changes

domoritz mentioned this pull request Sep 20, 2022

Add back JSON import/export for schema #2762

Closed

tustvold mentioned this pull request Oct 13, 2022

Split out arrow-integration-test crate #2868

Merged

mildbyte mentioned this pull request Oct 27, 2022

Upgrade to DataFusion 13 (784f10bb) / Arrow 25.0.0 splitgraph/seafowl#176

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move JSON Test Format To integration-testing #2724

Move JSON Test Format To integration-testing #2724

tustvold commented Sep 13, 2022

alamb left a comment

ursabot commented Sep 14, 2022

viirya left a comment

domoritz commented Sep 20, 2022

tustvold commented Sep 20, 2022

domoritz commented Sep 20, 2022

tustvold commented Sep 20, 2022

domoritz commented Sep 20, 2022

maxburke commented Oct 7, 2022 •

edited

Loading

alamb commented Oct 11, 2022

Move JSON Test Format To integration-testing #2724

Move JSON Test Format To integration-testing #2724

Conversation

tustvold commented Sep 13, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

ursabot commented Sep 14, 2022

viirya left a comment

Choose a reason for hiding this comment

domoritz commented Sep 20, 2022

tustvold commented Sep 20, 2022

domoritz commented Sep 20, 2022

tustvold commented Sep 20, 2022

domoritz commented Sep 20, 2022

maxburke commented Oct 7, 2022 • edited Loading

alamb commented Oct 11, 2022

maxburke commented Oct 7, 2022 •

edited

Loading