Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to arrow 26, change timezones #4039

Merged
merged 12 commits into from Nov 4, 2022
Merged

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Oct 31, 2022

Which issue does this PR close?

re apache/arrow-rs#2953

Rationale for this change

Get latest arrow
Fixes #3938

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added core Core datafusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql labels Oct 31, 2022
@@ -219,7 +219,7 @@ pub fn return_type(
}
BuiltinScalarFunction::Now => Ok(DataType::Timestamp(
TimeUnit::Nanosecond,
Some("UTC".to_owned()),
Some("+00:00".to_owned()),
Copy link
Contributor Author

@tustvold tustvold Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is necessary because by default (without the chrono-tz feature) arrow-rs doesn't know how to interpret this timezone. With the recent improvements to timezone handling, it now complains - (it prints "Unknown Timezone" when rending the timestamps). TBC this is an improvement over silently ignoring the timezone which it did before

FYI @waitingkuo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically applied this same pattern to a few other tests and now things looks great 👍

@tustvold
Copy link
Contributor Author

I'm getting failures in test_sort_fetch_memory_calculation which appears to have been added as part of #3593 by @isidentical. I'm having a hard time working out why it is failing, perhaps someone might be able help out?

@Dandandan
Copy link
Contributor

Dandandan commented Oct 31, 2022

I'm getting failures in test_sort_fetch_memory_calculation which appears to have been added as part of #3593 by @isidentical. I'm having a hard time working out why it is failing, perhaps someone might be able help out?

Somehow the implementation spilled memory, because it believes memory usage exceeded the limit. Maybe some change happened in arrow 26 not shrinking the buffer somewhere?

@isidentical
Copy link
Contributor

@tustvold @Dandandan it seems like the limit is never applied for multi-column sorts when lexsort_to_indices is called. Probably due to the changes in apache/arrow-rs#2929, where we previously created an array of (&value_indices)[0..len] but now just create it from value_indices directly without any limit (the single-column case is handled separately; and it still respects the limit).

@Dandandan
Copy link
Contributor

@tustvold @Dandandan it seems like the limit is never applied for multi-column sorts when lexsort_to_indices is called. Probably due to the changes in apache/arrow-rs#2929, where we previously created an array of (&value_indices)[0..len] but now just create it from value_indices directly without any limit (the single-column case is handled separately; and it still respects the limit).

Ah, great find. @tustvold might be worth to block the RC to fix this issue?

@tustvold
Copy link
Contributor Author

Aah nice find will workaround.

Given it can easily be worked around by slicing the returned array I don't think we need to block the RC

@alamb
Copy link
Contributor

alamb commented Oct 31, 2022

First thing I recommend we do is to file an arrow-rs ticket explaining the issue and then we can discuss a new RC based on that issue. Doing that now

@alamb
Copy link
Contributor

alamb commented Oct 31, 2022

Filed apache/arrow-rs#2990 -- will work on that shortly

@alamb
Copy link
Contributor

alamb commented Oct 31, 2022

Proposed fix apache/arrow-rs#2991

@alamb
Copy link
Contributor

alamb commented Oct 31, 2022

I created a new release candidate for arrow 26.0.0

@alamb
Copy link
Contributor

alamb commented Nov 3, 2022

Arrow 26 has been released -- apache/arrow-rs#2953

@andygrove
Copy link
Member

I plan on creating DataFusion 14.0.0 rc1 once this is merged.

@alamb
Copy link
Contributor

alamb commented Nov 4, 2022

I am working on updating this PR and polishing it for merge

let arr_micros = TimestampMicrosecondArray::from_opt_vec(ts_micros, None);
let arr_millis = TimestampMillisecondArray::from_opt_vec(ts_millis, None);
let arr_secs = TimestampSecondArray::from_opt_vec(ts_secs, None);
let arr_nanos = TimestampNanosecondArray::from(ts_nanos);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb alamb marked this pull request as ready for review November 4, 2022 17:19
@alamb alamb changed the title Update to arrow 26 Update to arrow 26, change timezones Nov 4, 2022
@@ -237,25 +235,26 @@ fn timestamp_nano_ts_none_predicates() -> Result<()> {
// a scan should have the now()... predicate folded to a single
// constant and compared to the column without a cast so it can be
// pushed down / pruned
let expected = "Projection: test.col_int32\n Filter: test.col_ts_nano_utc < TimestampNanosecond(1666612093000000000, Some(\"UTC\"))\
\n TableScan: test projection=[col_int32, col_ts_nano_none]";
let expected =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this also fixes #3938 🎉

@andygrove
Copy link
Member

sorry @alamb, I merged a PR that created a small conflict here

@alamb
Copy link
Contributor

alamb commented Nov 4, 2022

sorry @alamb, I merged a PR that created a small conflict here

No worries -- I just fixed it

@andygrove andygrove merged commit dd081d6 into apache:master Nov 4, 2022
@ursabot
Copy link

ursabot commented Nov 4, 2022

Benchmark runs are scheduled for baseline = 4a67d0d and contender = dd081d6. dd081d6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb
Copy link
Contributor

alamb commented Nov 4, 2022

🚀 FYI @Ted-Jiang

Dandandan pushed a commit to yuuch/arrow-datafusion that referenced this pull request Nov 5, 2022
* Update to arrow 26

* Update datafusion-cli

* Update datafusion-cli lockfile

* More test fixes

* Re-add dyn_cmp_dict dev-dependencies

* Update datafusion-cli lock

* fix: Update tests

* Update datafusion optimizer tests

* Fix test

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Ted-Jiang pushed a commit to Ted-Jiang/arrow-datafusion that referenced this pull request Nov 5, 2022
* Update to arrow 26

* Update datafusion-cli

* Update datafusion-cli lockfile

* More test fixes

* Re-add dyn_cmp_dict dev-dependencies

* Update datafusion-cli lock

* fix: Update tests

* Update datafusion optimizer tests

* Fix test

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql
Projects
None yet
6 participants