Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak list of optimization rules #3841

Merged
merged 8 commits into from
Oct 17, 2022

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Oct 15, 2022

Which issue does this PR close?

Closes #.

Rationale for this change

I ran all the optimization rules twice and looked at the change in output in the TPC-H regression tests.
Next, I removed all the rules that didn't have any effect, and tried to remove the earlier passes.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Oct 15, 2022
@@ -163,6 +162,14 @@ impl Optimizer {
rules.push(Arc::new(FilterPushDown::new()));
rules.push(Arc::new(LimitPushDown::new()));
rules.push(Arc::new(SingleDistinctToGroupBy::new()));
rules.push(Arc::new(ProjectionPushDown::new()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one can simply be run after the other optimizations.

@github-actions github-actions bot added the core Core datafusion crate label Oct 15, 2022
@Dandandan Dandandan marked this pull request as ready for review October 15, 2022 15:59
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new plans look a lot nicer to me. The only thing I start to worry about is how long our planning will take -- Perhaps it would be good to add some logging to improve the results.

Maybe related to #1160

@@ -7,12 +7,12 @@ Sort: numwait DESC NULLS FIRST, supplier.s_name ASC NULLS LAST
Inner Join: l1.l_orderkey = orders.o_orderkey
Inner Join: supplier.s_suppkey = l1.l_suppkey
TableScan: supplier projection=[s_suppkey, s_name, s_nationkey]
Filter: l1.l_receiptdate > l1.l_commitdate AND l1.l_receiptdate > l1.l_commitdate
Filter: l1.l_receiptdate > l1.l_commitdate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Dandandan
Copy link
Contributor Author

The new plans look a lot nicer to me. The only thing I start to worry about is how long our planning will take -- Perhaps it would be good to add some logging to improve the results.

Maybe related to #1160

The new plans look a lot nicer to me. The only thing I start to worry about is how long our planning will take -- Perhaps it would be good to add some logging to improve the results.

Maybe related to #1160

Yes, good idea. Also some optimization rules might be improved to e.g. not create filters that already exist etc. But in the end we would also need to keep the passes as simple as possible - a hard balancing act :)

@Dandandan Dandandan merged commit 431a412 into apache:master Oct 17, 2022
@ursabot
Copy link

ursabot commented Oct 17, 2022

Benchmark runs are scheduled for baseline = 743ff28 and contender = 431a412. 431a412 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@andygrove
Copy link
Member

I've just started reviewing this PR and noticed one difference in one of the queries I tested with that might not be optimal.

before

Table scan filter: CAST(col1 AS Int64) BETWEEN Int64(4) AND Int64(4) + Int64(3)

after

Table scan filter: CAST(col1 AS Int64) >= Int64(4), CAST(col1 AS Int64) <= Int64(7)

It looks like we are now performing the CAST twice.

@andygrove
Copy link
Member

Actually, I am testing against the latest DF that includes this PR but also has other changes ... so this change could possibly be unrelated to this PR. I will post an update here soon.

@andygrove
Copy link
Member

Filed #3863 for this issue. It is not related to this PR after all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants